Spanner Tiered Storage

Built on the ones who came before: Tiers to Tears

What is it?

Tiered Storage allows us to move our data from expensive and very performant SSDs to less-expensive and less-performant HDDs.

Who cares?

If you’re here reading this doc, and are a member of a data-services team who owns one or more Spanner databases, then YOU should care. Spanner is great, but it’s expensive, and our friends (overlords?) at Google have given us a way to lower those expenses by enabling Tiered Storage.

How do I do it?

[[#Step 1. Commit Timestamps|Enable Commit Timestamps]]
[[#Step 2. Timestamp Predicate Pushdown|Enable Timestamp Predicate Pushdown]]
[[#Step 3. Locality Group|Modify the Default Locality Group]]

According to GCP docs, all you need to enable Tiered Storage is to create or modify an age-based policy for a locality group.

However, teams(Curry) ran into some major performance issues after enabling spillover to due to some unexpected behavior by spanner, i.e. scanning data outside of a timestamp bounded query which caused slow reads from HDD on every request.

To avoid this there are some optimizations we recommend everyone make before enabling Tiered Storage. These optimizations will make sure spanner only scans the rows within the time-bounds of the query.

For the below steps, we recommend making them in lower environments before enabling on Production.

Step 1. Commit Timestamps

We can optimize time-based queries using commit_timestamps by adding them to a TIMESTAMP column in our Spanner table. You’ll want to use whatever column defines the timestamp for the initial creation of the row. This is done via a manual query or preferably during a migration of the database. You’ll set this per-table.

ALTER TABLE Table_Name
ALTER COLUMN Timestamp_Column SET OPTIONS (allow_commit_timestamp = true);

Step 2. Timestamp Predicate Pushdown

To take advantage of the commit_timestamp we need to provide our queries a sql hint to force spanner to read the rows within the time-bounds of the query initially instead of later in the query process. To any sensitive time-bound queries we need to add {@allow_timestamp_predicate_pushdown=TRUE} to the beginning of our query. Ex:

  @{allow_timestamp_predicate_pushdown=TRUE}
  SELECT s.SingerInfo
  FROM Singers s
  WHERE s.ModificationTime > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 12 HOUR);

There may be cases where this hint could negatively affect performance, which are discussed more in-depth in the GCP docs here.

Step 3. Locality Group

For our use-case of “spilling over” data of a certain age to HDD, we can get by modifying the default locality group and setting our spill timespan to whatever best fits our situation.

ALTER LOCALITY GROUP `default` SET OPTIONS (ssd_to_hdd_spill_timespan = '365d');

What timespan should I use?

This is context dependent. The spillover timespan should be set to some threshold after which the data in your Spanner table is very infrequently accessed. We recommend starting higher than you think, and ratcheting down as you monitor for unexpected performance changes.