Making sense of Redis’ SCAN cursor

Why Redis returns those seemingly random numbers

At Q42 we build and maintain the Cloud of the Philips Hue lighting system. If you turn on your lights via Google Home or Alexa, this passes through the Hue Cloud. Sometimes it is necessary to perform some task for every bridge connected to the cluster (example task: synchronizing which bridges are connected to which cluster). We keep a record in Redis for every connected Hue Bridge, and it contains the address of the backend server the bridge is connected to. If we want to get the IDs of all connected bridges, we can traverse all keys in Redis. For this, Redis has two commands:

  • KEYS pattern (docs), returns all matching keys at once. This block the Redis server and you should ⚠️️️ NEVER DO THIS IN PRODUCTION ⚠️.
  • SCAN cursor [MATCH pattern] [COUNT count](docs), returns count keys (default = 10) and a seemingly random ‘cursor’ so you can get the next page of keys in a next request. This prevents blocking the server for too long.

This article is about that SCAN-command’s cursor.


Have you ever wondered what the cursor value means that Redis returns for SCAN commands? No? Read on! I set out to make sense of these seemingly arbitrary numbers.

SCAN cursor MATCH prefix:* COUNT 1000

Some time ago I was reading a StackOverflow article, that linked the documentation inside the Redis SCAN implementation. At the time, I only quickly skimmed the text. I remember thinking that it should be possible to come up with an estimate of how far along the SCAN is.

Today I ‘cracked’ the algorithm. It is not really that much of an achievement, given it is thoroughly documented, but figuring it out without the source code documentation still gave me a eureka moment. The result:

-> % node debug-redis-SCAN.js 21
INPUT   REVERSE-BITWISE       REVERSE VAL  % OF 2^21-1 (111111…)
858947  11000010110110001011  1596182      76.11%
1885267 110010100010001100111 1655911      78.96%
2008915 110010101110010101111 1662127      79.25%
1566163 110010111010011111101 1668349      79.55%
962867  11001100100011010111  1675694      79.90%
307123  1100110111110101001   1687204      80.45%
784031  11111001011011111101  2043386      97.43%

The table says it all if you look closely, but might seem a bit cryptic at first. I’ll explain what Redis does. Spoiler: it is as elegant as it is simple.

Suppose you have N keys, then Redis has reserved a hash-table that is a bit bigger than N, namely it reserves a power of 2 such that 2^X > N (read: such that it fits). In above result the X is 21, so the maximum number of ‘buckets’ the table fits is 2.097.151. Once you use more keys than that, Redis will double the table, more on that later. The minimal binary index of that table is `00000…` and the maximum is `11111…` (with N bits).

If you SCAN, you give it a number where to start, which defaults to 0. Redis will take this number, and revert the bits. Then it will read the keys at that index, increment the index, and loop, until the COUNT indexes have been tried. Suppose your table can hold 8 items (X=3, 2³=16). Lets see what happens if we SCAN with a COUNT of 1, Redis would use these values:

COMMAND  BITS SWAPPED  NEXT  SWAPPED  CURSOR RETURNED
SCAN 0   __0  000      001   100      4
SCAN 4   100  001      010   010      2
SCAN 2   _10  010      011   110      6
SCAN 6   110  011      100   001      1
SCAN 1   __1  100      101   101      5
SCAN 5   101  101      110   011      3
SCAN 3   _11  110      111   111      7
SCAN 7   111  111      000   000      0

‘Rehashing’

Here, we increase the cursor with 1 every time, by using COUNT 1. If we would instead use a larger COUNT, then Redis would scan the range of keys that are inside the `[START…START+COUNT]` range. Because it swaps the bit order, it can ‘survive’ situations where the table size is doubled/quadrupled. Example when using COUNT 2:

# no resize
COMMAND  BITS  SWAPPED  +1    +2    SWAPPED  CURSOR RETURNED
SCAN 0   __0   000      001   010   010      2
SCAN 2   _10   010      011   100   001      1
vs
# with resize
COMMAND  BITS  SWAPPED  +1    +2    SWAPPED  CURSOR RETURNED
SCAN 0   __0   000      001   010   010      2
====== table resizes to X=4 =========================
SCAN 2   __10  0100     0101  0110  0110     6
SCAN 6   _110  0110     0111  1000  0001     1

As you scan see from this demo, it doesn’t matter that the table resizes half-way the SCAN: it still reaches cursor 1, it only takes 1 extra iteration.

Ok, great. But how does this ensure we get every element at least once? Because when the table is resized, the hashing is ‘stable’. So a key that is stored at index 001 with x=3, will be stored at 0010 or 0011 with x=4. Important: the key keeps the prefix 001! And since we’re using this bit-ordering to walk through all keys, we’ll never miss a key when the table is rehashed.

Since a picture says more than a thousand words, here is a picture of what happens when we resize a x=2 table to x=4. Say we just got `[KEY2, KEY3]` and cursor 2 from Redis. If we would continue, we would get KEY1. But what happens if KEY5 is inserted first, and Redis rehashes: if we continue after the rehashing, we will get [KEY5] and cursor 6, and we’ll get [KEY1] the time after.

Rehashing from x=2 to x=3, cursor stays stable

If the table doubles, every row is ‘split’ and the existing cursors out in the client applications will point at the first of the 2 new rows.

Exploiting this knowledge

Given this knowledge, we can now do more stuff than we could before. For example:

  • if we know the approximate size of the table, we can guess X
  • given X, we can generate a cursor at approximate percentages (say: 0% & 50%) and parallelize the iteration
  • given a cursor and knowing X we can estimate how far along we are

In a follow-up article we’ll dive into how to do these things!

References


Check out our job vacancies (in Dutch) at werkenbij.q42.nl!