Getting progress out of a Redis SCAN cursor

14 Jun 2020 - 5 minute read

tl; dr

The cursor is not a random number, but is actually a reverse-bitwise number that indicates the index on a hash table, where redis stores the keys for each database. By reversing the cursor, you can see your progress while SCANning!

See the code and comments below if you’re in a hurry, or see the explanation below if you want more insights.

function reverseBits($cursor, $pow) {
    //This is probably not the fastest way to invert a binary
    //number. If you want speed, you could probably do a conversion 
    //from little to big endianness

    //Convert from decimal to a binary string
    $bin = decbin($cursor);

    //Pad the string on the left in order to have $pow chars
    $bin = str_pad($bin, $pow, "0", STR_PAD_LEFT);

    //Reverse string, then convert back to decimal
    return bindec(strrev($bin));
}
function scanProgress($cursor, $totalKeys) {
    //Each database in redis is a hash table with size 2^x, depending
    //on how many keys are present. To find x, we need to find the 
    //closest next power of two 
    $nextPowerOfTwo = ceil(log($totalKeys)/log(2));

    //Calculate the hash table size
    $hashTableSize = pow(2, $nextPowerOfTwo);

    //Get how many keys were read from the cursor by reversing 
    //the binary representation of the cursor
    $hashTableIndex = reverseBits($cursor, $nextPowerOfTwo);

    //Calculating percent is now easy, the proportion 
    //is $hashTableIndex : $hashTableSize = x : 100
    $percent = $hashTableIndex * 100 / $hashTableSize;
    return $percent;
}

//To use:
$redis->select(10);
$totalKeys = $redis->dbSize();
$cursor = 0;
while (false !== ($keys = $redis->scan($cursor))) {
    $percent = scanProgress($cursors, $totalKeys);
    echo "Iteration $cursor, progress $percent%";
}

function reverseBits(cursor, pow) {
    //This is probably not the fastest way to invert a binary
    //number. If you want speed, you could probably do a conversion 
    //from little to big endianness

    //Convert from decimal to a binary string
    let bin = cursor.toString(2);

    //Pad the string on the left in order to have $pow chars
    bin = bin.padStart(pow, "0");

    //Reverse string, then convert back to decimal
    return parseInt(bin.split("").reverse().join(""), 10);
}
function scanProgress(cursor, totalKeys) {
    //Each database in redis is a hash table with size 2^x, depending
    //on how many keys are present. To find x, we need to find the 
    //closest next power of two 
    const nextPowerOfTwo = Math.ceil(Math.log(totalKeys)/Math.log(2));

    //Calculate the hash table size
    const hashTableSize = Math.pow(2, nextPowerOfTwo);

    //Get how many keys were read from the cursor by reversing 
    //the binary representation of the cursor
    const hashTableIndex = reverseBits(cursor, nextPowerOfTwo);

    //Calculating percent is now easy, the proportion 
    //is keysRead : hashTableSize = x : 100
    const percent = hashTableIndex * 100 / hashTableSize;
    return percent;
}

//To use:
!async function() {
    await redis.select(10); //Select a db

    const totalKeys = await redis.dbSize();
    let cursor = 0;
    while (false !== (keys = await redis.scan(cursor))) {
        const percent = scanProgress(cursors, totalKeys);
        console.log(`Iteration ${cursor}, progress ${percent}%`;
    }
}()

How the cursor works

Let’s imagine we have a database with 200 keys. To store N keys, Redis reserves a hash table bigger than N, so that it doesn’t have to reserve a bigger or smaller space everytime you delete or add keys. Namely it reserves a number that follows the power of 2, so that the following is true: 2^x > N. In this case, the closest power of 2 bigger than 200 is 8 (math).

$totalKeys = $redis->dbSize();
$nextPowerOfTwo = ceil(log($totalKeys)/log(2));
$hashTableSize = pow(2, $nextPowerOfTwo);

So, in this case redis reserves a hash table as big as 2^8, that is 256 or in binary 100000000. As a result, the hash table of a 200 keys database goes from index 00000000 to index 100000000. Once you use more or less keys than that, Redis will increase or decrease that power of 2, with a process called rehashing, see below for this topic.

Back to the cursor, what Redis does is take each cursor and reverse the binary value: by doing so it obtains an actual index, one that in our example is between 00000000 to index 100000000. Starting from that index, Redis increase the index while looping over the hash table, until at least COUNT keys are found, then reverse the binary again, converting it to decimal and returning it to you. So for example, with the same database with 200 keys, SCAN 0 COUNT 10 could follow this flow:

COMMAND  BINARY     SWAPPED    NEW BIN   DECIMAL  SWAPPED   NEW CURSOR
SCAN 0   00000000   00000000   00001010  10       01010000  80
SCAN 80  01010000   00001010   00011110  30       01111000  120
SCAN 120 01111000   00011110   00101000  40       00010100  20
SCAN 20  ...

The index increase (column DECIMAL) is not linear because Redis can read multiple indexes until it finds COUNT keys, and in the same index there can be more than one key. That’s why Redis can sometimes return more keys than there are in COUNT.

Example of a Redis hash table

You can also see that all the binary numbers have 8 bits each, which is same as the corresponding power of 2, which is important to maintain the correct index when reversing using strings. 80 in binary is 1010000 (usually you can discard the leftmost 0), but inverted is 0000101 (5 in decimal). 80 in binary with left padding to 8 bits becomes 01010000 which inverted is 00001010 (10 in decimal form). You can see that it’s quite an error, just for a missing 0 😑.

$binary = decbin($cursor);

//After converting to binary, pad the binary string 
//to the correct number of bits ($pow)
$binary = str_pad($binary, $pow, "0", STR_PAD_LEFT);

//Here we have the actual hash table position!
$hashTableIndex = bindec(strrev($binary));

After reversing the bits of the cursor, you can do a simple proportion to see at which percent you’ve come along while using SCAN:

$percent = $hashTableIndex * 100 / $hashTableSize;
echo "Iteration $cursor, progress $percent%";

And you’ll have this:

Iterator 80, progress 3.90625%
Iterator 120, progress 11.71875%
Iterator 20, progress 15.625%

Max's Cookies