Mersenne Twister: The algorithm which fakes randomness really well

[README] Warning: Jargon heavy paper [README]

This paper is rather jargon heavy for non technical people and beginners who do not know much about cryptography or random number generation. So, here we cover most of the technical jargon and words used in this paper. Pseudo Random Number Generator (PRNG): An algorithm which generates statistically random-like numbers using deterministic logic
Seed: The initial value to start a PRNG
Internal State: A set of values used by the PRNG to generate further outputs
Diffusion: A property where a small change spreads out and affects many parts of the output
Avalanche effect: When a small change in input leads to a large, unpredictable change in output
Hash function: A function that converts input data into a fixed-size string of bytes
Cryptographic hash function: A hash function designed to be secure and suitable for cryptographic use for example SHA256, BlAKE2 etc
Cryptographically Secure Pseudo Random Number Generator (CSPRNG): A type of PRNG where predicting the future or past outputs in not possible
Bitwise AND: An operation that compares bits of two numbers and returns 1s only where both have 1s
Bitwise XOR: Bitwise exclusive OR which returns 1 if bits differ, 0 if they are the same
Bitshift: Moves bits left or right within a number
Hamming weight: The number of 1s in a binary representation of a number. High Hamming weight = many bits set as 1
You will also see many arbitary constant values in the code of the paper, a basic explanation of why those constants are used has been provided. All of them are the part of the MT19937 specification

Introduction

The Mersenne Twister (MT) is arguably the most well known pseudorandom number generator (PRNG) in existence. It was developed in the 1990s by Japanese cryptographers Makoto Matsumoto and Takuji Nishimura. Today, it is used as the default PRNG in languages such as PHP, Python and R.

Its name comes from the fact that it has period length 2^19937 -1, a Mersenne Prime

While MT has several variants, we will only be talking about the standard implementation ie the MT19937. MT19937 is a 32 bit generator and it is used by Python as its default generator.

A high level look at MT19937

Like all PRNGs, we start our journey at the seed value. It is the input a PRNG recieves from which it populates its internal state and finally generates the random numbers. Once the seed value is recieved, the generator extends it to populate its internal state.

MT19937 has an internal state array containing 624 elements, each element being a 32 bit integer. There exist many algorithms to convert the seed to the initial state, but we will be primarily focusing on how it is done in the Python default generator ie the random module.

In the random module, seeding can be done using random.seed(value-here) where the value can be anything, an integer, a float, a string or even a bytearray. First the seed undergoes preprocessing, where its type is determined and it is processed accordingly, and a key array is constructed from the seed.

After this the 624 element state array is constructed using a constant value. Once the base state is ready, the key array is diffused throughout the state. This process of "state mixing" takes place twice and then with a few minor changes, the initial state is ready. Once the initial state is ready, it undergoes the "twist" operation and then you can start generating random numbers.

Think of MT19937 as a gun and the random numbers as the bullets. With the initialization process, the parts have been assembled into the actual gun. Now think of the twist operation as putting in a magazine containing 624 bullets.

Every time MT19937 has the generate a new random number, it takes the integer at the i-th index. This integer now undergoes another operation known as "tempering", and the tempered value is the output you recieve. Once you have generated 624 random numbers, the internal state undergoes the twist operation once again , think of it as putting in another magazine after using all the bullets.

                    MT19937
                
1. Seed Input
Initialize with seed value
seed(5489)
32-bit integer
2. State Array
Populate 624 elements
state[0..623]
Extend seed into array
3. Twist Operation
Generate new state values
twist() → new_state[]
Updates all 624 elements
4. Extract Value
Get next state element
value = state[index]
Index increments each call
5. Tempering
Transform state value
temper(value) → output
Bit shifts & XOR operations
6. Random Output
32-bit random number
0x3A7F2B1E
Final pseudorandom value

Now we will deep dive into how each one of these operations is implemented.

Assembling the gun: Seeding and state initialization

As stated earlier, MT19937 can take any integer, float, string or a bytearray as its input. Python processes the seed using a mutli step process, first the type of the seed is determined.

If the seed type is INT (integer), it is used directly. If it is a float, then python uses a non cryptographic hash function to process it. If it is a string or bytearray, the python uses the SHA2-512 cryptographic hash function to convert it into an integer.

Once the pre-processing is done, the seed value is split into 32 bit chunks from right to left. For example if the seed after pre-processing is 0x909192939495969798, it is split into [0x95969798 , 0x91929394 , 0x90] which forms our key array consisting of 32 bit integers.

If initially no seed is provided ie None is used as the seed, then python uses system entropy to form the key array for the generator. This system entropy is OS dependent, for example the BCryptGenRandom() API call on Windows and /dev/urandom on Unix/Linux based systems. 624 random 32 bit integers are used to create the key array.

If system entropy is unavailable for some reason, python fallbacks to using the system time and process ID (PID) for seeding. It uses current time (2 x 32 bit values), monotonic time (2 x 32 bit values) and the PID (1 x 32 bit value) for a total of 160 bits of entropy in the best case, though the actual entropy is usually a lot lower.

INTEGER

12345

Used directly without modification

No processing

32-bit integer

FLOAT

3.14159

Processed using non-cryptographic hash function

hash(float)

32-bit integer

STRING

"hello"

SHA2-512 hash

SHA2-512(string)

32-bit integer

BYTEARRAY

b'\x01\x02\x03'

SHA2-512 hash

SHA2-512(bytes)

32-bit integer

NONE

None

Use system CSPRNG

OS entropy source

624 × 32-bit array

Once they key array is ready, the initial state is initialized using the starting constant of 19650218. This value forms the first cell of the 624 element internal state. To explain the key mixing procedure, we use the following variables:

Internal state --> mt (Array of 624 elements, each element being a 32 bit integer)
The initial state is constructed using a linear congruential relation like this:

mt = [0]*624 mt[0] = 19650218; for i in range(1 , 624): mt[i] = (1812433253 * (mt[i-1] ^ (mt[i-1] >> 30)) + i) & 0xFFFFFFFF

First ,mt is initialized as an array of 624 0s, then mt[0], the first element in the state is set to 19650218

Then a loop is run from 1 to 623, where each element's value is calculated as

mt[i] = (1812433253 * (mt[i-1] ^ (mt[i-1] >> 30)) + i) & 0xFFFFFFFF
mt[i-1] is the previous element
mt[i-1] >> 30 is the previous element but bit shifted to the right 30 times
(mt[i-1] ^ (mt[i-1] >> 30)) is the previous element XORed with its bitshifted version
(1812433253 * (mt[i-1] ^ (mt[i-1] >> 30)) + i) it is then multiplied with a constant value of 1812433253 and the index value is added to it
(1812433253 * (mt[i-1] ^ (mt[i-1] >> 30)) + i) & 0xFFFFFFFF finally the value undergoes bitwise AND with 0xFFFFFFFF to ensure that it remains a 32 bit integer

After the initial state is ready, the first round of key mixing takes place. For this we make use of the following variables:

Internal state --> mt (Array of 624 elements, each element being a 32 bit integer)

Key array --> k (Array of variable length, each element being a 32 bit integer)

Updated internal state --> mt2 (The internal state after mixing, array of 624 element, each element being a 32 bit integer)

mt2 = mt.copy() for i in range(1, 624): j = (i - 1) % len(k) mt2[i] = ((mt[i] ^ ((mt2[i-1] ^ (mt2[i-1] >> 30)) * 1664525)) + k[j] + j) & 0xFFFFFFFF mt = mt2.copy() #this is actually done inplace however for ease of understand we make use of two seperate arrays

mt2 is initialized as a copy of the original state, then a loop is run from index 1 to 623. The line j = (i - 1) % len(K) calculates the index modulo the length of the key array, ensuring the index remains in range.

Each value in the updated state is calculated as

mt2[i] = ((mt[i] ^ ((mt2[i-1] ^ (mt2[i-1] >> 30)) * 1664525)) + k[j] + j) & 0xFFFFFFFF
mt2[i-1] is the previous element in the updated internal state
(mt2[i-1] >> 30) is the previous element in the updated internal state but bit shifted to the right 30 times
(mt2[i-1] ^ (mt2[i-1] >> 30)) is the previous element in the updated internal state XORed with its bitshifted version
((mt2[i-1] ^ (mt2[i-1] >> 30)) * 1664525) then it is multiplied with a constant value of 1664525
mt[i] is the corresponding element in the old internal state
(mt[i] ^ ((mt2[i-1] ^ (mt2[i-1] >> 30)) * 1664525)) it is XORed with value we obtained earlier
((mt[i] ^ ((mt2[i-1] ^ (mt2[i-1] >> 30)) * 1664525)) + k[j] + j) the corresponding key value is added to it, aswell as the key index value
mt2[i] = ((mt[i] ^ ((mt2[i-1] ^ (mt2[i-1] >> 30)) * 1664525)) + k[j] + j) & 0xFFFFFFFF finally it undergoes bitwise AND with 0xFFFFFFFF to ensure that it remains a 32 bit integer

After this a second and final mixing loop takes place. For this we make use of the following variables:

Internal state --> mt (Array of 624 elements, each element being a 32 bit integer)

for i in range(1, 624): mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941) - i) & 0xFFFFFFFF

Here a loop is run from 1 to 623, where the value of each element in the state is computed as

mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941) - i) & 0xFFFFFFFF
mt[i-1] is the previous element
mt[i-1] >> 30 is the previous element but bit shifted to the right 30 times
(mt[i-1] ^ (mt[i-1] >> 30)) is the previous element XORed with its bitshifted version
((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941) then it is multiplied with a constant value of 1566083941
(mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941) - i) it is XORed with the current value of the internal state ie mt[i] and the index value is subtracted from it
(mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941) - i) & 0xFFFFFFFF finally it undergoes bitwise AND with 0xFFFFFFFF to ensure that it remains a 32 bit integer

Compiled initialization code

mt = [0]*624 mt[0] = 19650218; for i in range(1 , 624): mt[i] = (1812433253 * (mt[i-1] ^ (mt[i-1] >> 30)) + i) & 0xFFFFFFFF mt2 = mt.copy() for i in range(1, 624): j = (i - 1) % len(k) mt2[i] = ((mt[i] ^ ((mt2[i-1] ^ (mt2[i-1] >> 30)) * 1664525)) + k[j] + j) & 0xFFFFFFFF mt = mt2.copy() for i in range(1, 624): mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941) - i) & 0xFFFFFFFF

The three constants—1812433253, 1664525, and 1566083941—originate from different sources in the literature. The first, 1812433253, is taken from Donald Knuth’s The Art of Computer Programming, Volume 2, while the other two are taken from alternative well-established sources.

This seeding process ensures good diffusion and shows a strong avalanche effect, both of which are critical properties for any high quality PRNG. The diffusion ensures that the influence of each bit of the input seed spreads across a large portion of the internal state. The avalanche effect refers to the phenomenon where even a single bit change in the input seed causes widespread and unpredictable changes in the resulting internal state.

The twist operation: Putting the magazine in the gun

The twist operation periodically refreshes the internal state of MT19937, ensuring long-term randomness. The twist function ensures that new numbers are not just linear combinations of old ones. It also ensures that the output is statistically uniform across a high-dimensional space, making it suitable for applications requiring high quality pseudo random numbers.

To understand the twist operation, we make use of the following variables:

Internal state --> mt (Array of 624 elements, each element being a 32 bit integer)

for i in range(624): y = (mt[i] & 0x80000000) + (mt[(i+1) % 624] & 0x7FFFFFFF) mt[i] = mt[(i + 397) % 624] ^ (y >> 1) if y % 2 != 0: mt[i] = mt[i] ^ 0x9908B0DF

Here the code iterates through the internal state, and calculates an intermediate value y for all indices.

y = (mt[i] & 0x80000000) + (mt[(i+1) % 624] & 0x7FFFFFFF)
mt[i] is the current element
mt[(i+1) % 624] is the next element with a loop around, so if mt[i] is the 624th element, mt[(i+1) % 624] wraps back to the 1st element
(mt[i] & 0x80000000) the current element undergoes bitwside AND with the constant of 0x80000000
(mt[(i+1) % 624] & 0x7FFFFFFF) the next element undergoes bitwise AND with the constant of 0x7FFFFFFF
(mt[i] & 0x80000000) + (mt[(i+1) % 624] & 0x7FFFFFFF) both of the values obtained above are added to obtain y

The first half of the calculation isolates the most significant bit of the current element (mt[i]) using a bitwise AND with 0x80000000. The second part isolates the lower 31 bits of the next element (mt[(i+1) % 624]) using 0x7FFFFFFF.

These two values are added together to form an intermediate result y.

After this the current element is updated using

mt[i] = mt[(i + 397) % 624] ^ (y >> 1)
mt[(i + 397) % 624] is the next 397th element with a loop around, so if mt[i] is the 624th element, mt[(i + 397) % 624] becomes the 397th element
(y >> 1) is the previously obtained intermediate value y bitshifted to the right once
mt[(i + 397) % 624] ^ (y >> 1) both these values are XORed and then assigned to the current element

As we can see, each element in the "twisted" grid depends on three elements, the element with the same index, the element with the next index and the element located 397 positions ahead in the internal state. This provides further diffusion

Finally, if the intermediate y was odd, which is checked by the statement y % 2 != 0, the current element in the state is updated with the XOR of itself and the constant 0x9908B0DF.

This final update introduces non linearity to the function and also destroys symmetry and predictability which existed earlier due to the values being linear. The constant 0x9908B0DF also has a high Hamming weight ie it has many bits set, which means when it is XORed with the element, it changes many bits of the element. The constant also exhibits optimal behaviour required for a PRNG.

After generating 624 random numbers, the internal state is considered "exhausted" ie all the values obtained from the previous twist operation have been consumed. To continue generation more random numbers, the internal state undergoes another twist operation and it is ready to generate another batch of 624 random numbers. Think of this like reloading the gun after firing all 624 bullets.

Firing the bullets: The tempering operation and the death of any remaining linearity

To finally generate a random number, we take the nth element from the internal state. This nth counter is reset every time the twist operation is performed. To understand the temper operation in detail we make use of the following code

index = 0 #assume twist operation has just been called random_num = mt[index] random_num = random_num ^ (random_num >> 11) random_num = random_num ^ ((random_num << 7) & 0x9d2c5680) random_num = random_num ^ ((random_num << 15) & 0xefc60000) random_num = random_num ^ (random_num >> 18) if index == 623: index = 0 twist() else: index = index + 1

We start by assuming that the index is 0, ie that the twist operation was just called and the first element of the internal state is taken. Then we process it step by step:

random_num = random_num ^ (random_num >> 11) (random_num >> 11) is the variable random_num but bitshifted to the right 11 times. random_num is XORed with it and that value is used for further computations. random_num = random_num ^ ((random_num << 7) & 0x9d2c5680) (random_num << 7) is the variable random_num but bitshifted to the right 7 times. It then undergoes bitwise AND with a constant value of 0x9d2c5680 and is XORed with random_num. random_num = random_num ^ ((random_num << 15) & 0xefc60000) random_num = random_num ^ (random_num >> 18) In these two steps aswell, the number is bitshifted, XORed and undergoes bitwise AND with another constant

Finally we obtain our random_num, the output of a long sequence of operations that represent careful mathematical and algorithmic design.

This single value is the product of a carefully initialized internal state, a well tuned recurrence relation that ensures period and statistical uniformity, a twist transformation that diffuses correlations across the internal state, and a tempering process which improves the distribution of output bits.

So much just to simulate a coin toss ...