Attacking a random number generator

In software dealing with security, randomness is often necessary to generate keys or tokens for resetting passwords or identifying sessions. There, randomness is required to be unpredictable for an attacker. However, sometimes developers do not use cryptographically secure pseudo random number generators (CSPRNG) in this scenario. Instead, they utilize faster pseudorandom number generators (PRNG) which are usually used for Monte-Carlo simulations. These PRNGs, such as Python’s random.random() do not offer unpredictability and are random in the sense of no statistical relationships between their outputs.

Consequently, the question arises how hard it is to attack a common (not cryptographically secure) random number generator. We focus on analyzing the Mersenne Twister MT19937 which is the most widely used PRNG. Our approach uses an SMT solver to clone a given instance of MT19937 given 624 consecutive outputs. This attack succeeds in under ten seconds on commodity hardware. Therefore, we conclude that using a PRNG where a CSPRNG is needed represents a significant risk and may be easily exploitable.

Introduction

In statistical programming, say for Monte-Carlo simulations, random numbers are needed to simulate random events. To achieve this, usually so-called pseudorandom number generators (PRNGs) are used. Contrary to their name these are not random but deterministic algorithms depending on a starting value (seed) that outputs a sequence of numbers that ‘behave randomly’. In particular, the output sequence should be distributed uniformly and have no trivial relationships. This is sufficient for numerical simulations, as numerical simulations do not interact with the PRNG in an adversarial fashion. However, in the case of cryptography this is not enough as an attacker will try to attack the PRNG in order to, e.g., steal money or forge signatures. In cryptography, randomness is important for various tasks such as generating keys for encryption or TLS sessions. The notion of randomness in the field of cryptography is used in the sense that the outputs should be unpredictable. This additional requirement is usually not covered for common PRNGs.

When auditing a code base from a security standpoint, a common mistake is the usage of a PRNG instead of a cryptographically secure pseudorandom number (CSPRNG). This may be due to a lack of knowledge of the developer or because PRNGs are usually more efficient than CSPRNGs. The usage of a PRNG instead of a CSRPNG is easy to detect given access to the source code, but the associated risk is well-understood only in the cryptographic community. Therefore, we want to show how easy it is to break a PRNG. For this purpose, we provide code to clone an instance of MT19937 – by far the most widely used PRNG – given 624 consecutive outputs.

A possible attack scenario is as follows. Assume a web server uses MT19937 for generating session tokens. An attacker can query the server for a sequence of 624 outputs by repeatedly logging in and out and recording the session tokens. From these outputs the attacker may be able to clone the PRNG and generate the next outputs and consequently the next session tokens. As a result, the attacker is able to hijack all future sessions as long as its cloned PRNG is manually synchronized with the PRNG on the server.

Preliminaries

Our analysis focuses on the Mersenne Twister. This is the most widely used pseudorandom number generator (PRNG). We focus on the version MT19937, which has a period of 2^19937−1. It is used by default in many libraries and programs such as PHP, Python, Ruby, Microsoft Excel, and many more.

Note that even though Python uses MT19937 internally, we reimplement it in pure Python. The implementation that is actually used in Python is done within a C module (see https://github.com/python/cpython/blob/3.8/Modules/_randommodule.c). While our attack works on Pythons random.random(), the presentation of the attack is more straightforward when attacking a reimplementation of MT19937 in pure Python.

The Mersenne Twister MT19937 has an internal state consisting of 624 32-bit integers which is periodically updated. Additionally, the Mersenne Twister contains some static parameters.

class mersenne_rng(object):
    def __init__(self, seed=5489):
        self.state = [0]*624
        self.f = 1812433253
        self.m = 397
        self.u = 11
        self.s = 7
        self.b = 0x9D2C5680
        self.t = 15
        self.c = 0xEFC60000
        self.l = 18
        self.index = 624
        self.lower_mask = (1 << 31)-1
        self.upper_mask = 1 << 31

After an output is requested, one of the internal state integers is updated. As MT19937 is a generalized feedback shift register, the integer is updated by assigning a function of the other state variables to it. This state update is called twisting and is shown below.

def twist(self):
    for i in range(624):
        temp = self.int_32(
            (self.state[i] & self.upper_mask)+(self.state[(i+1) % 624] & self.lower_mask))
        temp_shift = temp >> 1
        if temp % 2 != 0:
            temp_shift = temp_shift ^ 0x9908b0df
        self.state[i] = self.state[(i+self.m) % 624] ^ temp_shift
    self.index = 0

The output of MT19937 is computed by applying a function on a single state integer. This function is usually called temper. It implements a multiplication of a state bit with a tempered matrix to improve the statistical properties of the output. In particular, the purpose of transforming an internal state integer to an output by tempering is not performed to increase security. The temper function is shown below.

def temper(self, in_value):
    y = in_value
    y = y ^ (y >> self.u)
    y = y ^ ((y << self.s) & self.b)
    y = y ^ ((y << self.t) & self.c)
    y = y ^ (y >> self.l)
    return y

The Attack

As each output is a tempered state integer, the outputs correspond to the internal state of MT19937. The tempering function can be computationally inverted, in the sense that a preimage can be found. Consequently, we can recover an internal state that would produce the same sequence of outputs. Note that we cannot find ‘the’ preimage as tempering is not injective. Thus, if we have 624 outputs we can ‘untemper’ them and get the (a) complete internal state of MT19937. We implement the untempering by using the SMT solver Z3. An SMT solver is basically a program that can solve equations in a satisfiability modulo theory. Our approach uses a system of equations with bit vectors in order to untemper the output, as shown below.

from z3 import *

def untemper(out):
    """
    This is the untemper function, i.e., the inverse of temper. This
    is solved automatically using the SMT solver Z3. I could probably
    do it by hand, but there is a certain elegance in untempering symbolically.
    """
    y1 = BitVec('y1', 32)
    y2 = BitVec('y2', 32)
    y3 = BitVec('y3', 32)
    y4 = BitVec('y4', 32)
    y = BitVecVal(out, 32)
    s = Solver()
    equations = [
        y2 == y1 ^ (LShR(y1, 11)),
        y3 == y2 ^ ((y2 << 7) & 0x9D2C5680),
        y4 == y3 ^ ((y3 << 15) & 0xEFC60000),
        y == y4 ^ (LShR(y4, 18))
    ]
    s.add(equations)
    s.check()
    return s.model()[y1].as_long()

Observations

Even though we used an SMT solver which is usually considered slow for cloning an instance of MT19937, the running time is practical. On a current Lenovo Thinkpad laptop cloning takes under ten seconds. Of course optimizations are possible, but ten seconds is usually fast enough for an attack.

Interestingly, the tempering function is not injective. This means that multiple internal states produce the same sequence of outputs. When we clone MT19937 we do not necessarily get the same internal state, but we can get a different state that produces the same sequence of outputs. The future outputs of both, the cloned and the original Mersenne Twister are the same.

We tried computing the seed from these states, but some of these states do not correspond to a seed at all. Consequently, seed recovery from the internal state integers is hard and should instead be done directly from the outputs. Fortunately, a full seed recovery is often not necessary, as for most scenarios it is enough to clone the generator. The generated outputs of the cloned MT19937 are the same as for the original MT19937.

Conclusion

We showed that it is practical to clone a PRNG. Consequently, when developing software, care should be taken to use a CSPRNG when necessary. CSPRNGs produce outputs that are unpredictable under an appropriate attacker model, whereas PRNGs are optimized for speed.

References

  • The implementation of MT19937 that we used: https://github.com/james727/MTP
  • Cloning MT19937 is one of the cryptopals challenges. Consequently there are other solutions to this problem out on the Internet such as this one.
  • The original PRNG in Python is available here. We did not use that one directly, because it is a C-module inside of Python and thus is not easy to access directly.
  • Title picture: Photo by Pixabay from Pexels

Code

Below is the complete Python code. Alternatively, it can be downloaded here.

#!/usr/bin/env python

# Randomness is the true foundation of mathematics.
# -- Gregory Chaitin

# This code clones a pseudorandom number generator (PRNG). The
# attacked PRNG is the Mersenne Twister (MT19937)
# (https://en.wikipedia.org/wiki/Mersenne_Twister) as it is used nearly
# everywhere.

# The internal state of MT19937 consists of 624 32-bit integers. Each of
# those correspond to an output. In particular, there is a temper
# function that maps an integer of the internal state to an output. This
# function is invertible. I.e., there is a function "untemper" that can
# even be computed analytically.
# If I have 624 consecutive numbers from an MT19937 output, I
# can recover the whole internal state.

# I use the implementation of MT19937 from here:
# https://github.com/james727/MTP

from z3 import *

# heavily based on https://github.com/james727/MTP
# Usage:
# generator = mersenne_rng(seed = 123)
# random_number = generator.get_random_number()


class mersenne_rng(object):
    def __init__(self, seed=5489):
        self.state = [0]*624
        self.f = 1812433253
        self.m = 397
        self.u = 11
        self.s = 7
        self.b = 0x9D2C5680
        self.t = 15
        self.c = 0xEFC60000
        self.l = 18
        self.index = 624
        self.lower_mask = (1 << 31)-1
        self.upper_mask = 1 << 31

        # update state
        self.state[0] = seed
        for i in range(1, 624):
            self.state[i] = self.int_32(
                self.f*(self.state[i-1] ^ (self.state[i-1] >> 30)) + i)

    def twist(self):
        for i in range(624):
            temp = self.int_32(
                (self.state[i] & self.upper_mask)+(self.state[(i+1) % 624] & self.lower_mask))
            temp_shift = temp >> 1
            if temp % 2 != 0:
                temp_shift = temp_shift ^ 0x9908b0df
            self.state[i] = self.state[(i+self.m) % 624] ^ temp_shift
        self.index = 0

    def temper(self, in_value):
        y = in_value
        y = y ^ (y >> self.u)
        y = y ^ ((y << self.s) & self.b)
        y = y ^ ((y << self.t) & self.c)
        y = y ^ (y >> self.l)
        return y

    def get_random_number(self):
        if self.index >= 624:
            self.twist()
        out = self.temper(self.state[self.index])
        self.index += 1
        return self.int_32(out)

    def int_32(self, number):
        return int(0xFFFFFFFF & number)


# compare with
# https://blog.infosectcbr.com.au/2019/08/cryptopals-challenge-23-clone-mt19937.html

def untemper(out):
    """
    This is the untemper function, i.e., the inverse of temper. This
    is solved automatically using the SMT solver Z3. I could prpbably
    do it by hand, but there is a certain elegance in untempering symbolically.
    """
    y1 = BitVec('y1', 32)
    y2 = BitVec('y2', 32)
    y3 = BitVec('y3', 32)
    y4 = BitVec('y4', 32)
    y = BitVecVal(out, 32)
    s = Solver()
    equations = [
        y2 == y1 ^ (LShR(y1, 11)),
        y3 == y2 ^ ((y2 << 7) & 0x9D2C5680),
        y4 == y3 ^ ((y3 << 15) & 0xEFC60000),
        y == y4 ^ (LShR(y4, 18))
    ]
    s.add(equations)
    s.check()
    return s.model()[y1].as_long()


def recover_state_mt(numbers):
    """
    This function recovers the internal state of MT19937 given a
    sequence of outputs. Note that there can be multiple states of an
    MT19937 that yield the same sequence of outputs.
    """
    state = []
    for n in numbers[0:624]:
        state.append(untemper(n))
    return state


def main():
    """
    This function tests the implementation.
    We clone the RNG from its output and compare the next generated
    outputs of the real and the cloned PRNG. Then, we try to recover
    the seed.
    """
    rng = mersenne_rng(1337)
    print(f"real internal state of PRNG: {rng.state[0:10]} ...")
    random_nums = []
    print("generating random numbers")
    for i in range(624):
        random_nums.append(rng.get_random_number())
    print(f"generated numbers: {random_nums[0:10]} ... ")
    print("recover internal state of PRNG")
    recovered_state = recover_state_mt(random_nums)
    print(f"recovered internal state: {recovered_state[0:10]} ... ")
    print("cloning PRNG")
    cloned_rng = mersenne_rng()
    cloned_rng.state = recovered_state

    print("check equality of next 1000 outputs from the real and cloned rng")
    for i in range(1000):
        assert(cloned_rng.get_random_number() == rng.get_random_number())
    print('Success!')


if __name__ == "__main__":
    main()

Dr. Henning Kopp