Zh3r0 CTF V2 - Real Mersenne

Author: Quintec
Date: Jun 6, 2021

Do you believe in games of luck? I hope you make your guesses real or you’ll be floating around.

nc crypto.zh3r0.cf 4444

Source Analysis

 1import random
 2from secret import flag
 3from fractions import Fraction
 4
 5def score(a,b):
 6    if abs(a-b)<1/2**10:
 7        # capping score to 1024 so you dont get extra lucky
 8        return Fraction(2**10)
 9    return Fraction(2**53,int(2**53*a)-int(2**53*b))
10
11total_score = 0
12for _ in range(2000):
13    try:
14        x = random.random()
15        y = float(input('enter your guess:\n'))
16        round_score = score(x,y)
17        total_score+=float(round_score)
18        print('total score: {:0.2f}, round score: {}'.format(
19            total_score,round_score))
20        if total_score>10**6:
21            print(flag)
22            exit(0)
23    except:
24        print('Error, exiting')
25        exit(1)
26else:
27    print('Maybe better luck next time')

We can see that we have to try to guess floats generated by random.random(), and we have 2000 attempts to get 1,000,000 score. The strategy is then to clone the state of the random with the first guesses, and then use at most 977 guesses in the end to get the required number of points.

Python’s Random

Python’s random module uses a Mersenne Twister inside, as hinted by the title of the challenge. So how does a Mersenne Twister work? The short answer is through a series of states, which each consist of 624 integers (in most cases, including this one, 32-bit). When you ask for random values, some of these 624 values are used up, and each value is used only once. For example, if you call random.getrandbits(32), exactly one 32-bit state value is used, after being tempered (slightly modified), to make it harder to reverse. Once all 624 values have been used, they are “twisted” to generate a set of 624 new values, and the cycle repeats. (The generator has a period of $2^{19937} - 1$, which is why we sometimes refer to it as MT19937.)

But how does that produce floats between 0 and 1? Well let’s take a look at some source code.

 1/* random_random is the function named genrand_res53 in the original code;
 2 * generates a random number on [0,1) with 53-bit resolution; note that
 3 * 9007199254740992 == 2**53; I assume they're spelling "/2**53" as
 4 * multiply-by-reciprocal in the (likely vain) hope that the compiler will
 5 * optimize the division away at compile-time.  67108864 is 2**26.  In
 6 * effect, a contains 27 random bits shifted left 26, and b fills in the
 7 * lower 26 bits of the 53-bit numerator.
 8 * The original code credited Isaku Wada for this algorithm, 2002/01/09.
 9 */
10
11/*[clinic input]
12_random.Random.random
13  self: self(type="RandomObject *")
14random() -> x in the interval [0, 1).
15[clinic start generated code]*/
16
17static PyObject *
18_random_Random_random_impl(RandomObject *self)
19/*[clinic end generated code: output=117ff99ee53d755c input=afb2a59cbbb00349]*/
20{
21    uint32_t a=genrand_uint32(self)>>5, b=genrand_uint32(self)>>6;
22    return PyFloat_FromDouble((a*67108864.0+b)*(1.0/9007199254740992.0));
23}

So we see that calling random.random() uses up 2 state values, discards some trailing bits from them, and combines them in the numerator of some fraction to generate a random float between 0 and 1. More specifically, 27 bits and 26 bits respectively of two states are combined and divided by $2^{53}$.

The Plan

There are still a lot of issues we will need to deal with, but a hazy plan can be formed at this point. We can pass in an easy value, such as 0, to the guess function, and use the round score to solve for the numerator of the fraction in the code above (since the score is conveniently given as a Fraction with values multiplied by $2^{53}$!). Each float recovers 2 state values, so repeating this 312 times should give us 624 state values, enough to clone the random state and predict the next values perfectly, with ~800 guesses to spare at the end.

Now there are a few main issues with this plan:

Truncation

Even after solving for the numerator using the round score, and splitting that integer into the upper 27 and lower 26 bits, we’re still missing a combined 11 bits of the 2 states! If we were to brute force those bits for all 312 guesses, that would take $2^{3432}$ attempts, clearly infeasible. The solution here is to solve for the state symbolically, since the 624 values in one state are related to each other mathematically, and this should be enough information to solve for the missing bits. To save time, a teammate remembered this repository, which was used to solve a related challenge from PlaidCTF. (Which I spent many hours of my life going down the wrong path on… but I digress.)

Precision

This one was a killer, and I honestly still have no idea why this happened. But even though int(2**53*a) should be precise due to how python stores floats, and that the default precision for floats is 53 bits, in my local testing the recovered range of values for b (the lower 26 bits in the numerator) were always a little off when compared to the actual value in the random state. However, this too had a relatively simple fix, had I not wasted many hours trying to fix the precision without knowing where the error was. I just had to take less bits of b for granted, and solve for all the rest. To mitigate the smaller amount of info, I used up 624 guesses (2 whole states) instead of 312, since we had 2000 total guesses.

After making these fixes, and waiting 30 minutes for the agonizingly slow server since I did not live in India, the flag finally popped out. The full lightly-commented source code of the solution is below.

  1#This class from https://github.com/icemonster/symbolic_mersenne_cracker
  2"""
  3MIT License
  4
  5Copyright (c) 2021 icemonster
  6
  7Permission is hereby granted, free of charge, to any person obtaining a copy
  8of this software and associated documentation files (the "Software"), to deal
  9in the Software without restriction, including without limitation the rights
 10to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 11copies of the Software, and to permit persons to whom the Software is
 12furnished to do so, subject to the following conditions:
 13
 14The above copyright notice and this permission notice shall be included in all
 15copies or substantial portions of the Software.
 16
 17THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 18IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 19FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 20AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 21LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 22OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 23SOFTWARE.
 24"""
 25
 26from z3 import *
 27from random import Random
 28from itertools import count
 29import time
 30import logging
 31
 32logging.basicConfig(format='STT> %(message)s')
 33logger = logging.getLogger()
 34logger.setLevel(logging.DEBUG)
 35
 36SYMBOLIC_COUNTER = count()
 37
 38class Untwister:
 39    def __init__(self):
 40        name = next(SYMBOLIC_COUNTER)
 41        self.MT = [BitVec(f'MT_{i}_{name}', 32) for i in range(624)]
 42        self.index = 0
 43        self.solver = Solver()
 44
 45    #This particular method was adapted from https://www.schutzwerk.com/en/43/posts/attacking_a_random_number_generator/
 46    def symbolic_untamper(self, solver, y):
 47        name = next(SYMBOLIC_COUNTER)
 48
 49        y1 = BitVec(f'y1_{name}', 32)
 50        y2 = BitVec(f'y2_{name}' , 32)
 51        y3 = BitVec(f'y3_{name}', 32)
 52        y4 = BitVec(f'y4_{name}', 32)
 53
 54        equations = [
 55            y2 == y1 ^ (LShR(y1, 11)),
 56            y3 == y2 ^ ((y2 << 7) & 0x9D2C5680),
 57            y4 == y3 ^ ((y3 << 15) & 0xEFC60000),
 58            y == y4 ^ (LShR(y4, 18))
 59        ]
 60
 61        solver.add(equations)
 62        return y1
 63
 64    def symbolic_twist(self, MT, n=624, upper_mask=0x80000000, lower_mask=0x7FFFFFFF, a=0x9908B0DF, m=397):
 65        '''
 66            This method models MT19937 function as a Z3 program
 67        '''
 68        MT = [i for i in MT] #Just a shallow copy of the state
 69
 70        for i in range(n):
 71            x = (MT[i] & upper_mask) + (MT[(i+1) % n] & lower_mask)
 72            xA = LShR(x, 1)
 73            xB = If(x & 1 == 0, xA, xA ^ a) #Possible Z3 optimization here by declaring auxiliary symbolic variables
 74            MT[i] = MT[(i + m) % n] ^ xB
 75
 76        return MT
 77
 78    def get_symbolic(self, guess):
 79        name = next(SYMBOLIC_COUNTER)
 80        ERROR = 'Must pass a string like "?1100???1001000??0?100?10??10010" where ? represents an unknown bit'
 81
 82        assert type(guess) == str, ERROR
 83        assert all(map(lambda x: x in '01?', guess)), ERROR
 84        assert len(guess) <= 32, "One 32-bit number at a time please"
 85        guess = guess.zfill(32)
 86
 87        self.symbolic_guess = BitVec(f'symbolic_guess_{name}', 32)
 88        guess = guess[::-1]
 89
 90        for i, bit in enumerate(guess):
 91            if bit != '?':
 92                self.solver.add(Extract(i, i, self.symbolic_guess) == bit)
 93
 94        return self.symbolic_guess
 95
 96
 97    def submit(self, guess):
 98        '''
 99            You need 624 numbers to completely clone the state.
100                You can input less than that though and this will give you the best guess for the state
101        '''
102        if self.index >= 624:
103            name = next(SYMBOLIC_COUNTER)
104            next_mt = self.symbolic_twist(self.MT)
105            self.MT = [BitVec(f'MT_{i}_{name}', 32) for i in range(624)]
106            for i in range(624):
107                self.solver.add(self.MT[i] == next_mt[i])
108            self.index = 0
109
110        symbolic_guess = self.get_symbolic(guess)
111        symbolic_guess = self.symbolic_untamper(self.solver, symbolic_guess)
112        self.solver.add(self.MT[self.index] == symbolic_guess)
113        self.index += 1
114
115    def get_random(self):
116        '''
117            This will give you a random.Random() instance with the cloned state.
118        '''
119        logger.debug('Solving...')
120        start = time.time()
121        print(self.solver.check())
122        model = self.solver.model()
123        end = time.time()
124        logger.debug(f'Solved! (in {round(end-start,3)}s)')
125
126        #Compute best guess for state
127        state = list(map(lambda x: model[x].as_long(), self.MT))
128        result_state = (3, tuple(state+[self.index]), None)
129        r = Random()
130        r.setstate(result_state)
131        return r
132
133#below code, the actual exploit, is mine
134import gmpy2
135from pwn import *
136
137ut = Untwister()
138sh = remote('crypto.zh3r0.cf', 4444)
139for ii in range(624):
140    print(ii)
141    sh.recvline()
142    sh.sendline(b'0')
143    lin = sh.recvline()
144    flag = False
145    sc = lin.decode().strip().split('round score: ')[1]
146    if '/' in sc:
147        score = sc.split('/')
148    else:
149        score = sc
150        flag = True
151    if not flag:
152        numer = int(score[0])
153        denom = int(score[1])
154        state_comb = gmpy2.c_div(gmpy2.mpz(9007199254740992) * gmpy2.mpz(denom), numer) + 1 # solve for state
155        bits = bin(state_comb)[2:]
156        bits = '0' * (53 - len(bits)) + bits
157        a = bits[:27] #upper
158        b = bits[27:] #lower
159        assert len(a) == 27
160        assert len(b) == 26
161        ut.submit(a + '?' * 5) #a is truncated 5 bits
162        ut.submit(b[:13] + '?' * 19) # b bits are imprecise, take only 13
163    else:
164        ut.submit('0' * 10 + '?' * 22) #we only know that the first 10 bits are 0 (1/1024)
165        ut.submit('?' * 16 + '?' * 16)
166
167rec = ut.get_random()
168
169#score 1024!
170while True:
171    print(sh.recvline())
172    sh.sendline(str(rec.random()))
173    print(sh.recvline())