redpwnCTF 2021 - 2k Reversing Challenge

  • Author: Day, FizzBuzz101
  • Date:

Introduction

2k (22 solves) was a fun stack based VM reversing challenge from redpwnCTF 2021 by EvilMuffinHa. This writeup will be logically split into two parts: reversing the VM behavior in the chall binary, and then reversing the actual VM in prog.bin.

Reversing VM Behavior

Opening the binary up in Binary Ninja, we see that all the logic resides with main itself. There is one other function that is used at 0x1b70, but that is just for determining vm executable size.

In main, one of the first things that is done is initialize a VM memory region, which has a size of about 0x20000.

0000120e  4c8d9c240000feff   lea     r11, [rsp-0x20000 {vm_mem}]

00001216  4881ec00100000     sub     rsp, 0x1000
0000121d  48830c2400         or      qword [rsp], 0x0
00001222  4c39dc             cmp     rsp, r11 {vm_mem}
00001225  75ef               jne     0x1216

Then, a VM stack of size 0x800 is calloc’d and the instructions in the prog.bin is read in.

 100001259      void* stack = calloc(n: 0x800, elem_size: 2)
 20000126c      *(rsp_1 + 0x18) = 0
 300001278      FILE* vm_file = fopen(filename: "prog.bin", mode: &data_2007)
 40000127d      char const* const rdi_10
 50000127d      if (vm_file == 0)
 600001291      else
 700001291          int32_t vm_size = get_size_1b70(vm_file, rsp_1 + 0x18)
 800001296          int16_t* vm_text
 900001296          size_t size_read
1000001296          if (vm_size != 0xffffffff)
110000129f              size_t size = *(rsp_1 + 0x18)
12000012ac              vm_text = calloc(n: size, elem_size: 1)
1300001296              if (vm_text != 0)
14000012cb                  size_read = fread(buf: vm_text, size: 1, count: size, fp: vm_file)
1500001296                  if (*(rsp_1 + 0x18) == size_read)
16000012de                      void* stack = stack
17000012e8                      fclose(fp: vm_file)

Then the VM state is initialized before hitting a signed jump table at 0x21d0 to start VM execution.

000012de  4c89e3             mov     rbx, r12
000012e1  4c8d3de80e0000     lea     r15, [rel jump_table_21d0]
000012e8  e863feffff         call    fclose
000012ed  4c8b742418         mov     r14, qword [rsp+0x18]
000012f2  31c0               xor     eax, eax  {0x0}
000012f4  31c9               xor     ecx, ecx  {0x0}
000012f6  31d2               xor     edx, edx  {0x0}
000012f8  0f1f840000000000   nop     dword [rax+rax], eax

00001300  807c050053         cmp     byte [rbp+rax], 0x53
00001305  448d6a01           lea     r13d, [rdx+0x1] 
00001309  7725               ja      0x1330

0000130b  0fb6440500         movzx   eax, byte [rbp+rax]
00001310  49630487           movsxd  rax, dword [r15+rax*4]
00001314  4c01f8             add     rax, r15
00001317  3effe0             jmp     rax

Take note of these registers as they are used over and over again in the VM. r15 stores the jump table (which has 0x54 handlers), r14 stores the code size, and eax, ecx, edx are null for now. r12 in this case holds the contents of rax after the calloc call; this implies that rbx is set up to act as the stack pointer in this VM. Not shown in this disassembly is also an instruction that sets rbp to the code region.

Before I continue, I would like to discuss a few extra details about the VM and the context of a few registers. While such information is definitely unclear at this point, in later examples it will make much more sense. The VM is a stack based VM operating on shorts as its size; edx will be used throughout in conjunction with r13 as a form of letting the VM interpreter know where the next instruction is (usually r13 is loaded as an address ahead of edx, and then edx gets set to r13d in the process of opcode handling). There is also a pseudo register that is based upon ecx (I may call it the accumulator sometimes even though it really isn’t that in the traditional sense) used for operations on vm memory region.

A few main places handlers keep transfering control flow to include the following regions:

00001300  807c050053         cmp     byte [rbp+rax], 0x53
00001305  448d6a01           lea     r13d, [rdx+0x1]
00001309  7725               ja      0x1330
0000130b  0fb6440500         movzx   eax, byte [rbp+rax] 
00001310  49630487           movsxd  rax, dword [r15+rax*4]
00001314  4c01f8             add     rax, r15
00001317  3effe0             jmp     rax

This handler checks for opcode validity, and then sets r13d based on rdx according to the behavior I described previously (this makes sense as all but one instructions are 1 opcode long) before going to the correct opcode handler.

00001330  4489ea             mov     edx, r13d

00001333  89d0               mov     eax, edx
00001335  4939c6             cmp     r14, rax
00001338  73c6               jae     0x1300

Sometimes handlers set the edx to r13d already, so those would usually head to 0x1333. Otherwise, 0x1330 would be used. The next instruction location is always checked to r14, which records the size of the VM as mentioned previously. If some of these validity checks fail or the size goes over, the VM enters an exit state (around 0x133a) where memory is freed and the program exits. I might have missed a few other locations, but in general, the places where opcodes handlers transfer control flow to afterwards perform similar operations.

Now, let us look at the opcode handlers. I won’t describe every single one, but I’ll just do a few in some important categories of operations. Do note that many of the instructions in the allowed range are also just simply no-ops, and that most opcode arguments are passed via the stack and results are pushed back (hence stack based VM).

instructions used for vm mem region operations (0x50 - 0x53, 0x40-0x41)

000015ec  83c101             add     ecx, 0x1
000015ef  4489ea             mov     edx, r13d
000015f2  e93cfdffff         jmp     0x1333

This is handler 0x50, which simply increments the “accumulator”.

00001602  66034bfe           add     cx, word [rbx-0x2]
00001606  4489ea             mov     edx, r13d
00001609  4883eb02           sub     rbx, 0x2
0000160d  e921fdffff         jmp     0x1333

This is handler 0x52, which adds a value to the “accumulator”. Note how it chooses the value based on [rbx-0x2]. That is repeatedly used throughout the program as “top of the vm stack.” Notice the later sub rbx, 0x2 instruction. If you think about this, the handler is basically popping an argument to add to the “accumaltor”.

000015d4  0fb7c1             movzx   eax, cx
000015d7  4883c302           add     rbx, 0x2
000015db  4489ea             mov     edx, r13d
000015de  0fb7444420         movzx   eax, word [rsp+rax*2+0x20]
000015e3  668943fe           mov     word [rbx-0x2], ax
000015e7  e947fdffff         jmp     0x1333

This is when those “accumulator” values come in handy (handler 0x41). Also notice how the add rbx, 0x2. It’s basically pushing the offset in a short array (vm mem) based on the accumulator onto the vm stack. The write equivalent, as well as the sub and dec equivalents for the previous 2 handlers are all within this grouping of opcodes as well.

At this point, we have the following instructions mapped out (hopefully the instruction names are self-explanatory enough):

0x40: "MEMW", (POP 1 ARG)
0x41: "MEMR", (PUSH 1 RESULT)
0x50: "INCA",
0x51: "DECA",
0x52: "ADDA", (POP 1 ARG)
0x53: "SUBA", (POP 1 ARG)
000014f8  498d4502           lea     rax, [r13+0x2]
000014fc  4939c6             cmp     r14, rax
000014ff  0f8635feffff       jbe     0x133a

00001505  420fb7442d00       movzx   eax, word [rbp+r13]
0000150b  83c203             add     edx, 0x3
0000150e  4883c302           add     rbx, 0x2
00001512  668943fe           mov     word [rbx-0x2], ax
00001516  e918feffff         jmp     0x1333

This is handler 0x22. Code region validity checks are performed, and then it loads the following 2 bytes from the vm instruction pointer, before pushing that short onto the vm stack. Notice how the edx gets added with the value of 3, as this is the only instruction that is more than length 3.

00001398  4883eb02           sub     rbx, 0x2 
0000139c  4489ea             mov     edx, r13d
0000139f  eb92               jmp     0x1333

This is handler 0x2. It basically behaves like a pop.

00001387  0fb743fe           movzx   eax, word [rbx-0x2]
0000138b  4489ea             mov     edx, r13d
0000138e  4883c302           add     rbx, 0x2
00001392  668943fe           mov     word [rbx-0x2], ax
00001396  eb9b               jmp     0x1333

This is handler 0x1. I guess this instruction can sort of belong in this grouping? All it really does is pushes the top value of the stack to the top.

So now we have the following instructions mapped out:

    1: "DUP", (1 ARG USED, PUSH 1 RESULT)
    2: "POP",
    0x22: "PUSH",

“syscall-like” instructions (0x20-0x21)

0000149c  488b3d7d2b0000     mov     rdi, qword [rel stdin]
000014a3  894c240c           mov     dword [rsp+0xc], ecx
000014a7  4883c302           add     rbx, 0x2
000014ab  e840fdffff         call    getc
000014b0  8b4c240c           mov     ecx, dword [rsp+0xc]
000014b4  4489ea             mov     edx, r13d
000014b7  668943fe           mov     word [rbx-0x2], ax
000014bb  e973feffff         jmp     0x1333

This is handler 0x20 and just pushes the result of getc onto the stack.

000014c0  0fb743fe           movzx   eax, word [rbx-0x2]
000014c4  663dff00           cmp     ax, 0xff
000014c8  0f876cfeffff       ja      0x133a

000014ce  0fbfd0             movsx   edx, ax
000014d1  488d352c0b0000     lea     rsi, [rel data_2004] // %c
000014d8  31c0               xor     eax, eax  {0x0}
000014da  894c240c           mov     dword [rsp+0xc], ecx
000014de  bf01000000         mov     edi, 0x1
000014e3  4883eb02           sub     rbx, 0x2
000014e7  e8d4fcffff         call    __printf_chk
000014ec  8b4c240c           mov     ecx, dword [rsp+0xc]
000014f0  4489ea             mov     edx, r13d
000014f3  e93bfeffff         jmp     0x1333

This is handler 0x21 and just pops a value for printf.

Hence, now we have:

    0x20: "GETC", (PUSH 1 RESULT)
    0x21: "PUTC", (POP 1 ARG)

Arithmetic Based Instructions (0x10 - 0x15)

00001464  0fb743fc           movzx   eax, word [rbx-0x4]
00001468  660faf43fe         imul    ax, word [rbx-0x2]
0000146d  4489ea             mov     edx, r13d
00001470  4883eb02           sub     rbx, 0x2
00001474  668943fe           mov     word [rbx-0x2], ax
00001478  e9b6feffff         jmp     0x1333

This is handler 0x12. 2 arguments are popped from the stack, and the product is pushed back up.

Most of the instructions in this group of handlers are quite trivial to recognize. However, it is important to note the distinction between the div and mod handlers (div is 0x13 and mod is 0x14). To note down the instructions here, I will write the operand order as well (with arg0 representing the highest element on stack when handler is hit).

0x10: "ADD", (POP 2 ARGS, PUSH 1 RESULT), ARG0 + ARG1
0x11: "SUB", (POP 2 ARGS, PUSH 1 RESULT), ARG0 - ARG1
0x12: "MUL", (POP 2 ARGS, PUSH 1 RESULT), ARG0 * ARG1
0x13: "DIV", (POP 2 ARGS, PUSH 1 RESULT), ARG0 / ARG1
0x14: "MOD", (POP 2 ARGS, PUSH 1 RESULT), ARG0 % ARG1
0x15: "MOD_MUL", (POP 3 ARGS, PUSH 1 RESULT), (ARG2 * ARG3) % ARG1
00001429  0fb773fc           movzx   esi, word [rbx-0x4]  
0000142d  488d43fe           lea     rax, [rbx-0x2]
00001431  663973fe           cmp     word [rbx-0x2], si
00001435  0f844a020000       je      0x1685

0000143b  66c743fc0000       mov     word [rbx-0x4], 0x0
00001441  4489ea             mov     edx, r13d
00001444  4889c3             mov     rbx, rax
00001447  e9e7feffff         jmp     0x1333

00001685  66c743fc0100       mov     word [rbx-0x4], 0x1
0000168b  4489ea             mov     edx, r13d
0000168e  4889c3             mov     rbx, rax
00001691  e99dfcffff         jmp     0x1333

This is handler 0x16. It compares two arguments, and then pushes a 1 if they’re equal (at 0x1685) or pushes a 0 back.

0000147d  66837bfe00         cmp     word [rbx-0x2], 0x0
00001482  0f880e020000       js      0x1696

00001488  4489ea             mov     edx, r13d
0000148b  0f84a2feffff       je      0x1333

00001696  66c743feffff       mov     word [rbx-0x2], 0xffff  {0xffff}
0000169c  4489ea             mov     edx, r13d
0000169f  e98ffcffff         jmp     0x1333

This is handler 0x17. It pops an argument, checks if it is negative, and pushes -1 if it is, 0 if it is 0, and 1 otherwise.

0x16: "CMP", (POP 2 ARGS, PUSH 1 RESULT)
0x17: "IS_NEG", (POP 1 ARG, PUSH 1 RESULT)

Now for our final series of handlers!

0000151b  0fb743fe           movzx   eax, word [rbx-0x2]
0000151f  0fb753fe           movzx   edx, word [rbx-0x2]
00001523  4883eb02           sub     rbx, 0x2
00001527  66c1f80f           sar     ax, 0xf
0000152b  31c2               xor     edx, eax
0000152d  29c2               sub     edx, eax
0000152f  0fb7d2             movzx   edx, dx
00001532  e9fcfdffff         jmp     0x1333

This is handler 0x30. The series of arithmetic instruction here is actuallly abs. As we notice, a value is “popped” to the edx register before going back to one of the main VM handlers and hence changing control flow. All the VM jumps are absolute.

00001537  66837bfc00         cmp     word [rbx-0x4], 0x0 
0000153c  488d73fc           lea     rsi, [rbx-0x4]
00001540  0f84e3fdffff       je      0x1329

00001546  0fb743fe           movzx   eax, word [rbx-0x2]
0000154a  0fb753fe           movzx   edx, word [rbx-0x2]
0000154e  4889f3             mov     rbx, rsi
00001551  66c1f80f           sar     ax, 0xf
00001555  31c2               xor     edx, eax
00001557  29c2               sub     edx, eax
00001559  0fb7d2             movzx   edx, dx
0000155c  e9d2fdffff         jmp     0x1333

00001329  4889f3             mov     rbx, rsi
0000132c  0f1f4000           nop     dword [rax], eax

Here is handler 0x32. It pops the second highest value for comparison. If satisfied the jump conditional (in this case, jne), it continues and applies the same absolute value operations to change edx. If it isn’t satisfied, it just continues. The vm stack is also adjusted accordingly. All the other handlers in this family behave similarly, where the first argument is the jump destination and the second argument is used for the conditional determination.

So here is our last grouping of instructions:

0x30: "JMP", (POP 1 ARG)
0x31: "JE", (POP 2 ARGS)
0x32: "JNE", (POP 2 ARGS)
0x33: "JS", (POP 2 ARGS)
0x34: "JG", (POP 2 ARGS)
0x35: "JLE", (POP 2 ARGS)
0x36: "JNS", (POP 2 ARGS)

just one more instruction (0x3)

There was also the case of opcode 0x3.

000013a1  488d43fe           lea     rax, [rbx-0x2]  // if TOS1 == some negative junk, halt
000013a5  0fbf5bfe           movsx   ebx, word [rbx-0x2]
000013a9  83fbff             cmp     ebx, 0xffffffff
000013ac  0f8560020000       jne     0x1612

This is basically like a halt or an exit, and the end code will be popped from the top of the stack. At 0x1612, that value makes the determination of whether the player gets the flag or not.

Using all of this information, I ended up writing a Binary Ninja Plugin Module for disassembly only (an LLIL lifter was also briefly tried just for fun but any attempts to lift the jumps seemed to have drastically slowed down analysis, perhaps just due to my infamiliarity) for this custom 2k architecture, which I will be referencing throughout the reversing of the VM itself. This article does a pretty good job showing how to do so. I’ll hand off the writeup to Day from here.

Reversing the VM

All instructions are one byte, besides push, which contains a two byte inline argument. This is simple to write a disassembly script for, which served as the basis for the Binary Ninja plugin.

 1from pwn import *
 2prog = open("prog","rb").read()
 3pc = 0
 4OPS = {
 5    1: "DUP",
 6    2: "POP",
 7    3: "HLT",
 8    0x10: "ADD",
 9    0x11: "SUB",
10    0x12: "MUL",
11    0x13: "DIV",
12    0x14: "MOD",
13    0x15: "MOD_MUL",
14    0x16: "CMP",
15    0x17: "IS_NEG",
16    0x20: "GETC",
17    0x21: "PUTC",
18    0x22: "PUSH",
19    0x30: "JMP",
20    0x31: "JE",
21    0x32: "JNE",
22    0x33: "JS",
23    0x34: "JG",
24    0x35: "JLE",
25    0x36: "JNS",
26    0x40: "MEMW",
27    0x41: "MEMR",
28    0x50: "INCA",
29    0x51: "DECA",
30    0x52: "ADDA",
31    0x53: "SUBA"
32}
33while pc < len(prog):
34    print(hex(pc),end=": ")
35    inst = prog[pc]
36    print(OPS.get(inst, hex(inst)),end=" ")
37    if inst == 0x22:
38        print(hex(u16(prog[pc+1:pc+3])))
39        pc += 3
40    else:
41        print()
42        pc += 1

Which gives us about 7000 lines of instructions. This would take ages to reverse manually, but I noticed the program employed a pattern which could be semi-automated.

The first section of code reads like so

This creates “macros” for the program to refer to - the snippet at 0x4 causes the HLT instruction to fail, ending the program. Meanwhile the snippet at 0x8 causes the HLT instruction to succeed, printing the flag. The beginning of the program simply jumps past these code snippets.

The program begins by printing “sice deets:\n” via pushing each character and printing them one by one. Afterwards, it reads 64 characters of input, storing each one consecutively on the VM memory section. Before doing so, each character has 0x2f subtracted from it. The following code snippet repeats, grabbing a character, subtracting 0x2f from it, writing it to memory, and incrementing the accumulator.

After our inputted password is written to memory, the program begins executing a systematic series of mathematical checks. If any of these checks fail, the program will jump to 0x4 and end the VM. If, at the end, all checks have passed, the program jumps to 0x8 and we gain the flag. The first around 6000 lines of code all have the same type of check - an equivalence check.

This series of instructions:

Essentially, if the values at these two indexes are equal, the check fails, jumps to 0x4 and the program ends. In fact, there are several, hundreds even, of these equivalence checks of the exact same format. This means constant offsets can be used to extract every pair of indexes that must not be the same. I made an automatic script to grab every pair of indexes that must be different.

1prog = open("prog.bin","rb").read()
2pointer = 0x1fd
3while pointer < 0x2a3b:
4    num1 = prog[pointer]
5    pointer += 9
6    num2 = prog[pointer]
7    pointer += 14
8    print(num1, num2) 

After this, there are 5 or so different types of checks that it employs. These are:

The program subtracts the two numbers, and then uses the IS_NEG and MUL opcodes in conjunction to take an absolute value, thus getting the difference. This difference is then subtracted from the difference that it should be, and 0x4 is jumped to if the result is not 0.

The program pushes all the numbers onto the stack, and then runs an add(or two adds, if there are 3 numbers) to push the sum onto the stack. This is then subtracted by what the difference should be, and 0x4 is jumped to if the result is not 0.

Same as above, except MOD_MUL is used instead, and 0x7fff is pushed to the stack before every multiplication.

The program pushes an index of input onto the stack, subtracts it with what it should be, and jumps to 0x4 if the result is not 0.

In the first stage, the program multiples inp[i] mod inp[j] and inp[j] mod inp[i]. It checks the result is 0. This is only possible if one of them is a multiple of the other.

The second stage is done through a check, in which inp[i] // inp[j] and vice versa are added, and then that total is divided by inp[i] == inp[j] + 1. This value is then compared via subtraction and jne to the expected value. If inp[i] == inp[j], the divisor would be 2 whilst the dividend would be 2. This would result in an answer of 1, which is never wanted, so we can ignore this possibility.

The other possibility means that the divisor would be 1, whilst the dividend would be inp[i] // inp[j] + inp[j] // inp[i]. Since one is a multiple of the other, one of these will be 0(integer division) whilst the other will be a whole number which represents the cofactorial. (0 + a) / 1 = a, so this whole calculation will get the number C for which inp[i] = C * inp[j] OR inp[j] = C * inp[i]. This is then compared to the expected value as normal, and 0x4 is jumped to if the value is incorrect.

Every check after the equivalence checks is of this form, and at the end when all of these checks pass, 0x8 gets jumped to and the flag is printed.

z3 scripting

All these checks are fairly mathematical, allowing for them to be entered into z3 in which we can get the solution (courtesy of clubby789)

 1from z3 import *
 2FLAG_LEN = 64
 3s = Solver()
 4inp = [BitVec(f'flag_{x}', 16) for x in range(FLAG_LEN)]
 5with open('pairs.txt') as f:
 6    pairs = f.read().split('\n')
 7
 8for i in inp:
 9    s.add(i >= 1)
10    s.add(i <= 16)
11
12s.add(Or((inp[0x1e] == inp[0x26] * 2), (inp[0x1e] * 2 == inp[0x26])))
13s.add(inp[0x20] == 0x1)
14s.add(inp[0x1f] == 3 * inp[0x27] or inp[0x1f] * 3 == inp[0x27])
15s.add(inp[0x21] + inp[0x29] == 0xc)
16s.add(inp[0x22] == 4)
17s.add(inp[0x23] * inp[0x24] * inp[0x25] * inp[0x2d] % 0x7fff == 0x1e)
18s.add(inp[0x28] == 7)
19s.add(inp[0x2a] == 3 * inp[0x32] or inp[0x2a] * 3 == inp[0x32])
20s.add(inp[0x2b] + inp[0x33] + inp[0x3b] == 0xf)
21s.add(inp[0x2c] == 3)
22s.add(inp[0x2e] == 3 * inp[0x36] or inp[0x2e] * 3 == inp[0x36])
23s.add(inp[0x2f] == 2 * inp[0x37] or inp[0x2f] * 2 == inp[0x37])
24s.add(inp[0x30] + inp[0x31] + inp[0x39] == 0xc)
25s.add(inp[0x35] - inp[0x34] == 2 or inp[0x34] - inp[0x35] == 2)
26s.add(inp[0x38] == 2)
27s.add(inp[0x3a] == 5)
28s.add(inp[0x3c] - inp[0x3d] == 2 or inp[0x3d] - inp[0x3c] == 2)
29s.add(inp[0x3e] == 7 * inp[0x3f] or inp[0x3e] * 7 == inp[0x3f])
30s.add(inp[8] - inp[0] == 2 or inp[0] - inp[8] == 2)
31s.add(inp[1] * inp[9] * inp[10] % 0x7fff == 8)
32s.add(inp[2] + inp[3] == 0xc)
33s.add(inp[4] + inp[5] + inp[6] == 0x11)
34s.add(inp[7] * inp[0xf] * inp[0x17] % 0x7fff == 0xf)
35s.add(inp[0xb] == 0x7)
36s.add(inp[0xc] + inp[0x14] + inp[0x1c] == 0x7)
37s.add(inp[0xd] == 0x8)
38s.add(inp[0xe] + inp[0x16] == 0xc)
39s.add(inp[0x11] - inp[0x10] == 2 or inp[0x10] - inp[0x11] == 2)
40s.add(inp[0x12] * inp[0x1a] * inp[0x1b] % 0x7fff == 0x90)
41s.add(inp[0x13] == 1)
42s.add(inp[0x1d] - inp[0x15] == 5 or inp[0x15] - inp[0x1d] == 5)
43s.add(inp[0x18] == 5)
44s.add(inp[0x19] == 3)
45
46for pair in pairs:
47    a, b = pair.split(' ')
48    s.add(inp[int(a)] != inp[int(b)])
49print(s.check())
50mod = s.model()
51for i in range(FLAG_LEN):
52    print(chr(int(repr(mod[inp[i]])) + 0x2f), end='')
53
54print()
55mod = s.model()

This gets a working password, 7128536053061794?=803;12427106?30632417;641520372054<:;?1743:<06, leading to the flag: flag{kenken_is_just_z3_064c4}

Concluding Thoughts

This was a pretty fun reversing challenge overall. Both of us don’t do much reversing outside of pwn related rev, but we did try more of this category than usual (as all but one of our team’s reversers are always afk :rage: - time to kick and recruit) and it went better than expected. Feel free to let us know of any questions or feedback regarding this writeup!