redpwnCTF 2021 - 2k Reversing Challenge
- Author: Day, FizzBuzz101
- Date:
Introduction
2k (22 solves) was a fun stack based VM reversing challenge from redpwnCTF 2021 by EvilMuffinHa. This writeup will be logically split into two parts: reversing the VM behavior in the chall
binary, and then reversing the actual VM in prog.bin
.
Reversing VM Behavior
Opening the binary up in Binary Ninja, we see that all the logic resides with main itself. There is one other function that is used at 0x1b70, but that is just for determining vm executable size.
In main, one of the first things that is done is initialize a VM memory region, which has a size of about 0x20000.
0000120e 4c8d9c240000feff lea r11, [rsp-0x20000 {vm_mem}]
00001216 4881ec00100000 sub rsp, 0x1000
0000121d 48830c2400 or qword [rsp], 0x0
00001222 4c39dc cmp rsp, r11 {vm_mem}
00001225 75ef jne 0x1216
Then, a VM stack of size 0x800 is calloc’d and the instructions in the prog.bin
is read in.
100001259 void* stack = calloc(n: 0x800, elem_size: 2)
20000126c *(rsp_1 + 0x18) = 0
300001278 FILE* vm_file = fopen(filename: "prog.bin", mode: &data_2007)
40000127d char const* const rdi_10
50000127d if (vm_file == 0)
600001291 else
700001291 int32_t vm_size = get_size_1b70(vm_file, rsp_1 + 0x18)
800001296 int16_t* vm_text
900001296 size_t size_read
1000001296 if (vm_size != 0xffffffff)
110000129f size_t size = *(rsp_1 + 0x18)
12000012ac vm_text = calloc(n: size, elem_size: 1)
1300001296 if (vm_text != 0)
14000012cb size_read = fread(buf: vm_text, size: 1, count: size, fp: vm_file)
1500001296 if (*(rsp_1 + 0x18) == size_read)
16000012de void* stack = stack
17000012e8 fclose(fp: vm_file)
Then the VM state is initialized before hitting a signed jump table at 0x21d0 to start VM execution.
000012de 4c89e3 mov rbx, r12
000012e1 4c8d3de80e0000 lea r15, [rel jump_table_21d0]
000012e8 e863feffff call fclose
000012ed 4c8b742418 mov r14, qword [rsp+0x18]
000012f2 31c0 xor eax, eax {0x0}
000012f4 31c9 xor ecx, ecx {0x0}
000012f6 31d2 xor edx, edx {0x0}
000012f8 0f1f840000000000 nop dword [rax+rax], eax
00001300 807c050053 cmp byte [rbp+rax], 0x53
00001305 448d6a01 lea r13d, [rdx+0x1]
00001309 7725 ja 0x1330
0000130b 0fb6440500 movzx eax, byte [rbp+rax]
00001310 49630487 movsxd rax, dword [r15+rax*4]
00001314 4c01f8 add rax, r15
00001317 3effe0 jmp rax
Take note of these registers as they are used over and over again in the VM. r15 stores the jump table (which has 0x54 handlers), r14 stores the code size, and eax, ecx, edx are null for now. r12 in this case holds the contents of rax after the calloc call; this implies that rbx is set up to act as the stack pointer in this VM. Not shown in this disassembly is also an instruction that sets rbp to the code region.
Before I continue, I would like to discuss a few extra details about the VM and the context of a few registers. While such information is definitely unclear at this point, in later examples it will make much more sense. The VM is a stack based VM operating on shorts as its size; edx will be used throughout in conjunction with r13 as a form of letting the VM interpreter know where the next instruction is (usually r13 is loaded as an address ahead of edx, and then edx gets set to r13d in the process of opcode handling). There is also a pseudo register that is based upon ecx (I may call it the accumulator sometimes even though it really isn’t that in the traditional sense) used for operations on vm memory region.
A few main places handlers keep transfering control flow to include the following regions:
- 0x1300
00001300 807c050053 cmp byte [rbp+rax], 0x53
00001305 448d6a01 lea r13d, [rdx+0x1]
00001309 7725 ja 0x1330
0000130b 0fb6440500 movzx eax, byte [rbp+rax]
00001310 49630487 movsxd rax, dword [r15+rax*4]
00001314 4c01f8 add rax, r15
00001317 3effe0 jmp rax
This handler checks for opcode validity, and then sets r13d based on rdx according to the behavior I described previously (this makes sense as all but one instructions are 1 opcode long) before going to the correct opcode handler.
- 0x1330 and 0x1333
00001330 4489ea mov edx, r13d
00001333 89d0 mov eax, edx
00001335 4939c6 cmp r14, rax
00001338 73c6 jae 0x1300
Sometimes handlers set the edx to r13d already, so those would usually head to 0x1333. Otherwise, 0x1330 would be used. The next instruction location is always checked to r14, which records the size of the VM as mentioned previously. If some of these validity checks fail or the size goes over, the VM enters an exit state (around 0x133a) where memory is freed and the program exits. I might have missed a few other locations, but in general, the places where opcodes handlers transfer control flow to afterwards perform similar operations.
Now, let us look at the opcode handlers. I won’t describe every single one, but I’ll just do a few in some important categories of operations. Do note that many of the instructions in the allowed range are also just simply no-ops, and that most opcode arguments are passed via the stack and results are pushed back (hence stack based VM).
instructions used for vm mem region operations (0x50 - 0x53, 0x40-0x41)
000015ec 83c101 add ecx, 0x1
000015ef 4489ea mov edx, r13d
000015f2 e93cfdffff jmp 0x1333
This is handler 0x50, which simply increments the “accumulator”.
00001602 66034bfe add cx, word [rbx-0x2]
00001606 4489ea mov edx, r13d
00001609 4883eb02 sub rbx, 0x2
0000160d e921fdffff jmp 0x1333
This is handler 0x52, which adds a value to the “accumulator”. Note how it chooses the value based on [rbx-0x2]. That is repeatedly used throughout the program as “top of the vm stack.” Notice the later sub rbx, 0x2
instruction. If you think about this, the handler is basically popping an argument to add to the “accumaltor”.
000015d4 0fb7c1 movzx eax, cx
000015d7 4883c302 add rbx, 0x2
000015db 4489ea mov edx, r13d
000015de 0fb7444420 movzx eax, word [rsp+rax*2+0x20]
000015e3 668943fe mov word [rbx-0x2], ax
000015e7 e947fdffff jmp 0x1333
This is when those “accumulator” values come in handy (handler 0x41). Also notice how the add rbx, 0x2
. It’s basically pushing the offset in a short array (vm mem) based on the accumulator onto the vm stack. The write equivalent, as well as the sub and dec equivalents for the previous 2 handlers are all within this grouping of opcodes as well.
At this point, we have the following instructions mapped out (hopefully the instruction names are self-explanatory enough):
0x40: "MEMW", (POP 1 ARG)
0x41: "MEMR", (PUSH 1 RESULT)
0x50: "INCA",
0x51: "DECA",
0x52: "ADDA", (POP 1 ARG)
0x53: "SUBA", (POP 1 ARG)
stack related instructions (0x1, 0x2, 0x22)
000014f8 498d4502 lea rax, [r13+0x2]
000014fc 4939c6 cmp r14, rax
000014ff 0f8635feffff jbe 0x133a
00001505 420fb7442d00 movzx eax, word [rbp+r13]
0000150b 83c203 add edx, 0x3
0000150e 4883c302 add rbx, 0x2
00001512 668943fe mov word [rbx-0x2], ax
00001516 e918feffff jmp 0x1333
This is handler 0x22. Code region validity checks are performed, and then it loads the following 2 bytes from the vm instruction pointer, before pushing that short onto the vm stack. Notice how the edx gets added with the value of 3, as this is the only instruction that is more than length 3.
00001398 4883eb02 sub rbx, 0x2
0000139c 4489ea mov edx, r13d
0000139f eb92 jmp 0x1333
This is handler 0x2. It basically behaves like a pop.
00001387 0fb743fe movzx eax, word [rbx-0x2]
0000138b 4489ea mov edx, r13d
0000138e 4883c302 add rbx, 0x2
00001392 668943fe mov word [rbx-0x2], ax
00001396 eb9b jmp 0x1333
This is handler 0x1. I guess this instruction can sort of belong in this grouping? All it really does is pushes the top value of the stack to the top.
So now we have the following instructions mapped out:
1: "DUP", (1 ARG USED, PUSH 1 RESULT)
2: "POP",
0x22: "PUSH",
“syscall-like” instructions (0x20-0x21)
0000149c 488b3d7d2b0000 mov rdi, qword [rel stdin]
000014a3 894c240c mov dword [rsp+0xc], ecx
000014a7 4883c302 add rbx, 0x2
000014ab e840fdffff call getc
000014b0 8b4c240c mov ecx, dword [rsp+0xc]
000014b4 4489ea mov edx, r13d
000014b7 668943fe mov word [rbx-0x2], ax
000014bb e973feffff jmp 0x1333
This is handler 0x20 and just pushes the result of getc onto the stack.
000014c0 0fb743fe movzx eax, word [rbx-0x2]
000014c4 663dff00 cmp ax, 0xff
000014c8 0f876cfeffff ja 0x133a
000014ce 0fbfd0 movsx edx, ax
000014d1 488d352c0b0000 lea rsi, [rel data_2004] // %c
000014d8 31c0 xor eax, eax {0x0}
000014da 894c240c mov dword [rsp+0xc], ecx
000014de bf01000000 mov edi, 0x1
000014e3 4883eb02 sub rbx, 0x2
000014e7 e8d4fcffff call __printf_chk
000014ec 8b4c240c mov ecx, dword [rsp+0xc]
000014f0 4489ea mov edx, r13d
000014f3 e93bfeffff jmp 0x1333
This is handler 0x21 and just pops a value for printf.
Hence, now we have:
0x20: "GETC", (PUSH 1 RESULT)
0x21: "PUTC", (POP 1 ARG)
Arithmetic Based Instructions (0x10 - 0x15)
00001464 0fb743fc movzx eax, word [rbx-0x4]
00001468 660faf43fe imul ax, word [rbx-0x2]
0000146d 4489ea mov edx, r13d
00001470 4883eb02 sub rbx, 0x2
00001474 668943fe mov word [rbx-0x2], ax
00001478 e9b6feffff jmp 0x1333
This is handler 0x12. 2 arguments are popped from the stack, and the product is pushed back up.
Most of the instructions in this group of handlers are quite trivial to recognize. However, it is important to note the distinction between the div and mod handlers (div is 0x13 and mod is 0x14). To note down the instructions here, I will write the operand order as well (with arg0 representing the highest element on stack when handler is hit).
0x10: "ADD", (POP 2 ARGS, PUSH 1 RESULT), ARG0 + ARG1
0x11: "SUB", (POP 2 ARGS, PUSH 1 RESULT), ARG0 - ARG1
0x12: "MUL", (POP 2 ARGS, PUSH 1 RESULT), ARG0 * ARG1
0x13: "DIV", (POP 2 ARGS, PUSH 1 RESULT), ARG0 / ARG1
0x14: "MOD", (POP 2 ARGS, PUSH 1 RESULT), ARG0 % ARG1
0x15: "MOD_MUL", (POP 3 ARGS, PUSH 1 RESULT), (ARG2 * ARG3) % ARG1
Conditional Related Instructions (0x16-0x17)
00001429 0fb773fc movzx esi, word [rbx-0x4]
0000142d 488d43fe lea rax, [rbx-0x2]
00001431 663973fe cmp word [rbx-0x2], si
00001435 0f844a020000 je 0x1685
0000143b 66c743fc0000 mov word [rbx-0x4], 0x0
00001441 4489ea mov edx, r13d
00001444 4889c3 mov rbx, rax
00001447 e9e7feffff jmp 0x1333
00001685 66c743fc0100 mov word [rbx-0x4], 0x1
0000168b 4489ea mov edx, r13d
0000168e 4889c3 mov rbx, rax
00001691 e99dfcffff jmp 0x1333
This is handler 0x16. It compares two arguments, and then pushes a 1 if they’re equal (at 0x1685) or pushes a 0 back.
0000147d 66837bfe00 cmp word [rbx-0x2], 0x0
00001482 0f880e020000 js 0x1696
00001488 4489ea mov edx, r13d
0000148b 0f84a2feffff je 0x1333
00001696 66c743feffff mov word [rbx-0x2], 0xffff {0xffff}
0000169c 4489ea mov edx, r13d
0000169f e98ffcffff jmp 0x1333
This is handler 0x17. It pops an argument, checks if it is negative, and pushes -1 if it is, 0 if it is 0, and 1 otherwise.
0x16: "CMP", (POP 2 ARGS, PUSH 1 RESULT)
0x17: "IS_NEG", (POP 1 ARG, PUSH 1 RESULT)
Control Flow Related Instructions (0x30-0x36)
Now for our final series of handlers!
0000151b 0fb743fe movzx eax, word [rbx-0x2]
0000151f 0fb753fe movzx edx, word [rbx-0x2]
00001523 4883eb02 sub rbx, 0x2
00001527 66c1f80f sar ax, 0xf
0000152b 31c2 xor edx, eax
0000152d 29c2 sub edx, eax
0000152f 0fb7d2 movzx edx, dx
00001532 e9fcfdffff jmp 0x1333
This is handler 0x30. The series of arithmetic instruction here is actuallly abs. As we notice, a value is “popped” to the edx register before going back to one of the main VM handlers and hence changing control flow. All the VM jumps are absolute.
00001537 66837bfc00 cmp word [rbx-0x4], 0x0
0000153c 488d73fc lea rsi, [rbx-0x4]
00001540 0f84e3fdffff je 0x1329
00001546 0fb743fe movzx eax, word [rbx-0x2]
0000154a 0fb753fe movzx edx, word [rbx-0x2]
0000154e 4889f3 mov rbx, rsi
00001551 66c1f80f sar ax, 0xf
00001555 31c2 xor edx, eax
00001557 29c2 sub edx, eax
00001559 0fb7d2 movzx edx, dx
0000155c e9d2fdffff jmp 0x1333
00001329 4889f3 mov rbx, rsi
0000132c 0f1f4000 nop dword [rax], eax
Here is handler 0x32. It pops the second highest value for comparison. If satisfied the jump conditional (in this case, jne), it continues and applies the same absolute value operations to change edx. If it isn’t satisfied, it just continues. The vm stack is also adjusted accordingly. All the other handlers in this family behave similarly, where the first argument is the jump destination and the second argument is used for the conditional determination.
So here is our last grouping of instructions:
0x30: "JMP", (POP 1 ARG)
0x31: "JE", (POP 2 ARGS)
0x32: "JNE", (POP 2 ARGS)
0x33: "JS", (POP 2 ARGS)
0x34: "JG", (POP 2 ARGS)
0x35: "JLE", (POP 2 ARGS)
0x36: "JNS", (POP 2 ARGS)
just one more instruction (0x3)
There was also the case of opcode 0x3.
000013a1 488d43fe lea rax, [rbx-0x2] // if TOS1 == some negative junk, halt
000013a5 0fbf5bfe movsx ebx, word [rbx-0x2]
000013a9 83fbff cmp ebx, 0xffffffff
000013ac 0f8560020000 jne 0x1612
This is basically like a halt or an exit, and the end code will be popped from the top of the stack. At 0x1612, that value makes the determination of whether the player gets the flag or not.
Using all of this information, I ended up writing a Binary Ninja Plugin Module for disassembly only (an LLIL lifter was also briefly tried just for fun but any attempts to lift the jumps seemed to have drastically slowed down analysis, perhaps just due to my infamiliarity) for this custom 2k architecture, which I will be referencing throughout the reversing of the VM itself. This article does a pretty good job showing how to do so. I’ll hand off the writeup to Day from here.
Reversing the VM
All instructions are one byte, besides push, which contains a two byte inline argument. This is simple to write a disassembly script for, which served as the basis for the Binary Ninja plugin.
1from pwn import *
2prog = open("prog","rb").read()
3pc = 0
4OPS = {
5 1: "DUP",
6 2: "POP",
7 3: "HLT",
8 0x10: "ADD",
9 0x11: "SUB",
10 0x12: "MUL",
11 0x13: "DIV",
12 0x14: "MOD",
13 0x15: "MOD_MUL",
14 0x16: "CMP",
15 0x17: "IS_NEG",
16 0x20: "GETC",
17 0x21: "PUTC",
18 0x22: "PUSH",
19 0x30: "JMP",
20 0x31: "JE",
21 0x32: "JNE",
22 0x33: "JS",
23 0x34: "JG",
24 0x35: "JLE",
25 0x36: "JNS",
26 0x40: "MEMW",
27 0x41: "MEMR",
28 0x50: "INCA",
29 0x51: "DECA",
30 0x52: "ADDA",
31 0x53: "SUBA"
32}
33while pc < len(prog):
34 print(hex(pc),end=": ")
35 inst = prog[pc]
36 print(OPS.get(inst, hex(inst)),end=" ")
37 if inst == 0x22:
38 print(hex(u16(prog[pc+1:pc+3])))
39 pc += 3
40 else:
41 print()
42 pc += 1
Which gives us about 7000 lines of instructions. This would take ages to reverse manually, but I noticed the program employed a pattern which could be semi-automated.
The first section of code reads like so
This creates “macros” for the program to refer to - the snippet at 0x4 causes the HLT instruction to fail, ending the program. Meanwhile the snippet at 0x8 causes the HLT instruction to succeed, printing the flag. The beginning of the program simply jumps past these code snippets.
The program begins by printing “sice deets:\n” via pushing each character and printing them one by one. Afterwards, it reads 64 characters of input, storing each one consecutively on the VM memory section. Before doing so, each character has 0x2f subtracted from it. The following code snippet repeats, grabbing a character, subtracting 0x2f from it, writing it to memory, and incrementing the accumulator.
After our inputted password is written to memory, the program begins executing a systematic series of mathematical checks. If any of these checks fail, the program will jump to 0x4 and end the VM. If, at the end, all checks have passed, the program jumps to 0x8 and we gain the flag. The first around 6000 lines of code all have the same type of check - an equivalence check.
This series of instructions:
- Grabs index 1 of our input
- Grabs index 0 of our input
- Subtracts them
- Jumps to address 0x4 if the result is 0
Essentially, if the values at these two indexes are equal, the check fails, jumps to 0x4 and the program ends. In fact, there are several, hundreds even, of these equivalence checks of the exact same format. This means constant offsets can be used to extract every pair of indexes that must not be the same. I made an automatic script to grab every pair of indexes that must be different.
1prog = open("prog.bin","rb").read()
2pointer = 0x1fd
3while pointer < 0x2a3b:
4 num1 = prog[pointer]
5 pointer += 9
6 num2 = prog[pointer]
7 pointer += 14
8 print(num1, num2)
After this, there are 5 or so different types of checks that it employs. These are:
- Checking the difference between two indexes of input is a specific number
The program subtracts the two numbers, and then uses the IS_NEG and MUL opcodes in conjunction to take an absolute value, thus getting the difference. This difference is then subtracted from the difference that it should be, and 0x4 is jumped to if the result is not 0.
- Checking that the sum of two or more indexes of input is a specific number
The program pushes all the numbers onto the stack, and then runs an add(or two adds, if there are 3 numbers) to push the sum onto the stack. This is then subtracted by what the difference should be, and 0x4 is jumped to if the result is not 0.
- Checking the product(modulo 0x7fff) of two or more indexes of input is a specific number
Same as above, except MOD_MUL is used instead, and 0x7fff is pushed to the stack before every multiplication.
- Checking that an index of input is a specific number
The program pushes an index of input onto the stack, subtracts it with what it should be, and jumps to 0x4 if the result is not 0.
- A two stage check, which is functionally equivalent to checking that for two bytes of input, inp[i] and inp[j], inp[i] = C * inp[j] OR inp[j] = C * inp[i] for some C that changes from check to check
In the first stage, the program multiples inp[i] mod inp[j] and inp[j] mod inp[i]. It checks the result is 0. This is only possible if one of them is a multiple of the other.
The second stage is done through a check, in which inp[i] // inp[j] and vice versa are added, and then that total is divided by inp[i] == inp[j] + 1. This value is then compared via subtraction and jne to the expected value. If inp[i] == inp[j], the divisor would be 2 whilst the dividend would be 2. This would result in an answer of 1, which is never wanted, so we can ignore this possibility.
The other possibility means that the divisor would be 1, whilst the dividend would be inp[i] // inp[j] + inp[j] // inp[i]. Since one is a multiple of the other, one of these will be 0(integer division) whilst the other will be a whole number which represents the cofactorial. (0 + a) / 1 = a, so this whole calculation will get the number C for which inp[i] = C * inp[j] OR inp[j] = C * inp[i]. This is then compared to the expected value as normal, and 0x4 is jumped to if the value is incorrect.
Every check after the equivalence checks is of this form, and at the end when all of these checks pass, 0x8 gets jumped to and the flag is printed.
z3 scripting
All these checks are fairly mathematical, allowing for them to be entered into z3 in which we can get the solution (courtesy of clubby789)
1from z3 import *
2FLAG_LEN = 64
3s = Solver()
4inp = [BitVec(f'flag_{x}', 16) for x in range(FLAG_LEN)]
5with open('pairs.txt') as f:
6 pairs = f.read().split('\n')
7
8for i in inp:
9 s.add(i >= 1)
10 s.add(i <= 16)
11
12s.add(Or((inp[0x1e] == inp[0x26] * 2), (inp[0x1e] * 2 == inp[0x26])))
13s.add(inp[0x20] == 0x1)
14s.add(inp[0x1f] == 3 * inp[0x27] or inp[0x1f] * 3 == inp[0x27])
15s.add(inp[0x21] + inp[0x29] == 0xc)
16s.add(inp[0x22] == 4)
17s.add(inp[0x23] * inp[0x24] * inp[0x25] * inp[0x2d] % 0x7fff == 0x1e)
18s.add(inp[0x28] == 7)
19s.add(inp[0x2a] == 3 * inp[0x32] or inp[0x2a] * 3 == inp[0x32])
20s.add(inp[0x2b] + inp[0x33] + inp[0x3b] == 0xf)
21s.add(inp[0x2c] == 3)
22s.add(inp[0x2e] == 3 * inp[0x36] or inp[0x2e] * 3 == inp[0x36])
23s.add(inp[0x2f] == 2 * inp[0x37] or inp[0x2f] * 2 == inp[0x37])
24s.add(inp[0x30] + inp[0x31] + inp[0x39] == 0xc)
25s.add(inp[0x35] - inp[0x34] == 2 or inp[0x34] - inp[0x35] == 2)
26s.add(inp[0x38] == 2)
27s.add(inp[0x3a] == 5)
28s.add(inp[0x3c] - inp[0x3d] == 2 or inp[0x3d] - inp[0x3c] == 2)
29s.add(inp[0x3e] == 7 * inp[0x3f] or inp[0x3e] * 7 == inp[0x3f])
30s.add(inp[8] - inp[0] == 2 or inp[0] - inp[8] == 2)
31s.add(inp[1] * inp[9] * inp[10] % 0x7fff == 8)
32s.add(inp[2] + inp[3] == 0xc)
33s.add(inp[4] + inp[5] + inp[6] == 0x11)
34s.add(inp[7] * inp[0xf] * inp[0x17] % 0x7fff == 0xf)
35s.add(inp[0xb] == 0x7)
36s.add(inp[0xc] + inp[0x14] + inp[0x1c] == 0x7)
37s.add(inp[0xd] == 0x8)
38s.add(inp[0xe] + inp[0x16] == 0xc)
39s.add(inp[0x11] - inp[0x10] == 2 or inp[0x10] - inp[0x11] == 2)
40s.add(inp[0x12] * inp[0x1a] * inp[0x1b] % 0x7fff == 0x90)
41s.add(inp[0x13] == 1)
42s.add(inp[0x1d] - inp[0x15] == 5 or inp[0x15] - inp[0x1d] == 5)
43s.add(inp[0x18] == 5)
44s.add(inp[0x19] == 3)
45
46for pair in pairs:
47 a, b = pair.split(' ')
48 s.add(inp[int(a)] != inp[int(b)])
49print(s.check())
50mod = s.model()
51for i in range(FLAG_LEN):
52 print(chr(int(repr(mod[inp[i]])) + 0x2f), end='')
53
54print()
55mod = s.model()
This gets a working password, 7128536053061794?=803;12427106?30632417;641520372054<:;?1743:<06
, leading to the flag: flag{kenken_is_just_z3_064c4}
Concluding Thoughts
This was a pretty fun reversing challenge overall. Both of us don’t do much reversing outside of pwn related rev, but we did try more of this category than usual (as all but one of our team’s reversers are always afk :rage: - time to kick and recruit) and it went better than expected. Feel free to let us know of any questions or feedback regarding this writeup!