redpwnCTF 2021 - Pickled Onions

rev

Reversing

Pickled Onions was a reversing challenge where we are given a small Python file that loads a pickle object, and checks a flag that we input. It turns out to be a nested, self-unpacking program written in the Pickle bytecode language.

Stage 1

The initial challenge file looks like this:

__import__('pickle').loads(b'(I128\nI4\nI99\n...\x85R\x85R.')

When we run it:

What is the flag? test
Nope!

Pickle is a Python standard module allowing serialization of objects. Pickled objects can define a routine used while unpickling, and this code is included in the pickled string. The bytecode language seems to be a subset of Python. It is stack based, with some 'memo' slots for more persistent storage, and can build dicts, tuples and call functions.

While researching, I found the pickletools module, which allows us to disassemble a pickle string, and optionally annotate the meaning of each opcode. We will begin by disassembling the bytecode.

>>> pickletools.dis(challenge, annotate=30)
    0: (    MARK                      Push markobject onto the stack.
    1: I        INT        128        Push an integer or bool.
    6: I        INT        4          Push an integer or bool.
    9: I        INT        99         Push an integer or bool.
   13: I        INT        112        Push an integer or bool.
   18: I        INT        105        Push an integer or bool.
   23: I        INT        99         Push an integer or bool.
   27: I        INT        107        Push an integer or bool.
   32: I        INT        108        Push an integer or bool.
   37: I        INT        101        Push an integer or bool.
   42: I        INT        10         Push an integer or bool.
   46: I        INT        105        Push an integer or bool.
   51: I        INT        111        Push an integer or bool.
   56: I        INT        10         Push an integer or bool.
   60: I        INT        40         Push an integer or bool.
   64: I        INT        140        Push an integer or bool.
   69: I        INT        18         Push an integer or bool.
[ ... ]
544273: t        TUPLE      (MARK at 0) Build a tuple out of the topmost stack slice, after markobject.
544274: p    PUT        69420           Store the stack top into the memo.  The stack is not popped.
544281: c    GLOBAL     'pickle loads'  Push a global object (module.attr) on the stack.
544295: c    GLOBAL     'builtins bytes' Push a global object (module.attr) on the stack.
544311: g    GET        69420            Read an object from the memo and push it on the stack.
544318: \x85 TUPLE1                      Build a one-tuple out of the topmost item on the stack.
544319: R    REDUCE                      Push an object built from a callable and an argument tuple.
544320: \x85 TUPLE1                      Build a one-tuple out of the topmost item on the stack.
544321: R    REDUCE                      Push an object built from a callable and an argument tuple.
544322: .    STOP                        Stop the unpickling machine.

When unpickled, this string pushes thousands of ints to the stack, then builds a tuple out of them, storing it at memo position 69420. It then loads the bytes function, converting the tuple into a bytestring, then calls pickle.loads on it. We can deduce that the integers pushed are another pickled object, and extract it.

Stage 2

The stage2 pickle code is much more complex. it pushes a number of constants, which themselves appear to be more nested pickles, and a number of names such as pickledgreekoregano. It then builds these all into a dict, and calls input with the What is the flag? prompt. At this point I was stuck for a while, as I wasn't able to follow the sequence of operations. Luckily, my teammate discovered Fickling, a module written by TrailOfBits (coincidentally, a sponsor of RedPwnCTF). This module can disassemble a pickle into a Python AST - and the built-in unparse function can transform the AST into something resembling valid Python source code. Fickling didn't have support for the INT and DICT opcodes, so I patched these in:

diff --git a/fickling/pickle.py b/fickling/pickle.py
index ab62566..3eb46c4 100644
--- a/fickling/pickle.py
+++ b/fickling/pickle.py
@@ -162,6 +162,10 @@ class ConstantOpcode(Opcode):
         interpreter.stack.append(make_constant(self.arg))
 
 
+class Int(ConstantOpcode):
+    name = "INT"
+
+
 class StackSliceOpcode(Opcode):
     def run(self, interpreter: "Interpreter", stack_slice: List[ast.expr]):
         raise NotImplementedError(f"{self.__class__.__name__} must implement run()")
@@ -743,6 +747,18 @@ class Tuple(StackSliceOpcode):
         interpreter.stack.append(ast.Tuple(tuple(stack_slice), ast.Load()))
 
 
+class Dict(StackSliceOpcode):
+    name = "DICT"
+
+    def run(self, interpreter: Interpreter, stack_slice: List[ast.expr]):
+        k = list()
+        v = list()
+        for i in range(0, len(stack_slice), 2):
+            k.append(stack_slice[i])
+            v.append(stack_slice[i+1])
+        interpreter.stack.append(ast.Dict(k, v))
+
+
 class Build(Opcode):
     name = "BUILD"

Here was the result:

from pickle import io
from builtins import input
_var0 = input('What is the flag? ')
_var1 = io
_var1.__setstate__({'pickledhorseradish': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.', 'pickledcoconut': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.', 'pickledlychee': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.', 'pickledcrabapple': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.', 'pickledportabella': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.', 'pickledquince': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.', 'pickledeasternmayhawthorn': b'\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.', 'pickledmonstera': b'I1\n.', 'pickledcorneliancherry': b'I0\n.', 'pickledalligatorapple': b'\x80\x04I', 'pickledboysenberry': <VERY VERY LONG PICKLED DATA> 'pickledgreekoregano': '\nI', 'pickledpupunha': ('Nope!', 'Correct!'), 'pickledximenia': _var0, 'pickledgarlic': ('pickledcorneliancherry', 'pickledboysenberry')})
from io import pickledximenia.__len__
_var2 = pickledximenia.__len__()
from io import pickledximenia.encode
_var3 = pickledximenia.encode()
_var4 = _var1
_var4.__setstate__({'pickledburmesegrape': _var2, 'pickledximenia': _var3})
from builtins import print
from io import pickledpupunha.__getitem__
from pickle import loads
from io import pickledgarlic.__getitem__
from io import pickledburmesegrape.__eq__
_var5 = pickledburmesegrape.__eq__(64)
_var6 = pickledgarlic.__getitem__(_var5)
from io import _var6
_var7 = loads(_var6)
_var8 = pickledpupunha.__getitem__(_var7)
_var9 = print(_var8)
result = _var9

At first it may not be immediately obvious what this is doing, but if we follow it through it begins to make sense. We see pickle.io is assigned to var1, which then has __setstate__ called upon it. setstate assigns each key/value pair as a property of pickle.io - this is so these constant strings are added to the io module, and can be accessed later on. We can rewrite this as regular Python:

flag = input("What is the flag? ")
io.pickledhorseradish = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__add__\ncio\npickledbarberry\n\x85R.'
io.pickledcoconut = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__sub__\ncio\npickledbarberry\n\x85R.'
io.pickledlychee = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__xor__\ncio\npickledbarberry\n\x85R.'
io.pickledcrabapple = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__eq__\ncio\npickledbarberry\n\x85R.'
io.pickledportabella = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ne__\ncio\npickledbarberry\n\x85R.'
io.pickledquince = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__le__\ncio\npickledbarberry\n\x85R.'
io.pickledeasternmayhawthorn = '\x80\x04cpickle\nio\ncio\npickledmacadamia.__ge__\ncio\npickledbarberry\n\x85R.'
# Each of these pushes io.pickledmacadamia.__add__ to the stack followed by (pickledbarberry,)
# Or another method
io.pickledmonstera = 'I1\n.'
io.pickledcorneliancherry = 'I0\n.'
io.pickledalligatorapple = '\x80\x04I'
io.pickledboysenberry = 'VERY LONG'
io.pickledgreekoregano = '\nI'
io.pickledpupunha =  ("Nope!", "Correct!")
io.pickledximenia = flag
io.pickledgarlic = ("pickledcorneliancherry", "pickledboysenberry")
io.pickledburmesgrape = flag.len()
io.pickledximenia = flag.encode()
var5 = int(io.pickledburmesgrape == 64) # 1 if len(input) is 64, 0 otherwise
var6 = pickledgarlic[var5] 				# pickledcorneliancherry if len(input) is not 64,  pickledboysenberry if it is
var7 = loads(var6)						# loads(b'I0\n.') if len(input) is not 64, loads(b'VERY LONG') if it is
var8 = pickledpupunha[var7]				# 'Nope!' if returned value is 0, 'Correct!' if it's 1
print(var8)

The first few variables saved are constants which, when unpickled, will compare io.pickledmacadamia to io.pickledbarberry in some way. If our flag length is 64, it will run the pickle in pickledboysenberry - otherwise, it will skip it and return 0. Our flag is saved to pickledximenia.

Stage 3

Again running Fickling on our new pickle, we receive this output

from pickle import io
from io import pickledgreekoregano.join
from builtins import map
from builtins import str
_var0 = map(str, pickledximenia)
_var1 = b"\nI".join(_var0)
_var2 = io
_var2.__setstate__({'pickledximenia': _var1})
from io import pickledximenia.encode
_var3 = pickledximenia.encode()
_var4 = _var2
_var4.__setstate__({'pickledximenia': _var3})
from io import pickledalligatorapple.__add__
from io import pickledximenia
_var5 = pickledalligatorapple.__add__(pickledximenia)
_var6 = _var4
_var6.__setstate__({'pickledalligatorapple': _var5})
from io import pickledalligatorapple.__add__
_var7 = pickledalligatorapple.__add__(b'\n')
_var8 = _var6
_var8.__setstate__({'pickledalligatorapple': _var7})
from io import pickledalligatorapple
_var9 = _var8
_var9.__setstate__({'pickledblackapple': pickledalligatorapple, 'pickledyellowpeartomato': 1})

If we clean it up:

flag = "\nI".join([str(x) for x in pickledximenia]).encode()
flag = b'\x80\x04I' + flag + b'\n'
io.pickledblackapple = flag
io.pickledyellowpeartomato = 1

Our flag is converted into 64 integers which are pushed onto the stack.

The next part:

from io import pickledblackapple.__add__
_var10 = pickledblackapple.__add__(b'p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\ndb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__gt__\nI32\n\x85R\x85Rdb(\x8c\x17pickledyellowpeartomatocio\npickledyellowpeartomato.__and__\ncio\npickledmacadamia.__lt__\nI127\n\x85R\x85Rdb0')
_var11 = _var9
_var11.__setstate__({'pickledblackapple': _var10})

This part is repeated 64 times, and looks like this when disassembled

>>> pickletools.dis(a, annotate=50)
    4: p    PUT        0                                  Store the stack top into the memo.  The stack is not popped.
    7: 0    POP                                           Discard the top stack item, shrinking the stack by one item.
    8: c    GLOBAL     'pickle io'                        Push a global object (module.attr) on the stack.
   19: (    MARK                                          Push markobject onto the stack.
   20: \x8c     SHORT_BINUNICODE 'pickledmacadamia'       Push a Python Unicode string object.
   38: g        GET        0                              Read an object from the memo and push it on the stack.
   41: d        DICT       (MARK at 19)                   Build a dict out of the topmost stack slice, after markobject.
   42: b    BUILD                                         Finish building an object, via __setstate__ or dict update.
   43: (    MARK                                          Push markobject onto the stack.
   44: \x8c     SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
   69: c        GLOBAL     'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
  105: c        GLOBAL     'io pickledmacadamia.__gt__'   Push a global object (module.attr) on the stack.
  133: I        INT        32                             Push an integer or bool.
  137: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  138: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  139: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  140: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  141: d        DICT       (MARK at 43)                   Build a dict out of the topmost stack slice, after markobject.
  142: b    BUILD                                         Finish building an object, via __setstate__ or dict update.
  143: (    MARK                                          Push markobject onto the stack.
  144: \x8c     SHORT_BINUNICODE 'pickledyellowpeartomato' Push a Python Unicode string object.
  169: c        GLOBAL     'io pickledyellowpeartomato.__and__' Push a global object (module.attr) on the stack.
  205: c        GLOBAL     'io pickledmacadamia.__lt__'   Push a global object (module.attr) on the stack.
  233: I        INT        127                            Push an integer or bool.
  238: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  239: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  240: \x85     TUPLE1                                    Build a one-tuple out of the topmost item on the stack.
  241: R        REDUCE                                    Push an object built from a callable and an argument tuple.
  242: d        DICT       (MARK at 143)                  Build a dict out of the topmost stack slice, after markobject.
  243: b    BUILD                                         Finish building an object, via __setstate__ or dict update.
  244: 0    POP                                           Discard the top stack item, shrinking the stack by one item.

This looks complex, but with a quick glance at the constants and functions, we can guess the purpose - checking that each character of the flag is between 32 and 127 - or 0x20 <= c <= 0x7f, i.e. the printable ASCII range.

We then reach the net part, which is where the flag checking logic actually begins:

from io import pickledalligatorapple.__add__
_var140 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000000p0\n0cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI102\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x11pickledbluetongue\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var141 = pickledalligatorapple.__add__(b'00000000000000000000000000000000000000000000000000000000000000p0\n00cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI108\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x16pickledroseleafbramble\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var142 = pickledalligatorapple.__add__(b'0000000000000000000000000000000000000000000000000000000000000p0\n000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI97\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0cpickledolive\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
from io import pickledalligatorapple.__add__
_var143 = pickledalligatorapple.__add__(b'000000000000000000000000000000000000000000000000000000000000p0\n0000cpickle\nio\n(\x8c\x10pickledmacadamiag0\n\x8c\x0fpickledbarberryI103\n\x8c\rpickledgarlic\x8c\x16pickledcorneliancherry\x8c\x0fpickledvoavanga\x86dbcpickle\nloads\n\x8c\x02iocio\npickledgarlic.__getitem__\ncpickle\nloads\ncio\npickledcrabapple\n\x85R\x85R\x93\x85R.')
_var144 = _var139

If we disassemble the first, we see:

    POP * 63
    4: p    PUT        0
    7: 0    POP
    8: c    GLOBAL     'pickle io'
   19: (    MARK
   20: \x8c     SHORT_BINUNICODE 'pickledmacadamia'
   38: g        GET        0
   41: \x8c     SHORT_BINUNICODE 'pickledbarberry'
   58: I        INT        102
   63: \x8c     SHORT_BINUNICODE 'pickledgarlic'
   78: \x8c     SHORT_BINUNICODE 'pickledcorneliancherry'
  102: \x8c     SHORT_BINUNICODE 'pickledbluetongue'
  121: \x86     TUPLE2
  122: d        DICT       (MARK at 19)
  123: b    BUILD
  124: c    GLOBAL     'pickle loads'
  138: \x8c SHORT_BINUNICODE 'io'
  142: c    GLOBAL     'io pickledgarlic.__getitem__'
  172: c    GLOBAL     'pickle loads'
  186: c    GLOBAL     'io pickledcrabapple'
  207: \x85 TUPLE1
  208: R    REDUCE
  209: \x85 TUPLE1
  210: R    REDUCE
  211: \x93 STACK_GLOBAL
  212: \x85 TUPLE1
  213: R    REDUCE
  214: .    STOP

This constructs a dict of {pickledmacadamia: <first char>, pickledbarberry: 102, pickledgarlic: (pickledbluetongue, pickledcorneliancherry)}

We then use pickledcrabapple (earlier defined as pickledmacadamia.__eq__(pickledbarberry)). This is repeated 4 times, with different numbers of pops and constants. If we convert the constants to ASCII, we get flag - this pops the first 4 characters, and checks that they are 'flag'.

Checking Logic

The rest of the checking logic is a little more complex. They each load two characters from the flag, and compare them using sub, xor, eq, ne, le, ge. Some checks also add two characters together then use eq to check if it matches a constant. This is a perfect case for solving using Z3.

We create a stub like this:

from z3 import *
s = Solver()
flag = [BitVec('flag_%s' % i, 8) for i in range(0, 64)]
for c in flag:
    s.add(c <= 0x7f)
    s.add(c >= 0x20)

Then all we need to do is parse each check and convert it into a z3-style check.

data = open("checks.txt").read().split("\n")
import pickle

for line in data:
    # Convert the data into a bytestring literal
    pickletext = eval("b'" + line + "'")
    example = pickletext
    # Create a fake stack so it will unpickle properly. The value at each index is the index, so we can track the position of the checks
    flag = [i for i in range(0, 64)]
    flag = b"I" + b'\nI'.join([str(i).encode('latin1') for i in flag]) + b"\n"
    end = example.index(b'dbc') + 1
    # Slice it up to the parts calling functions - we only want to build the dict
    example = example[:end]
    example = flag + example + b"."
    example = (pickle.loads(example))
    # Load the 2
    p0 = example["pickledmacadamia"]
    p1 = example["pickledbarberry"]
    # Check which operation is used, and output the appropriate check
    if b"pickledhorseradish" in pickletext:
      tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
      print(f"s.add(flag[{p0}] + flag[{p1}] == {tmp})")
    if b"pickledlychee" in pickletext:
      tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
      print(f"s.add(flag[{p0}] ^ flag[{p1}] == {tmp})")
    if b"pickledcoconut" in pickletext:
      tmp = (int(pickletext.split(b"I")[1].split(b"\n")[0].decode()))
      print(f"s.add(flag[{p0}] - flag[{p1}] == {tmp})")

We are now able to print the z3-generated model, and read the flag:

flag{n0w_th4t5_4_b1g_p1ckl3_1n_4_p1ckl3_but_1t_n33d3d_s0m3_h3lp}