Dock12/Writeup: Speed-Rev Bots (HackPack CTF 2023)

Created by Cesare Pizzi Wed, 26 Apr 2023 09:44:47 +0100 Modified Wed, 26 Apr 2023 09:44:47 +0100

Sometimes I like to play CTFs: I think it’s a good way to train myself and my knowledge and may be learn something new. And time to time I stumble in some interesting challenge I think it would be interesting to share.

Speed-Rev Bots is one of these and I’d like to share here how I solved it.

The Challenge

The challenge (presented in HackPack CTF 2023) was pretty simple to understand: there was a network service to connect (via netcat or any other network utility). As soon as the connection was established you received a binary (in base64 format) asking for a “flag”.

asciicast

Our task is to reverse engineer the executable, get the “flag” and type it at the prompt. Then a second level is presented and so on, up to the 6th level.

Binaries are small so this looks pretty easy, but there are two constraints:

  • you have to solve the 6 levels in 5 minutes
  • every time you restart the connection, flags are different

So, it’s pretty clear that you need some kind of automation to solve all the challenges.

Analysis

Let’s start by looking into the first binary presented by the challenge, in order to imagine a way to solve it. Once saved and decoded, we are presented with an ELF file with few functions. This is the main:

l1-main

At point 1 we can see the input request and then we have a call to a validate function.

l1-validate

Mmmhhh…ok, this is pretty straightforward: the string is compared with static value, so nothing complex here. Let’s put in the “flag” to get level 2.

Level 2 main is exactly the same as Level 1, with a call to a validate function. But validate looks a bit different:

l2-validate

Validation is a lot more complex than before and probably is going to be even more complex in upper levels.

To solve all the levels we need to find a common pattern in the binaries, and then build something to infer the solution given this common pattern.

Fortunately for us, this is not so complex. By looking for a while the validate function, I noticed this

common-pattern

When validation is correct, 0 is moved in EAX. Otherwise the value is set to 1. And this is true for both the binaries analyzed. Moreover this value is propagated outside the validate function itself, to the main and then returned to the Operating System. That means that the exit value of the program will be 0 if falg is correct, otherwise 1.

Solution

Now that we know how to identify if the “flag” is correct, it should be easy to script something…well….no :-)

How can we know the correct “flag” to send? In the first binary it was pretty easy, may be we could extract the string from the executable, but for the second one (and all the other levels)?

It’s here were Symbolic Execution can save us. I don’t want to explain here what SE means in this article, I just say that In computer science, symbolic execution is a means of analyzing a program to determine what inputs cause each part of a program to execute.

So we can instruct a program to find for us the solution, given the fact that the EAX value must be 0 at the exit.

We have a great framework called angr, which allow us to build SE paths in a “simple” way by using Python. Mastering angr is not easy, and I’m not a guru, but I have some basic knowledge that should allow me to solve this challenge.

The very first version of the script is surprisingly easy:

import angr

project = angr.Project('l1.elf')

initial_state = project.factory.full_init_state(
  add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                  angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)

simulation = project.factory.simgr(initial_state)

simulation.run()

for dead in simulation.deadended:
  try:
    dead.solver.add(dead.regs.eax == 0)
    print(dead.posix.dumps(0))
  except:
    pass

The first lines are boilerplate code, to load and prepare the binary for the symbolic execution. STDIN is automatically used to give the input and everything is ready. The real “magic” happens in the for loop:

from the angr docs A state goes to the deadended stash when it cannot continue the execution for some reason, including no more valid instructions, unsat state of all of its successors, or an invalid instruction pointer

So, the exit of the program should be one if this states. At this point adding a constraint on this state (EAX == 0) should bring us to the correct solution, and it actually does (look at the password of Level 1 above):

>>> python solve.py
b'cbymDgOfdACNTJzM'

Now, we basically solved the challenge. This code works even on Level 2:

>>> python solve.py
b'n50MfHBz4bEeblyF'

But before running it on the server and get the final Flag, we need to satisfy an additional constraints: all the flags must be letters and numbers, no other printable chars are allowed. In the first two levels it was the case, but in the upper levels it isn’t.

We need to instruct angr to limit possible solutions and we can do it in this way:

import angr
import claripy
from claripy import And

project = angr.Project('l1.elf')

# We assume the lentght of the flag is 16 chars
flag = claripy.BVS("flag", 16 * 8)

initial_state = project.factory.full_init_state(
  stdin=angr.SimPackets(name='stdin', content=[(flag, 16)]),
  add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
                  angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)

# Define the allowed characters for the input
for byte in flag.chop(8):
  initial_state.solver.add(And(byte >= ord('0'), byte <= ord('9')) | \
                           And(byte >= ord('a'), byte <= ord('z')) | \
                           And(byte >= ord('A'), byte <= ord('Z')))

simulation = project.factory.simgr(initial_state)

simulation.run()

for dead in simulation.deadended:
  try:
    dead.solver.add(dead.regs.eax == 0)
    print(dead.posix.dumps(0))
  except:
    pass

Following the comments in the code should be quite self-explanatory: in this case we didn’t relied on the automatic STDIN management of angr, but we used claripy (which is a “sub-module” of angr), to control the symbolic input to the binaries. The input is limited to 16 chars (which we saw was the length of the flags of the first two levels) and the char used are a subset of the entire input range.

With this script you can actually solve all the 6 challenges in less than one minute and get the final flag!

Conclusions

Symbolic Execution is extremely powerful. You can do a lot of things and it could be really useful in reverse engineering tasks. But Symbolic Execution is also pretty hard to master: here we saw a basic example, built to be solved in this way, so we were pretty lucky. But things can get pretty hard and you can easily end up in neverending simulations, memory issues and so on. Sometimes you need to prepare your binaries for execution, by understanding before what they are doing, how to cut useless branch from execution, mocking complex functions, map memory areas with proper values and so on.

angr can also be used for a lot of other things, like code coverage, decompilation, VSA (Value-Set Analysis), etc. I encourage you to have a look to it: I can’t say it will be an easy journey, but for sure it will be interesting!