Sometimes I like to play CTFs: I think it’s a good way to train myself and my knowledge and may be learn something new. And time to time I stumble in some interesting challenge I think it would be interesting to share.
Speed-Rev Bots
is one of these and I’d like to share here how I solved it.
The Challenge
The challenge (presented in HackPack CTF 2023) was pretty simple to understand: there was a network service to connect (via netcat or any other network utility). As soon as the connection was established you received a binary (in base64 format) asking for a “flag”.
Our task is to reverse engineer the executable, get the “flag” and type it at the prompt. Then a second level is presented and so on, up to the 6th level.
Binaries are small so this looks pretty easy, but there are two constraints:
- you have to solve the 6 levels in 5 minutes
- every time you restart the connection, flags are different
So, it’s pretty clear that you need some kind of automation to solve all the challenges.
Analysis
Let’s start by looking into the first binary presented by the challenge, in order to imagine a way to solve it. Once saved and decoded, we are presented with an ELF
file with few functions. This is the main
:
At point 1
we can see the input request and then we have a call to a validate
function.
Mmmhhh…ok, this is pretty straightforward: the string is compared with static value, so nothing complex here. Let’s put in the “flag” to get level 2.
Level 2 main
is exactly the same as Level 1, with a call to a validate
function. But validate
looks a bit different:
Validation is a lot more complex than before and probably is going to be even more complex in upper levels.
To solve all the levels we need to find a common pattern in the binaries, and then build something to infer the solution given this common pattern.
Fortunately for us, this is not so complex. By looking for a while the validate
function, I noticed this
When validation is correct, 0 is moved in EAX. Otherwise the value is set to 1. And this is true for both the binaries analyzed. Moreover this value is propagated outside the validate
function itself, to the main
and then returned to the Operating System. That means that the exit value of the program will be 0 if falg is correct, otherwise 1.
Solution
Now that we know how to identify if the “flag” is correct, it should be easy to script something…well….no :-)
How can we know the correct “flag” to send? In the first binary it was pretty easy, may be we could extract the string from the executable, but for the second one (and all the other levels)?
It’s here were Symbolic Execution
can save us. I don’t want to explain here what SE means in this article, I just say that In computer science, symbolic execution is a means of analyzing a program to determine what inputs cause each part of a program to execute.
So we can instruct a program to find for us the solution, given the fact that the EAX value must be 0 at the exit.
We have a great framework called angr, which allow us to build SE paths in a “simple” way by using Python. Mastering angr
is not easy, and I’m not a guru, but I have some basic knowledge that should allow me to solve this challenge.
The very first version of the script is surprisingly easy:
import angr
project = angr.Project('l1.elf')
initial_state = project.factory.full_init_state(
add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)
simulation = project.factory.simgr(initial_state)
simulation.run()
for dead in simulation.deadended:
try:
dead.solver.add(dead.regs.eax == 0)
print(dead.posix.dumps(0))
except:
pass
The first lines are boilerplate code, to load and prepare the binary for the symbolic execution. STDIN is automatically used to give the input and everything is ready. The real “magic” happens in the for
loop:
from the angr
docs A state goes to the deadended stash when it cannot continue the execution for some reason, including no more valid instructions, unsat state of all of its successors, or an invalid instruction pointer
So, the exit of the program should be one if this states. At this point adding a constraint on this state (EAX == 0) should bring us to the correct solution, and it actually does (look at the password of Level 1 above):
>>> python solve.py
b'cbymDgOfdACNTJzM'
Now, we basically solved the challenge. This code works even on Level 2:
>>> python solve.py
b'n50MfHBz4bEeblyF'
But before running it on the server and get the final Flag, we need to satisfy an additional constraints: all the flags must be letters
and numbers
, no other printable chars are allowed. In the first two levels it was the case, but in the upper levels it isn’t.
We need to instruct angr
to limit possible solutions and we can do it in this way:
import angr
import claripy
from claripy import And
project = angr.Project('l1.elf')
# We assume the lentght of the flag is 16 chars
flag = claripy.BVS("flag", 16 * 8)
initial_state = project.factory.full_init_state(
stdin=angr.SimPackets(name='stdin', content=[(flag, 16)]),
add_options = { angr.options.SYMBOL_FILL_UNCONSTRAINED_MEMORY,
angr.options.SYMBOL_FILL_UNCONSTRAINED_REGISTERS}
)
# Define the allowed characters for the input
for byte in flag.chop(8):
initial_state.solver.add(And(byte >= ord('0'), byte <= ord('9')) | \
And(byte >= ord('a'), byte <= ord('z')) | \
And(byte >= ord('A'), byte <= ord('Z')))
simulation = project.factory.simgr(initial_state)
simulation.run()
for dead in simulation.deadended:
try:
dead.solver.add(dead.regs.eax == 0)
print(dead.posix.dumps(0))
except:
pass
Following the comments in the code should be quite self-explanatory: in this case we didn’t relied on the automatic STDIN management of angr
, but we used claripy
(which is a “sub-module” of angr
), to control the symbolic input to the binaries. The input is limited to 16 chars (which we saw was the length of the flags of the first two levels) and the char used are a subset of the entire input range.
With this script you can actually solve all the 6 challenges in less than one minute and get the final flag!
Conclusions
Symbolic Execution
is extremely powerful. You can do a lot of things and it could be really useful in reverse engineering tasks. But Symbolic Execution
is also pretty hard to master: here we saw a basic example, built to be solved in this way, so we were pretty lucky. But things can get pretty hard and you can easily end up in neverending simulations, memory issues and so on. Sometimes you need to prepare your binaries for execution, by understanding before what they are doing, how to cut useless branch from execution, mocking complex functions, map memory areas with proper values and so on.
angr
can also be used for a lot of other things, like code coverage, decompilation, VSA (Value-Set Analysis), etc. I encourage you to have a look to it: I can’t say it will be an easy journey, but for sure it will be interesting!