System Design: Stripe Capture the Flag
We launched Stripe CTF 2.0 on Wednesday. Thus far we’ve had 14,000 signups, and over 500 people have captured the flag. Designing an architecture to handle this many users, all running their potentially malicious code on the level servers, was definitely a challenge, but it was also a ton of fun.
Our first and foremost design goal was pretty simple: don’t let people root the machines. It’s horrendously difficult to keep a shared Linux login machine secure, so the best you can do is apply all of the security countermeasures you can think of. Each level server is configured in the same way (maintaining multiple configurations is a great way to end up with an oversight due to increased complexity). All user-facing services run in a chroot with only /home
, /tmp
, /var/tmp
, and /var/log
writeable. This is implemented by mounting a filesystem (created using debootstrap) at /var/chroot
and bind-mounting it read-only to /var/chroot-ro
. That way a user in the chroot can’t clobber system files even with escalated privileges (thus cutting off many vectors to obtaining a root shell), but we can perform maintenance from outside the chroot. We didn’t mount /proc
in the chroot (except where needed to make other software work, and even there we mounted read-only or even chmod
‘d 700), and we set /var/log
in the chroot to be only accessible to admins. We also went and removed the setuid bit from all binaries (find / -perm -4000
is a handy command to find them).
Our second design goal was isolation. Malicious users should not be able to affect gameplay for others. We hadn’t really expected our first CTF to take off, and hence had explicitly decided to punt on user isolation features. As a result, people kept fork-bombing and preventing others from accessing the levels. We spent the week getting very good at clearing out forkbombs (which is a skill I hope never to employ again). This time, we decided to run every end user’s code as a separate UNIX user, meaning that we could use Linux’s native per-user resource limits. (We also considered using LXC, but didn’t have enough data about its scaling properties — we were also hesitant to have a security-critical component of our system be one that we didn’t know intimately.) Since everything was running as its own user, we needed to spawn server processes on demand — mod_fcgid + suEXEC provide exactly this feature (we also threw in suPHP to make the PHP levels more intuitive to people, though I would have preferred running everything under a single setuid wrapper). I wrote a quick wrapper script around suEXEC to set resource limits and nice the child FCGI processes, which would be inherited by any children processes they might spawn. We also set low disk quotas per user and removed unneeded device nodes (e.g. /dev/ram*
).
We used the following mod_fcgid
config to make sure that users weren’t allocated too many processes (FcgidMaxProcessesPerClass
) and that processes were eventually killed off (FcgidMinProcessesPerClass
). The FcgidMaxProcesses
setting turned out to be useless, as there’s a compile-time constant FCGID_MAX_APPLICATION
(set to 1024 on Ubuntu) which dictates a hard limit, and we didn’t want to recompile mod_fcgid
by hand.
FcgidMinProcessesPerClass 0
FcgidMaxProcessesPerClass 2
FcgidMaxProcesses 5000
FcgidProcessLifeTime 150
The initial CTF level was written in node, and hence we were stuck with the problem of trying to run node under FCGI (which as far as I can tell no one has actually done before). As a giant hack, I wrote a FCGI shim which translates the FCGI protocol back to HTTP (I’ll go over the details in another blog post). The shim then spawns the node process directly and speaks HTTP to it over a UNIX socket. We also used the shim in a later level to spawn a cluster of server processes on-demand — it turned out to be a pretty flexible approach for getting the process management benefits of mod_fcgid without actually having to speak FCGI.
A couple of levels required us to run a simulated browser so people could exploit XSS bugs. For those levels, we ran a PhantomJS script every few minutes. To make sure it only ran when needed (and that it ran as your assigned UNIX user, in case of Webkit vulnerabilities), we spawned the headless Webkit instance directly from the FCGI dispatcher, using the following code:
Thread.abort_on_exception = true
Thread.new do
while true
started = Time.now
pid = fork do
cmd = File.join(File.dirname(__FILE__), '../browser-runner.sh')
exec([cmd, cmd])
end
Process.wait(pid)
status = $?.exitstatus
$stderr.puts "Exited with status #{$?}" unless status == 0
sleep(60 * (3 * rand + 1))
end
end
Giving UNIX users random names (and using mod_userdir as a routing layer) served as a convenient access-control mechanism. We assigned CTF solvers a level URL such as https://level01-2.stripe-ctf.com/user-iqrncaxifi, which was backed by a UNIX user with name user-iqrncaxifi
. To prevent people from discovering other users, we set permissions on /etc/passwd
and /etc/group
to 600
. It turns out that this makes some software a bit sad (e.g. bash can’t fill in the username in your shell prompt, and Python can’t load the site module), but for the most part those can be worked around.
One issue with the UNIX user approach is actually creating the accounts. It turns out that adduser
by default does a linear scan through all possible user IDs, and effectively grinds to a halt once you have lots of users on a system. If you pass it a uid
and gid
directly, it runs in about 500ms, which still isn’t fast enough for on-demand allocation. Hence we maintained a central database of UNIX users and preallocated 1000 users per box.
We added a variety of other controls. For example, we firewalled off outbound network access on all servers (don’t want people using them to send spam). We also ran our machines such that even if they were completely compromised, we wouldn’t have to care. Hence we ran the CTF on a throwaway domain (stripe-ctf.com) with a fresh SSL cert generated just for the contest.
All told, building CTF was probably about two or three weeks of my time, plus as much time from a combination of other people (props to Andy, Sidd, Ludwig, Andrew, and Colin for doing an awesome job on this). We’ll be publicly posting the levels after the contest is over, so you should check those out. There are a ton of other details on the infrastructure I could delve into; let me know if there’s anything else you’re wondering about!