Sandboxes

Emilua provides support for creating actors in isolated processes using Capsicum, FreeBSD jails, Seccomp, Linux namespaces or Landlock. The idea is to prevent potentially exploitable code from accessing resources beyond what has been explicitly handed to them. That’s the basis for capability-based security systems, and it maps pretty well to APIs implementing the actor model such as Emilua.

If someone steals my laptop while I’m logged in, they can read my email, take my money, and impersonate me to my friends, but at least they can’t install drivers without my permission.
Figure 1. XKCD 1200: Authorization

Even modern operating systems are still somehow rooted in an age where we didn’t know how to properly partition computer resources adequately to user needs keeping a design focused on practical and conscious security. Several solutions are stacked together to somehow fill this gap and they usually work for most of the applications, but that’s not all of them.

Consider the web browser. There is an active movement that try to push for a future where only the web browser exists and users will handle all of their communications, store & share their photos, book hotels & tickets, check their medical history, manage their banking accounts, and much more…​ all without ever leaving the browser. In such scenario, any protection offered by the OS to protect programs from each other is rendered useless! Only a single program exists. If a hacker exploits the right vulnerability, all of the user’s data will be stolen. There is no real compartmentalisation.

The browser is part of a special class of programs. The browser is a shell. A shell is any interface that acts as a layer between the user and the world. The web browser is the shell for the www world. Www browser or not, any shell will face similar problems and has to be consciously designed to safely isolate contexts that distrust each other. The Emilua team is not aware of anything better than FreeBSD’s Capsicum to do just this. In the absence of Capsicum, we have Linux Landlock which can be used to build something close. Browsers actually use Linux namespaces which are older.

The API

Compartmentalised application development is, of necessity, distributed application development, with software components running in different processes and communicating via message passing.

— Capsicum: practical capabilities for UNIX
Robert N. M. Watson, Jonathan Anderson, Ben Laurie, and Kris Kennaway

The Emilua’s API to spawn an actor lies within the reach of a simple function call:

local my_channel = spawn_vm(module)

Check the manual elsewhere to understand the details. As for sandboxes, the idea is to spawn an actor where no system resources are available (e.g. the filesystem is mostly empty, no network interfaces are available, no PIDs from other processes can be seen, …​).

Consider the hypothetical sandbox class:

local mysandbox1 = sandbox.new()
local my_channel = spawn_vm(mysandbox1:context(module))
mysandbox1:handshake()

That would be the ideal we’re pursuing. Nothing other than 2 extra lines of code at most under your application. All complexity for creating sandboxes taken care of by specialized teams of security experts. The Capsicum paper[1] released in 2010 analysed and compared different sandboxing technologies and showed some interesting figures. Consider the following figure that we reproduce here:

Sandboxing mechanisms employed by Chromium
Operating system Model Line count Description

Windows

ACLs

22350

Windows ACLs and SIDs

Linux

chroot

605

setuid root helper sandboxes renderer

Mac OS X

Seatbelt

560

Path-based MAC sandbox

Linux

SELinux

200

Restricted sandbox type enforcement domain

Linux

seccomp

11301

seccomp and userspace syscall wrapper

FreeBSD

Capsicum

100

Capsicum sandboxing using cap_enter

Do notice that line count is not the only metric of interest. The original paper accompanies a very interesting discussion detailing applicability, risks, and levels of security offered by each approach. Just a few years after the paper was released, user namespaces was merged to Linux and yet a new option for sandboxing is now available. Fast-forward a few more years and we also have Linux Landlock which is even better than Linux namespaces. Within this discussion, we can discard most of the approaches — DAC-based, MAC-based, or too intrusive to be even possible to abstract away as a reusable component — as inadequate to our endeavour.

Out of them, Capsicum wins hands down. It’s just as capable to isolate parts of an application, but with much less chance to error (for the Chromium patchset, it was just 100 lines of extra C code after all). Unfortunately, Capsicum is not available in every modern OS.

Do keep in mind that this is code written by experts in their own fields, and their salary is nothing less than what Google can afford. 11301 lines of code written by a team of Google engineers for a lifetime project such as Google Chromium is not an investment that any project can afford. That’s what the democratization of sandboxing technology needs to do so even small projects can afford them. That’s why it’s important to use sound models that are easy to analyse such as capability-based security systems. That’s why it’s important to offer an API that only adds two extra lines of code to your application. That’s the only way to democratize access to such technology.

Rust programmers' vision of security is to rewrite the world in Rust, a rather unfeasible undertaking, and a huge waste of resources. In a similar fashion, Deno was released to exploit v8 as the basis for its sandboxing features (now they expect the world to be rewritten in TypeScript). The heart of Emilua’s sandboxing relies on technologies that can isolate any code (e.g. C libraries to parse media streams).

Back to our API, the hypothetical sandbox class that we showed earlier will have to be some library that abstracts the differences between each sandbox technology in the different platforms. The API that Emilua actually exposes as of this release abstracts all of the semantics related to actor messaging, work/lifetime accounting, process reaping, DoS protection, serialization, lots of Linux namespaces details (e.g. PID1), and much more, but it still expects you to actually initialize the sandbox.

The init.script

Every process carries associated credentials that enable operation on system-wide addressable objects such as filesystem objects and sockets. We setup a sandbox by disabling the ambient authority so the address space itself becomes inaccessible. Sandboxed code thus should be run only after such setup already completed successfully. The proper hook to perform this setup is init.script. init.script runs right after the process is created.

After the sandboxed actor is up it can receive access to new resources through its inbox. If any security exploit is performed on the sandboxed code, then only the objects it has access to are rendered vulnerable (the damage is thus contained in its compartment).

Landlock (Linux)

local init_script = [[
    local rules = C.landlock_create_ruleset{ handled_access_fs = {
        "execute", "write_file" "read_file", "read_dir", "remove_dir",
        "remove_file", "make_char", "make_dir", "make_reg", "make_sock",
        "make_fifo", "make_block", "make_sym", "refer", "truncate" } }
    set_no_new_privs()
    C.landlock_restrict_self(rules)
]]

spawn_vm{
    subprocess = {
        init = { script = init_script }
    }
}

Landlock as of now can only control access to filesystem objects, but future versions will be more complete.

Capsicum

spawn_vm{
    subprocess = {
        init = { script = "C.cap_enter()" }
    }
}