Internals

The target public for this document are C++ programmers who want to delve into the project’s code, not lua users. Native plug-in authors should also read this page.

The intent of this page is not to detail every internal of the project, but just to give an overview of the architecture. Details change quickly and documentation would lag behind, so they’re avoided.

Once you read it, you should be familiar with the assumptions made thoroughly the project, and how to interact with the native code.

We assume that you already have some familiarity with the lua C API and Boost.Asio.

Multiple lua VMs

The project allows multiple OS threads to call asio::io_context::run(), so lua VMs can jump from one thread to another freely, but they will always refer to the same asio::io_context and each will be protected by its own ASIO strand.

-- Instantiates a new lua VM that shares
-- the caller's `asio::io_context`
spawn_vm(module)

-- Instantiates a new lua VM in a new
-- thread with its own `asio::io_context`
spawn_vm(module, { inherit_context=false })

You must specify a lua module name to run in the new VM, not a function. The module will be loaded and run in the new VM.

The only way for two different lua VMs to communicate is message passing. The channels are given when you instantiate the extra VMs. The channels accept a range of different values and will deep-copy them. You can also send references to IO objects, but the original references will be rendered unusable (their metatables are unset). Do pay attention to not let objects that have pending operations to be sent over (EBUSY, but do create an error code just for that).

Nor synchronization primitives (such as mutex) nor fiber handles can be sent over the channels and by implication can’t be used to synchronize (or send cancellation requests to) fibers running in different lua VMs.

You can also send a channel over a channel. This will only send the channel “address” over and will allow complex routing among the lua VMs. If you send a channel’s rx-end, the other side will receive a tx-channel anyway. On the C++-side, we need to implement a MPSC strand-based channel.

These characteristics should be enough to implement actor patterns. And it is not the job of emilua to enforce good patterns on applications. The patterns can be configured purely in the lua side of coding.

-- Spawn extra threads to the
-- caller's `asio::io_context`
spawn_context_threads(count)

Leaving the actor model aside for a moment, it’s now easy to have threads with work-stealing (e.g. 8 lua VMs sharing the same asio::io_context running on 4 threads) so you don’t have to worry about load-balancing.

Inside a single lua VM

When you issue some IO operation (including chan:receive()), the calling fiber will suspend, but other fibers from the same lua VM are allowed to kick in (cooperative multitasking). Fibers can share state with each other safely (and free from contention problems) as-if the program was single-threaded.

-- Spawn a new fiber on this lua VM
spawn(fn)

You can use the fiber handle just like you’d use a thread handle. There is join(), detach() and interrupt().

All sync primitives obey some characteristics thanks to the restrictions we’ve laid out:

  • They always live in the same strand. They never migrate strands.

  • They don’t synchronize with fibers from other strands (except for channels, but that’s another story).

Given these conditions, it’s now easier to implement and reason about the C++ code.

Only the C++ code that suspended the fiber can resume it back. If the operation should be cancellable, the async op should set an interrupter before suspending the fiber. No other code from the runtime will wake this fiber up. Once the interrupter is called, it’ll be cleared automatically to prevent further complications on the async op implementation. The completion handler should also clear the interrupter to make sure it won’t be (wrongly) reused for other operations.

A good level of serialization can be done by exploring these properties and simplify the implementation a lot. For once, you know no other code will wake the fiber up, so you can just as well call io_obj.cancel() on the interrupter and map asio::error::operation_aborted to errc::interrupted on the completion handler. A single handler (and no other) will take care of waking the fiber. There is no race to deal with here or anything alike.

A lot of the boilerplate is handled already on the prologue/epilogue functions from vm_context.

Userdata practices

Besides the common practices to create custom objects through userdata, Emilua (IO) objects will also:

  • Hide the metatable. By doing that, user code is prevented from changing the metatable (the metatable is just an usual table after all) that native code relies on.

  • Assume lua_setmetatable() is an indivisible operation for userdata (i.e. if it fails, it doesn’t set a metatable nor any __gc metamethod). This assumption is important to simplify object management by getting away with all pre-initialization tricks teached on Roberto’s manuals and associated complexities.

  • Assume lua_setmetatable() reports errors through exceptions (i.e. it always returns 1). This is a superset of the previous point and it is guaranteed by the VM[1]. We don’t really care as much about this point, but as it is guaranteed, the assumption described in the previous point (which we do care about) is covered as well.

C++ async operations

Let’s begin with require().

require()'ing a module is also an async operation which will suspend the caller fiber. Every module has its own isolated environment (i.e. a new lua thread is created for every module and that thread’s environment is configured to use a separate lua table) sharing the same lua VM. The module’s entry point is an user-provided source code evaluated to prepare the environment with the names that should be exported to the caller fiber. But this preparatory step may not be immediately ready and may need to call other async operations. The rule we define to mark a module as loaded and ready is when its main fiber finishes (synchronization code similar to fiber:join()).

To further enforce a more manageable project layout, it is only allowed to import new modules from the main fiber. This may introduce a “slow” startup in some project layouts, but:

  • It is simpler to reason about the relationship of exported/imported names if we restrict them to the same main fiber. One such use we do of this feature is detecting whether the inbox module was loaded and close it if not.

  • We are explicitly not aiming for remote modules (e.g. JS running on a web browser), so we don’t need to care about slow startup happening in this event.

  • In the cases where some module startup is indeed slow, the module programmer himself can adopt lazy loading techniques within his module’s functions to have a quick startup with respect to the rest of the application.

Modules evaluate only once and are cached. We never unload them. We keep a reference to their lua thread for as long as the lua VM is active.

Loading a module forms a loader-loaded relationship. This relationship builds a chain that must be checked when a new module is require()d (so we can for instance prevent cyclic imports). But each module will have its own environment. This means the C++ function that implements require() needs to check lua-hidden state associated with the caller lua function (not a global one). That’s the module system state per-module.

Rule

The per-module state is stored by using the module’s main thread as a key in the fibers table. The fibers table is strong, but this isn’t a problem because the module shall never be unloaded anyway. Code that unrefs fiber coroutines shall check whether the lua thread represents a module and skip removing it from the fibers table if so.

We can’t store the module system data directly at the thread environment because lua code can change the thread environment by calling setfenv(0, table).

We’ve already gone through the trickiest parts and added the most important restrictions to the table (no lua-related pun intended), so the remaining rules should be quick’n’easy to catch.

When you initiate an async operation, the C++ side will copy the lua_State* to handle the completion (or cancellation) later. However, any LUA_ERRMEM will trigger an emilua-call to lua_close() and L may then be invalid when we later try to resume it. So the completion handler need to check whether the vm is still valid before accessing it and this is the purpose of the vm_context structure (also protected by the same strand as the vm).

this_fiber

As long as lua code is executing, there is a current fiber and this property stays unchanged for as long as control doesn’t return to host.

transparent, adj.

Being or pertaining to an existing, nontangible object.

It’s there, but you can’t see it
— IBM System/360 announcement, 1964
virtual, adj.

Being or pertaining to a tangible, nonexistent object.

I can see it, but it’s not there.
— Lady Macbeth

This property is mostly transparent to lua code. Which is to say that the programmer is aware of this property, but there isn’t a tangible object that it can track back to this_fiber. This is mostly true, but there is a quite tangible this_fiber lua global object that the user can inspect — exposed at the beginning of the first thread execution.

However, this_fiber being a global is shared among all the fibers, so it can’t point to a single fiber. Instead, it will query which fiber is current and do operations on it.

C++ async ops will always store which fiber is current to know how to resume it back. And before a fiber is resumed, this info is stored at a know lua registry’s index so future async ops will get to know about it too. The reason why we can’t rely on the L argument passed to C functions registered at the VM and the current fiber needs to be remembered is because there will be a L that points to the wrong lua thread as soon as the user wraps some function in a coroutine.

This design works well because we don’t mix responsibilities of the scheduler with user code (as is the case for Fiber#resume in Ruby which would be better suited by a Fiber#spawn() that accepts post/dispatch execution policies and would avoid the (un-)parking unsound ideas altogether).

Asynchronous event notification

Some events are intrusive and will be generated even when no thread/fiber asked for them. The classical example are UNIX signals. A sighandler must be registered to handle them, but that begs the question: from which thread are these functions called? In the C world there are multiple answers:

SIGEV_SIGNAL

The handler will be called asynchronously from any thread. That means a lot of restrictions to what a sighandler can do.

SIGEV_THREAD

The handler will be called from an unspecified thread. Now we have way less restrictions, but some still exist (e.g. unsafe thread-local variables and thread cancelability state).

SIGEV_KEVENT

The golden standard for event multiplexing in the C world.

Generally the need for asynchronous events spurs from bad design and should be avoided. However when integrating lua code to existing libraries we must deal with asynchronous events now and then. Emilua reserves a lua coroutine/thread for which no suspension is ever allowed and that will give the lua user a mix between SIGEV_SIGNAL and SIGEV_THREAD restrictions. From the handler the user can notify a condition variable to achieve friction-less handling from a different fiber similar to what SIGEV_KEVENT enables.

From the C++ side, one just needs to get the asynchronous event (lua) thread and rely on lua_pcall() (no need for complex lua_resume() handling, nor fiber APIs).

LUA_ERRMEM

Lua code cannot recover from allocation failures. As an example (and single-VM only):

my_mutex:lock()
scope_cleanup_push(function() my_mutex:unlock() end)

If the VM fails to allocate the closure passed to scope_cleanup_push(), my_mutex will be kept locked and the lua code inside that VM will be in an unrecoverable state. There’s no pattern or ordering to make resource management work here as allocation failures can happen almost anywhere and we then inherit some constraints and reasoning from preemptive scheduling. The only option (and this applies to any allocation failure reported by the lua VM when running arbitrary user code) is to terminate the VM from the C++-side.

When lua_close() is called, there is no guarantee pending operations will be canceled as they might hold strong references to the underlying IO object preventing its destructor from getting called. Therefore, the vm_context structure also holds an intrusive container of polymorphic elements which are destroyed after lua_close() is called and can be used to register cleanup code to avoid such leaks. If the operation finishes, the IO object is free to reclaim their own objects from this container and use them for other purposes.

lua_CFunction objects should never call lua_close(). If they detect LUA_ERRMEM all they have to do is to mark the flags field from vm_context and suspend the fiber. The host will take care of closing lua_State* and extra cleanup when it recovers control of the thread.

The other side of the coin is to detect LUA_ERRMEM. All interactions with the VM from the C API happens through the virtual stack, so naturally that’s the first concern. You must not push anything on the stack if there’s no extra free stack slot available. To check for such slot space, there’s lua_checkstack().

The usual C function signature is not enough to convey all the semantics required by the Lua C API. On the Functions and Types section from the manual, we verify the following information:

Here we list all functions and types from the C API in alphabetical order. Each function has an indicator like this: [-o, +p, x]

[…​] The third field, x, tells whether the function may throw errors: '-' means the function never throws any error; 'm' means the function may throw an error only due to not enough memory; 'e' means the function may throw other kinds of errors; 'v' means the function may throw an error on purpose.

The 5.1’s signature for lua_checkstack() is:

int lua_checkstack(lua_State *L, int extra); // [-0, +0, m]

That’s obviously bogus. If lua_checkstack() can throw on ENOMEM that means there is no possible safe interaction with the VM. That’s — plain and simple — a bug. This bug was fixed in Lua 5.2 when the signature changed to:

int lua_checkstack(lua_State *L, int extra); // [-0, +0, –]
Lua 5.2 received a few other improvements concerning ENOMEM such as obsoleting lua_cpcall() by introducing light C functions. API-wise, Lua 5.2 was a great release as it fixed many shortcomings.

You don’t always need to call lua_checkstack() before doing anything thanks to at least LUA_MINSTACK free stack slots being guaranteed for you when the VM calls into your lua_CFunction objects. And here’s where things start to get tricky. Consider the following Lua code:

coroutine.wrap(function()
    spawn(function()
        print('Hello World')
    end)
end)()

The underlying C function implementing spawn() is exposed to 3 different lua_State* handles:

Current fiber

get_vm_context(L).current_fiber(). The one that calls coroutine.wrap().

Inner coroutine

The L parameter from lua_CFunction. The one that calls spawn().

New fiber

lua_newthread(L) return value. The one to print “Hello World”.

If lua_error() is called on L, the stack for L will be in a completely deterministic state. Anything this lua_CFunction object pushed on the stack will be popped and the whole pcall()-chain on the state L will be respected too. However lua_error() might be called indirectly through other API functions. That’s the signature for lua_newtable():

void lua_newtable(lua_State *L); // [-0, +1, m]

As we’ve seen previously:

'm' means the function may throw an error only due to not enough memory

“Throw” here means sorts of a call to lua_error() (LUAI_THROW to be more accurate). That’s the pcall()-chain and each lua_State has its own (this property won’t change even if you compile the Lua VM as C++ code). This independent pcall()-chain for each lua_State is not a limitation from the C API, but an accurate model of the underlying machinery happening in Lua code itself. Consider the following snippet:

c1 = coroutine.create(function()
    pcall(function()
        -- ...
    end)
end)

If c1 is suspended in the middle of pcall(), it retains this private pcall()-chain that doesn’t get mixed with pcall()-chains from other coroutines (i.e. the other lua_State* handles). Therefore the C API accurately maps the language behaviour on retaining a private pcall()-chain for each lua_State and we can’t expect any different behaviour here really. Lua documentation on the issue has been ironed out little-by-little throughout its releases. Lua 5.3 was the one to finally explicitly state the behaviour we just described:

The panic function, as its name implies, is a mechanism of last resort. Programs should avoid it. As a general rule, when a C function is called by Lua with a Lua state, it can do whatever it wants on that Lua state, as it should be already protected. However, when C code operates on other Lua states (e.g., a Lua argument to the function, a Lua state stored in the registry, or the result of lua_newthread), it should use them only in API calls that cannot raise errors.

In short, that means our spawn() implementation that is exposed to the {L, current fiber, new fiber} triple would throw to the wrong pcall()-chain if it calls lua_newtable(new_fiber). The solution is to use lua_xmove() when necessary and maintain rigorous discipline as to which C API functions are called on “foreign” lua_State* handles paying very special attention to their respective throw specifications. As for the discipline required, Rici Lake wrote a good summary on the lua-users wiki:

There are quite a number of API functions which will never throw a Lua error. API functions that throw errors are identified in the reference manual as of 5.1.3. First, none of the stack adjustment functions throw errors; this includes lua_pop, lua_gettop, lua_settop, lua_pushvalue, lua_insert, lua_replace and lua_remove. If you provide incorrect indexes to these functions, or you haven’t called lua_checkstack, then you’re either going to get garbage or a segfault, but not a Lua error.

None of the functions which push atomic data — lua_pushnumber, lua_pushnil, lua_pushboolean and lua_pushlightuserdata ever throw an error. API functions which push complex objects (strings, tables, closures, threads, full userdata) may throw a memory error. None of the type enquiry functions — lua_is*, lua_type and lua_typename — will ever throw an error, and neither will the functions which set/get metatables and environments. lua_rawget, lua_rawgeti and lua_rawequal will also never throw an error. Aside from lua_tostring, none of the lua_to* functions will throw an error, and you can avoid the possibility of lua_tostring throwing an out of memory error by first checking that the object is a string, using lua_type. lua_rawset and lua_rawseti may throw an out of memory error. The functions which may throw arbitrary errors are the ones which may call metamethods; these include all of the non-raw get and set functions, as well as lua_equal and lua_lt.

On a side note, Lua 5.2 added the following:

If an error happens outside any protected environment, Lua calls a panic function (see lua_atpanic) and then calls abort, thus exiting the host application. Your panic function can avoid this exit by never returning (e.g., doing a long jump to your own recovery point outside Lua).

The panic function runs as if it were a message handler (see §2.3); in particular, the error message is at the top of the stack. However, there is no guarantees about stack space. To push anything on the stack, the panic function should first check the available space (see §4.2).

That’s actually behaviour that already existed on the version 5.1. An alternative panic function could just throw a C++ exception to implement this __attribute__((noreturn)) behaviour. However this hypothetical panic function is not an alternative solution to our problems due to the combination of the following facts:

  • As described elsewhere in this document, we require lua_error() to act as-if it throws a C++ exception so our destructors are properly called. That requires the underlying Lua VM (LuaJIT in our case) to throw and catch C++ exceptions.

  • A C++-throw is triggered from lua_newtable(L). The type thrown here is internal to the Lua VM and we cannot throw it ourselves. LUA_ERRMEM information is correctly preserved.

  • A panic is triggered from lua_newtable(new_fiber). Our panic function would in turn discard LUA_ERRMEM and throw a generic C++ exception.

  • On lua_newtable(new_fiber) hitting LUA_ERRMEM, the L's C++-catch handler wouldn’t receive the original error (LUA_ERRMEM). That means information loss. That means our host code (the code that first calls into the Lua VM) won’t call lua_close() (when it should) as its lua_pcall()/lua_resume() call might not report the correct error reason (LUA_ERRMEM). That also means the possibility to unwind the wrong number of cascaded pcall() blocks (a pcall() from Lua code is not supposed to handle LUA_ERRMEM — if correctly detected — so the number of blocks unwinded differs whenever LUA_ERRMEM is involved).

  • Although LuaJIT can catch generic C++ exceptions, it lacks context and cannot possibly restore the stack state on each lateral lua_State* handle at play (the triple {L, current fiber, new fiber} in our case). If the spawn() lua_CFunction had a value pushed on the current_fiber stack when a new_fiber panic-triggered exception raises, the value on the current_fiber stack wouldn’t be properly popped by the time L handles the C++ exception (and do remember that L is executing nested on top of current_fiber so you can already imagine the chaos here). In short, the Lua VM needs our cooperation to maintain some invariants.

  • By wrapping these calls into our own C++ catch blocks we could work around some of these issues, but the thought that thread control would still return to the Lua VM one last time after the panic handler got called is just too scary and previous mailing list threads on this topic weren’t very reassuring. For one, if the exception is panic-triggered by current_fiber, we won’t know what remains on this stack (except for the stack top), but that’s exactly the lua_State that the host is operating on when our lua_CFunction got called on L. Even if control does return safely to our host it would still have problems to deal with there.

That covers our policy when implementing lua_CFunction objects. In short, we cannot resort to Lua panics here and the only real solution is the rigorous discipline on C API usage mentioned earlier.

Now let’s talk about our policy for host code. The Lua suspending IO functions are implemented by querying which fiber is current and scheduling a lua_resume() on it as the callback for some Boost.Asio supported C++ async_*() function (plus a ton of other details properly documented elsewhere on this document such as strand handling and so on). The initiating function is called from the Lua VM, but the callback is not. The callback will act as the host.

Back to lua_resume(), this function itself doesn’t throw:

int lua_resume(lua_State *L, int narg); // [-?, +?, –]

However the code that runs before lua_resume() might throw. This is the code that pushes the arguments to the coroutine. For instance, if a string is one of the coroutine parameters, you will have to use C API that might throw on ENOMEM:

void lua_pushlstring(lua_State *L, const char *s, size_t len); // [-0, +1, m]

It’s no use trying to call lua_pcall() to wrap lua_pushlstring() here. lua_state() now returns LUA_YIELD and that means you can’t use lua_pcall() on this lua_State* handle. You can’t create a new handle and use the lua_xmove() trick either as lua_newthread() itself can throw on ENOMEM:

lua_State *lua_newthread(lua_State *L); // [-0, +1, m]

Fear not, for here is the place where we can finally use a panic function to throw a custom C++ exception. There are only two caveats. The first one is related to LuaJIT having such tight integration with native exceptions that it makes (almost) no distinction between lua_pcall() and C++ catch frames[2]. The net result is that you can use C++'s catch-all blocks and then no panic function will ever be involved (by now you must be feeling that we just travelled to the farthest candy shop in the kingdom just to make a full-turn just one block away from destination when we changed our minds and decided to go on the neighbour’s candy shop). Despite the lack of a real panic function throwing our own exceptions, I’ll still use the same previous terminology (i.e. panic-triggered exceptions).

The second caveat is a little charming race to avoid. The completion handler doing the host job is executed through the strand that protects the VM. If we let the exception escape the completion handler, another thread might try to use the VM before we have the chance to close it. In other words, the following approach has a race and thus is not used:

for (;;) {
    try {
        // Completion handler allows the panic
        // exception to escape here.
        ioctx.run();
        break;
    } catch (...) {
        // This is a bug. This code isn't executed
        // through the VM strand. A pending operation
        // that just finished could try to access
        // `current` from another thread while we're
        // here.
        vm_context* current = ...;
        current->close();
        continue;
    }
}

Therefore, it is responsibility from the completion handler to handle the panic-triggered exception (sorry about the boilerplate on your side, but that’s the way it is).

try {
    // lua_push*() calls
} catch (...) {
    vm_ctx->close();
    return;
}
int res = lua_resume(fiber, narg);

That is enough to cover the policy for host code and finally finish the LUA_ERRMEM discussion too.

Channels and resources

The biggest challenge to cross-VM resource management are the multi-strand sync primitives (i.e. the channels). They have to execute code that jumps from one strand to another to finish their jobs. If the associated execution context already finished, then they would be stuck forever. The solution is for them to keep the execution context busy through a work guard.

However some rules are needed to make this work:

  • Rx-channels (i.e. inbox) don’t keep work guards.

  • Tx-channels keep a work guard to the other end while they are alive. But they only keep a work guard to their own strands when they have an active operation.

If the tx-channels are not closed, they will prevent execution contexts that are no longer necessary from being destroyed. But that’s the best we can do. We could periodically call the GC to free unused channels, but so will lua code anyway and there’s nothing left for us to do on the C++ side. A good practice for lua code would be to add the following chunk at the beginning of the fiber who’s gonna process the actor messages:

scope_cleanup_push(function() inbox:close() end)

Extra rules for channels management:

  • As an extra safety measure, if the main fiber finishes and inbox wasn’t imported, the runtime closes it.

  • Channels (tx and rx) also get closed when the VM is terminated.

  • Channels must only upgrade their weak references to vm_context once they migrated to the target strand. Otherwise, they would prevent the VM from auto-closing (and hairy problems would follow).

The exception mechanism

C++ exceptions must not be used to propagate errors across lua/C++ frames. However, lua errors may simply trigger stack unwinding (the code makes heavy use of setjmp()) and we do depend on RAII to keep the code correct.

It is assumed that any call to lua_error() will behave as-if it throws a C++ exception (thus triggering our destructors). We require some support from the luaJIT VM for this. Specifically, we can’t rely on the “no interoperability” category from their “exception” section on the “extensions” page because the following restriction:

Throwing Lua errors across C++ frames will not call C++ destructors.

To make matters worse, the feature we do depend on only appears in the the “full interoperability” category:

Throwing Lua errors across C++ frames is safe. C++ destructors will be called.

A different approach would be to implement an exception mechanism in terms of coroutines (although it’d add to code complexity):

Exceptions < Coroutines < Continuations

Exceptions can be thought of as a subclass of coroutines. You can implement an exception mechanism with coroutines.

— leafo
leafo.net

But this path would be a dead-end as native lua errors would still be reported through lua_error(). For luaJIT, lua_error() plays well with our code because:

The LuaJIT VM is fully resumable. This means you can yield from a coroutine even across contexts, where this would not possible with the standard Lua 5.1 VM: e.g. you can yield across pcall() and xpcall(), across iterators and across metamethods.

Wasn’t for this guarantee, the project would be monstrous. To understand why this guarantee is important, let’s unravel the fundamental pattern for fibers support. We always implicitly wrap every user code inside a lua coroutine:

local fib = coroutine.create(user_fn)

So async operations can suspend the calling fiber and resume them later.

But user_fn might very well contain a pcall() and execute our suspending async function inside it:

function user_fn()
    pcall(function()
        io_obj:emilua_async_op()
    end)
end

The exception mechanism should not block our ability to suspend fibers. When our own native code calls lua_yield() to suspend a fiber, the suspension mechanism should be able to cross the pcall() barrier.

To wrap all up so far, the standard lua exception mechanism is used to report errors. The only difference is that emilua will lua_error() a structured error object inspired by std::error_code for our own errors.

Things would get a little tricky on the following point that we raised previously though:

[…​] and we do depend on RAII to keep the code correct.

Imagine we have some code like the following:

class reference
{
public:
    reference() : L(nullptr) {}

    reference(lua_State* L)
        : L(L)
        , idx(luaL_ref(L, LUA_REGISTRYINDEX))
    {}

    ~reference()
    {
        if (!L)
            return;

        luaL_unref(L, LUA_REGISTRYINDEX, idx);
    }

    reference(reference&& o)
        : L(o.L)
        , idx(o.idx)
    {
        o.L = nullptr;
    }

    lua_State* state() const
    {
        return L;
    }

    void push() const
    {
        assert(L);
        lua_pushinteger(L, idx);
        lua_gettable(L, LUA_REGISTRYINDEX);
    }

private:
    lua_State* L;
    int idx;
};

If an object of this type has its destructor called on lua_error()-triggered stack unwinding, it means we’re manipulating the lua_State* (luaL_unref(L) in this example) on stack unwinding (i.e. outside of a lua-catch block which would be just after a pcall() return). If the VM is not in a safe state for manipulations at this moment (this scenario just doesn’t happen if you stick with plain C which is the target lua was developed for) then we’re screwed. Luckily, the VM can handle such situations just fine as it is hinted on the luaJIT documentation:

static int wrap_exceptions(lua_State *L, lua_CFunction f)
{
  try {
    return f(L);  // Call wrapped function and return result.
  } catch (const char *s) {  // Catch and convert exceptions.
    lua_pushstring(L, s);
  } catch (std::exception& e) {
    lua_pushstring(L, e.what());
  } catch (...) {
    lua_pushliteral(L, "caught (...)");
  }
  return lua_error(L);  // Rethrow as a Lua error.
}
http://luajit.org/ext_c_api.html#mode_wrapcfunc
Recommended usage pattern for LUAJIT_MODE_WRAPCFUNC

This guarantee is promised again (although this version of the promise is read-only) in their “extensions” page (and again only at the full interoperability category):

Lua errors can be caught on the C++ side with catch(…​). The corresponding Lua error message can be retrieved from the Lua stack.

The final piece for our puzzle is related to async ops converting std::error_code into lua exceptions (i.e. lua_error()). The completion handler for async ops is not called in a lua context, so they cannot just call lua_error() and hope the correct context will catch the exception (there’s no API similar to resume_with() from Boost.Context). They need to return control to the native code that suspended the fiber so it can throw a lua exception before control returns to lua code.

This guarantee used to exist on luaJIT 1.x (which included Coco):

Now, if the current coroutine has an associated C stack, lua_yield() returns the number of arguments passed back from the resume.

The lack of allocated C stacks brings more complications to the implementation that will be discussed later. lua_yieldk() from Lua 5.2 would be enough for us (and cheaper!), but we don’t have that either.

Yet another option would be to set an one-time hook to be called immediately just before resuming the lua coroutine, but it’d present challenges in the future if we ever add debugging support, so it is avoided.

And the solution Emilua get away with is wrapping the C function inside a lua function. The C function returns a 2-tuple. If the first argument is not nil, the lua function itself will take care of use it to raise an error.

local error, native = ...
return function(...)
    local e, v = native(...)
    if e then
        error(e)
    else
        return v
    end
end

User-coroutines

Let’s jump straight to a topic that gives some sense of continuity to the previous section. The pcall() barrier is not the only barrier that the user can insert to prevent lua_yield() from suspending the fiber. The user might very well just wrap calls using coroutine.create():

function user_fn()
    coroutine.create(function()
        io_obj:emilua_async_op()
    end)
end

Rule

Lua’s coroutine module must never be directly exposed to lua code.

The problem is solved by exposing a different coroutine module — a small shim over the original one. This version inspects this_fiber's suspension reason (native code or lua code).

Conceptually, the implementation looks like this:

function coroutine.resume(co, ...)
    if _G.busy_coroutines[co] then
        -- CORUN
        error("cannot resume running coroutine", 2)
    end

    local args = {...}
    while true do
        local ret = {raw_coroutine.resume(co, unpack(args))}
        if ret[1] == false then
            return unpack(ret)
        end
        if _G.this_fiber.native_yield then
            _G.busy_coroutines[co] = true
            args = {raw_coroutine.yield(unpack(ret, 2))}
            _G.busy_coroutines[co] = nil
        else
            return unpack(ret)
        end
    end
end

function coroutine.yield(...)
    if _G.fibers[raw_coroutine.running()] ~= nil then
        error("bad coroutine", 2)
    end
    return raw_coroutine.yield(...)
end

function coroutine.status(co)
    if _G.busy_coroutines[co] then
        return "normal"
    end

    return raw_coroutine.status(co)
end

function coroutine.running()
    local co = raw_coroutine.running()
    if _G.fibers[co] ~= nil then
        -- Fiber's coroutines work just like the main coroutine
        return nil
    end

    return co
end

coroutine.create = ...
coroutine.wrap = ...

Dead fibers

When an exception escapes the fiber stack, the hook registered with sys.set_uncaught_hook() is called. The default hook prints the stack trace to stderr and additionally terminates the VM if the exception escaped from the main fiber. If the custom hook itself fails, the default hook is then called anyway.

Scope handlers are properly popped and called after the hook returns control of the thread to the runtime.

The hook is only called for detached fibers. Therefore, a different behaviour can be chosen for each join()ed fiber. Also, if the fiber isn’t explicitly detach()ed, the hook action will be deferred until some GC round.

There isn’t a pcall block around the whole program. lua_resume is enough and it has the nice property of not unwinding the stack so it can be examined from the error handler. A new lua thread is created to execute the uncaught-hook while it has the chance to examine the unchanged error’ed call stack.

The hook mechanism isn’t implemented yet.

Functions that receive a lua callback

There are plenty of functions that have a lua closure as a parameter (e.g. pcall(), scope(), …​). If we blindly implement them in plain C, they will configure a non-leaf C stack frame which we cannot suspend.

To avoid the C stack frame in the middle of the call-stack altogether, we implement (parts of) these functions in lua, not C. The problem is then how to expose sensitive raw resources that the C functions would use. One of the goals is to not let these resources escape elsewhere.

A quick way to achieve it is by having a lua bootstrap function/chunk to create closures and later change their upvalues through C:

local private_resource = ...
return function()
    -- use `private_resource`
end

This approach is naive as luaJIT 2.x does not implement some lua functions (i.e. the sensitive raw resources that we want to keep private) as C functions and we cannot feed them as upvalues for the imported bytecode. For instance, we have this behaviour for pcall():

lua_pushcfunction(L, luaopen_base);
lua_call(L, 0, 0);
lua_getglobal(L, "pcall");
lua_CFunction pcall_addr = lua_tocfunction(L, -1);
assert(pcall_addr == nullptr); // :-(

Therefore the lua bytecode won’t be a closure with uninitialized upvalues per se, but a function that receives the private resources and returns the needed closure. It is an extra step on startup, but at least we save some cycles by compiling the bytecode with stripped debug info in the project build stage.

Process environment

A part of the process environment (e.g. UNIX signals) should be under complete control of the program and no external library should meddle with it. However, no protections will be provided to enforce this good practice.

VM settings inheritance

New actors should inherit generic customization points for the GC (e.g. step count and period) and the JIT. They should also inherit allocator settings, but they must not be prevented from creating new actors with higher allocation quotas (unless of course the global pool is already at its limit).

Lua 5.2/LuaJIT extensions

We use some C functions found only on Lua 5.2+ and/or LuaJIT:

  • luaL_traceback()

  • luaopen_bit()

  • luaopen_jit()

  • luaopen_ffi()

  • LUAJIT_VERSION_SYM()

2GB addressing limit

luaJIT has a serious 2GB limit that has been fixed on forks. By default, the broken 64-bit addressing mode is hidden behind LUAJIT_ENABLE_GC64. Emilua might consider moving to moonjit if its author don’t try to part away from the lua 5.1 core and keep himself distant from 5.3+ syntactic explosion madness. I don’t like this C++-like culture expanding to lua or other languages (kudos to Go here for avoiding it).

JIT parameters

The JIT parameters are also changed from the old defaults:

maxtrace=1000
maxrecord=4000
maxmcode=512  -- in KB
maxtrace=8000
maxrecord=16000
maxmcode=40960  -- in KB

Locales

A recent POSIX standard specified anemic per-thread and per-function locale support, but, aside from this anemic support, C uses the same locale globally for the whole process.

Meanwhile, C++ has somewhat usable support for multiple locales per process (and an extra global one that also affects the global C locale).

Functions such as perror() and strerror() will query LC_MESSAGES from the global C locale. However the sole function to query this attribute — setlocale() — is not thread-safe so we shouldn’t change the locale after the program starts and minimal initialization to the process state is done. Changing the global locale is highly unsafe and such API will not be exposed to Lua code.

The thread-safe C++ locales export functionality for LC_MESSAGES through the facet std::messages. This facet allows one to open system-defined message catalogs, and get translation messages for them. This facet exposes no equivalent for the query setlocale(LC_MESSAGES, NULL). Even if we query it at the beginning of the program and try to attach a new custom facet to the global locale object, this will create a nameless locale. Unnamed global C++ locales will break LC_MESSAGES for the C ecosystem (e.g. perror() will no longer print localized messages). Therefore custom facets are out of question.

A direct call to setlocale(LC_MESSAGES, NULL) is avoided too because ISO C++ doesn’t define the macro LC_MESSAGES. To query the current LC_MESSAGES we just look for LC_MESSAGES in the current C++ locale’s name. This approach doesn’t interfere with the C ecosystem, and also paves the way for multiple per-process locales.

One can find the list of POSIX environment variables that affect the process' locale at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap08.html#tag_08_02. The format for these variables is defined as:

[language[_territory][.codeset][@modifier]]

This format is compatible with RDF’s Turtle where LANGTAG is defined as:

LANGTAG ::= '@' [a-zA-Z]+ ('-' [a-zA-Z0-9]+)*

And it matches the semantics for BCP47 definition:

obs-language-tag = primary-subtag *( "-" subtag )
primary-subtag   = 1*8ALPHA
subtag           = 1*8(ALPHA / DIGIT)

So LC_MESSAGES=pt_BR becomes Turtle’s "literal"@pt-BR (and at least the subtag is case sensitive).

A Turtle language-tagged string ceases to be of the datatype http://www.w3.org/2001/XMLSchema#string. Its datatype will be http://www.w3.org/1999/02/22-rdf-syntax-ns#langString. If this is a problem for your application, do not use Turtle language-tagged strings.

For more information about C++ locales, the following links are relevant:

Open questions

  • Describe the behaviour for sys.exit() (for main and secondary VMs). Should it call the cancellator for every active operation? Should it exit the application?

Extra caution to take when writing plug-ins

Always keep in mind:

  • If you enable your IO object to be sent over channels, it’ll also be able to migrate to a different asio::io_context and you must take care to keep a work guard to the original asio::io_context.

  • Pending operations must hold a strong reference to vm_context and a work guard — directly or indirectly — to vm_context.strand().

  • IO objects (channels included) by themselves must not hold any strong references to their own vm_context (this cycle would prevent auto-closing the VM and associated channels). Operation initiation is the perfect time to upgrade weak references (if any) to strong ones.

  • Pending operations must not trust L from the initiating operation to decide which fiber to wake-up later on. They must resort — at initiation time — to the vm_context API. Check the simple sleep_for() implementation for a code template.

Final note

Emilua software is complex. There should be no pursuit in indefinitely extending this base. Rather, we should search for stabilization and maturity (and also tooling around a solid base).

If you think there should be a nice lua library to handle IRC and what-not, by all means do write it, but write it as a separate lua library (or native plug-in), and compete against the free market of libraries. Do not submit a proposal to integrate it in the core. There are no batteries included. And there shall be no committee-driven development.

Likewise, we should be stuck in the current lua syntax (5.1 plus some extensions found in the beta branch of luaJIT 2.1[3]) forever. If you want more syntax, use a transpiler.


2. Do notice that contrary to the feeling nourished in the mailing list thread, panic functions also would work in our case. I’ve tested/verified and I also followed the relevant source code for multiple LuaJIT versions. Really, it’s okay.
3. http://luajit.org/extensions.html#lua52 (-DLUAJIT_ENABLE_LUA52COMPAT).