4

Snapshotting is like compiling but better

 2 years ago
source link: https://coder-mike.com/blog/2022/06/25/snapshotting-is-like-compiling-but-better/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

TL;DR: The final output of a traditional compiler like GCC bears a family resemblance to a Microvium snapshot, but the snapshotting paradigm is both easier to use and more powerful because it allows real application code to run at build time and its state to persist until runtime.

What is snapshotting?

My Microvium JavaScript engine is built on the paradigm of creating a VM snapshot as the deployable build artifact rather than creating a traditional compiled binary. As a developer, when you run the Microvium engine on your desktop with a command like microvium main.js it will execute the script until all the top-level code is complete and then output a snapshot file containing the final VM state. The snapshot file can then be “resumed” on an embedded device using Microvium’s embedded C library (for more details, see Getting Started). Although Microvium is designed especially for microcontrollers, the principle of snapshotting goes beyond the embedded space.

Comparing to GCC

For this post, I’ll mostly compare Microvium to GCC:

gcc main.c # Compile a C program with GCC
microvium main.js # "Compile" a Microvium program
gcc main.c         # Compile a C program with GCC
microvium main.js  # "Compile" a Microvium program

These two commands are analagous. Both produce a single file as the result, and this file is what you want to deploy to the target environment. In the case of GCC, the output is of course the executable (e.g. a.out), while in the case of Microvium, it’s the snapshot (main.mvm-bc).

Both of these commands do some kind of compilation as part of the process. GCC translates your function code to machine instructions, while Microvium translates to virtual machine instructions (bytecode instructions).

Constants

Both the GCC output and the Microvium output have a section for constants, including function code. You may be familiar with this as the .text section. Among other things, this contains constant values, such as:

// JavaScript
const x = 42;
// JavaScript
const x = 42;
const int x = 42;
// C
const int x = 42;

… but better

You can do this in Microvium but not in C:

const x = foo();
function foo() {
return 42;
const x = foo();

function foo() {
  return 42;
}

In C, it’s a compile-time error to call runtime functions for the calculation of constants. But in Microvium, this is perfectly legal since there is no distinction between compile-time and runtime — there is only really runtime before the snapshot and runtime after the snapshot. Although informally we may refer to the former as “compile-time” and the latter as “runtime”.

Apart from just being cleaner and easier to use, the Microvium snapshotting paradigm here allows computationally-intensive constants to be calculated at build time, using arbitrary functions and libraries that might also be useful at runtime.

Variable initializers… but better

For non-constant variables, both GCC and Microvium have an output section for the initial1 value of all the variables, which is copied into RAM at runtime. You may know this traditionally as the .data section.

But a major difference between them is that a Microvium snapshot also contains heap data.

// JavaScript
let arr = [1, 2, 3];
int* arr = malloc(3 * sizeof(int)); // ! Can't do this (in top-level code)
// JavaScript
let arr = [1, 2, 3];

// C
int* arr = malloc(3 * sizeof(int));  // ! Can't do this (in top-level code)

The best you could do in C for the above example would be to have an init function that runs early in the program to set up the initial runtime state of the program. Snapshotting is better because this initial structure can be established at build time.

Modules … but better

Microvium and C both have support for structuring your code in multiple files which get bundled into the same output artifact:

import { foo } from './foo.js'
import { foo } from './foo.js'
#include "foo.h"
#include "foo.h"

With the #include here, the dependent module implementation (e.g. foo.c) is not automatically compiled and linked into the program by GCC — you need to separately list foo.c to be compiled by GCC, or orchestrate the dependencies using a makefile.

But in the case of a Microvium import, the import statement itself is executed at build time, performing the module resolution, loading, parsing, and linking at build time, as well as executing the top-level code of the imported module. The top-level code of the imported module may in turn import other modules, transitively importing the whole module graph and executing its top-level initialization code.

Preprocessor… but better

Both C and Microvium support “compile-time” logic:

#if USE_FOO_1
#define foo foo1
#else
#define foo foo2
#endif
// C
#if USE_FOO_1
#define foo foo1
#else
#define foo foo2
#endif
// JavaScript
const foo = USE_FOO_1 ? foo1 : foo2;
// JavaScript
const foo = USE_FOO_1 ? foo1 : foo2;

Of course, you already knew that, because all the examples so far have demonstrated the fact that snapshotting allows you to run JavaScript at compile time. But I want to emphasize some of the key reasons why the snapshotting paradigm is better for this:

  • You don’t need two different programming languages (e.g. the preprocessor language and the C language).
  • Your “compile-time” code has the full power of the main language.
  • Your runtime code carries over the state from your compile-time code.

So in some sense, this unifies the preprocessor language with the main language. This applies similarly to other “compile-time languages”, such as makefiles and linker scripts. Or if you’re coming from the JS world, consider how the snapshotting paradigm obviates the need for a webpack.config (see Snapshotting vs Bundling).

But what about using the preprocessor to conditionally include different runtime logic, such as for different devices? For example, consider the following C code:

int myFunction() {
#if SOME_CONDITION
doX();
#else
doY();
#endif
int myFunction() {
#if SOME_CONDITION
  doX();
#else
  doY();
#endif 
}

Of course, it’s easy to see how this example might translate to JS:

function myFunction(someCondition) {
if (someCondition) {
doX();
} else {
doY();
function myFunction(someCondition) {
  if (someCondition) {
    doX();
  } else {
    doY();
  }
}

Now we have the bonus that our unit tests can inject someCondition to test both cases.

But doesn’t this code mean that now we have both doX and doY branches at runtime, taking up ROM space? (and someCondition)

That’s why I’ve also been developing the experimental Microvium Boost: an optimizer that analyzes a snapshot and removes unused branches of code. For example, if the analysis shows that someCondition is always true in your program, it can remove it as a parameter from myFunction and also remove the call to doY as dead code. This is still experimental but has shown significant success so far.

Host exports… but better

So far we’ve been considering an executable output from GCC, but it would be more accurate to compare a Microvium snapshot with a compiled shared library (e.g. DLL). Like a shared library, Microvium snapshots do not have a single entry point but may contain many exported functions to be resolved at runtime on the device.

A Windows DLL suits the analogy better than a Linux shared library, so in this section the examples will use msbuild rather than GCC.

Both a Microvium snapshot and a DLL binary contain a section for dynamic linking information — a table that associates relevant functions in the DLL with a number2 so that the host program using the DLL/snapshot at runtime can find them.

In the case of a DLL, you can provide the compiler with a DEF file that tells the compiler what to put into the DLL export table. If you wanted to export the functions foo, bar, and baz from the DLL with IDs 1, 2, and 3 respectively, your DEF file might look like this:

LIBRARY MyLibrary
EXPORTS
foo @1
bar @2
baz @3
LIBRARY   MyLibrary
EXPORTS
   foo   @1
   bar   @2
   baz   @3
int foo() { return 42; }
int bar() { return 43; }
int baz() { return 44; }
// C
int foo() { return 42; }
int bar() { return 43; }
int baz() { return 44; }

The equivalent in Microvium would be as follows:

// JavaScript
function foo() { return 42; }
function bar() { return 43; }
function baz() { return 44; }
vmExport(1, foo);
vmExport(2, bar);
vmExport(3, baz);
// JavaScript
function foo() { return 42; }
function bar() { return 43; }
function baz() { return 44; }

vmExport(1, foo);
vmExport(2, bar);
vmExport(3, baz);

You may have noticed a recurring theme in this post: the Microvium snapshotting paradigm doesn’t require a whole new language in order to do different build-time tasks. In this case, Microvium doesn’t require a DEF file (or a special __declspec(dllexport) language extension), since vmExport is just a normal function. This is just simpler and more natural.

Another recurring theme here is that the snapshotting approach is more powerful, allowing you to do things that are impossible or impractical in the traditional paradigm. Take a look at the following example in Microvium:

// JavaScript
for (let i = 1; i <= 3; i++) {
vmExport(i, () => 41 + i);
// JavaScript
for (let i = 1; i <= 3; i++) {
  vmExport(i, () => 41 + i);
}

This has the same overall effect as the previous code, adding 3 functions to the export table of the deployed binary with IDs 1, 2, and 3 and which return 42, 43, and 44 respectively. But having vmExport be a normal function means that now we have the full power of the language for orchestrating these exports, or for writing an abstraction layer over the export system, or outsourcing this logic to a third-party library.

Side note: a more subtle point in this example for advanced readers is its code cohesion. The single line of code mixes both compile-time and runtime code (vmExport(i,...) and ()=> 41 + i respectively), but keeps the related parts of both in close proximity. This is the difference between temporal cohesion (grouping code by when its run) vs functional cohesion (grouping code by what feature it relates to) (see Wikipedia). A common disadvantage of having separate build-time or deploy-time code (e.g. a DEF file, makefile, linker script, webpack.config, dockerfile, terraform file, etc) is that it pushes you into temporal cohesion, which in turn damages modularity and reusability.

Conclusion

The idea of deploying a snapshot rather than a traditional compiled binary opens up a whole new paradigm for software development. The end result is very similar — a binary image with sections for different memory spaces, compiled function code, constants, initial variable values, and export/import tables — but the snapshotting paradigm is both simpler and more powerful.


  1. In the context of Microvium, the word “initial” here refers to the initial state when the snapshot is resumed, not the initial state when the program is started, since the program starts at build time, and variables in JavaScript start with the value undefined. 

  2. DLL exports can be by name or number, but Microvium exports are only by number, for efficiency reasons, so that’s what I’m using in the analogy here. 


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK