Fyi, this is a design document for Temper, a programming language for high-assurance libraries that run inside anything.

How much dynamism is too much?

How can we get the most benefits of dynamism while suffering the fewest downsides?

    Abstract

    Highly static languages have some benefits and some downsides, and highly dynamic languages have different ones. This document

    What isn't “dynamism”?

    Debates about the merits of dynamic vs. static languages on programmer forums often bog down in discussions of unrelated concerns. For the purposes of this document, whether something is dynamic is not directly related to whether

    What is “dynamism”?

    A language is dynamic to the degree that the result of an expression can substitute for code that its author might write.

    First, let's consider some examples.

    Substituting Output

    # In Bash
    export VAR="$(cat x)"
    # when x contains 'Hello, World' is equivalent to
    export VAR='Hello, World'
    

    since $(…) spawns a subshell, captures its stdout, and makes that stdout available as terms to the Bash parser.

    // In JavaScript
    eval(x);
    // when x === 'f()' is equivalent to
    f();
    

    since eval passes its input to the JavaScript parser and executes statements within the containing lexical scope.

    Field Indirection

    // In JavaScript
    o[x]();
    // when x === 'foo' is equivalent to
    o.foo();
    

    since the square bracket operator interprets its argument as a property key.

    Dynamic Declaration

    // In JavaScript
    class C {
      [x]() { return 42; }
    }
    // when x === 'foo' is equivalent to
    class C {
      foo() { return 42; }
    }
    

    since […] where a property key is expected interprets the contained expression's result as a name to declare.

    Dynamic Loading

    // In JavaScript
    const y = await import(x);   // (assuming top-level await)
    // when x === 'foo.js' is equivalent to
    import y from 'foo.js';
    

    since import(…) treats its argument as a “module reference”, effectively a URL.

    // In Java
    Class<?> c = Class.forName(x);
    // when x.equals("foo.Bar") is equivalent to
    Class<?> c = foo.Bar.class;
    

    since Class.forName, given a type name, loads that type which need not be directly referenced by the program.

    Runtime Type Checks

    // In TypeScript
    function f(o, x) {
      if (!(o instanceof x)) {
        throw new TypeError();
      }
      …
    }
    // where x is T
    function f<T>(o: T) {
      …
    }
    // modulo limitations of erasure, instanceof, cross-realm code, and
    // null-prototype Objects.
    

    since instanceof checks whether a value is of a particular type by looking at its prototype chain.

    Use Cases and Stakeholders

    Code that adapts

    System architects, system integrators, and tool authors all have an interest in code that adapts to its environment.

    For example, an object-relational broker (ORB) that queries a database for its table structure and generates classes for each table in the database.

    The downside of adaptability is that it's harder to predict the behavior of a system in production. If the system adapts to the presence of a test harness, or the production database differs from the test database. How valuable is marginal testing effort if what you run isn't the same as what you tested?

    Auto-wiring code

    Auto-wiring is the plugging together of software components together based on conventions. These conventions can be based on details that are apparent to programmers but which are typically considered semantically insignificant by static programming languages like:

    Interested parties include:

    The downside of auto-wiring is that code reviewers and static analyzers have to understand the conventions to draw reliable conclusions based on arguments of the form “x only talks to …”

    Configuration

    Static systems also depend heavily on configuration files. What makes configuration dynamic is the presence, in the configuration file, of module names, source file paths, language identifiers, even small snippets of code. For example, if a configuration file uses JSON we might see entries like:

    Interested parties include:

    Remote Procedure Calling

    Remote procedure calls depends on a relationship between bits in a network message and names of functions defined in an API. For example, a GET request with path /foo might invoke a method like FooServiceHandler.getFoo.

    Interested parties include web service providers and cloud providers.

    Web service providers can focus on implementing functions and methods and leave the details of how to unpack network inputs and pack network outputs to common infrastructure that can handle multiple formats: JSON, XM, protocol buffers.

    Cloud providers can provide that common infrastructure which provides some lock-in to prevent their clients from trivially moving to other cloud providers. They can also provide value-adds like authz and automatically generating documentation on how to connect to web APIs using a variety of formats.

    Serialization and deserialization

    Serialization is the act of converting a value in a programming language into bytes or characters that can be sent outside the process, whether to the file-system, another process, a database, or another machine. Deserialization is the reverse: constructing a value from bytes or characters that might have originated outside the process.

    Both assume a relationship between parts of a string and parts of a type. The relationship may be obvious to humans, for example when marshalling between a type like struct Point { int x, y; } and JSON like {"type":"Point","x":1,"y":2}.

    Interested parties include:

    The downsides are various:

    Most of the security and analyzability problems with deserialization come from these properties:

    Safe Serialization Under Mutual Suspicion” introduce a scheme that, for a highly dynamic language, avoids many of these problems. It contrasts two approaches to specifying y=deserialize(serialize(x)):

    Literal Realism

    Let x be the original data structure and y be a clone of x. Then the graphs reachable from x and y must be graph isomorphic with respect to their roots x and y.

    Society of Impressionists

    In the object paradigm, objects have no direct access to one another's state. …

    To build a serialization system for unprivileged use, it is best to build the serialization system without use of special privileges, thereby forcing it to play by the same rules imposed on all other objects. Within these constraints Literal Realism isn't possible.

    Introducing Data-E” explains

    In our approach, the serializer must obtain for each object a portrayal &emdash; a representation of that object in terms of live references to other objects. The serializer's traversal proceeds as it obtains a portrayal for each of these “components”, etc. A serializer can simply ask an object to provide a portrayal of itself -- the object's self portrait -- or, as we will see, it can derive or substitute its own portrayal of the object as an expression of some other policy objective. However it obtains these portrayals, the resulting depiction is a literal picture only of the graph of these portrayals, rather than the actual graph of objects being serialized.

    The Data-E paper makes no claims about how to efficiently compute or process these portrayals.

    Diagnostics

    Related to serialization, parts of a software toolchain need to display runtime values to the user to aid debugging and auditing. For example, Node.js's util.inspect(object, [showHidden, …]) “returns a string representation of object that is intended for debugging”. Interested parties include:

    The util.inspect docs say “[its] output may change at any time and should not be depended upon programmatically” but per Hyrum's law, “… all observable behaviors of your system will be depended on by somebody.” Unfortunately, this means that inspect allows breaking information hiding schemes, which makes it hard to bound the effect of malicious and poorly thought out third-party code.

    It's worth noting that all of these legit cases relate to people or components who have privileged access to a system; those who could attach a debugger. Many familiar with Java's approach to testing code argue that test code is just code that happens to include tests. This is not true. Test code does things, and needs to do things that sane production code would never. Test code should be able to break abstractions to better test resilence in the face of compromises. Treating test code as just regular code leads to weakening abstractions like information hiding relied upon by production code.

    Domain-specific Languages

    Dynamic code loaders provides an easy way to take a program in a Domain-specific language (DSL) and run it inside a program written in a dynamic, general-purpose language (GPL). Just write a function in the GPL that translates a string in the DSL to a string in the GPL and then eval that string. The benefit is that there's no need to break out into a build-system to run a translator.

    Examples include:

    Interested parties include:

    Ad-hoc Queries

    Many large web-facing systems allow ad-hoc queries: some, often privileged users can enter equations into a web-form that are evaluated for each row in a table of data. For example:

    Interested parties include:

    Ad-hoc queries are fundamentally different from the previously discusses use cases. They operate on inputs that are only available at runtime, after the system has started acception connections from untrusted clients, and is based on expressions from those client.

    Evaluating expressions can also be done in library code; expression can mean arithmetic expression, not necessarily expression in a general purpose programming language.

    This document does not attempt to address ad-hoc queries as a source of dynamism, and instead suggests they be handled by special-purpose libraries like one for creating a spreadsheet-like abstractions, connecting it to data sources, and evaluating contained expressions.

    Semantic stability

    A very simple piece of code” introduces a simple program in two languages: Go (a fairly static language) and JavaScript (a fairly dynamic language). A particular set of inputs causes the dynamic language version to change behavior markedly with serious security consequences.

    When common operators allow runtime values to effectively substitute for code in the language, it's hard to reason about what the program never does.

    Various groups have an interest in being able to reason about what a program never does:

    Pros & Cons

    Pro: Dynamic programs can adapt to their environment.

    Con: Predicting what a dynamic program will do is hard without detailed, hard-to-formalize knowledge about the environment in which it will run.

    Pro: Small amounts of dynamic code can do a lot.

    Con: It's hard to reason about what dynamic code will never do. Whether JavaScript ({})[x][y](z) lets an attacker run arbitrary code depends on where x, y, z come from.

    Pro: Reflection, strings substituting for program element names, makes it easy for programs to take advantage of conventions: correspondences between parts of a name in distant program elements.

    Con: Reflection makes it hard to optimize programs.

    function a() { /* long function body */ }
    function b() { /* long function body */ }
    export const c = [a(), b()];
    
    export function (x) {
      return eval(x);
      // Unless tools can prove that x is never "a" or "b",
      // they can't safely inline & eliminate, α-rename, or
      // do many other optimizations on functions a or b.
    }
    // Note: JS engines actually do release a and b for collection
    // by fudging what functions that might call eval close over
    // which complicates the definition of eval.
    

    Pros: Dynamic languages can accept inputs from many sources based on informal conventions like duck-typing.

    Cons: It's very hard to identify the perimiter of a dynamic system, and check that checks for properties like well-formedness, authz happen consistently before values pass from functions that expect to deal with noisy or untrusted inputs to layers of the system that don't.

    The pros and cons are contrasting dynamic languages with dynamic systems. Hopefully there is a useful space in-between: static systems built using dynamic languages, that might let us get more pros with fewer cons. We proceed on the assumption that some of the dichotomies below are false ones:

    Dynamism is not necessarily at odds with analyzability / reviewability / predictablity. Analysis need not be based on source code. Automated analysis can include compiler outputs. Code review can be augmented with compiler outputs.

    Static systems can arise from small amounts of code doing a lot. Boilerplate code can be addressed by macros. Much of Java's type boilerplate would go away if it had type aliasing or type-producing functions. Autowiring and DSLs can also be done via meta-programming.

    The ability for developers to iterate on many designs does not require them to produce dynamic systems. Fast, implicit, incremental recompilation addresses this and has been proven in modern IDEs. Compilation that allows running parts of a program even when other parts are uncompilable is in modern Java IDEs.

    The Effect of Reflection on Analyzability

    Protecting Software through Obfuscation” by Schrittweiser et al defines

    Function recombination

    Another transformation type merges the bodies of two or more (similar) functions. The new method has a mixed parameter list of the merged functions and an extra parameter that selects the function body to be executed.

    TODO: Nagra & Collberg probably has a better quote.

    The idea is that you can make code harder to analyze by combining many functions into one.

    function sum(x, y)          { return x + y; }
    function prodMinus(x, y, z) { return x * y - z; }
    
    // can be recombined to
    
    function obscured(k, x, y, z) {
      switch (k) {
        case 0:                   return x + y;
        case 1:                   return x * y - z;
      }
    }
    
    // then its less clear which functions call which.
    // The call-graph becomes flatter & denser.
    

    The way reflection works is similar. There's something that allows looking up a function body given a key; in the case of reflection, the key incorporates the name of the function. “Collapsing Towers of Interpreters” by Amin & Rompf POPL 2018 ¶ 9.1 shows one way a meta-layer relates names and reflected operators.

    def init_mframe(m: => MEnv) = list_to_value(List(
        binding("eval-begin", evalfun(eval_begin_fun(m))), binding("eval-EM", evalfun(eval_EM_fun(m))),
        binding("eval-clambda", evalfun(eval_clambda_fun(m))), binding("eval-lambda", evalfun(eval_lambda_fun(m))),
        …))
      

    Imagine generating code to allow reflectively calling a method of a class given an instance of the class, a method name, and an array of arguments.

    // A JavaScript class.
    class C {
      x;
      sum(y)          {   return this.x + y; }
      prodMinus(y, z) {   return this.x * y - z; }
    }
    
    // For class C, we generate a function that applies its
    // methods to a bundle of arguments.
    
    C.prototype.reflectivelyApply = (methodName, [y, z]) => {
      switch (methodName) {
        // After inlining.
        case 'sum':       return this.x + y;
        case 'prodMinus': return this.x * y - z;
      }
    };
    

    The structure of this is unintentionally similar to that of function-recombined code.

    There's a lot more to obfuscation than just merging call graph nodes, but both function recombination and runtime reflection complicate analyses that work well when “who might call f” and “f might be called by whom?” yield small sets.

    Judicious use of runtime reflection doesn't make code unanalyzable. For example, when reflection enables remote procedure calling (RPC), this is exactly what we expect. By design, external inputs can cause invocation of any API we expose to the outside world, and analyses should reflect this.

    But some patterns make it hard to bound the effects of reflection to, for example, APIs intentionally exposed via RPC:

    Staging (Building static systems dynamically)

    Many of the problems with dynamism seem to arise because its not obvious to one part of a system or interested party is done with something required by another. Note the before/after relationships in the below.

    Multi-stage programming (MSP) is a variety of metaprogramming in which compilation is divided into a series of intermediate phases, allowing typesafe run-time code generation.

    Multi-stage programming may provide a way to disentangle these knotty ordering relationships by allowing more dynamism to happen before the environment becomes complex and allowing more analysis to happen after parts of the system have settled down.

    The main ideas are:

    Macros are a fine way to do meta-programming, but having a single stage at which macros run is limiting. Some meta-programming needs to happen early to setting up the scaffolding that supports a module's development, and some benefits from running later, after passes like type-inference make more information available, and should not be able to do things that violate assumptions made by those passes.

    Ordering Chaos

    It'd be nice if compiler outputs, with the exception of diagnostic logs, did not depend on the order in which the compiler scheduled modules for stage advancement, especially if, while waiting for a network/file-system fetches to return with a module's source code, the compiler tries to make progress on other modules' whose source has been received.

    In general, more powers are available to early stages than to later stages, but when it comes to sources of indeterminism, it's best to reserve power to the post-compile stage.

    Sources of nondeterminism include:

    Debuggability

    Running user code during compilation means that developers will run into compilation problems like ∞-loops, integer division by zero, etc.

    It'd be nice if it were at least as easy to debug compilation-time bugs as it is to debug runtime bugs.

    Our debuggability strategy will have several forks:

    Stages

    Processing proceeds through the stages defined below, in order. At each stage, user code can run based on expression satisfaction discussed after stages.

    For each stage, we list the following:

    Form

    What is the shape of the module? In almost all cases, the module is a tree of some form.

    Arrives

    Staging doesn't add or remove declarations to avoid masking or unmasking variables. Instead, declarations arrive and may later evaporate. Before a declaration arrives and after it evaporates, any references to it will be unsatisfiable.

    Imagine that there is a builtin eval function. Before it arrives, there is an implicit module level declaration:

    let eval : code → unknown;

    When it arrives, it acquires a value:

    let eval : code → unknown = …;

    Later, it evaporates meaning its value becomes unavailable, and its type becomes one that poisons any type checking of expressions that refer to it.

    let eval : ☠️;
    Evaporates Thaws

    What becomes mutable on stage entry?

    Freezes

    What becomes immutable on stage exit?

    Processing

    What special processing does the compiler do?

    Properties

    What properties are true if compilation got this far without fatal errors?

    Prologue processing

    Parameterized Modules’ defines a prologue mechanism to allow the module linker to decide whether a new module is needed based on parameter sets.

    The prologue does not go through the same stages as the module body because references to stages (code annotations that further restrict when user declarations arrive or evaporate and when references are satisfiable) in it should refer to stages as apparent to the module body.

    Prologue processing is abbreviated:

    1. The module loader finds the end of the prologue by running the Lex stage processor but stopping after the first “;;;” token it encounters.
    2. The module loader then parses and lifts the prologue to an AST without allowing any prologue code to affect these steps.
    3. Then the module loader interprets the AST as if it went directly to the runtime phase without any type-checking or verification phases, but without evaporating anything.
    4. Finally, the module introspects over the module parameters to determine whether a new module instance is needed. If it is, processing of the module body proceeds as below.

    A module body scope includes all declarations defined at the top-level of its body and inherits unmasked declarations from its prologue. This means that the lexical scoping structure is effectively:

    {
      // The implicit environment module parameter is brought into
      // scope.  This means that the loader of the module defines
      // both the prologue and the module's “globals.”
      let {...} = environment;
      let module = {
        /* create unique identity for module instance */;
        publicIdentity: …, privateIdentity: …,
        /* set up some metadata */
        metadata: …,
        /* a tree that represents the module source code in various forms */
        tree: …
      };
      // Implicit prologue level declarations happen here.
      let import =
          /* new import fn that closes over the module's identity */;
    
      // START PROLOGUE
      {
        // Prologue declarations appear here including implicit
        // module parameters, and may mask entries in environment.
    
        /* Prologue code goes here */
    
        // START MODULE BODY
        {
          // Implicit module level declarationa happen here.  E.g.
          let export =
              /* new export fn that closes over the module' identity */;
          // If the module body is needed, it gets a scope which
          // is strictly within that of its prologue.
          // Module body declarations see and may mask prologue
          // and environment declarations.
    
          /* Module body code goes here */
        }
        // END MODULE BODY
      }
      // END PROLOGUE
    }
    

    Lex stage

    The goal of the lex stage is to split source code into tokens.

    Form

    The tree forms all allow a leaf node to be a token that contains { text, sourcePosition, type: "token", tokenType }.

    Processing starts with a tree that has a single node, a token whose tokenType is undifferentiatedSourceCode.

    Processing typically ends as a root with type tokens and whose children are all tokens.

    For example, if it started as

    {
      text: "/*Hello*/ World!",
      type: "token",
      sourcePosition: { left: 0, right: 16, … },
      tokenType: "undifferentiatedSourceCode"
    }

    it might end as

    {
      type: "tokens",
      sourcePosition: { left: 0, right: 16, … },
      children: [
        { type: "token", tokenType: "comment", text: "/*Hello*/", … },
        { type: "token", tokenType: "whitespace", text: " ", … },
        { type: "token", tokenType: "word", text: "World", … },
        { type: "token", tokenType: "punctuation", text: "!", … },
      ]
    }
    Arrives

    The lexer actually arrives at prologue start, so that the prologue can import modules that tweak the lexer by adding new token definitions, or substituting a different lexer.

    Evaporates

    lexer

    Thaws

    module.tree

    Processing

    The lex stage processor walks module.tree and uses the lexer, as augmented by prologue execution, to split any undifferentiatedSourceCode tokens into other tokens and replaces them with a tokens inner node. Lexer application may involve calls out to imported user code or to functions defined in the prologue.

    Parse stage

    Form

    Processing typically starts as a flat tree that contains a list of tokens.

    Processing typically ends with these tokens lifted into a parse tree.

    A parse tree consists of inner nodes that have an operator property and leaf nodes that are tokens per ‘Parsing Program Structure’.

    If we start with tokens:

    {
      type: "tokens",
      sourcePosition: { left: 0, right: 16, … },
      children: [
        { type: "token", tokenType: "comment", text: "/*Hello*/", … },
        { type: "token", tokenType: "whitespace", text: " ", … },
        { type: "token", tokenType: "word", text: "World", … },
        { type: "token", tokenType: "punctuation", text: "!", … },
      ]
    }

    we might end up with a parse tree containing the same tokens in infix order:

    {
      type: "syntax",
      sourcePosition: { left: 0, right: 16, … },
      operator: rootPseudoOperator,
      children: [
        {
          type: "syntax",
          ignoredBefore: [
            { type: "token", tokenType: "comment", text: "/*Hello*/", … },
            { type: "token", tokenType: "whitespace", text: " ", … },
          ],
          sourcePosition: { left: 10, right: 16 },
          operator: { type: POSTFIX, text: "!", … },
          children: [
            {
              type: "syntax",
              sourcePosition: { left: 10, right: 15 },
              operator: notAnOperator,
              children: [
                { type: "token", tokenType: "word", text: "World", … },
              ],
            },
            { type: "token", tokenType: "punctuation", text: "!", … },
          ],
        },
      ],
    }
    Arrives

    The parser actually arrives at prologue start, so that the prologue can import modules that tweak the parser by adding new operator definitions, or substituting a different parser.

    Evaporates

    parser

    Processing

    Processing involves a walk over module.tree. The parser is applied to all tokens nodes with type:"tokens" to lift them into parse tree nodes. Parser application may involve calls out to user code imported by or defined in the prologue if the parser has a custom canNest predicate.

    Grammar stage

    Form

    Processing typically starts as a parse tree and typically ends with that parse tree converted into an untyped AST. An AST has inner nodes that have a rule property whose value is a grammar rules, and leaf nodes that are tokens.

    Arrives

    The grammar actually arrives at prologue start, so that the prologue can import modules that tweak the grammar by adding new rules, or substituting a different grammar.

    Evaporates

    grammar

    Processing

    Processing proceeds by walking module.tree. Grammar application proceeds thus:

    1. identify maximal sub-trees whose roots are parse tree nodes
    2. flatten each sub-tree into a series of pseudo-tokens. Flatten leaves any non-parse-tree nodes in place.
    3. apply grammar rules to each pseudo-token list starting with the ModuleBody rule.
    4. lift the result of grammar application into an AST.
    5. substitute the lifted AST for the sub-tree.

    The Reference combinator for a rule r will pass and consume any element of the form { type: "ast", rule: r, … } allowing gracefully processing trees that contain mixed parsed content and programmatically generated content.

    Import stage

    Form

    module.tree is typically a partially-interpreted AST by this stage, so can be interpreted.

    A partially-interpreted AST is an AST, but instead of leaves being tokens, leaves may instead be value nodes. Values result from evaluating satisfiable expressions. For example, a subtree parsed from “ 1 + 1 ”:

    {
      rule: infixOperation, type: "ast",
      children: [
        { rule: literal, type: "ast", children: [{ type: "token", text: "1" }] },
        { type: "token", text: "+" },
        { rule: literal, type: "ast", children: [{ type: "token", text: "1" }] },
      ],
    }

    would be satisfiable (loosely because it depends on nothing mutable, nothing that has not arrived, nor anything that has evaporated) and might be replaced in the larger nodule.tree with:

    {
      type: "value",
      resultType: i32,
      value: 2,
      …
    }
    Arrives

    By default, calls to import run at this stage but it is available to other stages.

    Processing

    Processing proceeds by applying the stage simplifier to the parse tree to identify each satisfiable subtree of module.tree and replace it with the result of evaluating it.

    Untyped macro stage

    Macros that don't need inferred type information can run at this stage.

    Form

    module.tree is a partially-interpreted AST.

    Processing

    As for the import stage, processing proceeds by finding satisfiable subtrees and replacing them with their result.

    Type stage

    This is a good stage to run custom type inference code.

    Form

    module.tree is a partially-interpreted AST.

    Freezes

    Types declared within this module become immutable at the end of this stage, so that type inferences based on them can be reliable. Prior to the end of this stage, type expressions in module.tree are just expressions. Afterwards, any application of operators like | and & to type values is illegal.

    Type definitions within this module become immutable at the end. The members defined in trait and struct declarations in module.tree cannot change after this stage.

    Function signatures within this module become immutable at the end. The arity and types of formal parameters cannot change after this stage. Type bounds for formal type parameters cannot change after this stage.

    Types on variable declarations cannot change after this stage. (modulo evaporation).

    After this stage ends, a flip is switched on module.tree which prevents tree mutations that do not pass type checking. Broadly, type metadata cannot change, and substitutions are only allowed where the substituted tree is type-compatible with the original.

    Processing

    As for the import stage, processing proceeds by finding satisfiable subtrees and replacing them with their result.

    After processing ends, a type checker pass runs. For each node in module.tree, pre-order:

    1. Let type be the error type.
    2. If node.type is "value"
      1. Set type to node.resultType.
      2. Assert that node.value is compatible with type.
      3. If node.value is a function value
        1. Let origin be node's originating module.
        2. If origin.stage < type stage
          1. Let signatures be node.value.signatures.
          2. Add (signatures, immutableSnapshot(signatures)) to origin.typeCommitments.
        Note: Function values that originate in modules that have not yet been typechecked may have their signatures changed, since the signatures are specified via still-mutable AST nodes. Make sure that any assumptions made here are checked when that finally settles down.
        We could try and require that dependencies reach the end of the user-code portion of the type checking phase together, but storing commitments and checking them eventually gets the reliability we need as long as all modules advance to runtime stage before the compiler produces a production bundle.
    3. Else if node.type is "ast"
      1. Let operandTypes be a new, empty list.
      2. For each child in node.children
        1. Add child.resultType to operandTypes.
      3. If operandTypes contains the error type
        1. If node.operator is StatementBlock
          1. Assert node.children is not empty.
          2. Set type to the last element of operandTypes.
            Node: this means that error types propagate root-ward but do not escape statement blocks, allowing some measure to error containment, and preventing escape from function bodies.
        2. Else set type to the error type.
      4. Else set type to node.operator.computeType(operandTypes)
        Note: the result of explicit type parameter bindings for function application is a type binding set that the implementation of the function application operator can recognize so this should be sufficient for leaf-down type resolution.
    4. Set node.resultType to type.
    5. For each trait or struct type, referencedSubtype, referenced by type
      1. Let origin be referencedSubtype's originating module.
      2. If origin.stage < type stage
        1. Add (referencedSubtype, immutableSnapshot(referencedSubtype)) to origin.typeCommitments.
      Note: See comment above about function values commitments above.
    6. Freeze node.resultType.

    For each function value that originated in this module and that escaped via export, apply the algorithm above to its signatures and bodies. Note: We may need to keep track of a modules-that-values-reached set until the end of type checking, or try something approximate via key-weak sets of function values.

    Finally, check each commitment in the current module's commitments, registering a fatal error if there might be (hand-wavy) violated assumptions.

    Properties

    AST nodes and values that originated in this module are now well-typed (or have the error type) and will remain so from here on out. Any module that has an AST node or value that received the error type will have its production-fatal error flag set.

    Typed macro stage

    A good time to run for macros that need access to type information and can preserve type-safety.

    Form

    module.tree is a partially-interpreted AST.

    Processing

    As for the import stage, processing proceeds by finding satisfiable subtrees and replacing them with their result.

    Export close stage

    By default, calls to export run during this stage.

    Form

    module.tree is a partially-interpreted AST.

    Evaporates

    The module's export function evaporates at the end of this stage.

    Freezes

    The record of exported values, their external names, and metadata becomes immutable at the end of this stage.

    Processing

    As for the import stage, processing proceeds by finding satisfiable subtrees and replacing them with their result.

    Properties

    After this stage, a module's has exported everything it will ever export.

    Inspectable stage

    A module may inspect itself to build out metadata tables now.

    Form

    module.tree is a partially-interpreted AST.

    Thaws

    module.sideTables becomes mutable allowing building of side tables by extracting data from module.tree.

    Freezes

    At the start of this stage, module.tree becomes immutable except via replacement of a satisfied call with its inlined output.

    At the end of this stage, module becomes deeply immutable to user code. Privileged compiler code still perform two mutations:

    Verification stage

    This is a good stage to run local code-quality analyses and unit tests.

    By default, test code co-located with the module runs at this stage.

    Form

    module.tree is a frozen AST.

    Evaporates

    API that allow parsing / lexing strings to tree fragments / tokens.

    Processing

    As for the import stage, processing proceeds by finding satisfiable subtrees but instead of replacing the results of satisfied nodes with their results, the following happens for each satisfied AST node that resolves to result:

    1. If result is void
      1. Remove the subtree
    2. Else
      1. Log an error and replace the subtree with an error node.

    At the end of this stage, any nodes that would be unsatisfiable at runtime are replaced with error nodes.

    Properties

    The module code is fully specified and fixed at the end of this stage, and consists entirely of AST nodes or error nodes.

    Ready stage

    Form

    The module code is a deeply immutable tree consisting of AST and value nodes ready to be consumed by application-language backends.

    Evaporates

    Compilation log APIs.

    ✨Magic happens✨

    The compiler waits for all modules to advance to the ready stage and then feeds them to the various application language backends to produce libraries in each of the application languages.

    Compiled stage

    Modules that import functions from other modules at the compiled stage, get non-tree interpreted versions which should be more performant.

    Form

    An implementation detail.

    Arrives

    The compilation log is unavailable to compiled code, but runtime logging functions are, so some tricks allow logging in compiled functions which show up in the compilation log.

    Runtime stage

    Form

    The program is static generated code or binary instructions in an application language linked into a larger system.

    If the compiler log was free of fatal errors then the module is considered production ready.

    Arrives

    Sources of nondeterminism.

    Expression Satisfaction

    TODO: reduction semantics include a notion of satisfiability. Reduction only possible when conditions are satisfied. Ties into function application. Main points:

    Compiler outputs

    Each application backend receives a bundle of ready modules' trees and metadata and is responsible for producing a library in the output language.

    TODO: Compiler outputs include:

    TODO: Requirements for backends to enable linking of multiple compiled bundles in the same host application.

    Decisions

    TODO

    TODO: proving that what was developed and tested is what runs in production. Hashability of runtime artifacts modulo a small number of escape hatches for things like hostname. Experiment with substring-erasure before hashing?

    TODO: plan to tune for ability to use fuzzers to test properties. Static output may be more amenable to constraint solving fuzzers. Test this via fault injection.

    Thanks for reading!