Fyi, this is a design document for Temper, a programming language for high-assurance libraries that run inside anything.

Modules

Who cares about modules and how their statics and dynamics balance.

    Abstract

    Mechanisms like #include, import, and require let code express that the containing module needs somethng from another module.

    We explore tradeoffs in the mechanisms modules of code use to find what they need to do their jobs, and to provide services to other modules of code.

    Import mechanisms obviously affect the author of code that needs functionality provided by other code, but other parties have interests in how these work. We try to identify these interested parties, which is partly an exercice in stating the obvious.

    Then we try to derive a simple import/export mechanism to balance these needs.

    Existing flavours of import

    // C
    #include <foo>
    // <foo>'s prototypes available to code that follows
    
    
    # Python
    from foo import x as y
    # Delays further processing until foo has been loaded.
    # Then the name `y` refers to foo's `x`.
    import bar as baz
    # Now the name `baz` refers to a value with
    # properties reflecting bar's exports
    
    
    // Java
    import com.example.Foo;
    // Hereafter the `Foo` refers to com.example.Foo.
    // Additionally queues `com/example/Foo.java` for compilation.
    import static com.example.Bar.*;
    // Imports a symbol into scope for each accessible API element.
    // If com.example.Bar.X is accessible, `X` refers to it.
    

    We can see from these that import systems variously:

    In most module systems, each source file is a module. Ocaml allows embedding submodules using syntax like moduleend and submodules may be parameterized via functors. (Ruby's “module” blocks are syntactically similar but Ruby's actually defines a mixin mechanisms, and does not relate to code loading.)

    Not all languages reify (represent as values) modules, but the more dynamic ones tend to:

    Languages that reify modules tend to attach some module metadata, but more static languages like Java have built module metadata systems alongside.

    Stakeholders

    Code Authors

    Authors need to be able to bring into scope everything they need to do their job. If all the tools they need aren't available in a global scope, they need a way to import dependencies.

    Authors need to be able to work around name conflicts, and global symbol masking.

    Many languages use importas… to allow the importer to pick the local name. Others treat importing as an abbreviation mechanism, but allow the user to fallback to the unabbreviated name where the abbreviation is ambiguous.

    import com.example.foo.*;
    import com.example.bar.*;
    // If both com.example.foo and com.example.bar define a type T,
    // then the name `T` can't be used to refer to both.
    import com.example.foo.T;
    // In Java one can explicitly pick one `T` because more specific
    // imports trump less specific.
    
    // And one can fallback to qualified names.
    class X implements T, com.example.bar.T {}
    

    Sometimes a module author knows what they need but is agnostic as to which module provides it.

    let markdownToHtmlConverter;
    // See if the project authors have a preferred module
    // install.
    for (let moduleId of
         // Names of modules that the author knows are markdown to
         // HTML converters that export a convert(str) function.
         ['md2html-foo', 'md2html-bar', 'md2html-baz']) {
      try {
        markdownToHtmlConverter = require(moduleId);
      } catch (ex) {
        continue;
      }
      break;
    }
    if (!markdownToHtmlConverter) { /* We have a problem */ }
    

    In more static languages, like Java, this is often done via dependency injection.

    Code Reviewers

    When a code reviewer sees foo() and “foo” is not defined locally, they often need to figure out which module it comes from so they can decide whether it's something with simple, widely understood semantics, or whether it comes with caveats that they should double check.

    Tool authors

    Tool authors broadly benefit from being able to do as much work as possible without loading other source files.

    IDEs and text editors benefit from being able to find the definition of a symbol so that when the user clicks-through, it can open the right source file to the right line.

    IDEs and text editors also want to implement type-ahead; when a user types “foo”, figure out all symbols in scope that start with “foo” and present a list for the user to pick from.

    Incremental compilers need to keep track of which modules depend on which so that they can recompile the minimal set of modules when a source file changes.

    Debuggers benefit from access to the source file behind a compiled output, symbol metadata, including for modules synthesized from content that was never checked into revision control.

    Linters benefit from knowing which warnings were judged not useful by the code authors&maintainers. Linters work best on code that grew up alongside the linter, so when new lint rules are added, it'd be nice if one could run a program that adds metadata that opts old files out of the new warnings so a blank page benefits from all lessons that a strict set of lint rules contains. Perhaps if each file defines, in a visually unobtrusive manner (e.g. at the end), metadata including a set of lint checks that it was exempt from and the last time this list was auto-updated, then an auto-updater could be applied to a repository to opt out legacy files, so a project wouldn't have to exempt new source files, leaving the team in a place to recognize new rules that have value and decide when its convenient to try to bring old files into conformance.

    Static analyzers benefit from having a definition of whole program so they can make closed-set arguments like “the whole program defines only 2 concrete sub-types of T so where we see a function that takes (x: T) we can run an analysis twice, each time substituting a different concrete subtype for the type of x.”

    Defining whole program requires either, the ability to enumerate, transitively, the dependencies of the main module, or metadata from the compiler with a conservative set of modules that might load at runtime.

    Project teams

    The project team as a whole benefits benefits from code reuse; not having multiple modules that do the same thing, and not having multiple fragments of code that do the same thing. Code reuse helps focus scarce testing and debugging effort.

    Library authors

    Library authors need clients to be able to load their libraries. They want to be able to document memorable usage idioms like “copy & paste this import statement into your code and then you'll be able to call foo().”

    Library authors want to bundle things that are often used together so that it is easy for clients to get started using the library. But they don't want to require clients who use only some of the functionality provided to have to incorporate all the code.

    Library authors want to minimize the number of versions they need to support. The faster they can migrate clients to a new version, the better.

    Library authors may want to export some experimental APIs to projects they collaborate clearly with but not to the world at large since they're not willing to commit to supporting those long term.

    Library authors sometimes want to bundle API elements (types, functions, builders, useful constants) that all assume some constant. For example, bundling map and set types that use a particular notion of equivalence with n-ary union and intersection operators that produce outputs that close over that equivalence function. One way to realize this is to allow parameterizing modules: a module can receive types and functions as inputs.

    Potential library clients

    Library consumers need a standard way to find modules that do what they want, find documentation that explains how to integrate with it, and convince their peers that they've done due diligence in finding a relible dependency among all possible solutions.

    Code Maintainers

    * Maintainers - Auto-update mostly works.

    Blue teamers (security support)

    * Blue team - identify APIs that are prone to misuse and limit the amount of code that accesses them to guide developers to safer APIs. * Blue team - virtualize excessively powerful APIs to better approximate POLA whether used by application code or library code. Allow some code access to the original APIs. * Blue team - intercept third-party module fetches to implement review and approval for new dependencies that need high-levels of privilege, match CVEs with modules used by projects, and make local patches to third-party modules that have zero-days.

    Configuration consumers

    * Application bootstrap code and library authors - get sensitive configuration or capabilities to where its needed without making it widely available. * Author - configure a dependency. For example, tell a module to use this other module that provides a service instead of that. (Use case for parameterized modules)

    Code generator authors

    * Code generator authors - export an API as if the module were compiled from source.

    Devops

    * Devops - interested in reproducible builds, archiving definitions of modules with production bundles.

    Trusted Modules

    * Run early when global environment mutable. * For bootstrap * Virtualization

    How are modules found?

    It'd be nice if import were a general mechanism for getting code what it needs to do its job, instead of just a name resolution mechanism.

    A module reference is a name that contains enough information for a module content fetcher to attempt to fetch the module's source code.

    In addition, a module could provide a service identified by one or more traits. Empty placeholder traits can help.

    In the future, a trait that uses async patterns could resolve to an external service.

    Service discovery

    early running modules can include modules that implicitly register themselves as providers. Is this per-environment?

    What is a module output?

    Unless otherwise specified, each module implicitly defines a namespace type that bundles its exports.

    Who fetches module content?

    Leave it up to the host. Control of command line flag implies control over how modules are fetched. Early running code could augment/reinterpret.

    Provisioning modules with privilege / secrets

    Module identity & selective opacity for reliable channels between modules

    Modified one-version-policy: One major version policy.

    Metadata files (like cargo.toml) specify major version which are available as module metadata.
    Linker which merges separately compiled libraries, does not merge across major versions. Gives preference to later minor/patch numbers.

    Import

    It's a bit easier for trusted code or a loader to virtualize a module's interactions with the outside world if importing and exporting are done via functions instead of special forms.

    This brings risk though: if an attacker controlled string causes an import, they may be able to load malicious code leading to remote-code-execution.

    Each module instance will have “import” and “export” functions which evaporate early in the compilation process, so a system can use powerful dynamic operators early in its lifecycle, which evaporate before the system leaves the crèche and is exposed to untrusted inputs.

    Importing no symbols

    // If modules can have side-effects, this would bring that effect.
    import(x);
    

    This document does not decide whether module initialization can have side-effects besides network/file fetches of module content, but this kind of simple import would at least make the compiler aware of any services provided by the module identified by x which is discussed below.

    Import everything

    An author needs to be able to control the symbols that come into scope, but for well understood, stable modules from a high-reputation source, authors may want to import all the symbols. We define “...” as the surprise me operator; in various contexts it indicates to the reader that dynamic content fills in here.

    let { ... } = import(x);
    

    During an early module stage, let { ... } = e means evaluate e and try to convert its result to a series of (symbol, value) pairs. Then convert to a “let” for each symbol initialized to the corresponding value.

    Import a namespace

    let name = import(x);
    

    The type of name is the export type of the module identified by x.

    Import cherrypicked symbols

    let { a, blocal: bexternal } = import(x);
    

    We need some pattern decomposition syntax. This decomposes a struct value and binds the local name “a” to “.a” from the imported namespace and binds the local name “blocal” to “.bexternal” from the imported namespace.

    Import a service

    Importing a type expression instead of a string module reference looks for a module that provides a service with those traits.

    let provider = import(Trait1 & Trait2);
    

    Parameterizing a module

    let name = import(x, param0, param1);
    

    Extra parameters to “import”, including named ones, are forwarded to the module prologue (defined below). The compiler uses the set of paramters to decide whether to reuse a previously instantiated module, or to instantiate a new one.

    Instantiating a module is the act of enqueing the module's parameters and code for processing by later compiler stages. Hereafter, we use these terms:

    Module
    A separately referenceable unit of code
    Module Instance
    A bundle consisting of a module and a set of values for its parameters that the compiler will convert to code.
    Formal Module Parameters
    The formal parameters for module m are the set of names for which an importer of m may specify values.
    Actual Module Parameters
    The actual parameters for a module instance mi of module m are the set of (name, value) pairs that mi received from its importer including any whose values were initialized to default values specified in m's prologue.

    Virtualizing a module's environment

    let name = import(x, param0, param1, environment=environment);
    

    Each module implicitly takes a parameter named “environment” which contains values that appear global to it.

    In the normal case where the importer does not specify “environment” the importer's environment is used.

    The type of the environment is determined by early running trusted modules.

    Conditional import

    // Failover import
    let name = import(x) || fallbackExpression;
    // Optional import
    let name = Some(import(x)) || None;
    

    Code may recover from failures to import. This may come in handy for patterns like:

    Failure due to transient network failures should abort compilation entirely. This is the responsibility of the host code that fetches module content.

    Key management

    Each module instance's “import” function closes over its public/private identity, which comes into play when negotiating with “export” as covered below.

    Parameterized Modules

    Parameterized modules require a way to declare parameters. Given a set of actual module parameters, the compiler needs to figure out whether it can reuse a module instance with equivalent parameters or whether it needs to create a new module instance.

    We define a module prologue as module content lexically before the first token “;;;”. It will be processed before the module content allowing the compiler to decide whether to reuse an existing module instance. If the token “;;;” does not appear, then there is no prologue, and all tokens are part of the module body.

    It'd be nice if a module author could define the output type for a module. If a module is a service provider, this is the type of the service that the module provides.

    : TraitName;
    
    ;;;
    

    This will typically require importing a trait name, so imports are allowed in a module prologue. When an import happens in a module prologue, the importer is not identified by a module path and its parameters, since the full actual parameter set is not yet known. Instead, the importer receives the importing module path and the token “#prologue”.

    To provide multiple services, intersect some traits.

    : Trait00 & Trait1;
    
    ;;;
    

    It'd be nice if a module can declare parameters. Inside the prologue,

    // A required module parameter.
    let (@ModuleParameter) p : T;
    
    // A module parameter with a default value.
    let (@ModuleParameter) q = 42;
    // The initializer expression is only evaluated
    // if the importer does not specify a value for q.
    // As such, the initializer expression will be evaluated
    // separately for each import that could result in a module
    // instance.
    
    ;;;
    

    It'd be nice if a module could control how the compiler decides whether two sets of values for a module are equivalent for the purposes of deciding whether to reuse a module instance. Module parameters may be coerced from the values received from the importer to the static type of the module parameter. A module parameter's type's equals method can then determine equivalence.

    Any “let”s in the module prologue remain in scope in the module body.

    It's be nice if a module could initialize module metadata, for example to provide a helpful diagnostic name.

    module.metadata.diagnosticName = stringExpression;
    
    // stringExpression could depend on
    // module.metadata.moduleConstructorReference and
    // values of parameters.
    

    For how long module metadata remains mutable by module code is not yet determined.

    Exports

    Exports can't happen until a module's prologue has finished, so, similar to imports, each module will have a “export” function that closes over its public/private identity that becomes callable before control enters the module body. The export function will evaporate before runtime at a time not yet determined.

    It'd be nice if a module could export some symbols to all importers.

    export(x, 'name');
    
    // where name is the externally visible name.
    

    It'd be nice if a module could export some symbols to friends.

    export(x, 'name', to=predicate);
    
    // predicate is a predicate over the importers public key.
    

    The “to” predicate is used to box the exported value. The importer's “import” function automatically unboxes values.

    Finally, it'd be nice if a module could declare and export a symbol in one breath.

    let (@Export(predicate)) x = 42;
    
    // decorates, via the definition of `@Export` to
    let x = export(x, name='x', to=predicate);
    
    // The default for to= just returns true.
    
    Thanks for reading!