Fyi, this is a design document for Temper, a programming language for high-assurance libraries that run inside anything.
Proposal: A multi-backend programming language to scale security engineering efforts by allowing message integrity preserving code to be shared by many application toolchains.
Security Engineering involves improving the toolchain to make it easier to produce software that is not prone to unforeseen behavior when exposed to crafted inputs. The toolchain includes libraries, developer tools, programming languages, code review and analysis tools.
Large internet companies' tightly integrated, homegrown toolchains used to offer a productivity advantage over open-source toolchains by making it easy to deploy programs on multiple machines in a datacenter. These same companies' public cloud services now solve that problem for open-source toolchains.
Continuing to use these homegrown stacks requires retraining new hires who already know open-source stacks, and converting acquired companies' products from open-source stacks that already integrate well with their public cloud offerings. There are gaps, but efforts like cd.foundation that route software artifacts & telemetry will eventually let large orgs recreate their processes on open-source toolchains.
This makes homegrown stacks non-viable in the long term. Unfortunately most of the investment in security engineering is in these toolchains.
There is an organizational O(n×m) problem. Security engineering requires producing small amounts of thoroughly tested & reviewed code that functions correctly on inputs crafted by an attacker. Producing solutions for n problems on m toolchains requires O(n×m) people trained in security engineering: for each (application language, problem) pair, we need someone intimately familiar with the quirks of that language, with the problem domain, who is willing to maintain tricky code on short notice over the long term.
This does not scale when n is the number of open-source stacks, leading to inconsistent coverage & quality despite heavy duplication of effort. As large organizations move from homegrown stacks with high security engineering investments to open-source stacks without, and unless something changes, the next generation of software will be less secure than the present.
Herein we propose a single common language that compiles to many application languages to turn this into an O(n+m) problem by enabling a division of responsibilities:
Most languages compile to binary, bytecode for a small number of VMs, or to source code for a small number of other languages. Unlike Java's “write once, run anywhere” provided via a heavyweight virtual machine this language will be “write once, run in anything” by supporting a large number of compiler backends that produce libraries that a host application can link to from within the host application's process:
|Backend Output||Supported Environments||Notes|
|C++ source||C++, Wasm, PHP?|
|Java source & bytecode||JVM Languages: Java, Scala, Kotlin, JRuby, etc.||Need to adjust bytecode output for debug symbols|
|C#, .Net Assembly||.Net languages: C#, VB.net, F#, etc.|
|TBD, probably Swift||iOS: Objective-C, Swift, etc.|
|Python source & bytecode||Python||See Java re debug symbols|
This requirement leads to a number of lowest-common denominator problems:
Requiring GC would make lightweight integration with C++ and Rust difficult. The language will constrain declarations of product types (e.g. struct, class) to prevent reference cycles allowing exactly once deallocation via simple mechanisms like shared_ptr in the C++ backend.
A cycle cannot be monotonically decreasing per transitivity of partial orders, so an object graph cycle must consist of either edges where the types are all equivalent, or there must be one edge such that target type ≮ source type. The first case should not occur since the target must be older than the source, and write-once, early-initialized values cannot point to younger records. The second case should not occur as long as the type system is sound.
(It should be possible to evolve the language towards a Rust-style borrow checker if this proves problematic, possibly running a Rust backend ahead of time to piggyback on its correctness checks.)
The common language will focus on integrity w.r.t. concurrency over providing a wide array of concurrency primitives.
In OOP languages, most changes to an object happens early in its life. The common language will explicitly support this lifecycle to make it easy to share immutable values. When allocating a new value of type T one gets a Tyoung which is a subtype of T. A builtin operator will make it easy to derive a deeply immutable Tmature from a Tyoung. Syntactically, it will be easier to mention Tmature than T or Tyoung.
At a minimum, producer-consumer interactions can be enabled via a shared buffer that can contain mature values. The shared buffer can be split into a writable and readable ends. A writer can commit to never modifying a prefix of the written commit, which makes it available for read, and the read side can commit to never needing a prefix of the read content which can free that for further writing. A builtin (α reader → unit) × (α writer → unit) → unit function could delegate control as space/content becomes available allowing a backend for an environment that provides true parallelism to provide better service where the functions do not close over mutable state.
Efficient string processing often requires segmenting strings.
Doing this with random-access operators (
the processing code dependent on the native string representation
which differs per environment:
|bytes||C++ (usually), Go, Rust, Ocaml, Python|
|various depending on gcc wchar flags||C++, CPython2|
Further, environments that use byte strings differ in whether there is a type that is reliably well-formed UTF-8, and if so, whether it is restricted to scalar values or codepoints.
The common language will discourage random-access of strings, instead providing input and output buffer types to encourage left-to-right processing of buffers by chunks with minimal use of lookbehind.
Buffers that contain text content will be parameterized with the code-unit type, so that backends can take advantage of efficiencies when the input code-unit type is easily chunked with the native code-unit type.
Failing to undo speculative changes when abandoning one strategy and starting another is a recurring source of errors and complicates reviews.
The common language will encourage a style of problem solving that focuses on finding a valid interpretation of a message on an input buffer and composing an output on an output buffer.
Taking a cue from the Icon family of string processing languages, especially Converge:
expressions either succeed - in which case they also return a value - or they fail - in which case no value is produced
We will attempt to craft semantics such that, by default, statements and expressions fail pure; by default, there are no visible side effects from failing branches, e.g. in lieu of STM, by storing the length of append-only output buffers on entry and truncating to that length on failure. This will require extra restriction on mutable closed-over state and may complicate builtin datatypes; e.g. efficiently checkpointing a hash table. This shouldn't complicate specification; CPS that takes one continuation for success and another for failure should suffice.
Authors may explicitly mark side-effects that need to outlive failing paths, e.g. memo-tables for packrat parsers, logging.
Host languages differ widely re type systems, so backends will need to use a variety of strategies to deal with values as they cross between the host environment and code written in the common language. For example:
In all these cases, backends benefit from having more information about the ways in which the common language author expects their exported APIs to be called.
The common language will have a static, sound (see mem safety), nominal, probably traits based type system. Type inference may be fine for internal declarations, but signatures for exported APIs will allow, at a minimum: semantically significant formal parameter names (a la Python), parameter optionality, parameter defaults, extensible configuration bundles, variadic parameters.
Go's and TypeScript's structural type systems allow declaring interface type that must be a subtype of multiple other interfaces so there is no need for explicit intersection types.
The trend across many languages is towards using large numbers of modules authored by unknown third-parties.
Supply-chain security is important but out of band for this proposal. That said, backends should resist tampering where possible, e.g. via selectively opaque values.
It would be nice to check, when linking multiple, separately compiled common language outputs that they are compatible. (“Linking” here is used loosely, and involves any attempt to reconcile differences between compiler outputs that may overlap in the API elements they provide).
Module systems for the various host environments differ in most ways they reasonably could, but noone benefits from having multiple versions of the same library.
Ideally, the common language compiler would be integrated into the application's build process to produce a single compiled bundle with just what the application needs. This would make it hard to speed adoption by publishing to Maven, npm, RubyGems and probably other widely used package managers. So we need some way to publish separately compiled bundles that reliably link together.
It's not practical to prevent having two or more versions, because at some point packages diverge; (perl5, perl6) are not two versions of the same thing. “Semantic Versioning” notes:
Given a version number MAJOR.MINOR.PATCH, increment the:
- MAJOR version when you make incompatible API changes,
Backends will receive the major versions of source files which they must incorporate into any metadata used to link multiple, separately compiled bundles. Bundles with different embedded (name, major version) must be considered distinct. Within a major version, linkers must require that APIs be the same, for example, by comparing hashes of intermediate forms.
Advice to application developers will include using half-open version ranges [1.3.0,) so that the dependency resolver can settle on a single stable version across all dependers of a major version of a particular library. Version stamping for reproducibility and archival is properly the domain of auto-generated lockfiles.
Host environments use a variety of idioms for communicating failure:
|Error codes||Exceptions||Options||Promises||Result pair||null result|
|C++||✓||✓ % caveats||✓||✓||✓|
|Swift||✓||✓||✓ × 4|
The common language will not use exceptions internally; it is not an exceptional condition when an input crafted by an attacker fails to conform to expectations.
As discussed above, operators either succeed in producing a result, or fail to produce a result, meaning, unless a function's signature specifies otherwise, it can fail to produce a result.
The common language will use Wuffs-style facts about previously checked boundary conditions to eliminate failure branches around input buffer accesses and this could extend to propagating type guards.
Backends must pick an idiomatic way to communicate such failure to produce a value as control returns to the host environment. Backends must also ensure that stack overflows and failures to allocate memory do not take the failure path; attackers have successfully exploited boundary conditions to cause failover from a special case handling branch to a laxer, general case branch via unwise exception catchalls.
Application developers are the primary consumers of libraries implemented int he common language.
They understand the good parts of the application languages they use, but often not the quirks that attackers can abuse. They may receive periodic compsec training but are rarely up-to-date on compsec literature.
They would benefit from having high-quality libraries and tools that provide strong security guarantees when presented with crafted inputs.
They have a low tolerance for code review and tools that “get in their way.”
Blue teamers advise application development teams on how to produce secure systems and/or audit the product for vulnerabilities.
They understand the application's threat environment, risk tolerance, and security needs.
They may not be programmers themselves so may rely on code scanners, and pentester tools to identify problems.
They are often heavily outnumbered by application developers, so where they audit code, they have an interest in measures that bound the portion of a codebase that might contribute to a vulnerability.
They have an interest in application development teams using the kinds of hardened toolchains that security engineering enables.
Security engineers are the primary authors of code in the common language.
They are domain experts in a class of vulnerabilities, familiar with secure coding techniques, formalizing security requirements, and in negative testing.
They have a high tolerance for code quality tools that block compilation and for redrafting code to be uncontroversial in code review.
Security researchers focus on finding flaws in security-critical code, sometimes for a bounty.
When the find a new, exploitable quirk in internet infrastructure, it'd be nice if they didn't have to manage reporting embargoes with a maintainer per language, and scour many toolchains' divergent libraries to cobble together a list of recommend mitigations.
External services respond to network messages. Even where the message includes end-user credentials, and narrowly scoped security metadata, the service maintainers have an interest in how the message was constructed.
Service providers might benefit from signals like “this input was constructed using message integrity best practices” where such practices include (per “Securing the Tangled Web”):
response.write(…)only accept inputs marked safe for the output type.
This project would be successful if it allowed a small community of security engineers to focus their efforts to produce high-quality, usable, analyzable outputs to many frameworks.
The project succeeds when security engineers can combine their work on one codebase, instead of fragmenting efforts across multiple toolchain-specific codebases.
The project would have failed if it security engineers cannot ramp down current maintenance obligations in favor of common language projects, or are unable to support new problem domains.
The project succeeds when security researchers have a few tools to try to break and that the reported bug rate slows down to a crawl after some period of scrutiny.
The project would have failed if relevant organizations (e.g. OWASP, CII) or partner companies are unwilling to offer bounties covering documented security properties of outputs or security researchers can trivially get payoffs within a year of offering bounties.
The project succeeds when automated tools like fuzzers can find flaws. We might assume that a codebase is solid if fuzzers find intentionally injected flaws, but find nothing when no flaws when injected flaws are removed.
The project would have failed if most outputs are not transparent to fuzzers by the fault injection criterion.
The project succeeds when the interfaces to outputs feel idiomatic to users of that language:
The project would have failed if outputs do not replace substandard offerings in new code some time after release.
The project succeeds when the effort required to write and maintain the nth backend is small compared to the project as a whole.
The project would have failed if any of the backends above prove unimplementable or unmaintainable.
Some examples of broad classes of message integrity tools.
(trustworthy α × untrusted β) template → trustworthy α
Template languages mix untrusted inputs among strings from trusted authors. “Using type inference to make web templates robust against XSS” explains how template languages can be made aware of the structure of their output language, and preserve the authors' intent even given crafted inputs.
Such safe by construction techniques can be producers of trusted strings of a particular language, and can be consumers of the same for example, by declining to re-escape a string of trusted HTML where an HTML document fragment is allowed.
α → trustworthy α
Sanitizers take an untrusted string in a language and remove or defang high-privilege instructions to produce an output in the same language that is safe in many contexts.
Sanitizers can produce trusted strings in their output language, and could decline to remove instructions from known trusted strings.
trustedness α → trustedness β
Translators are like sanitizers, but the output language need not be the same as the input language. Poorly written markdown to HTML converters are a consistent source of security vulnerabilities.
Similarly, translators between (surprisingly complex) formats like CSV and better standardized formats like JSON.
α → uncontroversial α
A semantics-preserving transform from a language to a subset of that language that is reliably parsed to reduce the attack surface downstream.
For example, JSON-like input to JSON that does not have orphaned UTF-16 surrogates, numbers that lose precision, syntactic quirks like trailing commas.
structured-but-untyped → domain-value
Deserialization schemes based on reflection are a consistent source of vulnerabilities.
A schema describes the expected structure of an input, like a type signature for a web service.
Schema checkers can check that an untyped input conforms to expectations before trying to use it to create domain objects which might include optional fields that attackers should not be able to specify.
Additionally, schema checkers might grant privileges to signed content that should be exempt from such requirements.
domain-value → structured-but-untyped
Serialization schemes based on type introspection are a source of
For example, a developer adds a
private String password;
to a class, but a JSON library breaks Java visibility to helpfully
Given a schema that describes the expected structure of an output, serialize domain objects to an untyped language like JSON without unintended content, and possibly encrypting PII.
csv-ish list × (freename → binding) → csv-ish list
Ad-hoc reporting involves generating reports based on user-supplied queries and/or arithmetic expressions. Applications often use eval or other string→code operators when trying to evaluate a user-supplied mathematical expression.
Developers understand spreadsheets like Excel and Google Sheets, but shipping sets of equations containing sensitive data to these as services raises a host of problems: confidentiality, macro-injection, latency, worst-case assumptions about data needed.
Provide a transform from a set of values in the CSV-value-domain and a resolver for free variables which may load further tables, to values in the CSV-value-domain but with expressions computed, that handles reference cycles, and allows quotas for expression evaluation.
Unrelated to security, ICU & PluralForm embed knowledge about how to generate high-quality human language outputs that take into account complex rules around, e.g. plurals in Arabic. Many toolchains would benefit from better support for human languages, providing incentive to integrate. There is also crossover with security relevant tools like templates for content composition.