Fyi, this is a design document for Temper, a programming language for high-assurance libraries that run inside anything.

Functions are easy, applying them not so much

How can we produce idiomatic APIs for many languages each with their own subtly different notion of function while using a single notion of function application?

    Abstract

    Functions, lambdas, macros, methods, procedures seem to be different names for the same thing: parameterized code. Differences around dispatch, overloading, overriding, and timing though lead to a spicy soup of subtle semantics.

    How can we craft a single notion of function that satisfies a JavaScript code generator's need to produce a function that dynamically inspects its arguments to unpack them, and one that works well with C++ static dispatch.

    Comparative Linguistics

    Consider a simple API that computes modulus. This example is contrived to involve overloading, overriding, and implicit widening. This syntax is made up for explanatory purposes. Imagine i32 is a 32b signed integer, and i64 is twice as wide.

    // Functions to compute integer modulus.
    
    namespace ModdyBits {  // Let's just give the API a name.
    
    // Global functions that take two operands.
    fn mod(x: i32, y: i32): i32 { … }  // Takes 32b ints
    fn mod(x: i64, y: i64): i64 { … }  // Takes 64b ints
    
    // Called with 2 i32s, invokes the first
    assert mod(3I, 2I) === 1I                  // Imagine `===` never does implicit converion
    // Called with at least one i64, invokes the second
    assert mod(3L, 2L) === 1L
    assert mod(3I, 2L) === 1L
    assert mod(3L, 2I) === 1L
    
    
    // Alternatively, we might want to allow calling via dot syntax
    // Imagine, since the zero-th argument is named `this`, it's a method
    // dispatched by dot syntax.
    fn mod(this: i32, y: i32): i32 { … }
    // So with two i32s, invokes the first version
    assert (3I).mod(2I) === 1I
    
    
    // If we define a method on i64s,
    fn mod(this: i64, y: i64): i64 { … }
    // we might expect the second to apply when either this or the
    // actual parameter is an i64
    assert (3I).mod(2L) === 1L
    assert (3L).mod(2I) === 1L
    assert (3L).mod(2L) === 1L
    
    }  // namespace
    

    Next, let's look at how we might shoehorn enough of this interface into each of several application languages to cover the functionality.

    Java

    // In Java, everything must be part of a “class”
    // so using a class or interface as a namespace is idiomatic.
    public interface ModdyBits {
      public static int mod(int x, int y) { … }
      public static long mod(long x, long y) { … }
    }
    
    // int and long box to Integer and Long so Number is a common
    // super-type, but we can't add methods to Number so there's no way,
    // in-general, to do dot syntax for pre-existing types.
    //class Number {
    //  public int mod(int y) { … }
    //  public long mod(long y) { … }
    //}
    
    // Java does primitive-type promotion, so the following works
    public class Main {
      public static void main(String... argv) {
        mod(3I, 2I);  // int  * int  version
        mod(3L, 2I);  // long * long version
        mod(3I, 2L);  // long * long version
        mod(3L, 2L);  // long * long version
      }
    }
    

    So Java is single dispatch, but with implicit type promotion. It supports overriding (modulo type-erased ambiguity), and dot syntax for user defined classes.

    There is an impedence mismatch between traits and Java intefaces though, again related to type erasure ambiguity.

    // Imagine a type that has the same trait with two different
    // parameterizations: Append[String] + Append[Int]
    
    interface Append<T> {
      void append(T x);
    }
    
    class MyType implements Append<String>, Append<Integer> {
      …
    }
    
    // This is disallowed because under the hood, Java generates
    // “bridge” methods.
    class MyType implements Append<String>, Append<Integer> {
    
      //// For Append<String>
      public void append(String x) { this.append((Object) x); }  // a bridge
      // Object is the type bound for <T>   ^^^^^^
      public void append(Object x) {
        /* code that implements append when T is String but with types erased */
      }
    
      //// For Append<Integer>
      public void append(Integer x) { this.append((Object) x); }  // a bridge
      public void append(Object x) {
        /* code that implements append when T is Integer but with types erased */
      }
    
      // Allowing two different parameterizations of an interface causes an
      // ambiguity in the type-erased methods that the JVM requires that
      // prevents the auto-generated bridge methods from properly routing
      // method calls.
      // TODO: link to JLS
    }
    
    // Alternatively, we might provide a view of the value for each
    // ambiguous trait.
    class MyType {
      public Append<Integer> asAppendInt() { … }
      public Append<String> asAppendString() { … }
    
      /* which use light-weight wrappers that name-mangle
       * to non-public APIs.
       * Issues:
       * 1. should an instance of MyType be hashCode/equals equivalent
       *    to these wrappers?
       * 2. no way for MyType to be == equivalent to these wrappers.
       * 3. can Java code get the underlying MyType from the wrapper?
       *    that might encourage conflating MyType with the wrapper
       *    instead of using them as short-lived special purpose values.
       */
    }
    

    JavaScript

    // For JavaScript, it's possible to define namespace objects.
    const ModdyBits = Object.freeze({ mod });
    // But where a namespace can correspond to an ES6 module, it's unnecessary.
    export mod;
    // because importers can import as a namespace
    import * as ModdyBits from '…';
    // or not
    import { mod } from '…';
    
    // There is no overloading in Java, so it's idiomatic to use
    // runtime code to inspect arguments.
    // We can generate a “cover” function that picks an
    // appropriate overloading and dispatches to its implementation.
    function mod(x, y) {
      if (isI32(x) && isI32(y)) {
        return mod$i32$i32(x, y);
      }
      // Generated code that tries to coerce, but fails if given
      // inconvertible arguments.
      let x$i64 = isI64(x) ? x : isI32(x) ? i32ToI64(x) : illegalArg(x);
      let y$i64 = isI64(y) ? y : isI32(y) ? i32ToI64(y) : illegalArg(y);
      return mod$i64$i64(x$i64, y$i64);
    }
    // Implementations that internal clients of mod might link
    // directly to.
    function mod$i32$i32(x, y) { … }
    function mod$i64$i64(x, y) { … }
    
    
    // For dot syntax, it's possible to similarly define a cover method.
    // For i32 and i64, there is a common super-type: the builtin Number.
    // (though it's not an ideal match for i64).  TODO: link bigint proposal.
    (() => {
      const method = function (y) {
        /* uses this and y to dispatch to variants */
       };
    
      // and it can be attached to existing types.
      Object.defineProperty(
        Number.prototype,
        'mod',
        { value: method });
    
      // Mutating built-in prototypes is considered bad practice.
    })();
    
    
    // There is no impedence mismatch (TODO double-check TypeScript) with
    // multiply parameterized traits.
    

    Go

    // The module system is sufficient for namespaces and is the only way.
    package moddy_bits   // TODO: check naming conventions, quoting
    
    import (
      "fmt"
    )
    
    /*
    func Mod(x int32, y int32) int32 {
      return x % y
    }
    func Mod(x int64, y int64) int64 {
      return x % y
    }
    // Doesn't work.  Go does not have overloading.
    // Declarations and scope says:
    // “No identifier may be declared twice in the same block”
    // Function declarations says:
    // “A function declaration binds an identifier, the function name, to a function.”
    */
    func Mod32(x int32, y int32) int32 {
      return x % y
    }
    func Mod64(x int64, y int64) int64 {
      return x % y
    }
    // Instead it's idiomatic to just have multiple functions
    // whose name embeds the type.
    // See Set* in the Rat (Rational number) type in
    // the bignum library.
    // The language designers promote alternatives to
    // embedding parameter types in function names too often.
    
    type Modable interface {
      // No member overloading either.
      // "each method must have a unique non-blank name"
      Mod32(y int32) int32
      Mod64(y int64) int64
    }
    /*
    func (x int32) Mod32(y int32) int32 { return Mod32(x, y) }
    func (x int32) Mod64(y int64) int64 { return Mod64(x, y) }
    func (x int64) Mod32(y int32) int32 { return Mod32(x, y) }
    func (x int64) Mod64(y int32) int32 { return Mod64(x, y) }
    */
    // None of this works because it's not possible to extend types
    // defined elsewhere like int32 or int32* with new behaviors.
    
    
    const L3 int64 = 3
    const L2 int64 = 2
    const I3 int32 = 3
    const I2 int32 = 2
    
    
    func main() {
      fmt.Printf("fn  %d %d %d %d",
          Mod32(I3, I2),
          Mod64(L3, L2),
          Mod64(int64(I3), L2),  // need to be explicit about conversion
          Mod64(L3, int64(I2)));
    }
    

    C++

    C++ is notable in that it has no common super-type for reference types. Other than that, it's pretty nice to generate code for.

    // In .h
    #include <cstdint>
    
    // In C++ it's best to declare a namespace even if you control
    // the header file name.
    namespace moddybits {
      // Overloading works
      int32_t mod(int32_t x, int32_t y);
      int64_t mod(int64_t x, int64_t y);
    
      // Zero-abstraction cost wrapper types with implicit constructors
      // can enable opt-in dot syntax for primitive types.
      class Modable32 {
        private: int32_t i;
        public:
        inline Modable32() : i(0) {}
        /* noexplicit */ inline Modable32(const int32_t x) : i(x) {}
        /* noexplicit */ inline Modable32(const Modable32 &x) : i(x.i) {}
    
        inline moddybits::Modable32 operator=(const int32_t& x) {
          this->i = x;
          return *this;
        }
    
        int32_t mod(int32_t y) const;
      };
    
    }  // namespace moddybits
    
    
    // In .cpp #include <stdio.h> namespace moddybits { int32_t mod(int32_t x, int32_t y) { return x % y; } int64_t mod(int64_t x, int64_t y) { return x % y; } int32_t Modable32::mod(int32_t y) const { return mod(this->i, y); } } int main() { int32_t i3 = 3; int32_t i2 = 2; int64_t l3 = 3; int64_t l2 = 2; printf("fn %d %ld %ld %ld\n", moddybits::mod(i3, i2), moddybits::mod(l3, l2), // Unlike Java, primitive promotion is not silent in presence of overloading. moddybits::mod((long) i3, l2), moddybits::mod(l3, (long) i2)); moddybits::Modable32 m3 = i3; printf("dot %d\n", m3.mod(i2)); return 0; }

    If traits were to correspond to type templates, C++ would have no problem allowing type definitions to have multiple, differently parameterized, realizations of the same trait.

    #include <stdio.h>
    #include <string>
    
    template <class T>
    class Append {
    public:
      virtual void append(const T& x) = 0;
    };
    
    class C: public Append<int>, public Append<std::string> {
    public:
      virtual void append(const int& x) {
        printf("appended int\n");
      }
    
      virtual void append(const std::string& x) {
        printf("appended string\n");
      }
    };
    
    
    #include "bar.h" int main() { C c; c.append(42); c.append("42"); return 0; }

    Summary

    Language Namespaces Overrides Overloads Open builtin types Trait Impedence Dynamic property syntax
    Java Required Yes. Modulo erasure Single static No No No
    JavaScript Possible, but modules preferable No Dynamic within a single function Yes Sort of Yes
    Go No No Single static No Sort of No
    C++ Possible. Good for statics. Yes Yes! No Yes Yes

    Conclusions

    It is not worth trying to create concepts in Temper like namespaces that directly mirror concepts in the output language. Even among a small subset of the languages we want to output there's nothing remotely resembling a lowest-common-denominator set of comparable interface elements.

    Simply specifying a function of a single argument that returns a single value requires some knowledge of the structure of the language. Dot access works best for some, static methods others, namespaces are required for some and discouraged for others.

    Instead, Temper will focus on making it easy to implement functionality using internal methods and explain to each backend how to export each required piece of functionality.

    API element
    A function, method, type, property, constant value or grouping of the same like an enum declaration.
    Exportable
    An API element that is simple enough that many backends can handle.
    Exported
    An exportable API element that is meant to be part of the public interface of a library generated by a backend.
    Feature Set
    A set of exported API elements with the same feature label. They should be so marked because they provide equivalent functionality. For example, in the strawman code the function fn mod(x: i32, y: i32): i32 and the method fn mod(this: i32, y: i32): i32 provide equivalent functionality. Backends may make different choices as to whether to export one, the other, or both, as long as they export at least one.
    Feature Label
    A label that defines a feature set. Any exportable API element annotated with a particular label is in the corresponding feature set. The label is diagnostic only, is not available to backends, so cannot be used to derive identifiers that appear in backend generated code.
    TODO: no standard API shape. Need to build in a lot of flexibility. TODO: define "feature labels". Annotate externalizable APIs with the feature labels it satisfies. For example: "can-mod-two-int32s", "can-mod-two-int64s" Backends responsibility to issue a fatal error, or externalize at least one API element from each distinct feature set. TODO: Externalizable functions may have restrictions on signature not present on other fns. TODO: This defines a perimiter where we can be more careful with inputs. For example, check that byte strings that represent text are well-formed UTF-8.

    Use case: Builders based on satisfiability

    Can we eliminate run-time parsers if we know DSL at compile-time? Goal: move dynamism to compile-time. TODO

    Concepts

    Thanks for reading!