Fyi, this is a design document for Temper, a programming language for high-assurance libraries that run inside anything.

What's in a function?

Crafting a syntax for functions based on least surprise, use cases, and parties' interests.

    Abstract

    Many languages seem to accumulate syntaxes for functions as they age. This document tries to identify a way for the temper language to reserve some grammatical surface area for functions.

    Comparative Linguistics

    Looking at the breadth of syntax across a selection of languages may help clarify

    The table below results from an informal survey of a number of languages grammars for specifying functions/methods/procedures. The languages were chosen because they have a large enough community or long enough history to have gone through multiple iterations, they have a C-like grammar (yes, even Python per GVR).

    The y-axis tries to group the syntaxes based on the effect on scopes, the expressiveness (whether recursion is easy), how syntactic-sugary they are. It also explicitly includes a section on method references which captures how a language that may not have made it easy to treat functions as first-class (cough Java) has evolved to enable that.

    In the cells, the yellow-backgrounded portion is the function declaration and the non-backgrounded portion distinguishes that case from others where the same syntactic construct has different meanings depending on context.

    Go Java JS / TS Kotlin Python Rust Scala
    As value func (x T) { return x } function (x) { return x } fun (x: T) { return x }
    As value may recurse (function f(x) { return +x && f(x/2); })
    Declaration func name(x T) T { return x } Function<T, T> f = (x) -> x; function f(x: T): T { return x; } fun name(x: T) { return x; } def name(x): return x fn name(x:T) -> T { x } def name(x: T): T = x
    Method func (me *T) name() int { return me.x } class C { T name() { return this.x; } } class C { f() { return this.x } } class C { fun name() { return this.x; } } class C: def name(self): return self.x impl C { fn name(&self) -> T { self.x } } class C { def name(): T { return this.x } }
    Typeless Method ({ f() { return this.x; } })
    Succinct expression (T x) -> x (x) => x { x -> x } lambda x: x |x| x (x: T) => x
    Succinct statement (x) -> { return x; } (x) => { return x } |x| { x } { (x: T) => return x }
    Implied formal names (function () { return arguments[0] + 1 }) { it + 1 } (_ + _)
    Static Method reference T::f T.f T::f T.f T.f T.f
    Bound method o.f o::f o.f.bind(o) o::f o.f o.f

    From this table, we conclude:

    † Rust's closure syntax is not just syntactic sugar. Aspects of Rust's borrow checking benefit from varieties of closures that guarantee that the closure may be called at most once and/or that any call happens before the containing function exits.

    Interested Parties

    It's worth keeping in mind who has an interest in a language design decision. We spell this out, even where it's obvious.

    New learners

    Someone who is new to the language has an interest in their intuitions from other languages translating over.

    Code authors

    Code authors have an interest in syntax that is easy to enter. “λ” may be harder to enter on some keyboards, input devices than “lambda”.

    Code authors have an interest in syntax that lets them craft uncontroversial patches. Obscure, verbose syntax may require a reader to do more work to reach the same level of confidence in a patch's suitability.

    Code readers

    Code readers, including code maintainers, need to be able to quickly understand what code is intended to do, and what it actually does.

    Syntax that has higher cognitive load in common contexts may cause problems. For example, readers might have to do extra work to figure out where functions start and end where commas can appear both in a formal parameter list and surrounding the function.

    # Python
    f(lambda a, b: a + b, lambda a, b: a * b)
    
    // Imaginary arrow syntax
    f(a, b => a + b, a, b => a * b)
    

    Tool authors

    Authors of simple tools may want to accurately find all function bodies, while ignoring details not relevant to their tool like whether the function definition declares a symbol in its containing scope or not.

    So tool authors have an interest in fewer syntactic variants, and in the simplicity of contextual rules that distinguish use cases like those in the y-axis of the table above.

    Tool authors also have an interest in measures that cause developers to converge on a single idiom for common tasks. JavaScript tools that have to recognize both of the below as equivalent for defining a function and binding it to a name, suffer additional complexity:

    let f = () => 42
    function f() { return 42; }
    

    Non-native and non-English speakers

    Using multiple, different, English-meaning-laden keywords increases the load on non-native English speakers and English learners.

    The difference between “function” and “functoin” may not be as apparent to someone not long schooled in English's complex spelling rules.

    Dimensions

    Declaration vs Expression Syntax

    Many languages use similar syntax for function declarations, which give the function a name in the enclosing scope, and function espressions which do not.

    // JS: Name f declared in enclosing scope
     function f(x) { return x; }
    
    // JS: Do not declare a name.
    (function f(x) { return x; });
    (function  (x) { return x; });
    
    // JS: SyntaxError
     function  (x) { return x; } ;
    
    
    // Go: Name f declared in enclosing scope
    func f(x int) int { return x }
    
    // Go: Does not declare a name.
    func  (x int) int { return x }(1)
    

    It would be nice if a declaration was obviously a declaration regardless of whether it is a variable or function being declared.

    It would be nice if a function definition is obviously a function definition regardless of whether that function gets a name in the enclosing scope.

    If certain keywords, for example “let”, always start a declaration, and a different keyword, for example “fn” always precedes a function used as an expression, then we can encourage readable code patterns.

    // Let indicates a declaration.  The name follows the let.
    // The presence f a block makes it a declaration.
    let f(x) { return x; }
    
    // The keyword fn declares a value consistently.
    // This is an immediately called function.
    fn  (x) { return x; }(1);
    
    // If there's a name after `fn`, that name is available
    // for recursive use in the function body.
    // The name may also be useful for self-documentation / debugging.
    fn  f(x) { return x > 0 ? f(x-1) : 0; }(4);
    
    // SyntaxError: declarations require names.
    let  (x) { return x; }
    

    Returns Value

    Some functions are used for their side-effects (sometimes called “procedures”), some for a result, and some for a little of each.

    It'd be nice to have a short, succinct syntax for the result-only case, since return is longer than many simple expressions.

    // JS: Verbose
    (function (x, y) { return x/y + y/x; })
    
    // JS: Succinct
    (x, y) => x/y + y/x
    

    Sometimes an expression starts out small, and then grows a bit larger. It'd be nice to not have to switch to a different syntax for an expression that grows a bit.

    // JS: Verbose Before
    (function (x, y) { return x/y + y/x; })
    //             After
    (function (x, y) {
      let r = x/y;
      return r + 1/r;
    })
    
    // JS: Succinct Before
    (x, y) => x/y + y/x
    //              After
    (x, y) => {
      let r = x/y;
      return r + 1/r;
    //⬑Why do we need `return`?
    }
    

    Rust assumes that the last statement in a function body is returned if the function type shows it has a return value.

    // Rust: returns value
    fn f() -> i32 {
      42   // ⬑return type
    }
    
    // Rust
    fn g() {  // No type.  Implicitly -> ()
      // If the return type is unit, does not implicitly return.
      h(42)
    }
    

    We can adopt this convention to avoid dedicating syntax to succinct, for-value functions.

    // Explicit return type that is not void/unit
    fn () : int { 42 }
    
    // Return type inferred from expression.
    fn () { 42 }
    
    // Not returned.
    fn () : void { f() }
    

    Visual Containment

    As mentioned, function boundaries can be hard to identify if a function can lexically end in an arbitrary expression while nesting inside an expression.

    Ideally, a reader should be able to identify the end of a function using simple rules, and without knowing operator precedence tables, regardless of the expression in which it embeds.

    In non-succinct syntaxes, functions typically end in a close bracket (‘}’) or in the case of Python at the start of a line with lower indentation.

    // Keyword reliably starts function
    // ↓
       fn () { 42 }
    //            ↑
    //            Close bracket that matches open ends function
    

    This assumes good tools for pairing brackets in IDEs, syntax highlighters, and code review tools. We should invest in that.

    Decorations

    Function declarations often carry other information:

    // Java allows an arbitrarily long soup of declaration stuff
    // before it becomes apparent what's being declared.
    public @Memoize final @NoIgnoreResult <T extends Object> String f(T x)[] {
      //                                  ↑Oh, cool!  It's a method declaration.
      return new String[] { String.valueOf(x); }
    }
    

    This syntax is brittle. It's the presence of a type parameter list (arrow above) or return type that lets the parser know that it's parsing a method declaration.

    Ideally, adding new keywords or other decorations would not make function syntax ambiguous with other syntaxes. Moving decorations after the keyword that stars a function declaration should solve this.

    
    let mut f() { … }
    //  ⬑keyword indicates declaration may be reassigned.
    //  This borrows Rust keywords for explanatory purposes.
    //  This document does not commit to declaration keywords.
    
    let @(memoize) g() { … }
    //  ⬑group annotations.
    //  This document does not define an annotation mechanism.
    //  It merely reserves syntactic space.
    
    let [T] h(T x) { … }
    //  ⬑formal type parameters.
    

    Boilerplate

    Some bits of function declarations are unnecessary. As discussed we can avoid the return based on syntactic cues.

    The only bits needed for a function to function is a body. The leading keyword is needed for visual containment. It'd be nice if all other parts were optional.

    // Takes zero parameters
    fn () { 42 }
    
    // Takes zero parameters
    fn { 42 }
    

    Eliminating the () for a zero-parameter function is powerful since zero-parameter functions are an especially common case, since zero parameter functions can encapsulate delayed actions and delayed computations.

    Both fn{42} and ()=>42 have length 6 so the syntactic overhead seems roughly comparable, at least in the zero-argument case.

    Recursion

    Recursion can be overdone, but most of the succinct function syntaxes seem to unnecessarily restrict recursion.

    It'd be nice if authors did not need to go through extra hoops to define a recursive or self-referential function.

    If function syntax starts with a keyword, there's always room for a name, regardless of whether the function is part of a declaration.

    let f() { /* Code here may refer to f. */ }
    /* code here may refer to f. */
    
    g(fn h { /* Code here may refer to h. */ });
    /* h is not visible here. */
    

    Referencability

    As seen in the table above, the prevailing syntax for referencing a static method is T.f where T is the containing type and f is the method name. There's more variance in bound method syntax, but the statistical mode seems to be o.f where o is the instance.

    It'd be nice to use this syntax, and it'd be nice if this syntax were stable should the language support overloading, reordering of actual parameters based on parameter names, variadic calls, etc.

    Consider a type declaration with an overridden method:

    class C {
      let f()            :int { 0     }
      let f(x:int)       :int { x     }
      let f(x:int, y:int):int { x + y }
    }
    

    The type of new C().f may be represented as a union type since there are multiple variants: (() → int | int → int | (int × int) → int).

    One way of handling overloading is to define a dispatching function that receives a list of actual parameters and enough type, name metadata to dispatch to an overloading.

    Ideally, most dynamic dispatch and bundling of arguments would be optimized out in practice. An optimizer could take advantage of internal type metadata attached to union types that maps subtypes to indices into a dispatch table so that call operations can avoid bundling arguments and skip the dispatch step when the actual argument list type matches exactly one variant. The internal type representation could look like

    (
     /*variant#0*/ () → int 
    |/*variant#1*/ int → int 
    |/*variant#2*/ (int × int) → int 
    )

    It'd be nice to reuse the machinery that enables method references to do pattern-based decomposition.

    fn f(0)      { /* special case code */ }
    ||  (i: int) { /* general case code */ }
    

    This establishes the precedence of the “fn” prefix operator as slightly lower than the infix “||” operator.

    A function declaration with multiple bodies may declare multiple functions behind the scenes, but the name binds to a function which inspects its arguments and dispatches based on signatures.

    Flow of higher order functions

    In functional languages like Ocaml, these two are equivalent.

    (* A function of two arguments *)
    fun a b -> a + b
    
    (* A function of one argument that returns a function of one argument. *)
    fun a -> fun b -> a + b
    

    Functions that return functions flow nicely using succinct syntax like that for JS and Java.

    // JS
    (a) => (b) => a + b
    
    // Java
    (int a) -> (int b) -> a + b
    

    It'd be nice if higher-order functions flowed without nesting brackets. This is not the case for the proposed syntax.

    fn(a:int) { fn(b:int) { a + b } }
    // Nesting apparent at end    ↑ ↑
    

    Nice visual flow may be reconcilable with visual containment via a syntax like

    fn (a:int) (b:int) { a + b }
    

    If this is done, it's merely a convenient syntax for a special case. It's not possible to reconcile this while allowing any of the member functions to have variants via “||” so there is no non-lexically nested syntax for:

    fn (0)     { fn (0) { 0 }     || (b:int) { b }     }
    || (a:int) { fn (0) { a:int } || (b:int) { a + b } }
    

    That would require mandating that there is at most one implied argument list () (see boilerplate), and never when there is an explicit argument list. This document reserves enough syntax to realize that but does not advocate it.

    Decisions

    This document makes no decision on:

    Examples

    Some function values:

    fn { 42 }            // Zero parameters returns 42
    fn () { 42 }         // Ditto
    fn:int { 42 }        // Explicit return type
    fn (): int { 42 }    // Ditto
    fn (x:int):int { x } // Int identity
    fn [T] (x:T):T { x } // Parameterized identity function with every bell,whistle
    

    If instead of “fn” a declaration keyword is used, then it's a declaration.

    let fortyTwo { 42 }
    // equivalent to
    //   let fortyTwo = fn { 42 };
    // outside a method declaration context.
    const ident [T] (x:T):T { x }
    // Assuming `const` is a declaration keyword, this is
    // equivalent to
    //   const ident = fn [T] (x:T):T { x };
    // outside a method declaration context.
    

    The grammar allows for multiple function bodies.

    let f(x: int)        { … }
    ||   (x: int, y:int) { … }
    

    Since “_” is disallowed as a declaration name, we could, in the future, support unnamed positional parameters like Scala's { _ + _ } via a syntax like

    fn:int { _ + _ }  // has arity 2
    

    Curly brackets are always present so the cognitive load is low when anonymous functions appear in a comma separated sequence as in

    f(fn { 42 }, fn { 43 })
    
    Thanks for reading!