Output Grammars¶
How this fits in¶
The Frontend delivers Temper ASTs to backends which are responsible for turning those into files that make up a library in their target language.
For example, a Java backend gets a module set and needs to produce one or more Jar files because that is how Java libraries are consumed.
An output grammar (or out-grammar) works between the Temper frontend and its backends.
Specifically, they make it easy for a backend to construct a target language AST, for example, a tree of Java code, that can then be formatted to a stream of Java tokens and hence to a Java source files with associated source maps.
As such, output grammars help define both: - The Layered ASTs that most backends consume as it is simpler to work on than Lispy ASTs. - The Target language ASTs that most backends produce.
Each output grammar defines two kinds of nodes:
- AST nodes that form a syntax tree for the target language
- Data nodes that allow persisting information about a translation so that it can be used for later translation of dependencies.
For both kinds of nodes, there are also syntax declarations that control how nodes are converted to sequences of OutputTokens.
It's possible to derive one kind of node from another. For example, define "type" and/or "name" node types for use as persisted declaration metadata, and then auto-derive AST nodes to represent the same in the translated output.
Tips for maintainers¶
$ gradle kcodegen:updateGeneratedCode
will run code generators including the output grammar processor which looks under be*/**/*.out-grammar files and converts them each to a Kotlin file.
Output grammar file format by example¶
Output grammar files should end with .out-grammar so that they are
automatically found by the code generator task.
All token conventions are the same as those for the Temper language including string and C-style comment syntax.
```temper inert // This is a comment
### Metadata pairs
Metadata pairs can be specified via `let`
```temper inert
let namespaceName = "Foo";
C-style escapes like \n work inside quote strings.
The value for namespaceName is used to create a Kotlin object that holds the
generated node classes. In this case, Kotlin code could import Foo and then
refer to node types thus: Foo.MyNodeType.
```temper inert let imports = """ "lang.temper.foo.Bar "lang.temper.foo.Baz ;
similarly defines extra Kotlin `import` statements that should be part of the
file. This comes in handy when referring to names in Kotlin code blocks.
### Syntax Declarations
The `::=` operator allows defining syntax for a node type.
```temper inert
Program ::= topLevels%TopLevel*"\n";
This says that the Program node type consists of zero or more TopLevel nodes, collected in a property named topLevels.
topLevels%TopLevelis a declaration of a property namedtopLevelswhose type isTopLevels- the
*is a Kleene star, and means zero-or-more as in regular expression syntax - the
"\n"after the star is the joiner string. When formatting a Program, the formatter will insert a newline between each pair of adjacent TopLevels.
After processing this, the generated Kotlin code will have something like
object Foo { // From namespaceName
// Super-type for all node types
sealed interface Tree ...
sealed class BaseTree ...
class Program(
override val pos: Position,
topLevels: Iterable<TopLevel>
) : BaseTree(pos) {
override val formatString get() = "{{0*:\n}}"
...
}
...
}
Program corresponds to a class, not an interface because it has no
sub-types defined.
Basics of Node Types¶
As can be seen above, each node takes file Position metadata as its first argument.
Lists of children like topLevels are defensively, shallowly copied by the generated constructor.
Each AST node type also gets a structural .equals and .hashCode override
so that nodes compare structurally ignoring position metadata. Use === to
compare for reference equality. Data node types get structural equality because
their generated classes are
Kotlin data classes.
``temper inert
Foo(); // Declares an AST node type
data BarData(); // Thedata` keyword declares it as a data node type
data FooData from Foo; // Derives a data node type from the Foo node type
### Auto-parenthesizing
In many C-like languages, it's important to insert parentheses when, for
example, embedding a `+` operation inside a `/` operation.
```temper inert
AssignmentPattern.operatorDefinition = `JsOperatorDefinition.Eq`;
allows specifying the operatorDefinition for a node type.
The right-hand side is in back-ticks because it is direct Kotlin code that should, in this case, refer to a sub-type of OperatorDefinition which is used by CodeFormatter to decide when to parenthesize.
TODO: relate parenthesizing to <namespaceName>FormattingHints.
Property definitions¶
Property definitions are of the form
```temper inert propertyName%PropertyType
In the definition of *MyNodeType* it means that *MyNodeType* has a property
named `propertyName` and that its type is *PropertyType*.
Inside a syntax definition (following `::=`) it also contributes a use of that
property.
- if in a condition (left of `=>`, see below) it means the property's value is
being compared to the desired value.
- otherwise when generating the token stream, the property's values
tokens should appear there.
### Property counts
Defining a property `propertyName%PropertyType` tells us that the containing
node type needs that property and tells us about the type, but it doesn't give
enough information to define a Kotlin type for that property.
We infer a *property count* based on context.
```temper inert
Program ::= topLevels%TopLevel*"\n";
Here we know that topLevels contains multiple TopLevel nodes, so the
corresponding Kotlin type will be MutableList<TopLevel>.
```temper inert ReturnStatement ::= "return" & (expression%Expression || ()) & ";";
Here we have a definition of a *ReturnStatement* node type which uses `||` to
represent two possible syntaxes
```js
// return with an expression
return 42;
// return without an expression
return;
Breaking down (expression%Expression || ())
- the parentheses
(...)group since||is lower-precedence than&(concatenation) expression%Expressiondefines and uses a propertyexpression- Since not all paths through the
||useexpression, there is an implicit condition on it, and the left only activates whenexpressionis truthy. ()is shorthand for the empty token list.
Since there is a path through the syntax format that does not use expression
and no path that uses it with *, the Kotlin type will be nullable:
Expression?.
Finally, sometimes a property is required for formatting and always has one node.
```temper inert LetDeclaration ::= "let" & name%Identifier & ";";
Here, every path through the syntax declaration uses `name` once, so its
corresponding Kotlin type is just *Identifier*.
### Conditions
In the *ReturnStatement* example above, the implicit condition on `expression`
could have been written using explicit condition syntax:
```temper inert
((expression => expression%Expression) || ())
There are several kinds of conditions:
Truthiness checks. Checking that a property holds a node like expression =>.
Enum value checks. Checking that a property's value is a particular enumerated value.
```temper inert // When propertyName is elsewhere declared to have a type that is an enum type propertyName == MemberName => // When the enum type is explicit propertyName == EnumName.MemberName => // When the property is immediately declared with an enum type propertyName%EnumName == MemberName =>
Explicit boolean checks.
```temper inert
propertyName == true =>
If there is no other declaration for propertyName its type will be inferred
based on comparison to an enum value or a boolean.
Enum declarations¶
An enum can be declared thus:
```temper inert enum DeclarationKind = Let | Const;
which declares an enum type *DeclarationKind* with two members: *Let* and
*Const*. This corresponds to Kotlin code like:
```kotlin
enum class DeclarationKind { Let, Const; }
Syntax Operators¶
The full set of syntax operators are:
| Operator | Meaning | Example | Restrictions |
|---|---|---|---|
"text" |
Literal token | "return" (expression || ()) |
|
& |
Concatenation of tokens | "foo" & "bar" |
|
() |
Empty string | () |
|
(...) |
Group other operators | (isAsync => "async") || () |
|
* |
Zero or more joined with text | property*", " |
Left must be property, right string |
+ |
One or more joined with text | property+", " |
Ditto |
|| |
Alternation. Try the left, otherwise the right | (optionalThing || otherOptionalThing) |
Rightmost in chain must always succeed |
=> |
Condition | isAsync==true => "async" |
Make no sense outside || |
`expr` |
Custom token expression | `SpecialTokens.indent` |
Kotlin code with type OutputToken |
The & operator concatenates token lists, not strings, so "foo" & "bar"
corresponds to the tokens
foo bar
not the single token
foobar
Grouping node types¶
Some node types are umbrella terms. Rather than having their own syntax definitions, they serve to group other node types that may be syntactically quite distinct, but which are used in the same way.
```temper inert Statement = ExpressionStatement | BlockStatement | Declaration | IfStatement | WhileStatement | ThrowStatement | ReturnStatement | etc ;
This specifies that *Statement* is a node type and a super-type for
*ExpressionStatement*, *BlockStatement*, *Declaration*, and the others
mentioned.
So the generated Kotlin code will contain something like
```kotlin
sealed interface Statement : Tree
class ExpressionStatement(...) : BaseTree(pos), Statement
The super-types do not have to form an inheritance tree. One node type can be in multiple groupings, multiple parts of speech.
```temper inert // Expressions can be evaluated to a value. Expression = Identifier | etc ;
// Patterns can be assigned to. Pattern = Identifier | etc ;
because one way to represent `foo = bar` is by using two *Identifiers*, one of
which is read from, and one of which is written to.
The corresponding Kotlin code looks like
```kotlin
sealed interface Expression : Tree
sealed interface Pattern : Tree
class Identifier(...) : BaseTree, Expression, Pattern { ... }
You can also declare a super-interface using implements, and that allows
specifying a Kotlin type as the super type.
``temper inert
MyNodeType extendsSomeKotlinInterface`;
If you declare a Kotlin supertype you may need to explicitly override properties
or methods.
```temper inert
override MyNodeType.someProp = `implemention in Kotlin`;
Extra properties¶
Sometimes its useful to define properties outside the syntax specification, possibly abstract properties in a grouping node type that does not have its own syntax.
This includes properties that don't hold nodes as mentioned in conditions.
```temper inert isAsync==true => "async"
but may also include properties that do hold nodes.
You can also declare properties outside a `::=` declaration thus:
```temper inert
Identifier(name%`JsIdentifierName`, sourceIdentifier%`TemperName?`) {
`
init {
require(name.text !in jsReservedWords) { "\$pos: \`\${name.text}\` is a JS reserved word" }
}
`
};
This specifies:
- Identifier is a node type
- It has a property named name whose type is JsIdentifierName. The
backticks around the type mean that it is a reference to a type defined in
Kotlin, not the grammar.
- It has another property named sourceIdentifier, also with a direct Kotlin
type.
- The {...} means the body of the Kotlin class definition should include some
extra code, in this case which checks the invariant that an identifier cannot
be a reserved word.
If the property type does not use backticks it must correspond to a node or enum type.
If the extra code block wasn't desired, this could have been shortened by dropping the curlies as in
``temper inert
NumericLiteral(value%Number`);
If you want to specify a type of *NodeType* that is optional or that can appear
many times, you can use `?` and `*` as in:
```temper inert
Foo(
optionalProperty%Bar?,
repeatedProperty%Baz*"",
);
If using a Kotlin type, just specify Iterable<T> or T? inside the
back-quotes.
Default property values¶
Especially for properties that hold boolean or enum values, there's often a sensible default so that every line of code that creates a node of that type doesn't need to specify every bell & whistle.
``temper inert
Function.async.default =false`;
means that the default expression for node type *Function*'s property named
`async` is `false`.
The default expression is direct Kotlin code, so it's in backticks.
### Custom computed properties
Sometimes it's useful to have a computed property in your node type.
In a nested `if` statement, we might not want to put curly brackets around the
`else` clause if it is also an `if` statement to produce output like
```js
if (a) {
...
} else if (b) {
...
} else {
...
}
instead of
if (a) {
...
} else {
if (b) {
...
} else {
...
}
}
We can do this with conditions involving computed properties. First, define the syntax:
```temper inert IfStatement ::= "if" & "(" & test%Expression & ")" & "{" & "\n" & consequent%Statement & "\n" & "}" & ( (isElseIf==true => "else" & alternate%Statement) || (hasElse==true => "else" & "{" & "\n" & alternate & "\n" & "}") || ());
then augment it with overrides for the properties, so that they are not treated
as things to be passed into the constructor.
```temper inert
IfStatement.isElseIf = `alternate is IfStatement`;
IfStatement.hasElse = `
alternate != null &&
(alternate !is BlockStatement || alternate?.childCount != 0)
`;
The NodeName.propertyName=KotlinCode; syntax specifies an override,
where the right-hand side is an on-demand evaluated expression.
If the type of the computed property is not known, you can specify it explicitly.
``temper inert
IfStatement.isElseIf%Boolean=alternate is IfStatement`;
These correspond to Kotlin code like
```kotlin
class IfStatement(...) : BaseTree, Statement, ... {
...
val isElseIf: Boolean
get() = alternate is IfStatement
val hasElse: Boolean
get() =
alternate != null &&
(alternate !is BlockStatement || alternate?.childCount != 0)
}
(The property types of Boolean are inferred based on comparison to true in
comparisons in the format string)
Properties that are computing in a grouping node type may be overridden in sub-types to be constructor properties via extra property syntax:
```temper inert SubType(propertyName%PropertyType);
### Overriding properties
You can have a computed property in a super-type, and then override it to be
passed in as a constructor property in a sub-type.
```temper inert
override SubType.propertyName;
Now, even if SubType inherits propertyName's definition from one or more super-types, we only consider definitions local to SubType when deciding whether propertyName is computed or supplied.
Formatting grammar leaves¶
The simplest node types often are the hardest to format since they're not formatted by splicing together token lists from other node types.
``temper inert
// Define a node type that holds a string
StringLiteral(value%String);
// Override so that rendering doesn't use a format string, and instead
// the typeimplements TokenSerializable.
StringLiteral.renderTo =tokenSink.emit(
OutputToken(
stringTokenText(value),
OutputTokenType.QuotedValue
)
)
`;
### Type derivations
It can be useful to derive a data node type from an AST node type or vice versa.
```temper inert
data FooData from Foo;
That specifies that there is a data node type, FooData, derived from the Foo node type. And that FooData is the data node type that corresponds to Foo.
```temper inert data FooData =~ Foo;
That specifies only that *FooData* is the data node type that *corresponds* to
*Foo*, and does not imply any derivation.
When a node type *Target* is derived from a node type *Source*, we copy some
information over:
All local properties are copied over. So if *Source* has a property *p* of type
*T*, and *Target* does not declare such a local property, *Target* will get a
*p* whose type is the corresponding type of *T*. For example, if *T* is a tree
node type and *Target* is a data node type, then *Target.p* might have type
*TData*, the data node type that corresponds to *T*.
If *Source* has a subtype list like `Source = Sub1 | Sub2 | Sub3;` then
*Target* will get corresponding sub-types as if
`Target = Sub1Data | Sub2Data | Sub3Data;` were declared.
Some of this copying requires creating *implied* corresponding types by deriving
them from existing types. For example, for the sub-type list above, if
*Sub1Data* were not declared, it would be auto-derived as a data node type from
the *Sub1* tree node type.
Auto-derivation is never performed if there is a corresponding type as declared
using a `from` or `=~` declaration as shown above.
Auto-deriving requires using a naming convention:
- When auto-deriving a data node type from a tree node type, the tree node type
has "Tree" removed if it is a suffixm and then "Data" is appended to the end.
- When auto-deriving a tree node type from a data node type, any "Data" suffix
is removed from the end.
If *Source* has a super type, including transitively, that has a corresponding
type then *Target* will have that corresponding type as a super type. Node
types are never auto-derived just to provide a corresponding super-type.
### Extra Kotlin Code
Sometimes one needs to specify extra Kotlin code, for example to define a static
table that relates operator strings to definitions in an operator precedence
table:
```temper inert
UpdateExpression {
`
companion object {
val updateOperator = mapOf(
"++" to listOf(JsOperatorDefinition.PreIncr, JsOperatorDefinition.PostIncr),
"--" to listOf(JsOperatorDefinition.PreDecr, JsOperatorDefinition.PostDecr)
)
}
`
};
This is similar to extra property syntax but dropping the parentheses since this contributes no extra properties.