Conventions¶
WebAssembly is a programming language that has multiple concrete representations (its binary format and the text format). Both map to a common structure. For conciseness, this structure is described in the form of an abstract syntax. All parts of this specification are defined in terms of this abstract syntax.
Grammar Notation¶
The following conventions are adopted in defining grammar rules for abstract syntax.
Terminal symbols (atoms) are written in sans-serif font or in symbolic form: \(\mathsf{i{\scriptstyle 32}}\), \(\mathsf{nop}\), \(\rightarrow\), \({}[ , ]\).
Nonterminal symbols are written in italic font: \({\href{../syntax/types.html#syntax-valtype}{\mathit{valtype}}}\), \({\href{../syntax/instructions.html#syntax-instr}{\mathit{instr}}}\).
\({A^{n}}\) is a sequence of \(n \geq 0\) iterations of \(A\).
\({A^\ast}\) is a possibly empty sequence of iterations of \(A\). (This is a shorthand for \({A^{n}}\) used where \(n\) is not relevant.)
\({A^{+}}\) is a non-empty sequence of iterations of \(A\). (This is a shorthand for \({A^{n}}\) where \(n \geq 1\).)
\({A^?}\) is an optional occurrence of \(A\). (This is a shorthand for \({A^{n}}\) where \(n \leq 1\).)
Productions are written \(\begin{array}[t]{@{}l@{}r@{~}r@{~}l@{}l@{}} & {\mathit{sym}} & ::= & A_1 ~|~ \ldots ~|~ A_n ~|~ \\ \end{array}\).
Large productions may be split into multiple definitions, indicated by ending the first one with explicit ellipses, \(\begin{array}[t]{@{}l@{}r@{~}r@{~}l@{}l@{}} & {\mathit{sym}} & ::= & A_1 ~|~ \dots ~|~ \\ \end{array}\), and starting continuations with ellipses, \(\begin{array}[t]{@{}l@{}r@{~}r@{~}l@{}l@{}} & {\mathit{sym}} & ::= & \dots ~|~ A_2 ~|~ \\ \end{array}\).
Some productions are augmented with side conditions, “\((\mathrel{\mbox{if}} \mathit{condition})\)”, that provide a shorthand for a combinatorial expansion of the production into many separate cases.
If the same meta variable or non-terminal symbol appears multiple times in a production, then all those occurrences must have the same instantiation. (This is a shorthand for a side condition requiring multiple different variables to be equal.)
Auxiliary Notation¶
When dealing with syntactic constructs the following notation is also used:
\(\epsilon\) denotes the empty sequence.
\({|s|}\) denotes the length of a sequence \(s\).
\(s{}[i]\) denotes the \(i\)-th element of a sequence \(s\), starting from \(0\).
\(s{}[i : n]\) denotes the sub-sequence \(s{}[i] \ldots s{}[i + n - 1]\) of a sequence \(s\).
\(s{}[{}[i] = A]\) denotes the same sequence as \(s\), except that the \(i\)-th element is replaced with \(A\).
\(s{}[{}[i : n] = {A^{n}}]\) denotes the same sequence as \(s\), except that the sub-sequence \(s{}[i : n]\) is replaced with \({A^{n}}\).
\(s_1 \oplus s_2\) denotes the sequence \(s_1\) concatenated with \(s_2\); this is equivalent to \(s_1~s_2\), but used for clarity.
\({\bigoplus}\, {s^\ast}\) denotes the flat sequence formed by concatenating all sequences \(s_i\) in \({s^\ast}\).
\(A \in s\) denotes that \(A\) is contained in the sequence \(s\), that is, \(s\) is of the form \(s_1~A~s_2\) for some sequences \(s_1\), \(s_2\).
Moreover, the following conventions are employed:
The notation \({x^{n}}\), where \(x\) is a non-terminal symbol, is treated as a meta variable ranging over respective sequences of \(x\) (similarly for \({x^\ast}\), \({x^{+}}\), \({x^?}\)).
When given a sequence \({x^{n}}\), then the occurrences of \(x\) in an iterated sequence \({( \ldots x \ldots )^{n}}\) are assumed to be in point-wise correspondence with \({x^{n}}\) (similarly for \({x^\ast}\), \({x^{+}}\), \({x^?}\)). This implicitly expresses a form of mapping syntactic constructions over a sequence.
Productions of the following form are interpreted as records that map a fixed set of fields \({\mathsf{field}}_{i}\) to “values” \(A_i\), respectively:
The following notation is adopted for manipulating such records:
Where the type of a record is clear from context, empty fields with value \(\epsilon\) are often omitted.
\(r{.}\mathsf{field}\) denotes the contents of the \(\mathsf{field}\) component of \(r\).
\(r{}[{.}\mathsf{field} = A]\) denotes the same record as \(r\), except that the value of the \(\mathsf{field}\) component is replaced with \(A\).
\(r{}[{.}\mathsf{field} \mathrel{{=}{\oplus}} {A^\ast}]\) denotes the same record as \(r\), except that \({A^\ast}\) is appended to the sequence value of the \(\mathsf{field}\) component, that is, it is short for \(r{}[{.}\mathsf{field} = r{.}\mathsf{field} \oplus {A^\ast}]\).
\(r_1 \oplus r_2\) denotes the composition of two identically shaped records by concatenating each field of sequences point-wise:
\[\{ {\mathsf{field}}_{1}\,{A_1^\ast}, {\mathsf{field}}_{2}\,{A_2^\ast}, \ldots \} \oplus \{ {\mathsf{field}}_{1}\,{B_1^\ast}, {\mathsf{field}}_{2}\,{B_2^\ast}, \ldots \} = \{ {\mathsf{field}}_{1}\,({A_1^\ast} \oplus {B_1^\ast}), {\mathsf{field}}_{2}\,({A_2^\ast} \oplus {B_2^\ast}), \ldots \}\]\({\bigoplus}\, {r^\ast}\) denotes the composition of a sequence of records, respectively; if the sequence is empty, then all fields of the resulting record are empty.
The update notation for sequences and records generalizes recursively to nested components accessed by “paths” \(\begin{array}[t]{@{}l@{}r@{~}r@{~}l@{}l@{}} & {\mathit{pth}} & ::= & {({}[ i ]~\mid~{.}\mathsf{field})^{+}} ~|~ \\ \end{array}\):
\(s{}[{{}[ i ]}{{\mathit{pth}}} = A]\) is short for \(s{}[{}[i] = s{}[i]{}[{\mathit{pth}} = A]]\),
\(r{}[{.}\mathsf{field}~{\mathit{pth}} = A]\) is short for \(r{}[{.}\mathsf{field} = r{.}\mathsf{field}{}[{\mathit{pth}} = A]]\).
Lists¶
Lists are bounded sequences of the form \({A^{n}}\) (or \({A^\ast}\)), where the \(A\) can either be values or complex constructions. A list can have at most \({2^{32}} - 1\) elements.