Type SoundnessΒΆ

The type system of WebAssembly is sound, implying both type safety and memory safety with respect to the WebAssembly semantics. For example:

  • All types declared and derived during validation are respected at run time; e.g., every local or global variable will only contain type-correct values, every instruction will only be applied to operands of the expected type, and every function invocation always evaluates to a result of the right type (if it does not diverge, throw an exception, or trap).

  • No memory location will be read or written except those explicitly defined by the program, i.e., as a local, a global, an element in a table, or a location within a linear memory.

  • There is no undefined behavior, i.e., the execution rules cover all possible cases that can occur in a valid program, and the rules are mutually consistent.

Soundness also is instrumental in ensuring additional properties, most notably, encapsulation of function and module scopes: no locals can be accessed outside their own function and no module components can be accessed outside their own module unless they are explicitly exported or imported.

The typing rules defining WebAssembly validation only cover the static components of a WebAssembly program. In order to state and prove soundness precisely, the typing rules must be extended to the dynamic components of the abstract runtime, that is, the store, configurations, and administrative instructions. [1]

ContextsΒΆ

In order to check rolled up recursive types, the context is locally extended with an additional component that records the sub type corresponding to each recursive type index within the current recursive type:

𝐢::={ β€¦,π—‹π–Ύπ–Όπ—Œ subtypeβˆ— }

TypesΒΆ

Well-formedness for extended type forms is defined as follows.

Heap Type π–»π—ˆπ—ΒΆ

  • The heap type is valid.

Heap Type 𝗋𝖾𝖼 π‘–ΒΆ

Value Type π–»π—ˆπ—ΒΆ

  • The value type is valid.

Recursive Types 𝗋𝖾𝖼 subtypeβˆ—ΒΆ

πΆβŠ’π—‹π–Ύπ–Ό πœ–:π—ˆπ—„β‘(π‘₯,𝑖)𝐢⊒subtype:π—ˆπ—„β‘(π‘₯,𝑖)πΆβŠ’π—‹π–Ύπ–Ό subtypeβ€²βˆ—:π—ˆπ—„β‘(π‘₯+1,𝑖+1)πΆβŠ’π—‹π–Ύπ–Ό subtype subtypeβ€²βˆ—:π—ˆπ—„β‘(π‘₯,𝑖)

Note

These rules are a generalisation of the ones previously given.

Sub types π—Œπ—Žπ–» π–Ώπ—‚𝗇𝖺𝗅? htβˆ— comptypeΒΆ

|htβˆ—|≀1(htβ‰Ίπ‘₯,𝑖)βˆ—(unroll𝐢⁒(ht)=π—Œπ—Žπ–» htβ€²βˆ— comptypeβ€²)βˆ—πΆβŠ’comptype:π—ˆπ—„(𝐢⊒comptype≀comptypeβ€²)βˆ—πΆβŠ’π—Œπ—Žπ–» π–Ώπ—‚𝗇𝖺𝗅? htβˆ— comptype:π—ˆπ—„β‘(π‘₯,𝑖)

where:

(deftypeβ‰Ίπ‘₯,𝑖)=true(𝑦≺π‘₯,𝑖)=𝑦<π‘₯(𝗋𝖾𝖼 π‘—β‰Ίπ‘₯,𝑖)=𝑗<𝑖[2⁒𝑒⁒π‘₯]⁒unroll𝐢⁑(deftype)=unroll⁑(deftype)unroll𝐢⁑(𝑦)=unroll⁑(𝐢.π—π—’π—‰π–Ύπ—Œβ‘[𝑦])unroll𝐢⁑(𝗋𝖾𝖼 π‘—)=𝐢.π—‹π–Ύπ–Όπ—Œβ‘[𝑗]

Note

This rule is a generalisation of the ones previously given, which only allowed type indices as supertypes.

Defined types rectype.𝑖¢

The defined type (rectype . π‘–) is valid if:

𝐢⊒rectype:π—ˆπ—„β‘(π‘₯)rectype=𝗋𝖾𝖼 subtype𝑛𝑖<π‘›πΆβŠ’rectype . π‘–:π—ˆπ—„

SubtypingΒΆ

In a rolled-up recursive type, a recursive type indices 𝗋𝖾𝖼 π‘– matches another heap type ht if:

Note

This rule is only invoked when checking validity of rolled-up recursive types.

ResultsΒΆ

Results can be classified by result types as follows.

Results valβˆ—ΒΆ

  • For each value val𝑖 in valβˆ—:

  • Let π‘‘βˆ— be the concatenation of all 𝑑𝑖.

  • Then the result is valid with result type [π‘‘βˆ—].

(π‘†βŠ’val:𝑑)βˆ—π‘†βŠ’valβˆ—:[π‘‘βˆ—]

Results (𝗋𝖾𝖿.𝖾𝗑𝗇 π‘Ž) π—π—π—‹π—ˆπ—_𝗋𝖾𝖿¢

Results 𝗍𝗋𝖺𝗉¢

⊒[π‘‘βˆ—]:π—ˆπ—„π‘†βŠ’π—π—‹π–Ίπ—‰:[π‘‘βˆ—]

Store ValidityΒΆ

The following typing rules specify when a runtime store 𝑆 is valid. A valid store must consist of tag, global, memory, table, function, data, element, structure, array, exception, and module instances that are themselves valid, relative to 𝑆.

To that end, each kind of instance is classified by a respective tag, global, memory, table, function, or element, type, or just π—ˆπ—„ in the case of data structures, arrays, or exceptions. Module instances are classified by module contexts, which are regular contexts repurposed as module types describing the index spaces defined by a module.

Store 𝑆¢

 (π‘†βŠ’taginst:tagtype)βˆ—(π‘†βŠ’globalinst:globaltype)βˆ—(π‘†βŠ’meminst:memtype)βˆ—(π‘†βŠ’tableinst:tabletype)βˆ—(π‘†βŠ’funcinst:deftype)βˆ—(π‘†βŠ’datainst:π—ˆπ—„)βˆ—(π‘†βŠ’eleminst:reftype)βˆ—(π‘†βŠ’structinst:π—ˆπ—„)βˆ—(π‘†βŠ’arrayinst:π—ˆπ—„)βˆ—(π‘†βŠ’exninst:π—ˆπ—„)βˆ—π‘†={π—π–Ίπ—€π—Œ taginstβˆ—,π—€π—…π—ˆπ–»π–Ίπ—…π—Œ globalinstβˆ—,π—†π–Ύπ—†π—Œ meminstβˆ—,π—π–Ίπ–»π—…π–Ύπ—Œ tableinstβˆ—,π–Ώπ—Žπ—‡π–Όπ—Œ funcinstβˆ—,π–½π–Ίπ—π–Ίπ—Œ datainstβˆ—,π–Ύπ—…π–Ύπ—†π—Œ eleminstβˆ—,π—Œπ—π—‹π—Žπ–Όπ—π—Œ structinstβˆ—,π–Ίπ—‹π—‹π–Ίπ—’π—Œ arrayinstβˆ—,π–Ύπ—‘π—‡π—Œ exninstβˆ—}(𝑆.π—Œπ—π—‹π—Žπ–Όπ—π—Œβ‘[π‘Žs]=structinst)βˆ—((𝗋𝖾𝖿.π—Œπ—π—‹π—Žπ–Όπ— π‘Žs)≫̸+𝑆(𝗋𝖾𝖿.π—Œπ—π—‹π—Žπ–Όπ— π‘Žs))βˆ—(𝑆.π–Ίπ—‹π—‹π–Ίπ—’π—Œβ‘[π‘Ža]=arrayinst)βˆ—((𝗋𝖾𝖿.𝖺𝗋𝗋𝖺𝗒 π‘Ža)≫̸+𝑆(𝗋𝖾𝖿.𝖺𝗋𝗋𝖺𝗒 π‘Ža))βˆ—(𝑆.π–Ύπ—‘π—‡π—Œβ‘[π‘Že]=exninst)βˆ—((𝗋𝖾𝖿.𝖾𝗑𝗇 π‘Že)≫̸+𝑆(𝗋𝖾𝖿.𝖾𝗑𝗇 π‘Že))βˆ—βŠ’π‘†:π—ˆπ—„

where val1 ≫+𝑆val2 denotes the transitive closure of the following immutable reachability relation on values:

Note

The constraint on reachability through immutable fields prevents the presence of cyclic data structures that can not be constructed in the language. Cycles can only be formed using mutation.

Tag Instances {𝗍𝗒𝗉𝖾 tagtype}ΒΆ

Global Instances {𝗍𝗒𝗉𝖾 mut π‘‘,π—π–Ίπ—…π—Žπ–Ύ val}ΒΆ

⊒mut π‘‘:π—ˆπ—„π‘†βŠ’val:π‘‘β€²βŠ’π‘‘β€²β‰€π‘‘π‘†βŠ’{𝗍𝗒𝗉𝖾 mut π‘‘,π—π–Ίπ—…π—Žπ–Ύ val}:mut π‘‘

Memory Instances {𝗍𝗒𝗉𝖾 (addrtype limits),π–»π—’π—π–Ύπ—Œ π‘βˆ—}ΒΆ

⊒addrtype [𝑛..π‘š]:π—ˆπ—„|π‘βˆ—|=𝑛⋅64Kiπ‘†βŠ’{𝗍𝗒𝗉𝖾 (addrtype [𝑛..π‘š]),π–»π—’π—π–Ύπ—Œ π‘βˆ—}:addrtype [𝑛..π‘š]

Table Instances {𝗍𝗒𝗉𝖾 (addrtype limits π‘‘),𝖾𝗅𝖾𝗆 refβˆ—}ΒΆ

⊒addrtype [𝑛..π‘š] π‘‘:π—ˆπ—„|refβˆ—|=𝑛(π‘†βŠ’ref:𝑑′)βˆ—(βŠ’π‘‘β€²β‰€π‘‘)βˆ—π‘†βŠ’{𝗍𝗒𝗉𝖾 (addrtype [𝑛..π‘š] π‘‘),𝖾𝗅𝖾𝗆 refβˆ—}:addrtype [𝑛..π‘š] π‘‘

Function Instances {𝗍𝗒𝗉𝖾 deftype,π—†π—ˆπ–½π—Žπ—…π–Ύ moduleinst,π–Όπ—ˆπ–½π–Ύ func}ΒΆ

Host Function Instances {𝗍𝗒𝗉𝖾 deftype,π—π—ˆπ—Œπ—π–Ώπ—Žπ—‡π–Ό hf}ΒΆ

⊒deftype:π—ˆπ—„deftypeβ‰ˆπ–Ώπ—Žπ—‡π–Ό [π‘‘βˆ—1]β†’[π‘‘βˆ—2]βˆ€π‘†1,valβˆ—, βŠ’𝑆1:π—ˆπ—„βˆ§βŠ’π‘†βͺ―𝑆1βˆ§π‘†1⊒valβˆ—:[π‘‘βˆ—1]⟹hf⁑(𝑆1;valβˆ—)βŠƒβˆ…βˆ§βˆ€π‘…βˆˆhf⁑(𝑆1;valβˆ—), π‘…=βŠ₯βˆ¨βˆƒπ‘†2,result, βŠ’𝑆2:π—ˆπ—„βˆ§βŠ’π‘†1βͺ―𝑆2βˆ§π‘†2⊒result:[π‘‘βˆ—2]βˆ§π‘…=(𝑆2;result)π‘†βŠ’{𝗍𝗒𝗉𝖾 deftype,π—π—ˆπ—Œπ—π–Ώπ—Žπ—‡π–Ό hf}:deftype

Note

This rule states that, if appropriate pre-conditions about store and arguments are satisfied, then executing the host function must satisfy appropriate post-conditions about store and results. The post-conditions match the ones in the execution rule for invoking host functions.

Any store under which the function is invoked is assumed to be an extension of the current store. That way, the function itself is able to make sufficient assumptions about future stores.

Data Instances {π–»π—’π—π–Ύπ—Œ π‘βˆ—}ΒΆ

  • The data instance is valid.

Element Instances {𝗍𝗒𝗉𝖾 π‘‘,𝖾𝗅𝖾𝗆 refβˆ—}ΒΆ

βŠ’π‘‘:π—ˆπ—„(π‘†βŠ’ref:𝑑′)βˆ—(βŠ’π‘‘β€²β‰€π‘‘)βˆ—π‘†βŠ’{𝗍𝗒𝗉𝖾 π‘‘,𝖾𝗅𝖾𝗆 refβˆ—}:𝑑

Structure Instances {𝗍𝗒𝗉𝖾 deftype,π–Ώπ—‚π–Ύπ—…π–½π—Œ fieldvalβˆ—}ΒΆ

Array Instances {𝗍𝗒𝗉𝖾 deftype,π–Ώπ—‚π–Ύπ—…π–½π—Œ fieldvalβˆ—}ΒΆ

Field Values fieldvalΒΆ

Exception Instances {𝗍𝖺𝗀 π‘Ž,π–Ώπ—‚π–Ύπ—…π–½π—Œ valβˆ—}ΒΆ

Export Instances {𝗇𝖺𝗆𝖾 name,𝖺𝖽𝖽𝗋 externaddr}ΒΆ

Module Instances moduleinstΒΆ

 (⊒deftype:π—ˆπ—„)βˆ—(π‘†βŠ’π—π–Ίπ—€ tagaddr:𝗍𝖺𝗀 tagtype)βˆ—(π‘†βŠ’π—€π—…π—ˆπ–»π–Ίπ—… globaladdr:π—€π—…π—ˆπ–»π–Ίπ—… globaltype)βˆ—(π‘†βŠ’π–Ώπ—Žπ—‡π–Ό funcaddr:π–Ώπ—Žπ—‡π–Ό deftypeπ–₯)βˆ—(π‘†βŠ’π—†π–Ύπ—† memaddr:𝗆𝖾𝗆 memtype)βˆ—(π‘†βŠ’π—π–Ίπ–»π—…π–Ύ tableaddr:𝗍𝖺𝖻𝗅𝖾 tabletype)βˆ—(π‘†βŠ’π‘†.π–½π–Ίπ—π–Ίπ—Œβ‘[dataaddr]:ok)βˆ—(π‘†βŠ’π‘†.π–Ύπ—…π–Ύπ—†π—Œβ‘[elemaddr]:reftype)βˆ—(π‘†βŠ’exportinst:π—ˆπ—„)βˆ—(exportinst.𝗇𝖺𝗆𝖾)βˆ— disjointπ‘†βŠ’{π—π—’π—‰π–Ύπ—Œ deftypeβˆ—,π—π–Ίπ—€π—Œ tagaddrβˆ—,π—€π—…π—ˆπ–»π–Ίπ—…π—Œ globaladdrβˆ—,π—†π–Ύπ—†π—Œ memaddrβˆ—,π—π–Ίπ–»π—…π–Ύπ—Œ tableaddrβˆ—,π–Ώπ—Žπ—‡π–Όπ—Œ funcaddrβˆ—,π–½π–Ίπ—π–Ίπ—Œ dataaddrβˆ—,π–Ύπ—…π–Ύπ—†π—Œ elemaddrβˆ—,π–Ύπ—‘π—‰π—ˆπ—‹π—π—Œ exportinstβˆ— }:{π—π—’π—‰π–Ύπ—Œ deftypeβˆ—,π—π–Ίπ—€π—Œ tagtypeβˆ—,π—€π—…π—ˆπ–»π–Ίπ—…π—Œ globaltypeβˆ—,π—†π–Ύπ—†π—Œ memtypeβˆ—,π—π–Ίπ–»π—…π–Ύπ—Œ tabletypeβˆ—,π–Ώπ—Žπ—‡π–Όπ—Œ deftypeβˆ—π–₯,π–½π–Ίπ—π–Ίπ—Œ okβˆ—,π–Ύπ—…π–Ύπ—†π—Œ reftypeβˆ—,π—‹π–Ύπ–Ώπ—Œ 0…(|funcaddrβˆ—|βˆ’1) }

Configuration ValidityΒΆ

To relate the WebAssembly type system to its execution semantics, the typing rules for instructions must be extended to configurations 𝑆;𝑇, which relates the store to execution threads.

Configurations and threads are classified by their result type. In addition to the store 𝑆, threads are typed under a return type resulttype?, which controls whether and with which type a π—‹π–Ύπ—π—Žπ—‹π—‡ instruction is allowed. This type is absent (πœ–) except for instruction sequences inside an administrative 𝖿𝗋𝖺𝗆𝖾 instruction.

Finally, frames are classified with frame contexts, which extend the module contexts of a frame’s associated module instance with the locals that the frame contains.

Configurations 𝑆;𝑇¢

βŠ’π‘†:π—ˆπ—„π‘†;πœ–βŠ’π‘‡:[π‘‘βˆ—]βŠ’π‘†;𝑇:[π‘‘βˆ—]

Threads 𝐹;instrβˆ—ΒΆ

π‘†βŠ’πΉ:𝐢𝑆;𝐢,π—‹π–Ύπ—π—Žπ—‹π—‡ resulttype?⊒instrβˆ—:[]β†’[π‘‘βˆ—]𝑆;resulttype?⊒𝐹;instrβˆ—:[π‘‘βˆ—]

Frames {π—…π—ˆπ–Όπ–Ίπ—…π—Œ valβˆ—,π—†π—ˆπ–½π—Žπ—…π–Ύ moduleinst}ΒΆ

Administrative InstructionsΒΆ

Typing rules for administrative instructions are specified as follows. In addition to the context 𝐢, typing of these instructions is defined under a given store 𝑆.

To that end, all previous typing judgements 𝐢 ⊒prop are generalized to include the store, as in 𝑆;𝐢 ⊒prop, by implicitly adding 𝑆 to all rules – 𝑆 is never modified by the pre-existing rules, but it is accessed in the extra rules for administrative instructions given below.

𝗍𝗋𝖺𝗉¢

𝐢⊒[π‘‘βˆ—1]β†’[π‘‘βˆ—2]:π—ˆπ—„π‘†;πΆβŠ’π—π—‹π–Ίπ—‰:[π‘‘βˆ—1]β†’[π‘‘βˆ—2]

valΒΆ

  • The value val must be valid with value type 𝑑.

  • Then it is valid as an instruction with type [] β†’[𝑑].

π‘†βŠ’val:𝑑𝑆;𝐢⊒val:[]β†’[𝑑]

π—‚π—‡π—π—ˆπ—„π–Ύ funcaddrΒΆ

𝗅𝖺𝖻𝖾𝗅𝑛⁒{instrβˆ—0} instrβˆ—ΒΆ

𝑆;𝐢⊒instrβˆ—0:[𝑑𝑛1]β†’π‘₯βˆ—[π‘‘βˆ—2]𝑆;𝐢,π—…π–Ίπ–»π–Ύπ—…π—Œβ‘[𝑑𝑛1]⊒instrβˆ—:[]β†’π‘₯β€²βˆ—[π‘‘βˆ—2]𝑆;πΆβŠ’π—…π–Ίπ–»π–Ύπ—…π‘›β’{instrβˆ—0} instrβˆ—:[]β†’[π‘‘βˆ—2]

𝖿𝗋𝖺𝗆𝖾𝑛⁒{𝐹} instrβˆ—ΒΆ

  • Under the valid return type [𝑑𝑛], the thread 𝐹;instrβˆ— must be valid with result type [𝑑𝑛].

  • Then the compound instruction is valid with type [] β†’[𝑑𝑛].

𝐢⊒[𝑑𝑛]:π—ˆπ—„π‘†;[𝑑𝑛]⊒𝐹;instrβˆ—:[𝑑𝑛]𝑆;πΆβŠ’π–Ώπ—‹π–Ίπ—†π–Ύπ‘›β’{𝐹} instrβˆ—:[]β†’[𝑑𝑛]

𝗁𝖺𝗇𝖽𝗅𝖾𝗋𝑛⁒{catchβˆ—} instrβˆ—ΒΆ

  • For every catch clause catch𝑖 in catchβˆ—, catch𝑖 must be valid.

  • The instruction sequence instrβˆ— must be valid with some type [π‘‘βˆ—1] β†’[π‘‘βˆ—2].

  • Then the compound instruction is valid with type [π‘‘βˆ—1] β†’[π‘‘βˆ—2].

(𝐢⊒catch:π—ˆπ—„)βˆ—π‘†;𝐢⊒instrβˆ—:[π‘‘βˆ—1]β†’[π‘‘βˆ—2]𝑆;πΆβŠ’π—π–Ίπ—‡π–½π—…π–Ύπ—‹π‘›β’{catchβˆ—} instrβˆ—:[π‘‘βˆ—1]β†’[π‘‘βˆ—2]

Store ExtensionΒΆ

Programs can mutate the store and its contained instances. Any such modification must respect certain invariants, such as not removing allocated instances or changing immutable definitions. While these invariants are inherent to the execution semantics of WebAssembly instructions and modules, host functions do not automatically adhere to them. Consequently, the required invariants must be stated as explicit constraints on the invocation of host functions. Soundness only holds when the embedder ensures these constraints.

The necessary constraints are codified by the notion of store extension: a store state 𝑆′ extends state 𝑆, written 𝑆 βͺ―𝑆′, when the following rules hold.

Note

Extension does not imply that the new store is valid, which is defined separately above.

Store 𝑆¢

𝑆1.π—π–Ίπ—€π—Œ=taginstβˆ—1𝑆2.π—π–Ίπ—€π—Œ=taginstβ€²1βˆ— taginstβˆ—2(⊒taginst1βͺ―taginstβ€²1)βˆ—π‘†1.π—€π—…π—ˆπ–»π–Ίπ—…π—Œ=globalinstβˆ—1𝑆2.π—€π—…π—ˆπ–»π–Ίπ—…π—Œ=globalinstβ€²1βˆ— globalinstβˆ—2(⊒globalinst1βͺ―globalinstβ€²1)βˆ—π‘†1.π—†π–Ύπ—†π—Œ=meminstβˆ—1𝑆2.π—†π–Ύπ—†π—Œ=meminstβ€²1βˆ— meminstβˆ—2(⊒meminst1βͺ―meminstβ€²1)βˆ—π‘†1.π—π–Ίπ–»π—…π–Ύπ—Œ=tableinstβˆ—1𝑆2.π—π–Ίπ–»π—…π–Ύπ—Œ=tableinstβ€²1βˆ— tableinstβˆ—2(⊒tableinst1βͺ―tableinstβ€²1)βˆ—π‘†1.π–Ώπ—Žπ—‡π–Όπ—Œ=funcinstβˆ—1𝑆2.π–Ώπ—Žπ—‡π–Όπ—Œ=funcinstβ€²1βˆ— funcinstβˆ—2(⊒funcinst1βͺ―funcinstβ€²1)βˆ—π‘†1.π–½π–Ίπ—π–Ίπ—Œ=datainstβˆ—1𝑆2.π–½π–Ίπ—π–Ίπ—Œ=datainstβ€²1βˆ— datainstβˆ—2(⊒datainst1βͺ―datainstβ€²1)βˆ—π‘†1.π–Ύπ—…π–Ύπ—†π—Œ=eleminstβˆ—1𝑆2.π–Ύπ—…π–Ύπ—†π—Œ=eleminstβ€²1βˆ— eleminstβˆ—2(⊒eleminst1βͺ―eleminstβ€²1)βˆ—π‘†1.π—Œπ—π—‹π—Žπ–Όπ—π—Œ=structinstβˆ—1𝑆2.π—Œπ—π—‹π—Žπ–Όπ—π—Œ=structinstβ€²1βˆ— structinstβˆ—2(⊒structinst1βͺ―structinstβ€²1)βˆ—π‘†1.π–Ίπ—‹π—‹π–Ίπ—’π—Œ=arrayinstβˆ—1𝑆2.π–Ίπ—‹π—‹π–Ίπ—’π—Œ=arrayinstβ€²1βˆ— arrayinstβˆ—2(⊒arrayinst1βͺ―arrayinstβ€²1)βˆ—π‘†1.π–Ύπ—‘π—‡π—Œ=exninstβˆ—1𝑆2.π–Ύπ—‘π—‡π—Œ=exninstβ€²1βˆ— exninstβˆ—2(⊒exninst1βͺ―exninstβ€²1)βˆ—βŠ’π‘†1βͺ―𝑆2

Tag Instance taginstΒΆ

  • A tag instance must remain unchanged.

Global Instance globalinstΒΆ

Memory Instance meminstΒΆ

Table Instance tableinstΒΆ

Function Instance funcinstΒΆ

  • A function instance must remain unchanged.

Data Instance datainstΒΆ

Element Instance eleminstΒΆ

Structure Instance structinstΒΆ

Array Instance arrayinstΒΆ

Exception Instance exninstΒΆ

  • An exception instance must remain unchanged.

TheoremsΒΆ

Given the definition of valid configurations, the standard soundness theorems hold. [2] [3]

Theorem (Preservation). If a configuration 𝑆;𝑇 is valid with result type [π‘‘βˆ—] (i.e., βŠ’π‘†;𝑇 :[π‘‘βˆ—]), and steps to 𝑆′;𝑇′ (i.e., 𝑆;𝑇 β†ͺ𝑆′;𝑇′), then 𝑆′;𝑇′ is a valid configuration with the same result type (i.e., βŠ’π‘†β€²;𝑇′ :[π‘‘βˆ—]). Furthermore, 𝑆′ is an extension of 𝑆 (i.e., βŠ’π‘† βͺ―𝑆′).

A terminal thread is one whose sequence of instructions is a result. A terminal configuration is a configuration whose thread is terminal.

Theorem (Progress). If a configuration 𝑆;𝑇 is valid (i.e., βŠ’π‘†;𝑇 :[π‘‘βˆ—] for some result type [π‘‘βˆ—]), then either it is terminal, or it can step to some configuration 𝑆′;𝑇′ (i.e., 𝑆;𝑇 β†ͺ𝑆′;𝑇′).

From Preservation and Progress the soundness of the WebAssembly type system follows directly.

Corollary (Soundness). If a configuration 𝑆;𝑇 is valid (i.e., βŠ’π‘†;𝑇 :[π‘‘βˆ—] for some result type [π‘‘βˆ—]), then it either diverges or takes a finite number of steps to reach a terminal configuration 𝑆′;𝑇′ (i.e., 𝑆;𝑇 β†ͺβˆ—π‘†β€²;𝑇′) that is valid with the same result type (i.e., βŠ’π‘†β€²;𝑇′ :[π‘‘βˆ—]) and where 𝑆′ is an extension of 𝑆 (i.e., βŠ’π‘† βͺ―𝑆′).

In other words, every thread in a valid configuration either runs forever, traps, throws an exception, or terminates with a result that has the expected type. Consequently, given a valid store, no computation defined by instantiation or invocation of a valid module can β€œcrash” or otherwise (mis)behave in ways not covered by the execution semantics given in this specification.

Type System PropertiesΒΆ

Principal TypesΒΆ

The type system of WebAssembly features both subtyping and simple forms of polymorphism for instruction types. That has the effect that every instruction or instruction sequence can be classified with multiple different instruction types.

However, the typing rules still allow deriving principal types for instruction sequences. That is, every valid instruction sequence has one particular type scheme, possibly containing some unconstrained place holder type variables, that is a subtype of all its valid instruction types, after substituting its type variables with suitable specific types.

Moreover, when deriving an instruction type in a β€œforward” manner, i.e., the input of the instruction sequence is already fixed to specific types, then it has a principal output type expressible without type variables, up to a possibly polymorphic stack bottom representable with one single variable. In other words, β€œforward” principal types are effectively closed.

Note

For example, in isolation, the instruction 𝗋𝖾𝖿.π–Ίπ—Œ_π—‡π—ˆπ—‡_π—‡π—Žπ—…π—… has the type [(𝗋𝖾𝖿 π—‡π—Žπ—…𝗅 ht)] β†’[(𝗋𝖾𝖿 ht)] for any choice of valid heap type ht. Moreover, if the input type [(𝗋𝖾𝖿 π—‡π—Žπ—…𝗅 ht)] is already determined, i.e., a specific ht is given, then the output type [(𝗋𝖾𝖿 ht)] is fully determined as well.

The implication of the latter property is that a validator for complete instruction sequences (as they occur in valid modules) can be implemented with a simple left-to-right algorithm that does not require the introduction of type variables.

A typing algorithm capable of handling partial instruction sequences (as might be considered for program analysis or program manipulation) needs to introduce type variables and perform substitutions, but it does not need to perform backtracking or record any non-syntactic constraints on these type variables.

Technically, the syntax of heap, value, and result types can be enriched with type variables as follows:

null::=π—‡π—Žπ—…π—…? | π›Όnullheaptype::=… | π›Όheaptypereftype::=𝗋𝖾𝖿 null heaptypevaltype::=… | π›Όvaltype | π›Όnumvectyperesulttype::=[𝛼?valtypeβˆ— valtypeβˆ—]

where each 𝛼xyz ranges over a set of type variables for syntactic class xyz, respectively. The special class numvectype is defined as numtype | vectype | π–»π—ˆπ—, and is only needed to handle unannotated π—Œπ–Ύπ—…π–Ύπ–Όπ— instructions.

A type is closed when it does not contain any type variables, and open otherwise. A type substitution 𝜎 is a finite mapping from type variables to closed types of the respective syntactic class. When applied to an open type, it replaces the type variables 𝛼 from its domain with the respective 𝜎⁑(𝛼).

Theorem (Principal Types). If an instruction sequence instrβˆ— is valid with some closed instruction type instrtype (i.e., 𝐢 ⊒instrβˆ— :instrtype), then it is also valid with a possibly open instruction type instrtypemin (i.e., 𝐢 ⊒instrβˆ— :instrtypemin), such that for every closed type instrtypeβ€² with which instrβˆ— is valid (i.e., for all 𝐢 ⊒instrβˆ— :instrtypeβ€²), there exists a substitution 𝜎, such that 𝜎⁑(instrtypemin) is a subtype of instrtypeβ€² (i.e., 𝐢 ⊒𝜎⁑(instrtypemin) ≀instrtypeβ€²). Furthermore, instrtypemin is unique up to the choice of type variables.

Theorem (Closed Principal Forward Types). If closed input type [π‘‘βˆ—1] is given and the instruction sequence instrβˆ— is valid with instruction type [π‘‘βˆ—1] β†’π‘₯βˆ—[π‘‘βˆ—2] (i.e., 𝐢 ⊒instrβˆ— :[π‘‘βˆ—1] β†’π‘₯βˆ—[π‘‘βˆ—2]), then it is also valid with instruction type [π‘‘βˆ—1] β†’π‘₯βˆ—[𝛼valtypeβˆ— π‘‘βˆ—] (i.e., 𝐢 ⊒instrβˆ— :[π‘‘βˆ—1] β†’π‘₯βˆ—[𝛼valtypeβˆ— π‘‘βˆ—]), where all π‘‘βˆ— are closed, such that for every closed result type [𝑑′2βˆ—] with which instrβˆ— is valid (i.e., for all 𝐢 ⊒instrβˆ— :[π‘‘βˆ—1] β†’π‘₯βˆ—[𝑑′2βˆ—]), there exists a substitution 𝜎, such that [𝑑′2βˆ—] =[𝜎⁑(𝛼valtypeβˆ—) π‘‘βˆ—].

Type LatticeΒΆ

The Principal Types property depends on the existence of a greatest lower bound for any pair of types.

Theorem (Greatest Lower Bounds for Value Types). For any two value types 𝑑1 and 𝑑2 that are valid (i.e., 𝐢 βŠ’π‘‘1 :π—ˆπ—„ and 𝐢 βŠ’π‘‘2 :π—ˆπ—„), there exists a valid value type 𝑑 that is a subtype of both 𝑑1 and 𝑑2 (i.e., 𝐢 βŠ’π‘‘ :π—ˆπ—„ and 𝐢 βŠ’π‘‘ ≀𝑑1 and 𝐢 βŠ’π‘‘ ≀𝑑2), such that every valid value type 𝑑′ that also is a subtype of both 𝑑1 and 𝑑2 (i.e., for all 𝐢 βŠ’π‘‘β€² :π—ˆπ—„ and 𝐢 βŠ’π‘‘β€² ≀𝑑1 and 𝐢 βŠ’π‘‘β€² ≀𝑑2), is a subtype of 𝑑 (i.e., 𝐢 βŠ’π‘‘β€² ≀𝑑).

Note

The greatest lower bound of two types may be π–»π—ˆπ—.

Theorem (Conditional Least Upper Bounds for Value Types). Any two value types 𝑑1 and 𝑑2 that are valid (i.e., 𝐢 βŠ’π‘‘1 :π—ˆπ—„ and 𝐢 βŠ’π‘‘2 :π—ˆπ—„) either have no common supertype, or there exists a valid value type 𝑑 that is a supertype of both 𝑑1 and 𝑑2 (i.e., 𝐢 βŠ’π‘‘ :π—ˆπ—„ and 𝐢 βŠ’π‘‘1 ≀𝑑 and 𝐢 βŠ’π‘‘2 ≀𝑑), such that every valid value type 𝑑′ that also is a supertype of both 𝑑1 and 𝑑2 (i.e., for all 𝐢 βŠ’π‘‘β€² :π—ˆπ—„ and 𝐢 βŠ’π‘‘1 ≀𝑑′ and 𝐢 βŠ’π‘‘2 ≀𝑑′), is a supertype of 𝑑 (i.e., 𝐢 βŠ’π‘‘ ≀𝑑′).

Note

If a top type was added to the type system, a least upper bound would exist for any two types.

Corollary (Type Lattice). Assuming the addition of a provisional top type, value types form a lattice with respect to their subtype relation.

Finally, value types can be partitioned into multiple disjoint hierarchies that are not related by subtyping, except through π–»π—ˆπ—.

Theorem (Disjoint Subtype Hierarchies). The greatest lower bound of two value types is π–»π—ˆπ— or 𝗋𝖾𝖿 π–»π—ˆπ— if and only if they do not have a least upper bound.

In other words, types that do not have common supertypes, do not have common subtypes either (other than π–»π—ˆπ— or 𝗋𝖾𝖿 π–»π—ˆπ—), and vice versa.

Note

Types from disjoint hierarchies can safely be represented in mutually incompatible ways in an implementation, because their values can never flow to the same place.

CompositionalityΒΆ

Valid instruction sequences can be freely composed, as long as their types match up.

Theorem (Composition). If two instruction sequences instrβˆ—1 and instrβˆ—2 are valid with types [π‘‘βˆ—1] β†’π‘₯βˆ—1[π‘‘βˆ—] and [π‘‘βˆ—] β†’π‘₯βˆ—2[π‘‘βˆ—2], respectively (i.e., 𝐢 ⊒instrβˆ—1 :[π‘‘βˆ—1] β†’π‘₯βˆ—1[π‘‘βˆ—] and 𝐢 ⊒instrβˆ—1 :[π‘‘βˆ—] β†’π‘₯βˆ—2[π‘‘βˆ—2]), then the concatenated instruction sequence (instrβˆ—1 instrβˆ—2) is valid with type [π‘‘βˆ—1] β†’π‘₯βˆ—1π‘₯βˆ—2[π‘‘βˆ—2] (i.e., 𝐢 ⊒instrβˆ—1 instrβˆ—2 :[π‘‘βˆ—1] β†’π‘₯βˆ—1π‘₯βˆ—2[π‘‘βˆ—2]).

Note

More generally, instead of a shared type [π‘‘βˆ—], it suffices if the output type of instrβˆ—1 is a subtype of the input type of instrβˆ—1, since the subtype can always be weakened to its supertype by subsumption.

Inversely, valid instruction sequences can also freely be decomposed, that is, splitting them anywhere produces two instruction sequences that are both valid.

Theorem (Decomposition). If an instruction sequence instrβˆ— that is valid with type [π‘‘βˆ—1] β†’π‘₯βˆ—[π‘‘βˆ—2] (i.e., 𝐢 ⊒instrβˆ— :[π‘‘βˆ—1] β†’π‘₯βˆ—[π‘‘βˆ—2]) is split into two instruction sequences instrβˆ—1 and instrβˆ—2 at any point (i.e., instrβˆ— =instrβˆ—1 instrβˆ—2), then these are separately valid with some types [π‘‘βˆ—1] β†’π‘₯βˆ—1[π‘‘βˆ—] and [π‘‘βˆ—] β†’π‘₯βˆ—2[π‘‘βˆ—2], respectively (i.e., 𝐢 ⊒instrβˆ—1 :[π‘‘βˆ—1] β†’π‘₯βˆ—1[π‘‘βˆ—] and 𝐢 ⊒instrβˆ—1 :[π‘‘βˆ—] β†’π‘₯βˆ—2[π‘‘βˆ—2]), where π‘₯βˆ— =π‘₯βˆ—1 π‘₯βˆ—2.

Note

This property holds because validation is required even for unreachable code. Without that, instrβˆ—2 might not be valid in isolation.