Numerics

Numeric primitives are defined in a generic manner, by operators indexed over a bit width \(N\).

Some operators are non-deterministic, because they can return one of several possible results (such as different NaN values). Technically, each operator thus returns a set of allowed values. For convenience, deterministic results are expressed as plain values, which are assumed to be identified with a respective singleton set.

Some operators are partial, because they are not defined on certain inputs. Technically, an empty set of results is returned for these inputs.

In formal notation, each operator is defined by equational clauses that apply in decreasing order of precedence. That is, the first clause that is applicable to the given arguments defines the result. In some cases, similar clauses are combined into one by using the notation \(\pm\) or \(\mp\). When several of these placeholders occur in a single clause, then they must be resolved consistently: either the upper sign is chosen for all of them or the lower sign.

Note

For example, the \(\href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}\) operator is defined as follows:

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(\pm p_1, \pm p_2) &=& \pm p_1 \\ \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(\pm p_1, \mp p_2) &=& \mp p_1 \\ \end{array}\end{split}\]

This definition is to be read as a shorthand for the following expansion of each clause into two separate ones:

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(+ p_1, + p_2) &=& + p_1 \\ \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(- p_1, - p_2) &=& - p_1 \\ \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(+ p_1, - p_2) &=& - p_1 \\ \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(- p_1, + p_2) &=& + p_1 \\ \end{array}\end{split}\]

Numeric operators are lifted to input sequences by applying the operator element-wise, returning a sequence of results. When there are multiple inputs, they must be of equal length.

\[\begin{array}{lll@{\qquad}l} op(c_1^n, \dots, c_k^n) &=& op(c_1^n[0], \dots, c_k^n[0])~\dots~op(c_1^n[n-1], \dots, c_k^n[n-1]) \end{array}\]

Note

For example, the unary operator \(\href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}\), when given a sequence of floating-point values, return a sequence of floating-point results:

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(z^n) &=& \href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(z[0])~\dots~\href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(z[n]) \end{array}\]

The binary operator \(\href{../exec/numerics.html#op-iadd}{\mathrm{iadd}}\), when given two sequences of integers of the same length, \(n\), return a sequence of integer results:

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-iadd}{\mathrm{iadd}}_N(i_1^n, i_2^n) &=& \href{../exec/numerics.html#op-iadd}{\mathrm{iadd}}_N(i_1[0], i_2[0])~\dots~\href{../exec/numerics.html#op-iadd}{\mathrm{iadd}}_N(i_1[n], i_2[n]) \end{array}\]

Conventions:

  • The meta variable \(d\) is used to range over single bits.

  • The meta variable \(p\) is used to range over (signless) magnitudes of floating-point values, including \(\href{../syntax/values.html#syntax-float}{\mathsf{nan}}\) and \(\infty\).

  • The meta variable \(q\) is used to range over (signless) rational magnitudes, excluding \(\href{../syntax/values.html#syntax-float}{\mathsf{nan}}\) or \(\infty\).

  • The notation \(f^{-1}\) denotes the inverse of a bijective function \(f\).

  • Truncation of rational values is written \(\href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(\pm q)\), with the usual mathematical definition:

    \[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(\pm q) &=& \pm i & (\mathrel{\mbox{if}} i \in \mathbb{N} \wedge +q - 1 < i \leq +q) \\ \end{array}\end{split}\]
  • Saturation of integers is written \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i)\) and \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(i)\). The arguments to these two functions range over arbitrary signed integers.

    • Unsigned saturation, \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i)\) clamps \(i\) to between \(0\) and \(2^N-1\):

      \[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i) &=& 0 & (\mathrel{\mbox{if}} i < 0) \\ \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i) &=& 2^N-1 & (\mathrel{\mbox{if}} i > 2^N-1)\\ \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i) &=& i & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]
    • Signed saturation, \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(i)\) clamps \(i\) to between \(-2^{N-1}\) and \(2^{N-1}-1\):

    \[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(i) &=& -2^{N-1} & (\mathrel{\mbox{if}} i < -2^{N-1})\\ \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(i) &=& 2^{N-1}-1 & (\mathrel{\mbox{if}} i > 2^{N-1}-1)\\ \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(i) &=& i & (\mathrel{\mbox{otherwise}}) \end{array}\end{split}\]

Representations

Numbers and numeric vectors have an underlying binary representation as a sequence of bits:

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_{\href{../syntax/types.html#syntax-numtype}{\mathsf{i}\scriptstyle\kern-0.1emN}}(i) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i) \\ \href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_{\href{../syntax/types.html#syntax-numtype}{\mathsf{f}\scriptstyle\kern-0.1emN}}(z) &=& \href{../exec/numerics.html#aux-fbits}{\mathrm{fbits}}_N(z) \\ \href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_{\href{../syntax/types.html#syntax-vectype}{\mathsf{v}\scriptstyle\kern-0.1emN}}(i) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i) \\ \end{array}\end{split}\]

The first case of these applies to representations of both integer value types and packed types.

Each of these functions is a bijection, hence they are invertible.

Integers

Integers are represented as base two unsigned numbers:

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i) &=& d_{N-1}~\dots~d_0 & (i = 2^{N-1}\cdot d_{N-1} + \dots + 2^0\cdot d_0) \\ \end{array}\end{split}\]

Boolean operators like \(\wedge\), \(\vee\), or \(\veebar\) are lifted to bit sequences of equal length by applying them pointwise.

Floating-Point

Floating-point values are represented in the respective binary format defined by IEEE 754 (Section 3.4):

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-fbits}{\mathrm{fbits}}_N(\pm (1+m\cdot 2^{-M})\cdot 2^e) &=& \href{../exec/numerics.html#aux-fsign}{\mathrm{fsign}}({\pm})~\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_E(e+\href{../exec/numerics.html#aux-fbias}{\mathrm{fbias}}_N)~\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_M(m) \\ \href{../exec/numerics.html#aux-fbits}{\mathrm{fbits}}_N(\pm (0+m\cdot 2^{-M})\cdot 2^e) &=& \href{../exec/numerics.html#aux-fsign}{\mathrm{fsign}}({\pm})~(0)^E~\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_M(m) \\ \href{../exec/numerics.html#aux-fbits}{\mathrm{fbits}}_N(\pm \infty) &=& \href{../exec/numerics.html#aux-fsign}{\mathrm{fsign}}({\pm})~(1)^E~(0)^M \\ \href{../exec/numerics.html#aux-fbits}{\mathrm{fbits}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-fsign}{\mathrm{fsign}}({\pm})~(1)^E~\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_M(n) \\[1ex] \href{../exec/numerics.html#aux-fbias}{\mathrm{fbias}}_N &=& 2^{E-1}-1 \\ \href{../exec/numerics.html#aux-fsign}{\mathrm{fsign}}({+}) &=& 0 \\ \href{../exec/numerics.html#aux-fsign}{\mathrm{fsign}}({-}) &=& 1 \\ \end{array}\end{split}\]

where \(M = \href{../syntax/values.html#aux-signif}{\mathrm{signif}}(N)\) and \(E = \href{../syntax/values.html#aux-expon}{\mathrm{expon}}(N)\).

Vectors

Numeric vectors of type \(\href{../syntax/types.html#syntax-vectype}{\mathsf{v}\scriptstyle\kern-0.1emN}\) have the same underlying representation as an \(\href{../syntax/types.html#syntax-numtype}{\mathsf{i}\scriptstyle\kern-0.1emN}\). They can also be interpreted as a sequence of numeric values packed into a \(\href{../syntax/types.html#syntax-vectype}{\mathsf{v}\scriptstyle\kern-0.1emN}\) with a particular \(\href{../syntax/instructions.html#syntax-shape}{\mathit{shape}}\) \(t\mathsf{x}M\), provided that \(N = |t|\cdot M\).

\[\begin{split}\begin{array}{l} \begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-lanes}{\mathrm{lanes}}_{t\mathsf{x}M}(c) &=& c_0~\dots~c_{M-1} \\ \end{array} \\ \qquad \begin{array}[t]{@{}r@{~}l@{}l@{~}l@{~}l} (\mathrel{\mbox{where}} & w &=& |t| / 8 \\ \wedge & b^\ast &=& \href{../exec/numerics.html#aux-bytes}{\mathrm{bytes}}_{\href{../syntax/types.html#syntax-numtype}{\mathsf{i}\scriptstyle\kern-0.1emN}}(c) \\ \wedge & c_i &=& \href{../exec/numerics.html#aux-bytes}{\mathrm{bytes}}_{t}^{-1}(b^\ast[i \cdot w \href{../syntax/conventions.html#notation-slice}{\mathrel{\mathbf{:}}} w])) \end{array} \end{array}\end{split}\]

This function is a bijection on \(\href{../syntax/types.html#syntax-numtype}{\mathsf{i}\scriptstyle\kern-0.1emN}\), hence it is invertible.

Todo

pack/unpacknum

Storage

When a number is stored into memory, it is converted into a sequence of bytes in little endian byte order:

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-bytes}{\mathrm{bytes}}_t(i) &=& \href{../exec/numerics.html#aux-littleendian}{\mathrm{littleendian}}(\href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_t(i)) \\[1ex] \href{../exec/numerics.html#aux-littleendian}{\mathrm{littleendian}}(\epsilon) &=& \epsilon \\ \href{../exec/numerics.html#aux-littleendian}{\mathrm{littleendian}}(d^8~{d'}^\ast~) &=& \href{../exec/numerics.html#aux-littleendian}{\mathrm{littleendian}}({d'}^\ast)~\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_8^{-1}(d^8) \\ \end{array}\end{split}\]

Again these functions are invertible bijections.

Integer Operations

Sign Interpretation

Integer operators are defined on \(\href{../syntax/values.html#syntax-int}{\mathit{i}\scriptstyle\kern-0.1emN}\) values. Operators that use a signed interpretation convert the value using the following definition, which takes the two’s complement when the value lies in the upper half of the value range (i.e., its most significant bit is \(1\)):

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i) &=& i & (0 \leq i < 2^{N-1}) \\ \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i) &=& i - 2^N & (2^{N-1} \leq i < 2^N) \\ \end{array}\end{split}\]

This function is bijective, and hence invertible.

Boolean Interpretation

The integer result of predicates – i.e., tests and relational operators – is defined with the help of the following auxiliary function producing the value \(1\) or \(0\) depending on a condition.

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(C) &=& 1 & (\mathrel{\mbox{if}} C) \\ \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(C) &=& 0 & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-iadd}{\mathrm{iadd}}_N(i_1, i_2)\)

  • Return the result of adding \(i_1\) and \(i_2\) modulo \(2^N\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-iadd}{\mathrm{iadd}}_N(i_1, i_2) &=& (i_1 + i_2) \mathbin{\mathrm{mod}} 2^N \end{array}\]

\(\href{../exec/numerics.html#op-isub}{\mathrm{isub}}_N(i_1, i_2)\)

  • Return the result of subtracting \(i_2\) from \(i_1\) modulo \(2^N\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-isub}{\mathrm{isub}}_N(i_1, i_2) &=& (i_1 - i_2 + 2^N) \mathbin{\mathrm{mod}} 2^N \end{array}\]

\(\href{../exec/numerics.html#op-imul}{\mathrm{imul}}_N(i_1, i_2)\)

  • Return the result of multiplying \(i_1\) and \(i_2\) modulo \(2^N\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-imul}{\mathrm{imul}}_N(i_1, i_2) &=& (i_1 \cdot i_2) \mathbin{\mathrm{mod}} 2^N \end{array}\]

\(\href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_u}}_N(i_1, i_2)\)

  • If \(i_2\) is \(0\), then the result is undefined.

  • Else, return the result of dividing \(i_1\) by \(i_2\), truncated toward zero.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_u}}_N(i_1, 0) &=& \{\} \\ \href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(i_1 / i_2) \\ \end{array}\end{split}\]

Note

This operator is partial.

\(\href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_s}}_N(i_1, i_2)\)

  • Let \(j_1\) be the signed interpretation of \(i_1\).

  • Let \(j_2\) be the signed interpretation of \(i_2\).

  • If \(j_2\) is \(0\), then the result is undefined.

  • Else if \(j_1\) divided by \(j_2\) is \(2^{N-1}\), then the result is undefined.

  • Else, return the result of dividing \(j_1\) by \(j_2\), truncated toward zero.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_s}}_N(i_1, 0) &=& \{\} \\ \href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_s}}_N(i_1, i_2) &=& \{\} \qquad\qquad (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) / \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2) = 2^{N-1}) \\ \href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(\href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) / \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2))) \\ \end{array}\end{split}\]

Note

This operator is partial. Besides division by \(0\), the result of \((-2^{N-1})/(-1) = +2^{N-1}\) is not representable as an \(N\)-bit signed integer.

\(\href{../exec/numerics.html#op-irem}{\mathrm{irem\_u}}_N(i_1, i_2)\)

  • If \(i_2\) is \(0\), then the result is undefined.

  • Else, return the remainder of dividing \(i_1\) by \(i_2\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-irem}{\mathrm{irem\_u}}_N(i_1, 0) &=& \{\} \\ \href{../exec/numerics.html#op-irem}{\mathrm{irem\_u}}_N(i_1, i_2) &=& i_1 - i_2 \cdot \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(i_1 / i_2) \\ \end{array}\end{split}\]

Note

This operator is partial.

As long as both operators are defined, it holds that \(i_1 = i_2\cdot\href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_u}}(i_1, i_2) + \href{../exec/numerics.html#op-irem}{\mathrm{irem\_u}}(i_1, i_2)\).

\(\href{../exec/numerics.html#op-irem}{\mathrm{irem\_s}}_N(i_1, i_2)\)

  • Let \(j_1\) be the signed interpretation of \(i_1\).

  • Let \(j_2\) be the signed interpretation of \(i_2\).

  • If \(i_2\) is \(0\), then the result is undefined.

  • Else, return the remainder of dividing \(j_1\) by \(j_2\), with the sign of the dividend \(j_1\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-irem}{\mathrm{irem\_s}}_N(i_1, 0) &=& \{\} \\ \href{../exec/numerics.html#op-irem}{\mathrm{irem\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(j_1 - j_2 \cdot \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(j_1 / j_2)) \\ && (\mathrel{\mbox{where}} j_1 = \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) \wedge j_2 = \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2)) \\ \end{array}\end{split}\]

Note

This operator is partial.

As long as both operators are defined, it holds that \(i_1 = i_2\cdot\href{../exec/numerics.html#op-idiv}{\mathrm{idiv\_s}}(i_1, i_2) + \href{../exec/numerics.html#op-irem}{\mathrm{irem\_s}}(i_1, i_2)\).

\(\href{../exec/numerics.html#op-inot}{\mathrm{inot}}_N(i)\)

  • Return the bitwise negation of \(i\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-inot}{\mathrm{inot}}_N(i) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i) \veebar \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(2^N-1)) \end{array}\]

\(\href{../exec/numerics.html#op-iand}{\mathrm{iand}}_N(i_1, i_2)\)

  • Return the bitwise conjunction of \(i_1\) and \(i_2\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-iand}{\mathrm{iand}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) \wedge \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-iandnot}{\mathrm{iandnot}}_N(i_1, i_2)\)

  • Return the bitwise conjunction of \(i_1\) and the bitwise negation of \(i_2\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-iandnot}{\mathrm{iandnot}}_N(i_1, i_2) &=& \href{../exec/numerics.html#op-iand}{\mathrm{iand}}_N(i_1, \href{../exec/numerics.html#op-inot}{\mathrm{inot}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-ior}{\mathrm{ior}}_N(i_1, i_2)\)

  • Return the bitwise disjunction of \(i_1\) and \(i_2\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ior}{\mathrm{ior}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) \vee \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-ixor}{\mathrm{ixor}}_N(i_1, i_2)\)

  • Return the bitwise exclusive disjunction of \(i_1\) and \(i_2\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ixor}{\mathrm{ixor}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(\href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) \veebar \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-ishl}{\mathrm{ishl}}_N(i_1, i_2)\)

  • Let \(k\) be \(i_2\) modulo \(N\).

  • Return the result of shifting \(i_1\) left by \(k\) bits, modulo \(2^N\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ishl}{\mathrm{ishl}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(d_2^{N-k}~0^k) & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) = d_1^k~d_2^{N-k} \wedge k = i_2 \mathbin{\mathrm{mod}} N) \end{array}\]

\(\href{../exec/numerics.html#op-ishr}{\mathrm{ishr\_u}}_N(i_1, i_2)\)

  • Let \(k\) be \(i_2\) modulo \(N\).

  • Return the result of shifting \(i_1\) right by \(k\) bits, extended with \(0\) bits.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ishr}{\mathrm{ishr\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(0^k~d_1^{N-k}) & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) = d_1^{N-k}~d_2^k \wedge k = i_2 \mathbin{\mathrm{mod}} N) \end{array}\]

\(\href{../exec/numerics.html#op-ishr}{\mathrm{ishr\_s}}_N(i_1, i_2)\)

  • Let \(k\) be \(i_2\) modulo \(N\).

  • Return the result of shifting \(i_1\) right by \(k\) bits, extended with the most significant bit of the original value.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ishr}{\mathrm{ishr\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(d_0^{k+1}~d_1^{N-k-1}) & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) = d_0~d_1^{N-k-1}~d_2^k \wedge k = i_2 \mathbin{\mathrm{mod}} N) \end{array}\]

\(\href{../exec/numerics.html#op-irotl}{\mathrm{irotl}}_N(i_1, i_2)\)

  • Let \(k\) be \(i_2\) modulo \(N\).

  • Return the result of rotating \(i_1\) left by \(k\) bits.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-irotl}{\mathrm{irotl}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(d_2^{N-k}~d_1^k) & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) = d_1^k~d_2^{N-k} \wedge k = i_2 \mathbin{\mathrm{mod}} N) \end{array}\]

\(\href{../exec/numerics.html#op-irotr}{\mathrm{irotr}}_N(i_1, i_2)\)

  • Let \(k\) be \(i_2\) modulo \(N\).

  • Return the result of rotating \(i_1\) right by \(k\) bits.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-irotr}{\mathrm{irotr}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N^{-1}(d_2^k~d_1^{N-k}) & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i_1) = d_1^{N-k}~d_2^k \wedge k = i_2 \mathbin{\mathrm{mod}} N) \end{array}\]

\(\href{../exec/numerics.html#op-iclz}{\mathrm{iclz}}_N(i)\)

  • Return the count of leading zero bits in \(i\); all bits are considered leading zeros if \(i\) is \(0\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-iclz}{\mathrm{iclz}}_N(i) &=& k & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i) = 0^k~(1~d^\ast)^?) \end{array}\]

\(\href{../exec/numerics.html#op-ictz}{\mathrm{ictz}}_N(i)\)

  • Return the count of trailing zero bits in \(i\); all bits are considered trailing zeros if \(i\) is \(0\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ictz}{\mathrm{ictz}}_N(i) &=& k & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i) = (d^\ast~1)^?~0^k) \end{array}\]

\(\href{../exec/numerics.html#op-ipopcnt}{\mathrm{ipopcnt}}_N(i)\)

  • Return the count of non-zero bits in \(i\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ipopcnt}{\mathrm{ipopcnt}}_N(i) &=& k & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-ibits}{\mathrm{ibits}}_N(i) = (0^\ast~1)^k~0^\ast) \end{array}\]

\(\href{../exec/numerics.html#op-ieqz}{\mathrm{ieqz}}_N(i)\)

  • Return \(1\) if \(i\) is zero, \(0\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ieqz}{\mathrm{ieqz}}_N(i) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i = 0) \end{array}\]

\(\href{../exec/numerics.html#op-inez}{\mathrm{inez}}_N(i)\)

  • Return \(0\) if \(i\) is zero, \(1\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-inez}{\mathrm{inez}}_N(i) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i =/= 0) \end{array}\]

\(\href{../exec/numerics.html#op-ieq}{\mathrm{ieq}}_N(i_1, i_2)\)

  • Return \(1\) if \(i_1\) equals \(i_2\), \(0\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ieq}{\mathrm{ieq}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i_1 = i_2) \end{array}\]

\(\href{../exec/numerics.html#op-ine}{\mathrm{ine}}_N(i_1, i_2)\)

  • Return \(1\) if \(i_1\) does not equal \(i_2\), \(0\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ine}{\mathrm{ine}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i_1 \neq i_2) \end{array}\]

\(\href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_u}}_N(i_1, i_2)\)

  • Return \(1\) if \(i_1\) is less than \(i_2\), \(0\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i_1 < i_2) \end{array}\]

\(\href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_s}}_N(i_1, i_2)\)

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) < \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-igt}{\mathrm{igt\_u}}_N(i_1, i_2)\)

  • Return \(1\) if \(i_1\) is greater than \(i_2\), \(0\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-igt}{\mathrm{igt\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i_1 > i_2) \end{array}\]

\(\href{../exec/numerics.html#op-igt}{\mathrm{igt\_s}}_N(i_1, i_2)\)

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-igt}{\mathrm{igt\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) > \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-ile}{\mathrm{ile\_u}}_N(i_1, i_2)\)

  • Return \(1\) if \(i_1\) is less than or equal to \(i_2\), \(0\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ile}{\mathrm{ile\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i_1 \leq i_2) \end{array}\]

\(\href{../exec/numerics.html#op-ile}{\mathrm{ile\_s}}_N(i_1, i_2)\)

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ile}{\mathrm{ile\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) \leq \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-ige}{\mathrm{ige\_u}}_N(i_1, i_2)\)

  • Return \(1\) if \(i_1\) is greater than or equal to \(i_2\), \(0\) otherwise.

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ige}{\mathrm{ige\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(i_1 \geq i_2) \end{array}\]

\(\href{../exec/numerics.html#op-ige}{\mathrm{ige\_s}}_N(i_1, i_2)\)

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ige}{\mathrm{ige\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) \geq \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2)) \end{array}\]

\(\href{../exec/numerics.html#op-iextendn}{\mathrm{iextend}M\mathrm{\_s}}_N(i)\)

  • Let \(j\) be the result of computing \(\href{../exec/numerics.html#op-wrap}{\mathrm{wrap}}_{N,M}(i)\).

  • Return \(\href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(j)\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-iextendn}{\mathrm{iextend}M\mathrm{\_s}}_{N}(i) &=& \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(\href{../exec/numerics.html#op-wrap}{\mathrm{wrap}}_{N,M}(i)) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-ibitselect}{\mathrm{ibitselect}}_N(i_1, i_2, i_3)\)

  • Let \(j_1\) be the bitwise conjunction of \(i_1\) and \(i_3\).

  • Let \(j_3'\) be the bitwise negation of \(i_3\).

  • Let \(j_2\) be the bitwise conjunction of \(i_2\) and \(j_3'\).

  • Return the bitwise disjunction of \(j_1\) and \(j_2\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ibitselect}{\mathrm{ibitselect}}_N(i_1, i_2, i_3) &=& \href{../exec/numerics.html#op-ior}{\mathrm{ior}}_N(\href{../exec/numerics.html#op-iand}{\mathrm{iand}}_N(i_1, i_3), \href{../exec/numerics.html#op-iand}{\mathrm{iand}}_N(i_2, \href{../exec/numerics.html#op-inot}{\mathrm{inot}}_N(i_3))) \end{array}\]

\(\href{../exec/numerics.html#op-iabs}{\mathrm{iabs}}_N(i)\)

  • Let \(j\) be the signed interpretation of \(i\).

  • If \(j\) is greater than or equal to \(0\), then return \(i\).

  • Else return the negation of j, modulo \(2^N\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-iabs}{\mathrm{iabs}}_N(i) &=& i & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i) \ge 0) \\ \href{../exec/numerics.html#op-iabs}{\mathrm{iabs}}_N(i) &=& -\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i) \mathbin{\mathrm{mod}} 2^N & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-ineg}{\mathrm{ineg}}_N(i)\)

  • Return the result of negating \(i\), modulo \(2^N\).

\[\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ineg}{\mathrm{ineg}}_N(i) &=& (2^N - i) \mathbin{\mathrm{mod}} 2^N \end{array}\]

\(\href{../exec/numerics.html#op-imin}{\mathrm{imin\_u}}_N(i_1, i_2)\)

  • Return \(i_1\) if \(\href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_u}}_N(i_1, i_2)\) is \(1\), return \(i_2\) otherwise.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-imin}{\mathrm{imin\_u}}_N(i_1, i_2) &=& i_1 & (\mathrel{\mbox{if}} \href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_u}}_N(i_1, i_2) = 1)\\ \href{../exec/numerics.html#op-imin}{\mathrm{imin\_u}}_N(i_1, i_2) &=& i_2 & (\mathrel{\mbox{otherwise}}) \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-imin}{\mathrm{imin\_s}}_N(i_1, i_2)\)

  • Return \(i_1\) if \(\href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_s}}_N(i_1, i_2)\) is \(1\), return \(i_2\) otherwise.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-imin}{\mathrm{imin\_s}}_N(i_1, i_2) &=& i_1 & (\mathrel{\mbox{if}} \href{../exec/numerics.html#op-ilt}{\mathrm{ilt\_s}}_N(i_1, i_2) = 1)\\ \href{../exec/numerics.html#op-imin}{\mathrm{imin\_s}}_N(i_1, i_2) &=& i_2 & (\mathrel{\mbox{otherwise}}) \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-imax}{\mathrm{imax\_u}}_N(i_1, i_2)\)

  • Return \(i_1\) if \(\href{../exec/numerics.html#op-igt}{\mathrm{igt\_u}}_N(i_1, i_2)\) is \(1\), return \(i_2\) otherwise.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-imax}{\mathrm{imax\_u}}_N(i_1, i_2) &=& i_1 & (\mathrel{\mbox{if}} \href{../exec/numerics.html#op-igt}{\mathrm{igt\_u}}_N(i_1, i_2) = 1)\\ \href{../exec/numerics.html#op-imax}{\mathrm{imax\_u}}_N(i_1, i_2) &=& i_2 & (\mathrel{\mbox{otherwise}}) \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-imax}{\mathrm{imax\_s}}_N(i_1, i_2)\)

  • Return \(i_1\) if \(\href{../exec/numerics.html#op-igt}{\mathrm{igt\_s}}_N(i_1, i_2)\) is \(1\), return \(i_2\) otherwise.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-imax}{\mathrm{imax\_s}}_N(i_1, i_2) &=& i_1 & (\mathrel{\mbox{if}} \href{../exec/numerics.html#op-igt}{\mathrm{igt\_s}}_N(i_1, i_2) = 1)\\ \href{../exec/numerics.html#op-imax}{\mathrm{imax\_s}}_N(i_1, i_2) &=& i_2 & (\mathrel{\mbox{otherwise}}) \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-iadd-sat}{\mathrm{iadd\_sat\_u}}_N(i_1, i_2)\)

  • Let \(i\) be the result of adding \(i_1\) and \(i_2\).

  • Return \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i)\).

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-iadd-sat}{\mathrm{iadd\_sat\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i_1 + i_2) \end{array}\]

\(\href{../exec/numerics.html#op-iadd-sat}{\mathrm{iadd\_sat\_s}}_N(i_1, i_2)\)

  • Let \(j_1\) be the signed interpretation of \(i_1\)

  • Let \(j_2\) be the signed interpretation of \(i_2\)

  • Let \(j\) be the result of adding \(j_1\) and \(j_2\).

  • Return the value whose signed interpretation is \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(j)\).

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-iadd-sat}{\mathrm{iadd\_sat\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) + \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2))) \end{array}\]

\(\href{../exec/numerics.html#op-isub-sat}{\mathrm{isub\_sat\_u}}_N(i_1, i_2)\)

  • Let \(i\) be the result of subtracting \(i_2\) from \(i_1\).

  • Return \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i)\).

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-isub-sat}{\mathrm{isub\_sat\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(i_1 - i_2) \end{array}\]

\(\href{../exec/numerics.html#op-isub-sat}{\mathrm{isub\_sat\_s}}_N(i_1, i_2)\)

  • Let \(j_1\) be the signed interpretation of \(i_1\)

  • Let \(j_2\) be the signed interpretation of \(i_2\)

  • Let \(j\) be the result of subtracting \(j_2\) from \(j_1\).

  • Return the value whose signed interpretation is \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(j)\).

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-isub-sat}{\mathrm{isub\_sat\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_1) - \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N(i_2))) \end{array}\]

\(\href{../exec/numerics.html#op-iavgr}{\mathrm{iavgr\_u}}_N(i_1, i_2)\)

  • Let \(j\) be the result of adding \(i_1\), \(i_2\), and \(1\).

  • Return the result of dividing \(j\) by \(2\), truncated toward zero.

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-iavgr}{\mathrm{iavgr\_u}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}((i_1 + i_2 + 1) / 2) \end{array}\]

\(\href{../exec/numerics.html#op-iq15mulrsat}{\mathrm{iq15mulrsat\_s}}_N(i_1, i_2)\)

  • Return the whose signed interpretation is the result of \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(\href{../exec/numerics.html#op-ishr}{\mathrm{ishr\_s}}_N(i_1 \cdot i_2 + 2^{14}, 15))\).

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-iq15mulrsat}{\mathrm{iq15mulrsat\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(\href{../exec/numerics.html#op-ishr}{\mathrm{ishr\_s}}_N(i_1 \cdot i_2 + 2^{14}, 15))) \end{array}\]

Floating-Point Operations

Floating-point arithmetic follows the IEEE 754 standard, with the following qualifications:

  • All operators use round-to-nearest ties-to-even, except where otherwise specified. Non-default directed rounding attributes are not supported.

  • Following the recommendation that operators propagate NaN payloads from their operands is permitted but not required.

  • All operators use “non-stop” mode, and floating-point exceptions are not otherwise observable. In particular, neither alternate floating-point exception handling attributes nor operators on status flags are supported. There is no observable difference between quiet and signalling NaNs.

Note

Some of these limitations may be lifted in future versions of WebAssembly.

Rounding

Rounding always is round-to-nearest ties-to-even, in correspondence with IEEE 754 (Section 4.3.1).

An exact floating-point number is a rational number that is exactly representable as a floating-point number of given bit width \(N\).

A limit number for a given floating-point bit width \(N\) is a positive or negative number whose magnitude is the smallest power of \(2\) that is not exactly representable as a floating-point number of width \(N\) (that magnitude is \(2^{128}\) for \(N = 32\) and \(2^{1024}\) for \(N = 64\)).

A candidate number is either an exact floating-point number or a positive or negative limit number for the given bit width \(N\).

A candidate pair is a pair \(z_1,z_2\) of candidate numbers, such that no candidate number exists that lies between the two.

A real number \(r\) is converted to a floating-point value of bit width \(N\) as follows:

  • If \(r\) is \(0\), then return \(+0\).

  • Else if \(r\) is an exact floating-point number, then return \(r\).

  • Else if \(r\) greater than or equal to the positive limit, then return \(+\infty\).

  • Else if \(r\) is less than or equal to the negative limit, then return \(-\infty\).

  • Else if \(z_1\) and \(z_2\) are a candidate pair such that \(z_1 < r < z_2\), then:

    • If \(|r - z_1| < |r - z_2|\), then let \(z\) be \(z_1\).

    • Else if \(|r - z_1| > |r - z_2|\), then let \(z\) be \(z_2\).

    • Else if \(|r - z_1| = |r - z_2|\) and the significand of \(z_1\) is even, then let \(z\) be \(z_1\).

    • Else, let \(z\) be \(z_2\).

  • If \(z\) is \(0\), then:

    • If \(r < 0\), then return \(-0\).

    • Else, return \(+0\).

  • Else if \(z\) is a limit number, then:

    • If \(r < 0\), then return \(-\infty\).

    • Else, return \(+\infty\).

  • Else, return \(z\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(0) &=& +0 \\ \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(r) &=& r & (\mathrel{\mbox{if}} r \in \mathrm{exact}_N) \\ \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(r) &=& +\infty & (\mathrel{\mbox{if}} r \geq +\mathrm{limit}_N) \\ \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(r) &=& -\infty & (\mathrel{\mbox{if}} r \leq -\mathrm{limit}_N) \\ \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(r) &=& \mathrm{closest}_N(r, z_1, z_2) & (\mathrel{\mbox{if}} z_1 < r < z_2 \wedge (z_1,z_2) \in \mathrm{candidatepair}_N) \\[1ex] \mathrm{closest}_N(r, z_1, z_2) &=& \mathrm{rectify}_N(r, z_1) & (\mathrel{\mbox{if}} |r-z_1|<|r-z_2|) \\ \mathrm{closest}_N(r, z_1, z_2) &=& \mathrm{rectify}_N(r, z_2) & (\mathrel{\mbox{if}} |r-z_1|>|r-z_2|) \\ \mathrm{closest}_N(r, z_1, z_2) &=& \mathrm{rectify}_N(r, z_1) & (\mathrel{\mbox{if}} |r-z_1|=|r-z_2| \wedge \mathrm{even}_N(z_1)) \\ \mathrm{closest}_N(r, z_1, z_2) &=& \mathrm{rectify}_N(r, z_2) & (\mathrel{\mbox{if}} |r-z_1|=|r-z_2| \wedge \mathrm{even}_N(z_2)) \\[1ex] \mathrm{rectify}_N(r, \pm \mathrm{limit}_N) &=& \pm \infty \\ \mathrm{rectify}_N(r, 0) &=& +0 \qquad (r \geq 0) \\ \mathrm{rectify}_N(r, 0) &=& -0 \qquad (r < 0) \\ \mathrm{rectify}_N(r, z) &=& z \\ \end{array}\end{split}\]

where:

\[\begin{split}\begin{array}{lll@{\qquad}l} \mathrm{exact}_N &=& \href{../syntax/values.html#syntax-float}{\mathit{f}\scriptstyle\kern-0.15emN} \cap \mathbb{Q} \\ \mathrm{limit}_N &=& 2^{2^{\href{../syntax/values.html#aux-expon}{\mathrm{expon}}(N)-1}} \\ \mathrm{candidate}_N &=& \mathrm{exact}_N \cup \{+\mathrm{limit}_N, -\mathrm{limit}_N\} \\ \mathrm{candidatepair}_N &=& \{ (z_1, z_2) \in \mathrm{candidate}_N^2 ~|~ z_1 < z_2 \wedge \forall z \in \mathrm{candidate}_N, z \leq z_1 \vee z \geq z_2\} \\[1ex] \mathrm{even}_N((d + m\cdot 2^{-M}) \cdot 2^e) &\Leftrightarrow& m \mathbin{\mathrm{mod}} 2 = 0 \\ \mathrm{even}_N(\pm \mathrm{limit}_N) &\Leftrightarrow& \mathrm{true} \\ \end{array}\end{split}\]

NaN Propagation

When the result of a floating-point operator other than \(\href{../exec/numerics.html#op-fneg}{\mathrm{fneg}}\), \(\href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}\), or \(\href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}\) is a NaN, then its sign is non-deterministic and the payload is computed as follows:

  • If the payload of all NaN inputs to the operator is canonical (including the case that there are no NaN inputs), then the payload of the output is canonical as well.

  • Otherwise the payload is picked non-deterministically among all arithmetic NaNs; that is, its most significant bit is \(1\) and all others are unspecified.

  • In the deterministic profile, however, a positive canonical NaNs is reliably produced in the latter case.

The non-deterministic result is expressed by the following auxiliary function producing a set of allowed outputs from a set of inputs:

\[\begin{split}\begin{array}{llcl@{\qquad}l} & \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z^\ast\} &=& \{ + \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(\href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N) \} \\ \def\mathdef2137#1{{^{\small{[!#1]}}}}\mathdef2137{\href{../appendix/profiles.html#profile-deterministic}{\mathrm{DET}}} & \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z^\ast\} &=& \{ + \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), - \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n) ~|~ n = \href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N \} & (\mathrel{\mbox{if}} \{z^\ast\} \subseteq \{ + \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(\href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N), - \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(\href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N) \} \\ \def\mathdef2138#1{{^{\small{[!#1]}}}}\mathdef2138{\href{../appendix/profiles.html#profile-deterministic}{\mathrm{DET}}} & \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z^\ast\} &=& \{ + \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), - \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n) ~|~ n \geq \href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N \} & (\mathrel{\mbox{if}} \{z^\ast\} \not\subseteq \{ + \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(\href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N), - \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(\href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N) \} \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z_1, z_2\}\).

  • Else if both \(z_1\) and \(z_2\) are infinities of opposite signs, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if both \(z_1\) and \(z_2\) are infinities of equal sign, then return that infinity.

  • Else if either \(z_1\) or \(z_2\) is an infinity, then return that infinity.

  • Else if both \(z_1\) and \(z_2\) are zeroes of opposite sign, then return positive zero.

  • Else if both \(z_1\) and \(z_2\) are zeroes of equal sign, then return that zero.

  • Else if either \(z_1\) or \(z_2\) is a zero, then return the other operand.

  • Else if both \(z_1\) and \(z_2\) are values with the same magnitude but opposite signs, then return positive zero.

  • Else return the result of adding \(z_1\) and \(z_2\), rounded to the nearest representable value.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2\} \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1\} \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm \infty, \mp \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm \infty, \pm \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(z_1, \pm \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm \infty, z_2) &=& \pm \infty \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm 0, \mp 0) &=& +0 \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm 0, \pm 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(z_1, \pm 0) &=& z_1 \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm 0, z_2) &=& z_2 \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\pm q, \mp q) &=& +0 \\ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(z_1 + z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z_1, z_2\}\).

  • Else if both \(z_1\) and \(z_2\) are infinities of equal signs, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if both \(z_1\) and \(z_2\) are infinities of opposite sign, then return \(z_1\).

  • Else if \(z_1\) is an infinity, then return that infinity.

  • Else if \(z_2\) is an infinity, then return that infinity negated.

  • Else if both \(z_1\) and \(z_2\) are zeroes of equal sign, then return positive zero.

  • Else if both \(z_1\) and \(z_2\) are zeroes of opposite sign, then return \(z_1\).

  • Else if \(z_2\) is a zero, then return \(z_1\).

  • Else if \(z_1\) is a zero, then return \(z_2\) negated.

  • Else if both \(z_1\) and \(z_2\) are the same value, then return positive zero.

  • Else return the result of subtracting \(z_2\) from \(z_1\), rounded to the nearest representable value.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2\} \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1\} \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm \infty, \pm \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm \infty, \mp \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(z_1, \pm \infty) &=& \mp \infty \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm \infty, z_2) &=& \pm \infty \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm 0, \pm 0) &=& +0 \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm 0, \mp 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(z_1, \pm 0) &=& z_1 \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm 0, \pm q_2) &=& \mp q_2 \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(\pm q, \pm q) &=& +0 \\ \href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(z_1 - z_2) \\ \end{array}\end{split}\]

Note

Up to the non-determinism regarding NaNs, it always holds that \(\href{../exec/numerics.html#op-fsub}{\mathrm{fsub}}_N(z_1, z_2) = \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(z_1, \href{../exec/numerics.html#op-fneg}{\mathrm{fneg}}_N(z_2))\).

\(\href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z_1, z_2\}\).

  • Else if one of \(z_1\) and \(z_2\) is a zero and the other an infinity, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if both \(z_1\) and \(z_2\) are infinities of equal sign, then return positive infinity.

  • Else if both \(z_1\) and \(z_2\) are infinities of opposite sign, then return negative infinity.

  • Else if either \(z_1\) or \(z_2\) is an infinity and the other a value with equal sign, then return positive infinity.

  • Else if either \(z_1\) or \(z_2\) is an infinity and the other a value with opposite sign, then return negative infinity.

  • Else if both \(z_1\) and \(z_2\) are zeroes of equal sign, then return positive zero.

  • Else if both \(z_1\) and \(z_2\) are zeroes of opposite sign, then return negative zero.

  • Else return the result of multiplying \(z_1\) and \(z_2\), rounded to the nearest representable value.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2\} \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1\} \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm \infty, \pm 0) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm \infty, \mp 0) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm 0, \pm \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm 0, \mp \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm \infty, \pm \infty) &=& +\infty \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm \infty, \mp \infty) &=& -\infty \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm q_1, \pm \infty) &=& +\infty \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm q_1, \mp \infty) &=& -\infty \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm \infty, \pm q_2) &=& +\infty \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm \infty, \mp q_2) &=& -\infty \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm 0, \pm 0) &=& + 0 \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(\pm 0, \mp 0) &=& - 0 \\ \href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(z_1 \cdot z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z_1, z_2\}\).

  • Else if both \(z_1\) and \(z_2\) are infinities, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if both \(z_1\) and \(z_2\) are zeroes, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z_1, z_2\}\).

  • Else if \(z_1\) is an infinity and \(z_2\) a value with equal sign, then return positive infinity.

  • Else if \(z_1\) is an infinity and \(z_2\) a value with opposite sign, then return negative infinity.

  • Else if \(z_2\) is an infinity and \(z_1\) a value with equal sign, then return positive zero.

  • Else if \(z_2\) is an infinity and \(z_1\) a value with opposite sign, then return negative zero.

  • Else if \(z_1\) is a zero and \(z_2\) a value with equal sign, then return positive zero.

  • Else if \(z_1\) is a zero and \(z_2\) a value with opposite sign, then return negative zero.

  • Else if \(z_2\) is a zero and \(z_1\) a value with equal sign, then return positive infinity.

  • Else if \(z_2\) is a zero and \(z_1\) a value with opposite sign, then return negative infinity.

  • Else return the result of dividing \(z_1\) by \(z_2\), rounded to the nearest representable value.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2\} \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1\} \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm \infty, \pm \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm \infty, \mp \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm 0, \pm 0) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm 0, \mp 0) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm \infty, \pm q_2) &=& +\infty \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm \infty, \mp q_2) &=& -\infty \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm q_1, \pm \infty) &=& +0 \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm q_1, \mp \infty) &=& -0 \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm 0, \pm q_2) &=& +0 \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm 0, \mp q_2) &=& -0 \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm q_1, \pm 0) &=& +\infty \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(\pm q_1, \mp 0) &=& -\infty \\ \href{../exec/numerics.html#op-fdiv}{\mathrm{fdiv}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(z_1 / z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(z_1, z_2, z_3)\)

The function \(\href{../exec/numerics.html#op-fma}{\mathrm{fma}}\) is the same as fusedMultiplyAdd defined by IEEE 754 (Section 5.4.1). It computes \((z_1 \cdot z_2) + z_3\) as if with unbounded range and precision, rounding only once for the final result.

  • If either \(z_1\) or \(z_2\) or \(z_3\) is a NaN, return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N{z_1, z_2, z_3}\).

  • Else if either \(z_1\) or \(z_2\) is a zero and the other is an infinity, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if both \(z_1\) or \(z_2\) are infinities of equal sign, and \(z_3\) is a negative infinity, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if both \(z_1\) or \(z_2\) are infinities of opposite sign, and \(z_3\) is a positive infinity, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if either \(z_1\) or \(z_2\) is an infinity and the other is a value of the same sign, and \(z_3\) is a negative infinity, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if either \(z_1\) or \(z_2\) is an infinity and the other is a value of the opposite sign, and \(z_3\) is a positive infinity, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if both \(z_1\) and \(z_2\) are zeroes of the same sign and \(z_3\) is a zero, then return positive zero.

  • Else if both \(z_1\) and \(z_2\) are zeroes of the opposite sign and \(z_3\) is a positive zero, then return positive zero.

  • Else if both \(z_1\) and \(z_2\) are zeroes of the opposite sign and \(z_3\) is a negative zero, then return negative zero.

  • Else return the result of multiplying \(z_1\) and \(z_2\), adding \(z_3\) to the intermediate, and the final result ref:rounded <aux-ieee> to the nearest representable value.

\[\begin{split}\begin{array}{@{}llcll} & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2, z_3) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2, z_3\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_3) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1, z_3\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(z_1, z_2, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1, z_2\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm \infty, \pm 0, z_3) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm \infty, \mp 0, z_3) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm \infty, \pm \infty, - \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm \infty, \mp \infty, + \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm q_1, \pm \infty, - \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm q_1, \mp \infty, + \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm \infty, \pm q_1, - \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\mp \infty, \pm q_1, + \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm 0, \pm 0, \mp 0) &=& + 0 \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm 0, \pm 0, \pm 0) &=& + 0 \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm 0, \mp 0, + 0) &=& + 0 \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(\pm 0, \mp 0, - 0) &=& - 0 \\ & \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(z_1, z_2, z_3) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(z_1 \cdot z_2 + z_3) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z_1, z_2\}\).

  • Else if either \(z_1\) or \(z_2\) is a negative infinity, then return negative infinity.

  • Else if either \(z_1\) or \(z_2\) is a positive infinity, then return the other value.

  • Else if both \(z_1\) and \(z_2\) are zeroes of opposite signs, then return negative zero.

  • Else return the smaller value of \(z_1\) and \(z_2\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2\} \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1\} \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(+ \infty, z_2) &=& z_2 \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(- \infty, z_2) &=& - \infty \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, + \infty) &=& z_1 \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, - \infty) &=& - \infty \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(\pm 0, \mp 0) &=& -0 \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2) &=& z_1 & (\mathrel{\mbox{if}} z_1 \leq z_2) \\ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2) &=& z_2 & (\mathrel{\mbox{if}} z_2 \leq z_1) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z_1, z_2\}\).

  • Else if either \(z_1\) or \(z_2\) is a positive infinity, then return positive infinity.

  • Else if either \(z_1\) or \(z_2\) is a negative infinity, then return the other value.

  • Else if both \(z_1\) and \(z_2\) are zeroes of opposite signs, then return positive zero.

  • Else return the larger value of \(z_1\) and \(z_2\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2\} \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1\} \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(+ \infty, z_2) &=& + \infty \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(- \infty, z_2) &=& z_2 \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, + \infty) &=& + \infty \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, - \infty) &=& z_1 \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(\pm 0, \mp 0) &=& +0 \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2) &=& z_1 & (\mathrel{\mbox{if}} z_1 \geq z_2) \\ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2) &=& z_2 & (\mathrel{\mbox{if}} z_2 \geq z_1) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(z_1, z_2)\)

  • If \(z_1\) and \(z_2\) have the same sign, then return \(z_1\).

  • Else return \(z_1\) with negated sign.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(\pm p_1, \pm p_2) &=& \pm p_1 \\ \href{../exec/numerics.html#op-fcopysign}{\mathrm{fcopysign}}_N(\pm p_1, \mp p_2) &=& \mp p_1 \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(z)\)

  • If \(z\) is a NaN, then return \(z\) with positive sign.

  • Else if \(z\) is an infinity, then return positive infinity.

  • Else if \(z\) is a zero, then return positive zero.

  • Else if \(z\) is a positive value, then \(z\).

  • Else return \(z\) negated.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& +\href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n) \\ \href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(\pm \infty) &=& +\infty \\ \href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(\pm 0) &=& +0 \\ \href{../exec/numerics.html#op-fabs}{\mathrm{fabs}}_N(\pm q) &=& +q \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fneg}{\mathrm{fneg}}_N(z)\)

  • If \(z\) is a NaN, then return \(z\) with negated sign.

  • Else if \(z\) is an infinity, then return that infinity negated.

  • Else if \(z\) is a zero, then return that zero negated.

  • Else return \(z\) negated.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fneg}{\mathrm{fneg}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \mp \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n) \\ \href{../exec/numerics.html#op-fneg}{\mathrm{fneg}}_N(\pm \infty) &=& \mp \infty \\ \href{../exec/numerics.html#op-fneg}{\mathrm{fneg}}_N(\pm 0) &=& \mp 0 \\ \href{../exec/numerics.html#op-fneg}{\mathrm{fneg}}_N(\pm q) &=& \mp q \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fsqrt}{\mathrm{fsqrt}}_N(z)\)

  • If \(z\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z\}\).

  • Else if \(z\) is negative infinity, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else if \(z\) is positive infinity, then return positive infinity.

  • Else if \(z\) is a zero, then return that zero.

  • Else if \(z\) has a negative sign, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\).

  • Else return the square root of \(z\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fsqrt}{\mathrm{fsqrt}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)\} \\ \href{../exec/numerics.html#op-fsqrt}{\mathrm{fsqrt}}_N(- \infty) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fsqrt}{\mathrm{fsqrt}}_N(+ \infty) &=& + \infty \\ \href{../exec/numerics.html#op-fsqrt}{\mathrm{fsqrt}}_N(\pm 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-fsqrt}{\mathrm{fsqrt}}_N(- q) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} \\ \href{../exec/numerics.html#op-fsqrt}{\mathrm{fsqrt}}_N(+ q) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N\left(\sqrt{q}\right) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fceil}{\mathrm{fceil}}_N(z)\)

  • If \(z\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z\}\).

  • Else if \(z\) is an infinity, then return \(z\).

  • Else if \(z\) is a zero, then return \(z\).

  • Else if \(z\) is smaller than \(0\) but greater than \(-1\), then return negative zero.

  • Else return the smallest integral value that is not smaller than \(z\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fceil}{\mathrm{fceil}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)\} \\ \href{../exec/numerics.html#op-fceil}{\mathrm{fceil}}_N(\pm \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-fceil}{\mathrm{fceil}}_N(\pm 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-fceil}{\mathrm{fceil}}_N(- q) &=& -0 & (\mathrel{\mbox{if}} -1 < -q < 0) \\ \href{../exec/numerics.html#op-fceil}{\mathrm{fceil}}_N(\pm q) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(i) & (\mathrel{\mbox{if}} \pm q \leq i < \pm q + 1) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-ffloor}{\mathrm{ffloor}}_N(z)\)

  • If \(z\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z\}\).

  • Else if \(z\) is an infinity, then return \(z\).

  • Else if \(z\) is a zero, then return \(z\).

  • Else if \(z\) is greater than \(0\) but smaller than \(1\), then return positive zero.

  • Else return the largest integral value that is not larger than \(z\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ffloor}{\mathrm{ffloor}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)\} \\ \href{../exec/numerics.html#op-ffloor}{\mathrm{ffloor}}_N(\pm \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-ffloor}{\mathrm{ffloor}}_N(\pm 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-ffloor}{\mathrm{ffloor}}_N(+ q) &=& +0 & (\mathrel{\mbox{if}} 0 < +q < 1) \\ \href{../exec/numerics.html#op-ffloor}{\mathrm{ffloor}}_N(\pm q) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(i) & (\mathrel{\mbox{if}} \pm q - 1 < i \leq \pm q) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-ftrunc}{\mathrm{ftrunc}}_N(z)\)

  • If \(z\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z\}\).

  • Else if \(z\) is an infinity, then return \(z\).

  • Else if \(z\) is a zero, then return \(z\).

  • Else if \(z\) is greater than \(0\) but smaller than \(1\), then return positive zero.

  • Else if \(z\) is smaller than \(0\) but greater than \(-1\), then return negative zero.

  • Else return the integral value with the same sign as \(z\) and the largest magnitude that is not larger than the magnitude of \(z\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ftrunc}{\mathrm{ftrunc}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)\} \\ \href{../exec/numerics.html#op-ftrunc}{\mathrm{ftrunc}}_N(\pm \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-ftrunc}{\mathrm{ftrunc}}_N(\pm 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-ftrunc}{\mathrm{ftrunc}}_N(+ q) &=& +0 & (\mathrel{\mbox{if}} 0 < +q < 1) \\ \href{../exec/numerics.html#op-ftrunc}{\mathrm{ftrunc}}_N(- q) &=& -0 & (\mathrel{\mbox{if}} -1 < -q < 0) \\ \href{../exec/numerics.html#op-ftrunc}{\mathrm{ftrunc}}_N(\pm q) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(\pm i) & (\mathrel{\mbox{if}} +q - 1 < i \leq +q) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(z)\)

  • If \(z\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{z\}\).

  • Else if \(z\) is an infinity, then return \(z\).

  • Else if \(z\) is a zero, then return \(z\).

  • Else if \(z\) is greater than \(0\) but smaller than or equal to \(0.5\), then return positive zero.

  • Else if \(z\) is smaller than \(0\) but greater than or equal to \(-0.5\), then return negative zero.

  • Else return the integral value that is nearest to \(z\); if two values are equally near, return the even one.

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)\} \\ \href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(\pm \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(\pm 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(+ q) &=& +0 & (\mathrel{\mbox{if}} 0 < +q \leq 0.5) \\ \href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(- q) &=& -0 & (\mathrel{\mbox{if}} -0.5 \leq -q < 0) \\ \href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(\pm q) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(\pm i) & (\mathrel{\mbox{if}} |i - q| < 0.5) \\ \href{../exec/numerics.html#op-fnearest}{\mathrm{fnearest}}_N(\pm q) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(\pm i) & (\mathrel{\mbox{if}} |i - q| = 0.5 \wedge i~\mbox{even}) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-feq}{\mathrm{feq}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return \(0\).

  • Else if both \(z_1\) and \(z_2\) are zeroes, then return \(1\).

  • Else if both \(z_1\) and \(z_2\) are the same value, then return \(1\).

  • Else return \(0\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-feq}{\mathrm{feq}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& 0 \\ \href{../exec/numerics.html#op-feq}{\mathrm{feq}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 0 \\ \href{../exec/numerics.html#op-feq}{\mathrm{feq}}_N(\pm 0, \mp 0) &=& 1 \\ \href{../exec/numerics.html#op-feq}{\mathrm{feq}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(z_1 = z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fne}{\mathrm{fne}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return \(1\).

  • Else if both \(z_1\) and \(z_2\) are zeroes, then return \(0\).

  • Else if both \(z_1\) and \(z_2\) are the same value, then return \(0\).

  • Else return \(1\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fne}{\mathrm{fne}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& 1 \\ \href{../exec/numerics.html#op-fne}{\mathrm{fne}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 1 \\ \href{../exec/numerics.html#op-fne}{\mathrm{fne}}_N(\pm 0, \mp 0) &=& 0 \\ \href{../exec/numerics.html#op-fne}{\mathrm{fne}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(z_1 \neq z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return \(0\).

  • Else if \(z_1\) and \(z_2\) are the same value, then return \(0\).

  • Else if \(z_1\) is positive infinity, then return \(0\).

  • Else if \(z_1\) is negative infinity, then return \(1\).

  • Else if \(z_2\) is positive infinity, then return \(1\).

  • Else if \(z_2\) is negative infinity, then return \(0\).

  • Else if both \(z_1\) and \(z_2\) are zeroes, then return \(0\).

  • Else if \(z_1\) is smaller than \(z_2\), then return \(1\).

  • Else return \(0\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& 0 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 0 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z, z) &=& 0 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(+ \infty, z_2) &=& 0 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(- \infty, z_2) &=& 1 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z_1, + \infty) &=& 1 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z_1, - \infty) &=& 0 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(\pm 0, \mp 0) &=& 0 \\ \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(z_1 < z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return \(0\).

  • Else if \(z_1\) and \(z_2\) are the same value, then return \(0\).

  • Else if \(z_1\) is positive infinity, then return \(1\).

  • Else if \(z_1\) is negative infinity, then return \(0\).

  • Else if \(z_2\) is positive infinity, then return \(0\).

  • Else if \(z_2\) is negative infinity, then return \(1\).

  • Else if both \(z_1\) and \(z_2\) are zeroes, then return \(0\).

  • Else if \(z_1\) is larger than \(z_2\), then return \(1\).

  • Else return \(0\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& 0 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 0 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(z, z) &=& 0 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(+ \infty, z_2) &=& 1 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(- \infty, z_2) &=& 0 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(z_1, + \infty) &=& 0 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(z_1, - \infty) &=& 1 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(\pm 0, \mp 0) &=& 0 \\ \href{../exec/numerics.html#op-fgt}{\mathrm{fgt}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(z_1 > z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return \(0\).

  • Else if \(z_1\) and \(z_2\) are the same value, then return \(1\).

  • Else if \(z_1\) is positive infinity, then return \(0\).

  • Else if \(z_1\) is negative infinity, then return \(1\).

  • Else if \(z_2\) is positive infinity, then return \(1\).

  • Else if \(z_2\) is negative infinity, then return \(0\).

  • Else if both \(z_1\) and \(z_2\) are zeroes, then return \(1\).

  • Else if \(z_1\) is smaller than or equal to \(z_2\), then return \(1\).

  • Else return \(0\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& 0 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 0 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(z, z) &=& 1 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(+ \infty, z_2) &=& 0 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(- \infty, z_2) &=& 1 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(z_1, + \infty) &=& 1 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(z_1, - \infty) &=& 0 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(\pm 0, \mp 0) &=& 1 \\ \href{../exec/numerics.html#op-fle}{\mathrm{fle}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(z_1 \leq z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(z_1, z_2)\)

  • If either \(z_1\) or \(z_2\) is a NaN, then return \(0\).

  • Else if \(z_1\) and \(z_2\) are the same value, then return \(1\).

  • Else if \(z_1\) is positive infinity, then return \(1\).

  • Else if \(z_1\) is negative infinity, then return \(0\).

  • Else if \(z_2\) is positive infinity, then return \(0\).

  • Else if \(z_2\) is negative infinity, then return \(1\).

  • Else if both \(z_1\) and \(z_2\) are zeroes, then return \(1\).

  • Else if \(z_1\) is smaller than or equal to \(z_2\), then return \(1\).

  • Else return \(0\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& 0 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 0 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(z, z) &=& 1 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(+ \infty, z_2) &=& 1 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(- \infty, z_2) &=& 0 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(z_1, + \infty) &=& 0 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(z_1, - \infty) &=& 1 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(\pm 0, \mp 0) &=& 1 \\ \href{../exec/numerics.html#op-fge}{\mathrm{fge}}_N(z_1, z_2) &=& \href{../exec/numerics.html#aux-tobool}{\mathrm{bool}}(z_1 \geq z_2) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fpmin}{\mathrm{fpmin}}_N(z_1, z_2)\)

  • If \(z_2\) is less than \(z_1\) then return \(z_2\).

  • Else return \(z_1\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fpmin}{\mathrm{fpmin}}_N(z_1, z_2) &=& z_2 & (\mathrel{\mbox{if}} \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z_2, z_1) = 1) \\ \href{../exec/numerics.html#op-fpmin}{\mathrm{fpmin}}_N(z_1, z_2) &=& z_1 & (\mathrel{\mbox{otherwise}}) \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-fpmax}{\mathrm{fpmax}}_N(z_1, z_2)\)

  • If \(z_1\) is less than \(z_2\) then return \(z_2\).

  • Else return \(z_1\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-fpmax}{\mathrm{fpmax}}_N(z_1, z_2) &=& z_2 & (\mathrel{\mbox{if}} \href{../exec/numerics.html#op-flt}{\mathrm{flt}}_N(z_1, z_2) = 1) \\ \href{../exec/numerics.html#op-fpmax}{\mathrm{fpmax}}_N(z_1, z_2) &=& z_1 & (\mathrel{\mbox{otherwise}}) \end{array}\end{split}\]

Conversions

Todo

ext or extend?

\(\href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{u}}}_{M,N}(i)\)

  • Return \(i\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{u}}}_{M,N}(i) &=& i \\ \end{array}\end{split}\]

Note

In the abstract syntax, unsigned extension just reinterprets the same value.

\(\href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i)\)

  • Let \(j\) be the signed interpretation of \(i\) of size \(M\).

  • Return the two’s complement of \(j\) relative to size \(N\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_M(i)) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-wrap}{\mathrm{wrap}}_{M,N}(i)\)

  • Return \(i\) modulo \(2^N\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-wrap}{\mathrm{wrap}}_{M,N}(i) &=& i \mathbin{\mathrm{mod}} 2^N \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{u}}}_{M,N}(z)\)

  • If \(z\) is a NaN, then the result is undefined.

  • Else if \(z\) is an infinity, then the result is undefined.

  • Else if \(z\) is a number and \(\href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(z)\) is a value within range of the target type, then return that value.

  • Else the result is undefined.

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{u}}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \{\} \\ \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{u}}}_{M,N}(\pm \infty) &=& \{\} \\ \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{u}}}_{M,N}(\pm q) &=& \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(\pm q) & (\mathrel{\mbox{if}} -1 < \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(\pm q) < 2^N) \\ \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{u}}}_{M,N}(\pm q) &=& \{\} & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

Note

This operator is partial. It is not defined for NaNs, infinities, or values for which the result is out of range.

\(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{s}}}_{M,N}(z)\)

  • If \(z\) is a NaN, then the result is undefined.

  • Else if \(z\) is an infinity, then the result is undefined.

  • If \(z\) is a number and \(\href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(z)\) is a value within range of the target type, then return that value.

  • Else the result is undefined.

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{s}}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \{\} \\ \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{s}}}_{M,N}(\pm \infty) &=& \{\} \\ \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{s}}}_{M,N}(\pm q) &=& \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(\pm q) & (\mathrel{\mbox{if}} -2^{N-1} - 1 < \href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(\pm q) < 2^{N-1}) \\ \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{s}}}_{M,N}(\pm q) &=& \{\} & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

Note

This operator is partial. It is not defined for NaNs, infinities, or values for which the result is out of range.

\(\href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}_{M,N}(z)\)

  • If \(z\) is a NaN, then return \(0\).

  • Else if \(z\) is negative infinity, then return \(0\).

  • Else if \(z\) is positive infinity, then return \(2^N - 1\).

  • Else, return \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(\href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(z))\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 0 \\ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}_{M,N}(- \infty) &=& 0 \\ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}_{M,N}(+ \infty) &=& 2^N - 1 \\ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}_{M,N}(z) &=& \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(\href{../exec/numerics.html#aux-trunc}{\mathrm{trunc}}(z)) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}_{M,N}(z)\)

  • If \(z\) is a NaN, then return \(0\).

  • Else if \(z\) is negative infinity, then return \(-2^{N-1}\).

  • Else if \(z\) is positive infinity, then return \(2^{N-1} - 1\).

  • Else, return the value whose signed interpretation is \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}}(z))\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& 0 \\ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}_{M,N}(- \infty) &=& -2^{N-1} \\ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}_{M,N}(+ \infty) &=& 2^{N-1}-1 \\ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}_{M,N}(z) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}}(z))) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-promote}{\mathrm{promote}}_{M,N}(z)\)

  • If \(z\) is a canonical NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\) (i.e., a canonical NaN of size \(N\)).

  • Else if \(z\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(1)\}\) (i.e., any arithmetic NaN of size \(N\)).

  • Else, return \(z\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-promote}{\mathrm{promote}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} & (\mathrel{\mbox{if}} n = \href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N) \\ \href{../exec/numerics.html#op-promote}{\mathrm{promote}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{+ \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(1)\} & (\mathrel{\mbox{otherwise}}) \\ \href{../exec/numerics.html#op-promote}{\mathrm{promote}}_{M,N}(z) &=& z \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-demote}{\mathrm{demote}}_{M,N}(z)\)

  • If \(z\) is a canonical NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\}\) (i.e., a canonical NaN of size \(N\)).

  • Else if \(z\) is a NaN, then return an element of \(\href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(1)\}\) (i.e., any NaN of size \(N\)).

  • Else if \(z\) is an infinity, then return that infinity.

  • Else if \(z\) is a zero, then return that zero.

  • Else, return \(\href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(z)\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-demote}{\mathrm{demote}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{\} & (\mathrel{\mbox{if}} n = \href{../syntax/values.html#aux-canon}{\mathrm{canon}}_N) \\ \href{../exec/numerics.html#op-demote}{\mathrm{demote}}_{M,N}(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-nans}{\mathrm{nans}}_N\{+ \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(1)\} & (\mathrel{\mbox{otherwise}}) \\ \href{../exec/numerics.html#op-demote}{\mathrm{demote}}_{M,N}(\pm \infty) &=& \pm \infty \\ \href{../exec/numerics.html#op-demote}{\mathrm{demote}}_{M,N}(\pm 0) &=& \pm 0 \\ \href{../exec/numerics.html#op-demote}{\mathrm{demote}}_{M,N}(\pm q) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(\pm q) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-convert}{\mathrm{convert}^{\mathsf{u}}}_{M,N}(i)\)

  • Return \(\href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(i)\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-convert}{\mathrm{convert}^{\mathsf{u}}}_{M,N}(i) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(i) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-convert}{\mathrm{convert}^{\mathsf{s}}}_{M,N}(i)\)

  • Let \(j\) be the signed interpretation of \(i\).

  • Return \(\href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(j)\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-convert}{\mathrm{convert}^{\mathsf{s}}}_{M,N}(i) &=& \href{../exec/numerics.html#aux-ieee}{\mathrm{float}}_N(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_M(i)) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-reinterpret}{\mathrm{reinterpret}}_{t_1,t_2}(c)\)

  • Let \(d^\ast\) be the bit sequence \(\href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_{t_1}(c)\).

  • Return the constant \(c'\) for which \(\href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_{t_2}(c') = d^\ast\).

\[\begin{split}\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-reinterpret}{\mathrm{reinterpret}}_{t_1,t_2}(c) &=& \href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_{t_2}^{-1}(\href{../exec/numerics.html#aux-bits}{\mathrm{bits}}_{t_1}(c)) \\ \end{array}\end{split}\]

\(\href{../exec/numerics.html#op-narrow}{\mathrm{narrow}^{\mathsf{s}}}_{M,N}(i)\)

  • Let \(j\) be the signed interpretation of \(i\) of size \(M\).

  • Return the value whose signed interpretation is \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(j)\).

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-narrow}{\mathrm{narrow}^{\mathsf{s}}}_{M,N}(i) &=& \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_N^{-1}(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_s}}_N(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_M(i))) \end{array}\]

\(\href{../exec/numerics.html#op-narrow}{\mathrm{narrow}^{\mathsf{u}}}_{M,N}(i)\)

  • Let \(j\) be the signed interpretation of \(i\) of size \(M\).

  • Return \(\href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(j)\).

\[\begin{array}{lll@{\qquad}l} \href{../exec/numerics.html#op-narrow}{\mathrm{narrow}^{\mathsf{u}}}_{M,N}(i) &=& \href{../exec/numerics.html#aux-sat}{\mathrm{sat\_u}}_N(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_M(i)) \end{array}\]

Relaxed Operations

The result of relaxed operators are implementation-dependent, because the set of possible results may depend on properties of the host environment, such as its hardware. Technically, their behaviour is controlled by a set of global parameters to the semantics that an implementation can instantiate in different ways. These choices are fixed, that is, parameters are constant during the execution of any given program.

Every such parameter is an index into a sequence of possible sets of results and must be instantiated to a defined index. In the deterministic profile, every parameter is prescribed to be 0. This behaviour is expressed by the following auxiliary function, where \(R\) is a global parameter selecting one of the allowed outcomes:

\[\begin{split}\begin{array}{@{}lcll} \def\mathdef2094#1{{^{\small{[!#1]}}}}\mathdef2094{\href{../appendix/profiles.html#profile-deterministic}{\mathrm{DET}}} & \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R)[ A_0, \dots, A_n ] = A_R \\ & \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R)[ A_0, \dots, A_n ] = A_0 \\ \end{array}\end{split}\]

Note

Each parameter can be thought of as inducing a family of operations that is fixed to one particular choice by an implementation. The fixed operation itself can still be non-deterministic or partial.

Implementations are expexted to either choose the behaviour that is the most efficient on the underlying hardware, or the behaviour of the deterministic profile.

\(\href{../exec/numerics.html#op-frelaxed-madd}{\mathrm{frelaxed\_madd}}_N(z_1, z_2, z_3)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{fmadd}} \in \{0, 1\}\).

  • Return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmadd}})[\href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(z_1, z_2), z_3)\) or \(\href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(z_1, z_2, z_3)]\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-frelaxed-madd}{\mathrm{frelaxed\_madd}}_N(z_1, z_2, z_3) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmadd}})[ \href{../exec/numerics.html#op-fadd}{\mathrm{fadd}}_N(\href{../exec/numerics.html#op-fmul}{\mathrm{fmul}}_N(z_1, z_2), z_3), \href{../exec/numerics.html#op-fma}{\mathrm{fma}}_N(z_1, z_2, z_3) ] \\ \end{array}\end{split}\]

Note

Relaxed multiply-add allows for fused or unfused results, which leads to implementation-dependent rounding behaviour. In the deterministic profile, the unfused behaviour is used.

\(\href{../exec/numerics.html#op-frelaxed-nmadd}{\mathrm{frelaxed\_nmadd}}_N(z_1, z_2, z_3)\)

  • Return \(\href{../exec/numerics.html#op-frelaxed-madd}{\mathrm{frelaxed\_madd}}(-z_1, z_2, z_3)\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-frelaxed-nmadd}{\mathrm{frelaxed\_nmadd}}_N(z_1, z_2, z_3) &=& \href{../exec/numerics.html#op-frelaxed-madd}{\mathrm{frelaxed\_madd}}_N(-z_1, z_2, z_3) \\ \end{array}\end{split}\]

Note

This operation is implementation-dependent because \(\href{../exec/numerics.html#op-frelaxed-madd}{\mathrm{frelaxed\_madd}}\) is implementation-dependent.

\(\href{../exec/numerics.html#op-frelaxed-min}{\mathrm{frelaxed\_min}}_N(z_1, z_2)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{fmin}} \in \{0, 1, 2, 3\}\).

  • If \(z_1\) is a NaN, then return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmin}})[ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2)\), NAN(n), z_2, z_2 ]`.

  • If \(z_2\) is a NaN, then return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmin}})[ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2)\), z_1, NAN(n), z_1 ]`.

  • If both \(z_1\) and \(z_2\) are zeroes of opposite sign, then return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmin}})[ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2)\), pm 0, mp 0, -0 ]`.

  • Return \(\href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2)\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-frelaxed-min}{\mathrm{frelaxed\_min}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmin}})[ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2), \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2, z_2 ] \\ \href{../exec/numerics.html#op-frelaxed-min}{\mathrm{frelaxed\_min}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmin}})[ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)), z_1, \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1 ] \\ \href{../exec/numerics.html#op-frelaxed-min}{\mathrm{frelaxed\_min}}_N(\pm 0, \mp 0) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmin}})[ \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(\pm 0, \mp 0), \pm 0, \mp 0, -0 ] \\ \href{../exec/numerics.html#op-frelaxed-min}{\mathrm{frelaxed\_min}}_N(z_1, z_2) &=& \href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}_N(z_1, z_2) & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

Note

Relaxed minimum is implementation-dependent for NaNs and for zeroes with different signs. In the deterministic profile, it behaves like regular \(\href{../exec/numerics.html#op-fmin}{\mathrm{fmin}}\).

\(\href{../exec/numerics.html#op-frelaxed-max}{\mathrm{frelaxed\_max}}_N(z_1, z_2)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{fmax}} \in \{0, 1, 2, 3\}\).

  • If \(z_1\) is a NaN, then return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmax}})[ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2)\), NAN(n), z_2, z_2 ]`.

  • If \(z_2\) is a NaN, then return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmax}})[ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2)\), z_1, NAN(n), z_1 ]`.

  • If both \(z_1\) and \(z_2\) are zeroes of opposite sign, then return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmax}})[ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2)\), pm 0, mp 0, +0 ]`.

  • Return \(\href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2)\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-frelaxed-max}{\mathrm{frelaxed\_max}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmax}})[ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(\pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2), \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_2, z_2 ] \\ \href{../exec/numerics.html#op-frelaxed-max}{\mathrm{frelaxed\_max}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmax}})[ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, \pm \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n)), z_1, \href{../syntax/values.html#syntax-float}{\mathsf{nan}}(n), z_1 ] \\ \href{../exec/numerics.html#op-frelaxed-max}{\mathrm{frelaxed\_max}}_N(\pm 0, \mp 0) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{fmax}})[ \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(\pm 0, \mp 0), \pm 0, \mp 0, +0 ] \\ \href{../exec/numerics.html#op-frelaxed-max}{\mathrm{frelaxed\_max}}_N(z_1, z_2) &=& \href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}_N(z_1, z_2) & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

Note

Relaxed maximum is implementation-dependent for NaNs and for zeroes with different signs. In the deterministic profile, it behaves like regular \(\href{../exec/numerics.html#op-fmax}{\mathrm{fmax}}\).

\(\href{../exec/numerics.html#op-irelaxed-dot-mul}{\mathrm{irelaxed\_dot\_mul}}_{M,N}(i_1, i_2)\)

This is an auxiliary operator for the specification of \(\href{../syntax/instructions.html#syntax-instr-vec}{\mathsf{relaxed\_dot}}\) and \(\href{../syntax/instructions.html#syntax-instr-vec}{\mathsf{relaxed\_dot\_add}}\).

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{idot}} \in \{0, 1\}\).

  • Return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{idot}})[ \href{../exec/numerics.html#op-imul}{\mathrm{imul}}_N(\href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i_1), \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i_2)), \href{../exec/numerics.html#op-imul}{\mathrm{imul}}_N(\href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i_1), \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{u}}}_{M,N}(i_2)) ]\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-irelaxed-dot-mul}{\mathrm{irelaxed\_dot\_mul}}_{M,N}(i_1, i_2) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{idot}})[ \href{../exec/numerics.html#op-imul}{\mathrm{imul}}_N(\href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i_1), \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i_2)), \href{../exec/numerics.html#op-imul}{\mathrm{imul}}_N(\href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{M,N}(i_1), \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{u}}}_{M,N}(i_2)) ] \\ \end{array}\end{split}\]

Note

Relaxed dot product is implementation-dependent when the second operand is negative in a signed intepretation. In the deterministic profile, it behaves like signed dot product.

\(\href{../exec/numerics.html#op-irelaxed-q15mulr-s}{\mathrm{irelaxed\_q15mulr\_s}}_N(i_1, i_2)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{iq15mulr}} \in \{0, 1\}\).

  • If both \(i_1\) and \(i_2\) equal \((\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}^{-1}_N(-2^{N-1})\), then return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{iq15mulr}})[ 2^{N-1}-1, \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}^{-1}_N(-2^{N-1}) ]\).

  • Return \(\href{../exec/numerics.html#op-iq15mulrsat}{\mathrm{iq15mulrsat\_s}}(i_1, i_2)\)

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-irelaxed-q15mulr-s}{\mathrm{irelaxed\_q15mulr\_s}}_N(\href{../exec/numerics.html#aux-signed}{\mathrm{signed}}^{-1}_N(-2^{N-1}), \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}^{-1}_N(-2^{N-1})) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{iq15mulr}})[ 2^{N-1}-1, \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}^{-1}_N(-2^{N-1}) ] & \\ \href{../exec/numerics.html#op-irelaxed-q15mulr-s}{\mathrm{irelaxed\_q15mulr\_s}}_N(i_1, i_2) &=& \href{../exec/numerics.html#op-iq15mulrsat}{\mathrm{iq15mulrsat\_s}}(i_1, i_2) \end{array}\end{split}\]

Note

Relaxed Q15 multiplication is implementation-dependent when the result overflows. In the deterministic profile, it behaves like regular \(\href{../exec/numerics.html#op-iq15mulrsat}{\mathrm{iq15mulrsat\_s}}\).

\(\href{../exec/numerics.html#op-relaxed-trunc}{\mathrm{relaxed\_trunc}}^u_{M,N}(z)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{trunc\_u}} \in \{0, 1, 2, 3\}\).

  • If \(z\) is normal or subnormal and \(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}}(z)\) is non-negative and less than \(2^N\), then return \(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{u}}}_{M,N}(z)\).

  • Else, return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{trunc\_u}})[ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}_{M,N}(z), 2^N-1, 2^N-2, 2^(N-1) ]\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-relaxed-trunc}{\mathrm{relaxed\_trunc}}^u_{M,N}(\pm q) &=& \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{u}}}_{M,N}(\pm q) & (\mathrel{\mbox{if}} 0 \leq \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}}(\pm q) < 2^N) \\ \href{../exec/numerics.html#op-relaxed-trunc}{\mathrm{relaxed\_trunc}}^u_{M,N}(z) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{trunc\_u}})[ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}_{M,N}(z), 2^{N}-1, 2^{N}-2, 2^{N-1}] & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

Note

Relaxed unsigned truncation is implementation-dependent for NaNs and out-of-range values. In the deterministic profile, it behaves like regular \(\href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_u}}\).

\(\href{../exec/numerics.html#op-relaxed-trunc}{\mathrm{relaxed\_trunc}}^s_{M,N}(z)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{trunc\_s}} \in \{0, 1\}\).

  • If \(z\) is normal or subnormal and \(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}}(z)\) is greater than or equal to \(-2^{N-1}\) and less than \(2^{N-1}\), then return \(\href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{s}}}_{M,N}(z)\).

  • Else, return \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{trunc\_s}})[ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}_{M,N}(z), 2^N-1, 2^N-2, 2^(N-1) ]\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-relaxed-trunc}{\mathrm{relaxed\_trunc}}^s_{M,N}(\pm q) &=& \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}^{\mathsf{s}}}_{M,N}(\pm q) & (\mathrel{\mbox{if}} -2^{N-1} \leq \href{../exec/numerics.html#op-trunc}{\mathrm{trunc}}(\pm q) < 2^{N-1}) \\ \href{../exec/numerics.html#op-relaxed-trunc}{\mathrm{relaxed\_trunc}}^s_{M,N}(z) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{trunc\_s}})[ \href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}_{M,N}(z), \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}^{-1}_N(-2^{N-1})] & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

Note

Relaxed signed truncation is implementation-dependent for NaNs and out-of-range values. In the deterministic profile, it behaves like regular \(\href{../exec/numerics.html#op-trunc-sat}{\mathrm{trunc\_sat\_s}}\).

\(\href{../exec/numerics.html#op-ivrelaxed-swizzle}{\mathrm{ivrelaxed\_swizzle}}(i^n, j^n)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{swizzle}} \in \{0, 1\}\).

  • For each \(j_k\) in \(j^n\), let \(r_k\) be the value \(\href{../exec/numerics.html#op-ivrelaxed-swizzle-lane}{\mathrm{ivrelaxed\_swizzle\_lane}}(i^n, j_k)\).

  • Let \(r^n\) be the concatenation of all \(r_k\).

  • Return \(r^n\).

\[\begin{split}\begin{array}{@{}lcl} \href{../exec/numerics.html#op-ivrelaxed-swizzle}{\mathrm{ivrelaxed\_swizzle}}(i^n, j^n) &=& \href{../exec/numerics.html#op-ivrelaxed-swizzle-lane}{\mathrm{ivrelaxed\_swizzle\_lane}}(i^n, j)^n \\ \end{array}\end{split}\]

where:

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-ivrelaxed-swizzle-lane}{\mathrm{ivrelaxed\_swizzle\_lane}}(i^n, j) &=& i[j] & (\mathrel{\mbox{if}} j < 16) \\ \href{../exec/numerics.html#op-ivrelaxed-swizzle-lane}{\mathrm{ivrelaxed\_swizzle\_lane}}(i^n, j) &=& 0 & (\mathrel{\mbox{if}} \href{../exec/numerics.html#aux-signed}{\mathrm{signed}}_8(j) < 0) \\ \href{../exec/numerics.html#op-ivrelaxed-swizzle-lane}{\mathrm{ivrelaxed\_swizzle\_lane}}(i^n, j) &=& \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{swizzle}})[ 0, i^n[j \mathbin{\mathrm{mod}} n] ] & (\mathrel{\mbox{otherwise}}) \\ \end{array}\end{split}\]

Note

Relaxed swizzle is implementation-dependent if the signed interpretation of any of the 8-bit indices in \(j^n\) is larger than or equal to 16. In the deterministic profile, it behaves like regular \(\href{../syntax/instructions.html#syntax-instr-vec}{\mathsf{swizzle}}\).

\(\href{../exec/numerics.html#op-irelaxed-laneselect}{\mathrm{irelaxed\_laneselect}}_N(i_1, i_2, i_3)\)

The implementation-specific behaviour of this operation is determined by the global parameter \(R_{\mathrm{laneselect}} \in \{0, 1\}\).

  • If \(i_3\) is smaller than \(2^{N-1}\), then let \(i'_3\) be the value \(0\), otherwise \(2^N-1\).

  • Let \(i''_3\) be \(\href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{laneselect}})[i_3, i'_3]\).

  • Return \(\href{../exec/numerics.html#op-ibitselect}{\mathrm{ibitselect}}_N(i_1, i_2, i''_3)\).

\[\begin{split}\begin{array}{@{}lcll} \href{../exec/numerics.html#op-irelaxed-laneselect}{\mathrm{irelaxed\_laneselect}}_N(i_1, i_2, i_3) &=& \href{../exec/numerics.html#op-ibitselect}{\mathrm{ibitselect}}_N(i_1, i_2, \href{../exec/numerics.html#aux-relaxed}{\mathrm{relaxed}}(R_{\mathrm{laneselect}})[ i_3, \href{../exec/numerics.html#op-ext}{\mathrm{extend}^{\mathsf{s}}}_{1,N}(\href{../exec/numerics.html#op-ishr}{\mathrm{ishr\_u}}_N(i_3, N-1)) ]) \\ \end{array}\end{split}\]

Note

Relaxed lane selection is non-deterministic when the mask mixes set and cleared bits, since the value of the high bit may or may not be expanded to all bits. In the deterministic profile, it behaves like \(\href{../exec/numerics.html#op-ibitselect}{\mathrm{ibitselect}}\).