Parsed Expressions and Tokens

Microsoft Excel uses a modified reverse-Polish technique to store parsed expressions. A parsed expression contains a sequence of parse tokens, each of which is either an operand, an operator token, or a control token. Operand tokens push operands onto the stack. Operator tokens perform arithmetic operations on operands. Control tokens assist in formula evaluation by describing properties of the formula.

A token consists of two parts: a token type and a token value. A token type is called a ptg (parse thing) in Microsoft Excel. A ptg is 1 byte long and has a value from 01h to 7Fh. The ptgs above 7Fh are reserved.

The ptg specifies only what kind of information a token contains. The information itself is stored in the token value, which immediately follows the ptg. Some tokens consist of only a ptg, without an accompanying token value. For example, to specify an addition operation, only the token type ptgAdd is required. But to specify an integer operand, you must specify both ptgInt and the token value, which is an integer.

For example, assume that the formula =5+6 is in cell A1. The parsed expression for this formula consists of three tokens: two integer operand tokens (<token 1> and <token 2>) and an operator token (<token 3>), as shown in the following table.

<token 1>

<token 2>

<token 3>

ptgInt 0005h

ptgInt 0006h

ptgAdd


Notice that each ptgInt is immediately followed by the integer token value.

If you type this formula in cell A1 and then examine the FORMULA record (using the BiffView utility), you'll see the following:

00000  06 00 1d 00 00 00 00 00 0f 00 00 00 00 00 00 00
00010  26 40 00 00 00 00 e0 fc 07 00 1e 05 00 1e 06 00
00020  03 -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
The first 26 bytes of the hex dump contain the record number, record length, rw, col, ixfe, num, grbit, chn, and cce fields. The remaining 7 bytes contain the two ptgInt (1Eh) tokens — which contain the token values that represent the integers 5 and 6 (0005h and 0006h) — and the ptgAdd (03h) token. If the formula were changed to =5*6, the third token would be ptgMul (05h). For more information about the FORMULA record, see "FORMULA" on page 317.

In many cases, the token value consists of a structure of two or more fields. In these cases, offset-0 (zero) is assumed to be the first byte of the token value — that is, the first byte immediately following the token type.