Unicode Strings in BIFF8

Microsoft Excel 97 uses unicode strings. In BIFF8, strings are stored in a compressed format. Each string contains the following fields:

Offset	Name	Size	Contents

0	cch	2	Count of characters in the string (notice that this is the number of characters, NOT the number of bytes)
2	grbit	1	Option flags
3	rgb	var	Array of string characters and formatting runs

Unicode strings usually require 2 bytes of storage per character. Because most strings in USA/English Microsoft Excel always have the high bytes of unicode characters = 00h, the strings can be saved using a compressed unicode format. The grbit field specifies the compression encoding as shown in the following table.

Bits	Mask	Name	Contents

0	01h	fHighByte	= 0 if all the characters in the string have a high byte of 00h and only the low bytes are saved in the file (compressed) = 1 if at least one character in the string has a nonzero high byte and therefore all characters in the string are saved as double-byte characters (not compressed)
1	02h	(Reserved)	Reserved; must be 0 (zero)
2	04h	fExtSt	Extended string follows (Far East versions, see text)
3	08h	fRichSt	Rich string follows
7 – 4	F0h	(Reserved)	Reserved; must be 0 (zero)

An unformatted string with all high bytes = 00h has grbit = 00h. Also, this implies that there are no formatting runs, which means that the runs count field does not exist.

An unformatted string that has at least one character with a nonzero high byte has grbit = 01h.

A formatted string with all high bytes = 00h has grbit = 08h if the string has several different character formats applied.

The easiest way to understand the contents of BIFF8 strings is to look at an example. Suppose the string this is red ink is in a cell, and is formatted so that the word red is red. The rgb field of the SST record appears as follows:

0f 00 08 02 00 74 68 69 73 20 69 73 20 72 65 64 20 69 6e 6b 08 00     06 00 0b 00 05 00

Swapping bytes and reorganizing:

000F  08  0002  74 68 69 73 20 69 73 20 72 65 64 20 69 6E 6B
    0008  0006  000B  0005

This data parses as shown in the following table:

Data	Description

000F	String contains 15 characters.
08	The grbit is 08h, which indicates a rich string.
0002	Count of formatting runs (runs follow the string and are not included in the character count; if there are no formatting runs, this field does not exist).
74 68 69 73 20 69 73 20 72 65 64 20 69 6E 6B	The string characters; note that in this case, each character is one byte.
0008 0006	Run number 1: index to FONT record 6 (ifnt, 0-based) for characters beginning with character number 8 (0-based).
000B 0005	Run number 2: index to FONT record 5 (ifnt, 0-based) for characters beginning with character number B (0-based).

Extended Strings in Far East Versions

In Far East versions (for example, Japanese Microsoft Excel), extended strings may appear in the SST record (fExtSt is set in the grbit field). These strings store additional fields that contain phonetic, language ID, or keyboard ID information. The first two fields of extended strings (cch and grbit) are identical to the nonextended strings described in the preceding text.

Extended strings contain the fields shown in following tables.

Extended strings (not rich: fRichSt is not set)

Offset	Name	Size	Contents

0	cch	2	Count of characters in the string data (notice that this is the number of characters, NOT the number of bytes)
2	grbit	1	Option flags (see preceding table)
3	cchExtRst	4	Length of ExtRst data
7	rgb	var	String data
var	ExtRst	var	ExtRst data (not documented; length of this field is given by cchExtRst)

Extended strings (rich: fRichSt is set)

Offset	Name	Size	Contents

0	cch	2	Count of characters in the string data (notice that this is the number of characters, NOT the number of bytes)
2	grbit	1	Option flags (see preceding table)
3	crun	2	Count of formatting runs
5	cchExtRst	4	Length of ExtRst data
9	rgb	var	String data
var	rgSTRUN	var	Array of formatting run structures; length is equal to (crun x 8) bytes
var	ExtRst	var	ExtRst data (not documented; length of this field is given by cchExtRst)