Microsoft Excel 97 uses unicode strings. In BIFF8, strings are stored in a compressed format. Each string contains the following fields:
Offset |
Name |
Size |
Contents |
0 |
cch |
2 |
Count of characters in the string (notice that this is the number of characters, NOT the number of bytes) |
2 |
grbit |
1 |
Option flags |
3 |
rgb |
var |
Array of string characters and formatting runs |
Unicode strings usually require 2 bytes of storage per character. Because most strings in USA/English Microsoft Excel always have the high bytes of unicode characters = 00h, the strings can be saved using a compressed unicode format. The grbit field specifies the compression encoding as shown in the following table.
Bits |
Mask |
Name |
Contents |
0 |
01h |
fHighByte |
= 0 if all the characters in the string have a high byte of 00h and only the low bytes are saved in the file (compressed) |
1 |
02h |
(Reserved) |
Reserved; must be 0 (zero) |
2 |
04h |
fExtSt |
Extended string follows (Far East versions, see text) |
3 |
08h |
fRichSt |
Rich string follows |
7 – 4 |
F0h |
(Reserved) |
Reserved; must be 0 (zero) |
An unformatted string with all high bytes = 00h has grbit = 00h. Also, this implies that there are no formatting runs, which means that the runs count field does not exist.
An unformatted string that has at least one character with a nonzero high byte has grbit = 01h.
A formatted string with all high bytes = 00h has grbit = 08h if the string has several different character formats applied.
The easiest way to understand the contents of BIFF8 strings is to look at an example. Suppose the string this is red ink is in a cell, and is formatted so that the word red is red. The rgb field of the SST record appears as follows:
0f 00 08 02 00 74 68 69 73 20 69 73 20 72 65 64 20 69 6e 6b 08 00 06 00 0b 00 05 00
Swapping bytes and reorganizing:
000F 08 0002 74 68 69 73 20 69 73 20 72 65 64 20 69 6E 6B
0008 0006 000B 0005
This data parses as shown in the following table:
Data |
Description |
000F |
String contains 15 characters. |
08 |
The grbit is 08h, which indicates a rich string. |
0002 |
Count of formatting runs (runs follow the string and are not included in the character count; if there are no formatting runs, this field does not exist). |
74 68 69 73 20 69 73 20 72 65 64 20 69 6E 6B |
The string characters; note that in this case, each character is one byte. |
0008 0006 |
Run number 1: index to FONT record 6 (ifnt, 0-based) for characters beginning with character number 8 (0-based). |
000B 0005 |
Run number 2: index to FONT record 5 (ifnt, 0-based) for characters beginning with character number B (0-based). |
In Far East versions (for example, Japanese Microsoft Excel), extended strings may appear in the SST record (fExtSt is set in the grbit field). These strings store additional fields that contain phonetic, language ID, or keyboard ID information. The first two fields of extended strings (cch and grbit) are identical to the nonextended strings described in the preceding text.
Extended strings contain the fields shown in following tables.
Extended strings (not rich: fRichSt is not set)
Offset |
Name |
Size |
Contents |
0 |
cch |
2 |
Count of characters in the string data (notice that this is the number of characters, NOT the number of bytes) |
2 |
grbit |
1 |
Option flags (see preceding table) |
3 |
cchExtRst |
4 |
Length of ExtRst data |
7 |
rgb |
var |
String data |
var |
ExtRst |
var |
ExtRst data (not documented; length of this field is given by cchExtRst) |
Extended strings (rich: fRichSt is set)
Offset |
Name |
Size |
Contents |
0 |
cch |
2 |
Count of characters in the string data (notice that this is the number of characters, NOT the number of bytes) |
2 |
grbit |
1 |
Option flags (see preceding table) |
3 |
crun |
2 |
Count of formatting runs |
5 |
cchExtRst |
4 |
Length of ExtRst data |
9 |
rgb |
var |
String data |
var |
rgSTRUN |
var |
Array of formatting run structures; length is equal to (crun x 8) bytes |
var |
ExtRst |
var |
ExtRst data (not documented; length of this field is given by cchExtRst) |