Unicode Strings in BIFF8

Microsoft Excel 97 uses unicode strings. In BIFF8, strings are stored in a compressed format. Each string contains the following fields:

Offset

Name

Size

Contents

0

cch

2

Count of characters in the string (notice that this is the number of characters, NOT the number of bytes)

2

grbit

1

Option flags

3

rgb

var

Array of string characters and formatting runs


Unicode strings usually require 2 bytes of storage per character. Because most strings in USA/English Microsoft Excel always have the high bytes of unicode characters = 00h, the strings can be saved using a compressed unicode format. The grbit field specifies the compression encoding as shown in the following table.

Bits

Mask

Name

Contents

0

01h

fHighByte

= 0 if all the characters in the string have a high byte of 00h and only the low bytes are saved in the file (compressed)

= 1 if at least one character in the string has a nonzero high byte and therefore all characters in the string are saved as double-byte characters (not compressed)

1

02h

(Reserved)

Reserved; must be 0 (zero)

2

04h

fExtSt

Extended string follows (Far East versions, see text)

3

08h

fRichSt

Rich string follows

7 – 4

F0h

(Reserved)

Reserved; must be 0 (zero)


An unformatted string with all high bytes = 00h has grbit = 00h. Also, this implies that there are no formatting runs, which means that the runs count field does not exist.

An unformatted string that has at least one character with a nonzero high byte has grbit = 01h.

A formatted string with all high bytes = 00h has grbit = 08h if the string has several different character formats applied.

The easiest way to understand the contents of BIFF8 strings is to look at an example. Suppose the string this is red ink is in a cell, and is formatted so that the word red is red. The rgb field of the SST record appears as follows:

0f 00 08 02 00 74 68 69 73 20 69 73 20 72 65 64 20 69 6e 6b 08 00     06 00 0b 00 05 00

Swapping bytes and reorganizing:

000F  08  0002  74 68 69 73 20 69 73 20 72 65 64 20 69 6E 6B
    0008  0006  000B  0005

This data parses as shown in the following table:

Data

Description

000F

String contains 15 characters.

08

The grbit is 08h, which indicates a rich string.

0002

Count of formatting runs (runs follow the string and are not included in the character count; if there are no formatting runs, this field does not exist).

74 68 69 73 20 69 73 20 72 65 64 20 69 6E 6B

The string characters; note that in this case, each character is one byte.

0008 0006

Run number 1: index to FONT record 6 (ifnt, 0-based) for characters beginning with character number 8 (0-based).

000B 0005

Run number 2: index to FONT record 5 (ifnt, 0-based) for characters beginning with character number B (0-based).


Extended Strings in Far East Versions

In Far East versions (for example, Japanese Microsoft Excel), extended strings may appear in the SST record (fExtSt is set in the grbit field). These strings store additional fields that contain phonetic, language ID, or keyboard ID information. The first two fields of extended strings (cch and grbit) are identical to the nonextended strings described in the preceding text.

Extended strings contain the fields shown in following tables.

Extended strings (not rich: fRichSt is not set)

Offset

Name

Size

Contents

0

cch

2

Count of characters in the string data (notice that this is the number of characters, NOT the number of bytes)

2

grbit

1

Option flags (see preceding table)

3

cchExtRst

4

Length of ExtRst data

7

rgb

var

String data

var

ExtRst

var

ExtRst data (not documented; length of this field is given by cchExtRst)


Extended strings (rich: fRichSt is set)

Offset

Name

Size

Contents

0

cch

2

Count of characters in the string data (notice that this is the number of characters, NOT the number of bytes)

2

grbit

1

Option flags (see preceding table)

3

crun

2

Count of formatting runs

5

cchExtRst

4

Length of ExtRst data

9

rgb

var

String data

var

rgSTRUN

var

Array of formatting run structures; length is equal to (crun x 8) bytes

var

ExtRst

var

ExtRst data (not documented; length of this field is given by cchExtRst)