Stylesheet
A stylesheet is a collection of styles. In Word, each document has its own stylesheet.
A style is a set of formatting information collected together and given a name. Word 6.0 supports paragraph and character styles, previous versions supported only paragraph styles. Character styles have just character formatting, paragraph styles have both character and paragraph formatting. The style sheet establishes a correspondence between a style code and a style definition.
Note that the storage and behavior of styles has changed radically since Word 2 for Windows, beginning with nFib 63. Some of the differences are:
- Character styles are supported.
- The style code is called an istd, rather than an stc.
- The istd is a short, where the stc was a byte.
- The range of the istd is 0-4095, where 4095 is the null style. The range of the stc was 0-256, with 222 as the null style.
- PAPX's have a short istd at the beginning, rather than a byte stc.
- CHPX's are a grpprl, not a CHP.
- Many other changes...
This document describes only the final Word 6.0 version of the stylesheet, not the Word 2.x version.
The styles for a document (both paragraph and character styles) are stored in an array in each document. When new styles are created, they are added to the end of the array. The array can have unused slots. Some slots at the beginning of the array are reserved for specific styles, whether they have been created yet or not. Paragraph and character styles are stored in the same array. Each document has a separate array, so the same style will usually have a different istd in two different documents. Thus style matching between documents must be done by name (or by sti if the styles are built-in.)
Styles are usually referred to using an istd. The istd is an index into an array of STD's (STyle Descriptions). A (doc, istd) pair uniquely identifies a style because it tells which style in which array.
Parts of a style (for more information, see the STD structure below):
- sti: A style identifier. Built-in styles have an sti that indicates which built-in style they are. User-defined styles all have stiUser.
- sgc: The type of style, either paragraph or character.
- istdBase: The style that this style is based on.
- istdNext: The style that should be applied after this one.
- stzName: The name of a style, unique within its stylesheet.
- UPX: The difference between this style and the one it is based on.
- UPE: The properties of this style (a PAP, CHP, and/or grpprl).
Every paragraph has a paragraph style. Every character has a character style. The default paragraph style is Normal (stiNormal, istdNormal). The default character style is Default Paragraph Font (stiNormalChar, istdNormalChar).
The formatting of a paragraph (the PAP) and a character (the CHP) depend on the paragraph and character styles applied to them, as well as any additional formatting stored in the FKPs. The PAP and CHP are constructed in a layered fashion:
For a PAP:
An initial PAP is determined by getting the PAP from the paragraph's style.
Any paragraph formatting stored in the file (the FKP papx's) is then applied to that PAP.
For a CHP:
An initial CHP is determined by getting the CHP from the paragraph's style.
Properties from the character's style (the UPX.chpx.grpprl) are then applied to that CHP.
Any character formatting stored in the file (the FKP chpx's) is the applied to that CHP.
Note that the resulting PAP and CHP have fields that indicate what style was applied: PAP.istd, CHP.istd.
Stylesheet File Format
The style sheet (STSH) is stored in the file in two parts, a STSHI and then an array of STDs. The STSHI contains general information about the following stylesheet, including how many styles are in it. After the STSHI, each style is written as an STD. Both the STSHI and each STD are preceded by a ushort that indicates their length.
Field | Size | Comment |
cbStshi | 2 bytes | size of the following STSHI structure |
STSHI | (cbStshi) | Stylesheet Information |
Then for each style in the stylesheet (stshi.cstd), the following is stored:
cbStd | 2 bytes | size of the following STD structure |
STD | (cbStd) | the style description |
STSHI:
The STSHI structure has the following format:
// STSHI: STyleSHeet Information, as stored in a file
// Note that new fields can be added to the STSHI without invalidating
// the file format, because it is stored preceded by it's length.
// When reading a STSHI from an older version, new fields will be zero.
typedef struct _STSHI
{
ushort cstd; // Count of styles in stylesheet
ushort cbSTDBaseInFile; // Length of STD Base as stored in a file
BF fStdStylenamesWritten : 1; // Are built-in stylenames stored?
BF : 15; // Spare flags
ushort stiMaxWhenSaved; // Max sti known when this file was written
ushort istdMaxFixedWhenSaved; // How many fixed-index istds are there?
ushort nVerBuiltInNamesWhenSaved; // Current version of built-in stylenames
FTC rgftcStandardChpStsh[3]; // ftc used by StandardChpStsh for this document
} STSHI;
The cb preceding the STSHI in the file is the length of the STSHI as stored in the file. The current definition of the STSHI structure might be longer or shorter than that stored in the file, the stylesheet reader routine needs to take this into account.
stshi.cstd: The number of styles in this stylesheet. There will be stshi.cstd (cbSTD, STD) pairs in the file following the STSHI. Note that styles can be empty, i.e. cbSTD == 0.
stshi.cbSTDBaseInFile: The STD structure (see below) is divided into a fixed-length "base", and a variable length part. The stshi.cbSTDBaseInFile indicates the size in bytes of the fixed-length base of the STD as it was written in this file. If the STD base is grown in a future version, the file format doesn't change, because the stylesheet reader can discard parts it doesn't know about, or use defaults if the file's STD is not as large as it was expecting. (Currently, stshi.cbSTDBaseInFile is 8.)
stshi.fStdStylenamesWritten: Previous versions of Word did not store the style name if the style was a built-in style; Word 6.0 does, for compatibility with future versions. Note that the built-in stylenames may need to be "regenerated" if the file is opened in a different language or if stshi.nVerBuiltInNamesWhenSaved doesn't match the expected value.
stshi.stiMaxWhenSaved: This indicates the last built-in style known to the version of Word that saved this file.
stshi.istdMaxFixedWhenSaved: Each array of styles has some fixed-index styles at the beginning. This indicates the number of fixed-index positions reserved in the stylesheet when it was saved.
stshi.nVerBuiltInNamesWhenSaved: Since built-in stylenames are saved with the document, this provides an way to see if the saved names are the same "version" as the names in the version of Word that is loading the file. If not, the built-in stylenames need to be "regenerated", i.e. the old names need to be replaced with the new.
stshi.rgftcStandardChpStsh: This is the default fonts for this stylesheet. The first is for Asci characters (0-127), the second is for Far East characters, and the third is the default font for non-Far East, non-Asci text. See notes on sprmCRgftcX for details.
STD:
The style description is stored in an STD structure as follows:
// STD: STyle Definition
// The STD contains the entire definition of a style.
// It has two parts, a fixed-length base (cbSTDBase bytes long)
// and a variable length remainder holding the name, and the upx and upe
// arrays (a upx and upe for each type stored in the style, std.cupx)
// Note that new fields can be added to the BASE of the STD without
// invalidating the file format, because the STSHI contains the length
// that is stored in the file. When reading STDs from an older version,
// new fields will be zero.
typedef struct _STD
{
// Base part of STD:
ushort sti : 12; /* invariant style identifier */
ushort fScratch : 1; /* spare field for any temporary use,
always reset back to zero! */
ushort fInvalHeight : 1; /* PHEs of all text with this style are wrong */
ushort fHasUpe : 1; /* UPEs have been generated */
ushort fMassCopy : 1; /* std has been mass-copied; if unused at
save time, style should be deleted */
ushort sgc : 4; /* style type code */
ushort istdBase : 12; /* base style */
ushort cupx : 4; /* # of UPXs (and UPEs) */
ushort istdNext : 12; /* next style */
ushort bchUpe; /* offset to end of upx's, start of upe's */
ushort fAutoRedef : 1; /* auto redefine style when appropriate */
ushort fHidden : 1; /* hidden from UI? */
ushort : 14; /* unused bits */
// Variable length part of STD:
XCHAR xstzName[2]; /* sub-names are separated by chDelimStyle */
/* char grupx[]; */
/* the UPEs are not stored on the file; they are a cache of the based-on
chain */
/* char grupe[]; */
} STD;
The cb preceding each STD is the length of the data, which includes all of the STD except the grupe array (which is derived after the file is read in, by building each UPE from the base style UPE plus the exceptions in the UPX.) A cb of zero indicates an empty slot in the style array, i.e. no style has that istd. Note that the STD structure may be longer or shorter than the one stored in the file, stshi.cbSTDBaseInFile indicates the length of the base of the STD (up to stzName) as stored in the file. The stylesheet reader routine has to take this into account.
The variable-length part of the STD actually has three variable-length subparts, the xstzName, the grupx, and the grupe. Since this doesn't fit well into a C structure declaration, some processing is needed to figure out where one part stops and the next part begins. An important note is that all variable-length parts and subparts of the STD begin on EVEN-BYTE OFFSETS within the STD, even if the length of the preceding variable-length part was odd.
std.sti: The sti is an identifier which built-in style this is, or stiUser for a user-defined style. An sti is intended to be permanent through versions of Word, although new sti's may be added in new versions. The sti definitions are:
// standard sti codes - these are invariant identifiers for built-in styles
// and must remain the same (i.e. don't renumber them, or old files will be
// messed up.)
// NOTE: sti and istd are the same for Normal and level styles
// If you want to define a new built-in style:
// 1) Decide if you really need one--it will exist in all future versions!
// 2) Add a new sti below. You can take the first available slot.
// 3) Change stiMax, and stiPapMax or stiChpMax
// 4) Add entry to _dnsti, and the two ids's in strman.pp
// 5) Add case in GetDefaultUpdForSti
// 6) Change cstiMaxBuiltinDependents if necessary
// If you want to change the definition of a built-in style
// 1) In order to make WinWord 2 documents that use the style look like
// they did in WinWord 2, add a case in GetDefaultUpdForSti to handle
// fOldDef. This definition will be used when converting WinWord 2
// stylesheets.
// 2) If you change the name of a built-in style, increment nVerBuiltInNames
#define stiNormal 0 // 0x0000
#define stiLev1 1 // 0x0001
#define stiLev2 2 // 0x0002
#define stiLev3 3 // 0x0003
#define stiLev4 4 // 0x0004
#define stiLev5 5 // 0x0005
#define stiLev6 6 // 0x0006
#define stiLev7 7 // 0x0007
#define stiLev8 8 // 0x0008
#define stiLev9 9 // 0x0009
#define stiLevFirst stiLev1
#define stiLevLast stiLev9
#define stiIndex1 10 // 0x000A
#define stiIndex2 11 // 0x000B
#define stiIndex3 12 // 0x000C
#define stiIndex4 13 // 0x000D
#define stiIndex5 14 // 0x000E
#define stiIndex6 15 // 0x000F
#define stiIndex7 16 // 0x0010
#define stiIndex8 17 // 0x0011
#define stiIndex9 18 // 0x0012
#define stiIndexFirst stiIndex1
#define stiIndexLast stiIndex9
#define stiToc1 19 // 0x0013
#define stiToc2 20 // 0x0014
#define stiToc3 21 // 0x0015
#define stiToc4 22 // 0x0016
#define stiToc5 23 // 0x0017
#define stiToc6 24 // 0x0018
#define stiToc7 25 // 0x0019
#define stiToc8 26 // 0x001A
#define stiToc9 27 // 0x001B
#define stiTocFirst stiToc1
#define stiTocLast stiToc9
#define stiNormIndent 28 // 0x001C
#define stiFtnText 29 // 0x001D
#define stiAtnText 30 // 0x001E
#define stiHeader 31 // 0x001F
#define stiFooter 32 // 0x0020
#define stiIndexHeading 33 // 0x0021
#define stiCaption 34 // 0x0022
#define stiToCaption 35 // 0x0023
#define stiEnvAddr 36 // 0x0024
#define stiEnvRet 37 // 0x0025
#define stiFtnRef 38 // 0x0026 char style
#define stiAtnRef 39 // 0x0027 char style
#define stiLnn 40 // 0x0028 char style
#define stiPgn 41 // 0x0029 char style
#define stiEdnRef 42 // 0x002A char style
#define stiEdnText 43 // 0x002B
#define stiToa 44 // 0x002C
#define stiMacro 45 // 0x002D
#define stiToaHeading 46 // 0x002E
#define stiList 47 // 0x002F
#define stiListBullet 48 // 0x0030
#define stiListNumber 49 // 0x0031
#define stiList2 50 // 0x0032
#define stiList3 51 // 0x0033
#define stiList4 52 // 0x0034
#define stiList5 53 // 0x0035
#define stiListBullet2 54 // 0x0036
#define stiListBullet3 55 // 0x0037
#define stiListBullet4 56 // 0x0038
#define stiListBullet5 57 // 0x0039
#define stiListNumber2 58 // 0x003A
#define stiListNumber3 59 // 0x003B
#define stiListNumber4 60 // 0x003C
#define stiListNumber5 61 // 0x003D
#define stiTitle 62 // 0x003E
#define stiClosing 63 // 0x003F
#define stiSignature 64 // 0x0040
#define stiNormalChar 65 // 0x0041 char style
#define stiBodyText 66 // 0x0042
#define stiBodyText2 67 // 0x0043
#define stiListCont 68 // 0x0044
#define stiListCont2 69 // 0x0045
#define stiListCont3 70 // 0x0046
#define stiListCont4 71 // 0x0047
#define stiListCont5 72 // 0x0048
#define stiMsgHeader 73 // 0x0049
#define stiSubtitle 74 // 0x004A
#define stiSalutation 75 // 0x004B
#define stiDate 76 // 0X004C
#define stiBodyText1I 77 // 0x004D
#define stiBodyText1I2 78 // 0x004E
#define stiNoteHeading 79 // 0x004F
#define stiBodyText2 80 // 0x0050
#define stiBodyText3 81 // 0x0051
#define stiBodyTextInd2 82 // 0x0052
#define stiBodyTextInd3 83 // 0x0053
#define stiBlockQuote 84 // 0x0054
#define stiHyperlink 85 // 0x0055 char style
#define stiHyperlinkFollowed 86 // 0x0056 char style
#define stiStrong 87 // 0x0057 char style
#define stiEmphasis 88 // 0x0058 char style
#define stiNavPane 89 // 0x0059 char style
#define stiPlainText 90 // 0x005A
#define stiMax 91 // number of defined sti's
#define stiUser 0x0ffe // user styles are distinguished by name
#define stiNil 0x0fff // max for 12 bits
See below for the names of these styles.
std.stc: The type of each style is indicated by std.sgc. The two types currently in use are:
sgcPara | 1 | // A paragraph style |
sgcChp | 2 | // A character style |
More style types may exist in the future, so styles of an unknown type should be discarded.
std.istdBase: The style that this style is based on. A style is always based on another style or the null style (istdNil). Following a "chain" of based-on styles will always end at the null style, because a based-on chain cannot have a loop in it. A style can have up to 11 "ancestors" in its based-on chain, including the null style. A style's definition is built up from the style that it is based on. See std.cupx, std.grupx, std.grupe.
std.istdNext: The style that should be applied after this one. For a paragraph style, this is the style that is applied when Enter is pressed at the end of a paragraph. For a character style, the next style is essentially ignored, but should be the same as the current style.
std.xstzName: The name of the style, including aliases. The name is stored as an xstz (preceded by a length byte, followed by a null-terminator.) A style name can contain multiple "aliases", separated by commas. Aliases are alternate names for the same style (e.g. a style named "a,b,c" has three aliases, and can be referred to by "a", "b", or "c", or any combination.) WinWord 2.x did not have aliases, but MacWord 5.x did. If a style is a built-in style, the built-in stylename is always stored first.
All names (and aliases) must be unique within a stylesheet (e.g. styles "a,b" and "b,c" should not exist in the same stylesheet, as "b" matches multiple stylenames.)
A stylename (including all its aliases and comma separators) can be up to 253 characters long. So the xstz format of that name can be up to 255 characters. Stylenames are case sensitive.
The built-in stylenames (corresponding to each sti above) are defined for each language version of Word. For the USA, the names are:
// These are the names of the built-in styles as we want to present them
// to the user.
Normal
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Heading 7
Heading 8
Heading 9
Index 1
Index 2
Index 3
Index 4
Index 5
Index 6
Index 7
Index 8
Index 9
TOC 1
TOC 2
TOC 3
TOC 4
TOC 5
TOC 6
TOC 7
TOC 8
TOC 9
Normal Indent
Footnote Text
Annotation Text
Header
Footer
Index Heading
Caption
Table of Figures
Envelope Address
Envelope Return
Footnote Reference
Annotation Reference
Line Number
Page Number
Endnote Reference
Endnote Text
Table of Authorities
Macro Text
TOA Heading
List
List 2
List 3
List 4
List 5
List Bullet
List Bullet 2
List Bullet 3
List Bullet 4
List Bullet 5
List Number
List Number 2
List Number 3
List Number 4
List Number 5
Title
Closing
Signature
Default Paragraph Font
Body Text
Body Text Indent
List Continue
List Continue 2
List Continue 3
List Continue 4
List Continue 5
Message Header
Subtitle
Salutation
Date
Body Text First Indent
Body Text First Indent 2
Note Heading
Body Text 2
Body Text 3
Body Text Indent 2
Body Text Indent 3
Block Text
Hyperlink
Followed Hyperlink
Strong
Emphasis
Document Map
Plain Text
std.cupx: This is the number of UPXs in the std.grupx array. See below.
std.grupx: This is an array of variable-length UPXs, with std.cupx UPXs in the array. This array begins after the variable-length xstzName field, at the next even-byte offset within the STD. A UPX (Universal Property eXception) describes the difference in formatting of this style as compared to its based-on style. The UPX structure looks like this:
typedef union _UPX
{
struct
{
uchar grpprl[cbMaxGrpprlStyleChpx];
} chpx;
struct
{
ushort istd;
uchar grpprl[cbMaxGrpprlStylePapx];
} papx;
uchar rgb[1];
} UPX;
Each UPX stored in a file is not a complete UPX, rather it is a UPX with all trailing zero bytes lopped off, and preceded by a ushort length field. So it is stored like:
Field | Size | Comment |
cbUPX | 2 bytes | size of the following UPX structure |
UPX | (cbUPX) | Nonzero prefix of a UPX structure |
Each UPX begins on an even-byte offset within the STD, even if the length of the previous UPX (cbUPX) was odd.
The meaning of each UPX depends on the style type (std.sgc). For a paragraph style, std.cupx is 2. The first UPX is a paragraph UPX (UPX.papx) and the second UPX is a character UPX (UPX.chpx). For a character style, std.cupx is 1, and that UPX is a character UPX (UPX.chpx). Note that new UPXs may be added in the future, so std.cupx might be larger than expected. Any UPXs past those expected should be discarded.
The grpprl within each UPX contains the differences of this property type for this style from the UPE of that property type for the based on style. For example, if two paragraph styles, A and B, were identical except that B was bold where A was not, and B was based on A, B would have two UPXs, where the paragraph UPX would have an empty grpprl, and the character UPX would have a bold sprm in the grpprl. Thus B looks just like A (since B is based on A), with the exception that B is bold.
std.grupe: This is an array (group) of variable-length UPEs. These are not stored in the file! Rather, they are constructed using the std.istdBase and std.grupx fields. A UPE (Universal Property Expansion) describes the "end-result" of the property formatting, i.e. what the style looks like. The UPE structure is the non-zero prefix of a UPD structure. The UPD structure looks like this:
typedef union _UPD
{
PAP pap;
CHP chp;
struct
{
ushort istd;
uchar cbGrpprl;
uchar grpprl[cbMaxGrpprlStyleChpx];
} chpx;
} UPD;
The std.grupe and std.grupx arrays are similar: there is one UPE for each UPX, and internally they are stored similarly (a length ushort followed by a non-zero prefix), though remember that the UPEs are not stored in the file. The meaning of each UPE depends on the style type (std.sgc). For a paragraph style, the first UPE is a PAP (UPE.pap). The second UPE is a CHP (UPE.chp). For a character style, the first UPE is a CHPX (UPE.chpx).
The UPEs for a style are constructed by taking the UPEs from the based-on style, and applying the UPXs to them. Obviously, if the UPEs for the based-on style haven't yet been constructed, that style's UPE needs to be constructed first. Eventually by following the based-on chain, a style will be based on the null style (istdNil). The UPEs for the null style are predefined:
- The UPE.pap for the null style is all zeros, except fWidowControl which is 1, dyaLine which is 240, and fMultLinespace which is 1.
- The UPE.chp for the null style is all zeros, except istd which is 10 (istdNormalChar), hps which is 20, lid which is 0x0400, and ftc which is set to the STSHI.ftcStandardChpStsh.
- The UPE.chpx for the null style has an istd of zero, a cbGrpprl of zero (and an empty grpprl).
So, for a paragraph style, the first UPE is a UPE.pap. It can be constructed by starting the with first UPE from the based-on style (std.istdBase), and then applying the first UPX (UPX.papx) in std.grupx to that UPE. To apply a UPX.papx to a UPE.pap, set UPE.pap.istd equal to UPX.papx.istd, and then apply the UPX.papx.grpprl to UPE.pap. Similarly, the second UPE is a UPE.chp. It can be constructed by starting with the second UPE from the based-on style, and then applying the second UPX (UPX.chpx) in std.grupx to that UPE. To apply a UPX.chpx to a UPE.chp, apply the UPX.chpx.grpprl to UPE.chp. Note that a UPE.chp for a paragraph style should always have UPE.chp.istd == istdNormalChar.
For a character style, the first (and only) UPE (a UPE.chpx) can be constructed by starting with the first UPE from the based-on style (std.istdBase), and then applying the first UPX (UPX.chpx) in std.grupx to that UPE. To apply a UPX.chpx to a UPE.chpx, take the grpprl in UPE.chpx.grpprl (which has a length of UPE.chpx.cbGrpprl) and merge the grpprl in UPX.chpx.grpprl into it. Merging grpprls is a tricky business, but for character styles it is easy because no prls in character style grpprls should interact with each other. Each prl from the source (the UPX.chpx.grpprl) should be inserted into the destination (the UPE.chpx.grpprl) so that the sprm of each prl is in increasing order, and any prls that have the same sprm are replaced by the prl in the source. UPE.chpx.cbGrpprl is then set to the length of resulting grpprl, and UPE.chpx.istd is set to the style's istd.