IFilter Chunks

[This is preliminary documentation and subject to change.]

Each filter object can be asked to produce the "chunks" of Unicode text contained within itself. Text within one chunk is intended to be a linear, sequential flow of text with the same attribute and locale. Thus, two pieces of text that do not have such a relationship between each other would be in different chunks. Separate text boxes in a graphics file, labels and titles on charts, and possibly even text in separate cells of a spreadsheet are all examples of text in separate chunks.

Each chunk is given a chunk identifier that uniquely identifies the chunk. These identifiers are guaranteed to remain constant until the IFilter interface is released. Repeated instantiations of the IFilter interface with the same initial parameters produces the same set of chunks. Multiple instantiations with different initial parameters may produce a different set of chunks. Changing the set of attributes (see following section) may re-partition the chunks of an object. Chunk identifier 0 is invalid.

Chunks may overlap, but a specific attribute should be applied to a given character at most one time.