STAT_CHUNK

The STAT_CHUNK structure describes a chunk.

typedef struct tagSTAT_CHUNK
{
    ULONG              idChunk;
    CHUNK_BREAKTYPE    breakType;
    CHUNKSTATE         flags;
    LCID               locale;
    FULLPROPSPEC       attribute;
    ULONG              idChunkSource;
    ULONG              cwcStartSource;
    ULONG              cwcLenSource;
} STAT_CHUNK;
 

Members

idChunk
The chunk identifier. Chunk identifiers must be unique for the current instantiation of IFilter. Chunk Identifiers must be in increasing order. the order in which chunks are numbered should correspond to the order in which they appear in the source document. Some search engines may take advantage of the inter-attribute proximity exposed between chunks of various attributes. If so, the order in which chunks with different attributes are emitted will be important to the search engine.
breakType
The type of break that separates the previous chunk from the current chunk. Values are from the CHUNK_BREAKTYPE enumeration.
flags
Flags indicate whether this chunk contains text or a non-text attribute value. Values are taken from the CHUNKSTATE enumeration. If the CHUNK_TEXT flag is set, IFilter::GetText should be used to retrieve the contents of the chunk and parse it as a series of words. If the CHUNK_VALUE flag is set, IFilter::GetValue should be used to retrieve the value and treat it as a single property value. If the filter wishes the same text to be treated as both text and value it should be emitted twice in two different chunks, each with one flag set.
locale
The language and sub-language associated with a chunk of text. Chunk locale will be used by document indexers to perform proper word breaking of text. If the chunk is neither text nor a non-text attribute value of type VT_LPWSTR, VT_LPSTR or VT_BSTR, this field is ignored
attribute
The attribute to be applied to the chunk. If a filter wishes the same text to have more than one attribute, it needs to emit the text once for each attribute in separate chunks.

Following is an example of this that might come from a book:

The small detective exclaimed, "C'est finis!"

        Confessions

The room was silent for several minutes. After thinking very hard about it, the young woman asked, "But how did you know?"

This section might be broken into chunks as follows:
ID Text breakType flags locale attribute
1 The small dete N/A CHUNK_TEXT ENGLISH_UK CONTENT
2 ctive exclaimed, CHUNK_NO_
BREAK
N/A N/A N/A
3 "C'est finis!" CHUNK_EOW CHUNK_TEXT FRENCH_BELGIAN CONTENT
4 Confessions CHUNK_EOC CHUNK_TEXT ENGLISH_UK CHAPTER_
NAMES
5 Confessions CHUNK_EOP CHUNK_TEXT ENGLISH_UK CONTENT
6 The room was silent for several minutes. CHUNK_EOP CHUNK_TEXT ENGLISH_UK CONTENT
7 After thinking very hard about it, the young woman asked, "But how did you know?" CHUNK_EOS CHUNK_TEXT ENGLISH_UK CONTENT

The following three fields are used to describe the source of a derived chunk, that is, one that can be mapped back to a section of contents. For example, the heading of a chapter is both contents and a special type of contents — heading; heading would be a derived chunk. If the text of the current non-contents chunk (psuedo-property or property) is derived from some contents chunk, then:

idChunkSource
the identifier of the source of a derived chunk.
cwcStartSource
The offset from which the source text for a derived chunk starts in the source chunk.
cwcLenSource
The length in characters of the source text from which the current chunk was derived. A zero value signifies that there is character-by-character correspondence between the source text and the derived text. A non-zero value means that there is no such direct correspondence.

Remarks

Information provided by idChunkSource, cwcStartSource, and cwcLenSource is useful for a search engine that highlights hits. If the query is done for a pseudoproperty, the search engine will highlight the original text from which the text of the pseudoproperty has been derived. For instance, for a C++ code filter, when searching for MyFunction in pseudoproperty "function definitions," the browser will highlight the function header in the file. If the chunk is not derived, idChunkSource must be the same as idChunk. If the filter attributes specify a pseudoproperty only, then there is no content chunk from which the current pseudoproperty chunk is derived. In this case, idChunkSource must be set to 0, which is an invalid chunk id.

QuickInfo

  Windows NT: Use version 5.0 or later.
  Windows: Unsupported.
  Windows CE: Unsupported.
  Header: Declared in filter.h.

See Also

IFilter::GetChunk, IFilter::GetText, IFilter::GetValue, CHUNK_BREAKTYPE