The IFILTER_INIT enumeration vallues control text canonicalization, attribute output, embedding scope, and IFilter access patterns. These flags are used by IFilter::Init.
typedef enum tagIFILTER_INIT
{
IFILTER_INIT_CANON_PARAGRAPHS = 1,
IFILTER_INIT_HARD_LINE_BREAKS = 2,
IFILTER_INIT_CANON_HYPHENS = 4,
IFILTER_INIT_CANON_SPACES = 8,
IFILTER_INIT_APPLY_INDEX_ATTRIBUTES = 16,
IFILTER_INIT_APPLY_OTHER_ATTRIBUTES = 32,
IFILTER_INIT_INDEXING_ONLY = 64,
IFILTER_INIT_SEARCH_LINKS = 128
} IFILTER_INIT;
Generally, text output by GetText should exactly match the actual text of the document, but in order to achieve maximum interoperability some canonicalization of common features is desirable. These features include paragraph breaks, line breaks, hyphens and spaces. IFilter servers can also embed null characters in text, which will be nearly ignored by clients. That is, Unicode character 0x0000 will be completely ignored and 0x0001 will be treated as a word break.
Four flags control canonicalization. They are
IFILTER_INIT_CANON_PARAGRAPHS,
IFILTER_INIT_HARD_LINE_BREAKS,
IFILTER_INIT_CANON_HYPHENS, and
IFILTER_INIT_CANON_SPACES.
Different clients of IFilter will want different views of an object. Two flags, IFILTER_INIT_APPLY_INDEX_ATTRIBUTES and IFILTER_INIT_APPLY_OTHER_ATTRIBUTES, control the set of attributes that should be applied to chunks. In addition, specific attributes may be requested in IFilter::Init calls as an array of size cAttributes, stored in aAttributes.
IFilter implementations will need to store some chunk information when operations other than content indexing occur. IFILTER_INIT_INDEXING_ONLY will optimize the filter for indexing.
For viewing purposes, it may be desirable to search across links as well as in the document and any objects it embeds. IFILTER_INIT_SEARCH_LINKS specifies recursively searching all links.
IFilter::BindRegion, IFilter::GetChunk, IFilter::Init, IFilter::GetText, IFILTER_INIT