How to Write an RTF Reader

There are three basic things that an RTF reader must do:

  1. Separate text from RTF controls.

  2. Parse an RTF control.

  3. Dispatch an RTF control.

Separating text from RTF controls is relatively simple, because all RTF controls begin with a backslash. Therefore, any incoming character that is not a backslash is text and will be handled as text. (Of course, what one does with that text may be relatively complicated.)

Parsing an RTF control is also relatively simple. An RTF control is either (a) a sequence of alphabetic characters followed by an optional numeric parameter, or (b) a single non-alphanumeric character.

Dispatching an RTF control, on the other hand, is relatively complicated. A recursive-descent parser tends to be overly strict because RTF is intentionally vague about the order of various properties relative to one another. However, whatever method you use to dispatch an RTF control, your reader should do the following: