Analyze the Source File

Once you know where you want to end up, look at the source file to find out what format you are starting with. The worst, and fortunately least likely, case would be to have a file created by some obscure WYSIWYG word processor that stored text and formatting separately. More likely, you will have a file created by a type-setting system or a desktop publishing program that creates files in plain ASCII with embedded tags.

Ideally, you will have complete documentation on the system used, with a list of all tags and their precise meaning. If you don't have this information, extract a list of the tags from the source file, and then figure out the meaning of each by comparing its use in the source file to its effect in the printed document.

Once familiar with all the tags used in your source file, you can indicate on the tag list which to convert and which to delete. Your analysis should let you be able to directly convert character attributes and hierarchical elements. However, cross references, footnotes, and various kinds of other links will be more difficult.

Note:

The most difficult type of automatic tagging is that of cross references and other links because these often aren't tagged in the source file. The more obvious ones, such as “see Chapter 6,” or “see Section 3.7.2,” or “as shown in Figure 5-2,” can probably be done, because they refer to an element that should already be tagged. References to something “earlier in this chapter” or “shown previously” are more difficult.