Converting Text

Once you have the text files in electronic format, make sure they exist in a format usable by Windows with Multimedia. Electronic files exist in many different formats and these differences often complicate the conversion process.

For example, word processing software can provide great control over the appearance of printed output. This control is made possible through the special codes added to the text file by the word processor. Full-featured word processors such as Microsoft Word for Windows even show on the screen exactly what the printed output will look like—a feature often known as WYSIWYG (pronounced, “wizzywig”), which stands for “What You See Is What You Get.”

Typesetting systems use even more extensive codes to control how text gets printed. You can add character formatting, such as bold or italic; paragraph formatting, such as hanging indents and justification; and other formatting, such as margins and page numbering, that applies to whole blocks of text.

If you are fortunate, there will be a relatively painless conversion method available for your particular set of formats. Many of the more powerful word processing programs can read and write files in a variety of formats. And even if none of the available formats is exactly what you want, one of them, such as RTF, may be easier to work with than the original format.

The Benefits of Using SGML Formatting

Individual differences among the internal tags used to structure documents can cause conversion problems. For example, one typist may define paragraphs one way (double spacing, 1.5 lines after); another typist may define paragraphs differently (single spacing, 2 lines after). Such differences can increase the time required to make the text consistent throughout an application.

One approach used to solve this problem is to format text according to the guidelines of the Standard Generalized Markup Language (SGML). The SGML standard helps to enforce consistency of tagging between texts. SGML tags define the structure of a document and the purpose of the various elements of the document, rather than the appearance of those elements.

For example, in a typical typesetting file, a chapter title might be coded as bold 24-point text centered on the page. The tagged title might look like this:

<FB><CP24>Taming the Wild Sloth<CP1><FS><QC><WR1><QL>

This tag set will produce the output desired for presentation on paper, but it may not be appropriate for other media. The same chapter title tagged in SGML might look like this:

<Chapter>Taming the Wild Sloth</Chapter>

This tag set simply says that this group of words is a chapter title. If the file is sent to a typesetting machine, an accompanying file, called a Document Type Definition (DTD), specifies that chapter titles are to be printed centered, in bold 24-point text. If the file is to be displayed on a computer screen by a program incapable of displaying 24-point type, a different DTD can be attached that specifies that chapter titles be displayed bold, centered, and underlined.

SGML can help simplify the conversion process. Nevertheless, the final electronic stage of most books is a typesetting format, not a word processing format or SGML. Whenever you convert text, make sure to examine your requirements first.