Scanning Text

The concept of feeding printed material into a machine that recognizes each letter and feeds it into a text file sounds ideal. Fortunately, such technology exists and is called optical character recognition (OCR). Under the proper circumstances it can be a very efficient way to get text into a computer.

An OCR system consists of a scanner (quite possibly the same one used to scan images), a computer, and some software. The scanner converts a page of text into a bitmapped image, and the software analyzes the letter shapes and converts them into ASCII letters. The number of predefined typefaces is usually limited to less than a dozen, although many systems have a learning facility to include new characters and typefaces.

You scan every page and then you run various utility programs (such as a spell checker) to detect misreads and other scanner errors. For the final step, print and proofread the text. The following table lists some of the benefits and drawbacks associated with using OCR technology.

Benefits Drawbacks

Scanning requires little upfront labor. Some scanners can read only a limited set of typefaces. Other scanners read more typefaces, but you first have to train them by running samples through the scanner and then calibrating its interpretation of the text.
An OCR scanner can quickly convert large amounts of printed information into electronic files (less than a minute per page). Assuming an accuracy rate of 99%, scanned text contains an average of one error for every two lines of text. This error rate can mean hundreds of thousands of errors for long texts. Make sure to schedule time for editing and proofreading.
Scanning usually costs less than re-keying. You can lose special symbols (such as Greek or other foreign characters) and complex formatting (such as tables, mathematical formulas, or special fonts).