| Skip navigation |
||||||||||||||||||||||||||||||||||||||||||||||||||
| |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
Automating the production of bibliographic records for MEDLINE12.3 Alternative method for text verification The conventional approach to verifying the text output of any OCR system, or as in the case of MARS the output of a succession of automated processes, is to present the text in the same sequence as it appears on the printed page, and to highlight the low confidence characters (in color) in the text words. Then, as in our reconcile workstation, the operator can "tab" quickly from one suspected character to the next and make the necessary corrections. This conventional approach has some drawbacks. For example, the operator must detect the suspected character surrounded by a mass of correct text. Also, the text must be corrected as encountered, thereby breaking the rhythm of identifying incorrect characters. An alternative method is proposed that may prove to improve operator productivity. Called Carpet Character Certification and Correction, or the "Carpet" method, it involves grouping like characters (drawn from a number of pages or journal issues at the same time) and displaying them in groups in a single window, as shown in Figure 12.3.1. Each character appears in its "edit box." Only low confidence A - Z and 0 - 9 characters, of the same type, would be displayed in groups. The example shows a set of characters in the edit boxes, mostly e's, some of them a misreading of an s or an E as shown in the corresponding bitmapped images right above the edit boxes. Since context is important to detect poorly captured character shapes, the system must provide the display of the image fragment (a word or phrase) that provides the context in which the (presumably) incorrect character appears. Such context will particularly help distinguish letters or numbers that appear similar, e.g., 1, I or 0,O. The GUI for the Carpet system must possess the following functions:
The Carpet system is to be implemented using Visual C++ with the Kodak Image libraries. Following the software development, we intend to conduct a performance study using this system for reconciling, and measure the residual error rate and the time taken for correcting and verifying all the characters from a complete journal issue at a time. Should the accuracy and time saved prove to be an improvement over the current verification method, this module will be incorporated in the reconcile workstation software.
| |||||||||||||||||||||||||||||||||||||||||||||||||
CEB Home | CEB Projects | Related Work | Publications |
Repositories | NHANES | Site Index
URL: http://archive.nlm.nih.gov/pubs/thoma/mars2001_18.php
|
||||||||||||||||||||||||||||||||||||||||||||||||||