| Skip navigation |
||||||||
| |
||||||||
|
System for Automated Interlibrary Loan (SAIL) ProgramThis R&D program seeks to investigate the technical feasibility and role of automated document delivery to meet the requirements of the NLM's interlibrary loan (ILL) service. This program is motivated by the increasing burden faced by the Library in servicing the interlibrary loan requests in the traditional manual way. The research staff designed and built a prototype system consisting of a networked complex of PC-based workstations. This system automatically retrieved ILL requests from the NLM mainframe computer, parsed these, and used fielded data contained in them to retrieve document images from optical disks and automatically fax or print them for mailing. Operators used document capture systems developed inhouse to scan and store biomedical journals selected according to criteria that predicted high use. The system is operated in a pilot test mode to investigate performance and cost issues. Cost on a unit basis (per article delivered) turned out to be comparable to the cost of delivering documents the conventional (manual) way, even considering the disparity in volume (SAIL handled 5% of the total ILL load). The prime component in the cost figure is the labor necessary to convert paper documents to bitmapped electronic images. In terms of performance, delivery is in minutes and hours rather than days or weeks, but there is variation due to ambiguities in the requests, or not having the disk containing the requested article currently mounted in a drive, and other factors. Solutions were found for these performance problems. Since it was found that the 64 titles preselected for this pilot project delivered 5% of the total ILL requests to the NLM, proportionately a high figure considering the size of the journal collection at the Library, but of the articles stored only one-third were accessed to serve the ILL service. This has motivated a second look at the way articles are entered into the system, prompting an investigation of a point-of-request or delivery-on-demand system. The design of such a system is based on a 486 platform running under Microsoft Windows 3.1. The idea is that the only human operation involved is scanning the requested document. All other operations, e.g., faxing, printing, transmitting over Internet, extracting information from the ILL requests, and updating DOCLINE as to status, are to occur automatically in the multitasking environment of Windows. The design of this integrated system is under way. A number of engineering studies are being pursued in support of SAIL development and the library's ILL activity. Among these are the following: Artificial Neural Network (ANN) to Reduce Ambiguities. Work is centered on the application of ANN to classification problems encountered in SAIL operation continued. One such problem centered on the ambiguities in ILL requests resulting from remarks made by users in the unstructured comments field in the requests. Such ambiguities caused SAIL to automatically fill requests that the requesters did not want filled. These comments might be of the form NLM Do Not Fill, but are inconsistent. Analyzing several thousand requests, nine keywords or word pairs were found to be strong indicators to fill or not fill the requests. A subsystem consisting of a parser and an ANN of the back error propagation type was developed. Capitalizing on the sparse pidgin-like language used in typical comments, the subsystem begins by extracting NLM and the other keywords from the comments field. These keywords are used as a nine element input vector to the ANN whose output vector consists of one element: either fill, do not fill, or uncertain. Evaluation of the system showed that it correctly determined that 57% should be filled correctly determined that 30% should not be filled, left 13% in the uncertain category, and made 0.3% errors. The conclusion is that 87% of the ILL requests which have unstructured comments can be handled automatically, and that the remaining 13% may be referred to a human operator for a decision. By significantly reducing the operator intervention required, this research promises to yield time and cost savings in future operational systems for automated document delivery. This work appears in the literature: SH Hauser, W Hsu, GR Thoma: "Request Routing with a Back Error Propagation Network", Proc. SPIE Conference on Intelligent Information Systems, 1993, Vol. 1965, pp. 689-95. Artificial Neural Network for Journal Identification. Image analysis is the subject of another ANN project intended to aid scanning operators by automatically identifying a journal, and thereby reduce operator error in the selection of journal titles while scanning. This is accomplished by processing the image characteristics of a journal's cover page. In one approach the histogram of row and column black pixel counts is used as the input signature to an ANN. In the second approach, the black pixel distribution is initially processed by a Fast Fourier Transform whose first thirty five coefficients serve as the input. The first approach was found to be successful in correctly classifying 70 of 75 different journals, but was slow to train. The second approach correctly classified 66 of 75 journals but is faster to train. Most of the errors arose because of journals whose cover pages change style with time. This research is reported in: SH Hauser, TJ Cookson, GR Thoma: "Using Back Error Propagation Networks for Automatic Document Image Classification",Proc. SPIE Conference on Intelligent Information Systems, 1993, Vol. 1965, pp. 142-50. Simulation Studies. To predict a migration path for a scaled-up SAIL system, a discrete event simulation language, GPSS/H, is being used to model the image retrieval subsystem. This model allows a representation of varying numbers of fax servers, optical disk drives, magnetic disk drives and jukeboxes. Independent variables are the rate of ILL requests, the fraction of requests that are for fax service, and the distribution of requests over the optical platter set. The model will enable the testing of strategies on how the articles should be distributed over magnetic and optical media; for example, older articles could be on optical disks and more recent ones on magnetic disks. The results of the simulation will establish theoretical bounds on the number of system components and the overall system architecture for different levels of service. Automated Portrait/Landscape Mode Detection. As part of research into automated document imaging, an algorithm was developed to detect the orientation (portrait vs. landscape) of a binary page image. Detecting page orientation is a necessary preprocessing stage for optical character recognition, skew detection or skew correction. In addition, page orientation is crucial for automated document entry in which the contents of a printed document is segmented into such regions as headlines, text columns, graphics or footnotes. The algorithm developed is based on an analysis of projection profiles, vertical and horizontal variances on a page, and a technique to reduce the impact of nontextual data (blanks, graphics, forms, line art, large fonts and dithered images). Using a sample of several thousand images of medical journal pages, the algorithm was found capable of detecting page orientation at an accuracy rate of 99.92%. This work is the subject of a patent filing, and has been reported in the literature: Le DX, Thoma GR, "Automated Portrait/Landscape Mode Detection on a Binary Image", Proc. SPIE Visual Information,Processing II, 1993, Vol. 1961, pp. 202-12. Automated Document Skew Detection. Rescanning of documents is a time consuming and costly step, but often necessary in document conversion. Errors are detected at the quality control (QC) stage. A multistage technique was designed to automatically detect page skew. The principal components of this algorithm are component labelling, a procedure to reduce the amount of data to be processed, a technique to minimize the effect of nontextual data (graphics, forms, line art, large fonts and dithered images), and the Hough transform. The algorithm is characterized by the following: (1) it uses the bottom part ("feet") of the objects (characters); (2) the data to be processed is reduced by a factor of 15 for a typical page of text, and more than 80 for a compound page; (3) the detection process can be running while a page is scanned; (4) it is independent of text dominance. The algorithm was tested with several hundred images of medical journal pages, and found to detect skew with an accuracy of about 0.5 degrees. This work appears in the literature: DX Le, GR Thoma: Document Skew Angle Detection Algorithm. Proc. SPIE Visual Information Processing II, 1993, Vol. 1961, pp. 251-62. DX Lee, GR Thoma, and H Wechsler, "Automated Page Orientation and Skew Angle Detection for Binary Document Images", Proc. 1994 IEEE International Conference On Neural Networks, Orlando, FL, June 28 - July 2, 1994, vol. 5, pp. 3009-3014. |
|||||||
CEB Home | CEB Projects | Related Work | Publications |
Repositories | NHANES | Site Index
URL: http://archive.nlm.nih.gov/proj/sail.php
|
||||||||