Optical Disk Jukebox Performance in Multi-User Applications
Hauser SE
Roy G
Thoma GR,
Proceedings of the 1994 Optical Data Storage Topical Meeting
Vol. 10: pp. 53-5.
Introduction:
The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine, is evaluating an optical disk jukebox as a digital image store to support prototype systems for image distribution over the Internet. This paper summarizes a study undertaken to determine the performance characteristics of the jukebox to support multiple image databases simultaneously accessed by multiple users. A motivation for this investigation is the need to provide users access to digitized images of medical documents and radiographs.
Performance Study:
The jukebox is an HP100 model equipped with 144 5.25" magneto-optical platters and four drives. It is connected via fast SCSI to a SUN 670MP running Sun OS 4.1.3. Each platter side is formatted as a complete UNIX file system, leaving 283 megabytes (MB) for data storage, for a total usable jukebox capacity of 81.5 gigabytes (GB). The subsystem includes controlling software written by the systems integrator that supplied the jukebox.
Initial measurements were taken of the average time to retrieve a file from both an unmounted platter and from a mounted platter. Analysis of these data indicates that the time to read a file from a mounted platter into program memory is approximately 1.7 seconds per MB (or a throughput of approximately .59 MB/sec). When the desired file is not on a platter that is currently mounted, the additional time to exchange a platter, using one drive, is approximately 13.5 seconds. This includes the time for spin down, unload, platter exchange, load and spin up. The maximum rate at which the jukebox robot and controlling software can exchange platters, using all four drives, was found to be approximately 300 exchanges per hour. During each exchange, the robot is idle for approximately 4 seconds following insertion of the platter into the drive, even when other drives are not busy and read requests are pending. The systems integrator's explanation for this delay is that the software waits for the platter to spin up and be read to insure that the correct platter has been inserted.
It is anticipated that the jukebox will be the primary image store for two prototype projects, DXPNET[1] and SAIL[2], in which images are accessed by remote users via the Internet, as illustrated in Figure 1. The DXPNET database contains 5 and 10 MB files of digitized x-rays. A typical DXPNET request would be for one image. The SAIL database contains document image files ranging up to one half MB, with an average size of 80 kilobytes (KB). A typical SAIL request would be for 10 contiguous images, corresponding to the number of pages in an average journal article. Because the applications are likely to include real-time access of images for remote display, it is of interest to determine the performance that can be expected when the databases are accessed by multiple users. To eliminate the uncertain delays associated with Internet transmission, all jukebox performance measurements were taken from the host computer. For the performance tests, image files from each database were written to each platter side in the jukebox.
To evaluate the performance of the jukebox for multiple-user and multiple databases, two types of program were developed: "load" programs and "test" programs. The load programs simulate the "load" on a database server caused by multiple users while the test programs measure the effect of that load on the retrieval time that would be experienced by one user.
The load programs generate requests for image files at an average rate determined by a run time parameter. For each request, the programs use the C compiler's random number function to select the platter side from which to read the file and to determine the time interval, in milliseconds, until the request is generated. Although the average time interval between requests corresponds to the run time parameter, the distribution of time intervals is exponential[3]. When the interval has elapsed, the program generates a child process to read the file into program memory. The parent program does not wait for the file to be read before generating the next request. A load program, therefore, can at any time have several requests outstanding, thus generating a queue of requests for jukebox service. One load program generates requests for files from the SAIL image database, and the other for files from the DXPNET image database.
The test programs simulate user access by generating requests for files from randomly selected platters. The test programs wait until the current request is served, i.e., until the file is read into program memory, before generating the next request. There is no additional delay between requests. For each file requested, the test program measures and records retrieval time, which is the interval between the time the request for the file is initiated and the time the file is completely read into program memory. One test program generates requests for files from the DXPNET image database, and the other for files from the SAIL image database.
Figure 2 illustrates a typical test run, with the two load programs generating a total background load against which one or the other of the test programs measures retrieval times. For most runs, the test program measured four hundred retrieval times.
Figures 3 and 4 show average retrieval times as a function of two independent loads, the SAIL load and the DXPNET load, both in MB/sec. The maximum DXPNET load of 0.3 MB/sec corresponds to an average of 144 requests per hour. The maximum SAIL load of 0.05 MB/sec corresponds to 270 requests per hour. Both graphs show significant increases in retrieval times to hundreds of seconds at the larger combined loads, suggesting that the jukebox subsystem is approaching 100% utilization at these loads[4]. When we calculate the total number of load requests per hour corresponding to each position in the graphs, we find that very long retrieval times correspond to a background load of 279 requests per hour or more. This number is close to the 300 exchanges per hour maximum that we had measured, and confirms that the rate at which the jukebox can exchange platters is a limiting factor in the ability of the jukebox to serve multiple requests.
For lighter loads, the graphs show relatively small increases with increasing load, a possible factor being the availability of multiple drives. Because the platter in one drive can be exchanged while other drives are being read, multiple drives may buffer the impact of simultaneous access by multiple users. To quantify the effect, measurements were taken of retrieval times under light to moderate loads with two, three and four drives enabled. Average retrieval times for one SAIL file are shown in Figure5 as a function of load in exchanges per hour, where each exchange corresponds to the retrieval of one 5 MB DXPNET file. Although three drives reduce the average retrieval times as compared to two drives, the fourth drive offers no additional performance advantage at these loads.
Conclusions:
Two important factors governing the performance of an optical disk jukebox for access by multiple users are the maximum platter exchange rate and the number of available drives. These results have implications for the design of both the interface software and the application software. Interface software design should include efficient use of the robotics and intelligent queue management to maximize the platter exchange rate and minimize the number of exchanges. For example, if the 4 seconds of robot idle time were eliminated from each platter exchange, the maximum platter exchange rate would increase by approximately 50%. Application software should be designed to employ strategies to organize files or operations to minimize the number of platter exchanges. Although the attractive features of security, large capacities and low storage costs offered by optical disk jukeboxes are offset by inherently slower retrieval times, jukeboxes can be appropriate for selected applications when supported by suitable interface and application software.
References:
1. R. Long, L. E. Berman, G. R. Thoma. "Design Considerations for Wide Area Distribution of Digital X-ray Images," PACS Design and Evaluation, Proc. SPIE v 1899, Medical Imaging, 1993, pp. 383-394.
2. "System for Automated Interlibrary Loan (SAIL): System and Operations Description," internal technical report, Communications Engineering Branch, Lister Hill Center, National Library of Medicine, Bethesda, Maryland, November 1992.
3. A. M. Law and W. D. Kelton. Simulation Modeling and Analysis, McGraw-Hill, New York, 1982.
4. D. Gross and C. M. Harris. Fundamentals of Queuing Theory, 2nd ed., John Wiley & Sons, New York, 1985.









Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.