Performance of RAID as a Storage System for Internet Image Delivery
Susan E. Hauser
Lewis E. Berman
George R. Thoma
Lister Hill National Center for Biomedical Communications
National Library of Medicine
Bethesda, Maryland 20894
ABSTRACT
Redundant Array of Inexpensive Disks (RAID) vendors rely on multi-megabyte files and large numbers of physical disks to achieve the high transfer rates and Input/Output Operations Per Second (IOPS) quoted in the promotional literature. Practical image database applications do not always deliver such large files and cannot always afford the cost of the large numbers of disks required to match the vendors' performance claims. Because the user is often waiting on-line to view the images, applications deployed on the World Wide Web (WWW) are especially sensitive to keeping inline images relatively small. For such applications, the expected performance advantages of RAID storage may not be achieved.
The Lister Hill National Center for Biomedical Communications houses three image datasets on a SPARCstorage Array RAID system. Applications deliver these images to users via the Internet using the WWW and other client/server programs. Although approximately 3% of the images exceed 1 MB in size, the average file size is less than 200 KB and approximately 60% of the files are less than 100 KB in size. A study was undertaken to determine the configuration of the RAID system that will provide the fastest retrieval of these image files and to discover general principles of RAID performance. Average retrieval times with single processes and with concurrent processes are measured and compared for several configurations of RAID levels 5 and 0+1. A few trends have emerged showing a tradeoff between optimally configuring the RAID for a single process or for concurrent processes.
Keywords: RAID performance, storage subsystems performance, Internet applications
1. BACKGROUND
The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine, has procured a RAID subsystem to house data for prototype applications that deliver images to clients over the Internet . The RAID system is a Sun SPARCstorage Array (SSA) model 101, a compact external package with slots for up to 30 drives, connected to six internal fast wide SCSI busses. Initially, the SSA was configured with eighteen Seagate ST-31200W 1.05 GB disks. Recently, twelve Seagate ST-32550W 2.1 GB disks were added. The SSA is connected to a Sun SPARCstation 20 via a Fiber Channel port. SPARCstorage Volume Manager software supports use of the SSA as independent volumes or as RAID 0, RAID 1, RAID 0+1 or RAID 5 volumes.
Image files for World Wide Web (WWW) based prototype applications and other client-server Internet-based prototype applications are stored on the SSA. A consideration in the design of the image datasets for these interactive applications, where the user is typically waiting on-line for the images to appear, was the size and number of image files to be delivered to the user upon request. Consequently, many files are compressed or are reduced-resolution versions of larger image files stored elsewhere, thereby reducing delivery time. Figure 1 shows the distribution of the image file sizes of the three image datasets currently stored on the SSA. The first two columns of the histogram represent the number of files between 0 and 100 KB in size, which account for 59.4% of the total number of files. The first four columns of the histogram represent the number of files between 0 and 200 KB in size, which account for 90% of the total number of files. There are occasional files between 200 KB and 3200 KB in size. The average file size of the entire image collection of 12,211 files is 176.22 KB.
Figure 1. Distribution of file sizes from three image datasets.
The SSA was advertised to offer very fast file access and delivery as well as selectable levels of fault tolerance. However, the performance quoted in the advertisements was obtained under conditions unlikely to be seen in practical applications: multi-megabyte-sized files and eighteen or more disks in a striped array[1]. Furthermore, guidelines for selecting RAID level and stripe width to optimize file retrieval performance were general, and tended to assume that all files were of the same size. Since we were lacking a comprehensive prescription for optimally configuring the SSA for performance, we began a study to determine how various configuration parameters would affect file retrieval performance for our image datasets.
2. PREVIOUS WORK
Earlier phases of the study focused on one of the image databases and used just six of the disks in the SSA for performance measurements[2]. We discovered that for small files, RAID 0 or RAID 5 offered little to no performance advantages over balancing the load across the same number of disks configured as independent volumes. Only RAID 0+1, striping plus disk mirroring, met or exceeded the performance of independent volumes for both sequential file retrieval and concurrent file retrieval. Disk mirroring also offers a high degree of fault tolerance, at the expense of using twice as much media. We also found that for most configurations, a narrow stripe slightly improved sequential file retrieval while a wide stripe slightly improved concurrent file retrieval. We concluded that for files sizes of a few hundred KB or less, RAID systems should be selected not for speed performance goals, but for fault tolerance and ease of storage management.
In this phase, we continue the study to determine the extent to which other factors may improve overall retrieval performance for our image datasets using the fault tolerant RAID 5 and RAID 0+1 configurations. We explore the number of disks in a volume, the location of the data on the disks, the number of SCSI controllers used for a volume, and the characteristics of the disks in the volumes.
3. STUDY CONDITIONS
For this study, twelve of the eighteen 1.05 GB disks were available for testing, as well as an additional twelve 2.1 GB disks. By removing factors such as reading from cache or swap space and system loading, the study concentrated on measuring file retrieval time from the RAID subsystem alone. Special programs were written to measure the average time to read files from the SSA into memory.
Rather than reexamine the effect of stripe width on performance, a stripe width of 160 KB was used throughout this phase of the study. 160 KB is one of the wider stripes used in the earlier phases. As noted above, the wide stripe is advantageous for concurrent performance and slightly disadvantageous for sequential performance.
4. PERFORMANCE INDICATORS
For each configuration of the SSA, three aspects of retrieval performance were measured or calculated. Average Retrieval Time is the average time for sequential reads, or the average time to read files one at a time. Average Incremental Time is the average additional time to read a file per additional concurrent process also reading files from the RAID. 2 KB Read Time is the average time to read the first 2 KB of each file on the RAID.
To measure Average Retrieval Time, a program reads file names from a list, and then reads each file in the list into system memory. Special lists were prepared in which all of the files in the three image datasets were included in random order. The resulting Average Retrieval Time is the average for all of the files in the datasets. The program was run six to twelve times over several days to obtain the grand average reported in the following sections.
To measure Average Incremental Time, the large randomized list of files was divided into 12 lists, each containing approximately the same number of files from each of the three image datasets. One process reads the files from the first list into memory, again measuring the average retrieval time. Concurrently, one to eleven other processes each read files from the other lists into memory. Linear regression is then performed on average retrieval time measured by the first process as a function of the number of concurrent processes retrieving files. The calculated slope of the linear relationship is the Average Incremental Time. The values reported in the following sections are the average of several runs.
To measure 2 KB Read Time, a program reads file names from a list, and then reads the first 2 KB of each file into system memory. The program uses the same lists as the program to measure Average Retrieval Time. The average measured time is independent of file size and is used to determine the effect of configuration changes on the cumulative latencies of the disks, SCSI and Fiber Channel interfaces and operating system overhead.
The three image datasets were copied to each configuration of the SSA that was tested. The order of the files was the same on each volume, and each test run used the same list of files names.
5. RESULTS
Figures 2 through 6 show the performance parameters for various situations. They are all in the same format. The wide columns show the Average Retrieval Time and are associated with the scale on the left vertical axis. Two narrow columns are located inside each wide column. The left narrow columns show the Average Incremental Time and the right narrow columns show the 2 KB Read Time. Both narrow columns are associated with the scale on the right vertical axis. The times shown on both vertical axes are in milliseconds. In all figures in which a volume size is stated, the size is the total number of drives used to create the volume. For example, the effective data capacity of a 6-disk RAID 5 volume is 5/6 of the total disk space, while the effective data capacity of a 6-disk RAID 0+1 volume is 1/2 of the total disk space. Unless otherwise indicated, the configurations all use the 1.05 GB disks.
Figure 2 shows the effect of volume size on the performance indicators. We would have expected the Average Retrieval Time to be smaller for larger volumes because data can be read from more disks simultaneously and because the total data resides closer to the front of each disk, where the internal transfer rates are higher. For the RAID 5 volume, there is little difference among the Average Retrieval Times, with the 8-disk volume being slightly faster than the others. For RAID 0+1, larger volumes do yield slighter faster retrieval times. One might conclude that the additional latencies from the greater number of drives tend to cancel the expected throughput increases, but the 2 KB Read Time, which is a rough measure of latency, decreases steadily with increasing numbers of drives. The only results that are intuitive to us are the Average Incremental Time data. As expected, the system serves concurrent users faster with larger numbers of disks.
Figure 2. Performance indicators as a function of number of disks in a volume.
The drives in the SSA use the zone bit recording method, where data is packed more densely on the outer cylinders of each platter[3]. Since the angular velocity is constant, the data transfer rate for the outer cylinders, which correspond to the lower-numbered addresses on the disk, is larger than for the inner cylinders. When our test dataset is copied to volumes with more disks, a greater portion of the data is on outer cylinders. Hence, the performance improvement shown in Figure 2 may be due as much to data location as to the potential for simultaneous data transfer from a larger number of drives. Figure 3 shows the effect of retrieving data from different zones of the volume. For this study, a "zone" is about 2 GB, the total size of the three image datasets being used. For the RAID 5 volume, zone 1 occupies the first quarter of each drive in the volume, zone 2 occupies the second quarter, and so forth. For the RAID 0+1 volume, the three zones overlap. The "front" zone occupies the first 44% of each drive, the "middle" zone occupies the center 44% of each drive and the "back" zone occupies the last 44% of each drive. Figure 3 shows that data location has a greater effect on Average Retrieval Time than the size of the volume. For both RAID 5 and RAID 0+1 volumes, sequential file retrieval from the end of a 12-disk volume is slower than sequential file retrieval from the front of a 6-disk volume. The results for the 2 KB read times show the same trend to a lesser extent, suggesting that positioning times are also affected by file location. The effect of data location on Average Incremental Time is even less pronounced, so applications that read files concurrently could use the whole volume with little effect on performance. For applications that read files sequentially, higher priority files should be placed on the front of the volume if possible.
Figure 3. Performance indicators as a function of data location on the disks in a volume.
Figure 4 shows the effect on the performance indicators of the number of SCSI busses used for the volume. For both RAID 5 and RAID 0+1, performance is the same for 1 and 2 drives per SCSI bus, and decreases only slightly for 3 drives per SCSI bus. The fast wide SCSI interface has a specified transfer rate of 20 MB/sec, and the 1.05 GB drives have a specified internal transfer rate of 3.4 to 5.9 MB/sec. It appears that the SCSI interfaces are efficient at handling cumulative data streams at a rate approaching their specified maximum.
Figure 4. Performance indicators as a function of number of drives per SCSI interface.
The recent addition of twelve 2.1 GB drives to the SSA allowed us to measure the effect of disk characteristics on retrieval performance. The specifications for the 2.1 GB drives suggest an improvement over the 1.05 GB drives in almost every respect[4]. Notably, internal transfer rates are 6.2 to 9 MB/sec as compared with 3.4 to 5.9 MB/sec for the 1.05 GB drives, and average latency is 4.17 ms as compared with 5.54 ms for the 1.05 GB drives. Figure 5 shows that the faster disks improve all performance indicators more any other configuration factor studied.
Figure 5. The effect of drive characteristics on performance.
Figure 6. The effect on performance of one failed disk in a RAID 5 volume.
During the study, one of the disks in a 10-disk RAID 5 test volume failed, giving us the opportunity to measure the effect of a failed drive on performance. When one disk fails, the system must reconstruct the data in each full stripe using the parity stripe. Figure 6 shows the effect of this overhead on the three performance indicators. Average Retrieval Time and 2 KB Read Time are increased to those comparable to the fourth zone of a 12-disk RAID 5 volume. Average Incremental Time is affected most and is longer than for any configuration in the study.
6. TOTAL RETRIEVAL TIMES
For the small files used this study, the three performance indicators differ among configurations by just a few hundredths of a second. Although the results are interesting, the practical effect of configuration choice on application level performance could be minimal. Figures 7 and 8 illustrate the differences by showing the total time to retrieve up to ten average size files for a few configurations of the SSA. In both figures, the order of the legend items is the same as the order of the plotted lines at the right side of the chart. Times are shown for reading the files sequentially, and for reading the files concurrently. Figure 7 shows times for four configurations using 1.05 GB disks. Figure 8 shows times for two configurations using 2.1 GB disks. For reference, the 6 disk RAID 0+1 configuration using 1.05 GB disks is shown in dotted lines in both figures. The most important thing to notice is that concurrent retrieval is always faster than sequential retrieval. For any number of files greater than two, even the slowest configuration for concurrent reads (six 1.05 GB drives in a RAID 5 volume) is faster than the fastest configuration for sequential reads (six 2.1 GB drives in a RAID 0+1 volume).
Figure 7. Total time to retrieve average size files, reading sequentially and concurrently.
Figure 8. Total time to retrieve average size files, reading sequentially and concurrently.
7. CONCLUSIONS
The primary advantage of implementing a RAID storage system is fault tolerance. Using RAID does not automatically guarantee fast file retrieval, especially for small to medium sized files. At the level of the RAID subsystem, fast disk drives and disk mirroring can do the most to improve performance. Disk mirroring trades cost for performance because it requires twice as much media as for non-fault tolerant storage. Although the larger, faster drives used in this study are more expensive per drive than the smaller drives, the cost per byte is actually less. At the application level performance can be enhanced by retrieving multiple files concurrently whenever possible. In this case, programming effort is traded for performance. If neither disk mirroring, nor faster drives nor programming effort are options, good performance can still be achieved using RAID 5 volumes by selecting eight or more drives per volume, distributing the volumes across SCSI interfaces and positioning heavily used data at the beginning of each volume.
8. REFERENCES
1. Dye, Mike, Sun Microsystems, Inc. Personal communication, August, 1995.
2. Hauser SE, et.al., "Is the bang worth the buck?: a RAID performance study," Proceedings of Fifth NASA Goddard Conference on Mass Storage Systems and Technologies, College Park, MD, to be published September 1996.
3. Sun Microsystems, Inc., Technical Product Marketing, "Configuration Planning for Sun Servers, "Third edition, January, 1994.
4. Seagate Technology, Inc., http://www.seagate.com/disc/prodmatrix.shtml, Technical specifications for drive models ST-31200W and ST-32550W, 1996.
















