An Application-Level Technique For Faster Transmission Of Large Images On The Internet
R. Long
L. E. Berman
L. Neve
G. Roy
G. R. Thoma
Proceedings of the SPIE: Multimedia Computing and Networking 1995
Vol. 2417
February 6-8, 1995
San Jose, CA.
An application-level technique for improving the transmission rate of large files is described in this paper. Such techniques are important in areas such as telemedicine, where near-real-time delivery of large files such as digital images is a goal: end users may include specialists whose time is scarce and expensive, and timely access to the data may be necessary for effective clinical treatment. Faster delivery is also an enabling technology for accessing remote medical archives.
In conventional TCP/IP transmission, data to be transmitted is sent down one logical communication channel. Our technique divides the data into segments; each segment is sent down its own channel, and the segments are reassembled into a copy of the original data at the receiving end. This technique has been implemented and tested in a client-server program using Berkeley Unix sockets, multiple independent processes for channel control, and interprocess communication techniques to guarantee the receipt and correct reassembly of the transmitted data. Performance measurements have been made on several hundred Internet transmissions (including Arizona-to-Maryland transmissions) of 5-megabyte cervical x-ray images. Transmission time as a function of number of channels has been recorded, and a 3-fold improvement in transmission rate has been observed.
1. INTRODUCTION
1.1 Overview
The Lister Hill National Center for Biomedical Communications, a research and development division of the National Library of Medicine (NLM), is actively working on the development of prototype medical image archives using data of various imaging modalities. Data being archived includes xrays from the National Health and Nutrition Examination Surveys (NHANES), as well as CT, MRI, and digitized cryogenic section photography of cadavers from the Visible Human project. A goal is to make this data available on a variety of storage platforms that are mapped to the user's access time requirements[1]. Although images can be retrieved from the storage media with relative speed, the volume of data and competing network traffic remain as the main bottlenecks for image transmission over Wide Area Networks (WAN), in particular the Internet.
While other researchers have looked at various methods for bulk transfer rates such as the NETBLT[2] and VMTP[3] protocols, or re-transmission free schemes[4], we have investigated an application level technique for faster image transmission. At this level of integration, the software should be transportable to different client/server applications running on a variety of different platform/operating system combinations that support the Berkeley socket abstraction, shared memory, signals, and multi-processing.
1.2 Motivation
The needs of two specific in-house projects motivate our work. First, the DXPNET project[5] has taken as a goal the development of a client/server application for reading the NHANES II digitized xrays, archived on optical jukebox storage at NLM, with remotely-located workstations communicating over the Internet. Initial work on this project included several small scale experiments to estimate how much time it would take to transmit large files across widely-separated points on the Internet. During the month of September 1992 1 MByte buffers were transmitted between NLM and the University of Arizona. Two transmission methods were used: the standard FTP method of file transfer, and our own client/server application using one socket pair. The median FTP rate was found to be 18.3 KBytes/s, and the median socket rate was 16.1 KBytes/s[6] for transfers done every 15 minutes for each day during the month; this translates to a five or ten minute wait for cervical and lumbar spine images, respectively, at the remote site. This would be prohibitive since we have found that radiologists require less than a minute to read a cervical film in the conventional manner. An in-house study determined a median time of 40 seconds (not including time to find film, put film on light-box, and put film back in envelope) to do the reading.
The second project requiring Internet data delivery is a prototype public access archive of the NHANES xrays and related text information. The text information is NHANES collateral data which provides a physical, medical, and socioeconomic description of the subject persons who were x-rayed. The Internet rates and methods used in our 1992 experiment are too slow to meet the demands of a remote user interested in interactive querying and image retrieval at full resolution1.
Beyond the needs of these specific projects, though, there are other needs in the imaging community for faster Internet image transmission. Examples at NLM include the need to provide access to the recently acquired, data intensive Visible Human images, and the desire to provide wide access to historic medical documents digitized at 300 dots per inch and 24 bits per pixel, resulting in large image files. External communities potentially benefitting from faster image transmission include all those providing large images for which there is data which is in high demand, with NASA's Hubble Space Telescope images being a recent example.
1.3 Multisocket concept
The "multisocket" concept of file transmission is an experimental technique which attempts to use Internet bandwidth more efficiently by making more data available for transmission within a fixed time interval than is done with conventional "single-socket" transmission. In simple terms, the method is Break the data into segments; for each segment, create a separate logical communications stream, connected to the destination; use multi-tasking to send the segments as rapidly as possible; and, at the destination, reconstruct the segments into their original form.
2. METHOD
2.1 Description of Algorithm
A "socket" is a communications end-point. A sending socket and a receiving socket together form a socket-pair which establishes a communications link. In this paper, a single socket-pair is referred to as "single-socket" communications; the use of multiple socket-pairs to transmit data is referred to as "multisocket" communications. The link established by a socket-pair is a "logical connection" for transmitting data in a "stream.". Single-socket communications sends one data stream down one logical connection. Multisocket communications sends multiple data streams down multiple logical connections.
The operation of the multisocket algorithm can be understood by contrasting it to conventional single-socket data transmission. Figures 1 and 2 depict the relationships between the host computer, processes used, and the logical and physical connections involved.
In single-socket transmission (Figure 1) there is one principal server process which waits in passive mode for requests from a client to establish a logical connection. Once the connection is established, data may be sent between the client and server bidirectionally. (In actual implementations, the server may create a secondary "child" process to handle the communication with any particular client, while the "parent" server process waits for additional client connections. For simplicity in the diagrams, we have omitted showing this.) Once the logical connection is established, the transmission may be understood as occurring down a single, logical stream which exits the server's local area network and enters the client's local area network. The physical connections joining the LANs to the Internet are, for universities and large Government installations, still typically T1 lines, even though the Internet now has a T3 backbone. Hence, these physical connections represent a rate-limiting factor in transmission across today's Internet.
In contrast to the single-socket transfer of data through one logical connection, the multisocket server divides the data into segments of approximately equal size. For each segment, a dedicated Unix server child process is created; each server child process establishes an independent logical connection to a client socket; once a connection is established for any particular data segment, the client creates an independent Unix child process to read the segment of data from the connection. The operating system switches between processes with a time-sharing algorithm, just as if they were unrelated programs being executed by different users. Figure 2 is a diagram of the interacting server child processes and client child processes. These child processes are created by main server and client processes not shown in the figure.
In a manner analogous to FTP3, the multisocket algorithm incorporates the concept of control ports and data ports. Initially, the main server process S enters a passive listen state to wait for requests. S listens on a port referred to as the Server Control Port (SCP), which is selected by the multisocket application user in a configurable resource file. When the main client C is invoked, it first reads in its run-time configuration from the client resource file; next, C creates a child process C1 which immediately enters a passive listen state to wait for incoming image data from the server. C1 listens on a port designated the Client Data Port (CDP), which is configured in the resource file read in by C. All other ports used in the application, namely, the control port for the client, and the data ports for the server, are selected automatically by the operating system as they are needed.
After the main client process C creates C1, it does a connect to the server on the SCP. This client/server control port connection is then used for a bidirectional exchange of information including a request by the client for the number of data streams (logical connections) to be used to send the image data, and a reply by the server specifying how many data streams will in fact be used. The server also sends the size of the image data in this initial exchange. Figure 3 shows the relationships between the child processes used to transmit the image, and the other processes used by the multisocket server and client; it also illustrates how the concept of control ports and data ports are used in the algorithm, and how unique logical connections are established. The child processes S(i) and C1(i) in Figure 3 are the same as the corresponding child processes in Figure 2.
After the size of the image to be received is known, C allocates enough shared memory to hold the image, communicates the handle of the shared memory to C1, and sends a request to the server to transmit the image. When the server main process S receives the request to transmit the file data, it logically divides the image data into N segments, corresponding to the N data streams to be used, and executes the following loop:
for i=1 to N {
create asynchronous child process S(i);
S(i): create socket bound to OS-determined server data port;
S(i): connect to client on client data port;
S(i): send (i,data segment i) over the established connection;
S(i): exit;
}
The algorithm does not impose any wait states on the client to finish receiving one data stream before the server sends another one. Note that for each S(i) a separate server data port is selected by the operating system. It is the uniqueness of these ports which guarantees the independence of the separate logical connections CON1, CON2, ..., between the client and the server. Note also that, with each segment of data, a tag i is sent which identifies the sequence order of the data segment within the entire transmitted image. It is this identifier which makes possible the reconstruction of the image data in the correct order by the client.
The client process C1 receives on the client data port the requests to connect from the server S(i) child processes. When the connect request from process S(i) is received, C1 creates a child process C1(i) which reads the tag for segment i, then reads segment i itself into the proper location in shared memory; after reading segment i, C1(i) exits.
Note that the data segments may not be (and in fact typically are not) received in the correct order. However the data segment tags makes proper ordering of the segments essentially a simple task of setting a pointer in shared memory to the proper target location for the data. This is illustrated in Figure 4.
2.2 Theory of Operation
Three theories were initially proposed to explain why transmission rate performance improvement is expected using the multisocket algorithm. They are each discussed below.
2.2.1 Better Use of Hardware Resources on Multiprocessor Machine
It was conjectured that dividing the communications task among individual processes would allow a multiprocessor machine to distribute the workload among its several CPU's under Solaris 2.3. If the limiting factor in the transmission rate is CPU power, the increased CPU power achieved in the multiprocessor environment might in itself effect a significant improvement in transmission rate. In the test results described later, however, significant transmission rate improvement is shown both when a multiprocessor machine is involved in the communications, and also when both the client and server machines are single processor. Hence, we discount this theory as being a principal cause of the speed-up observed.
2.2.2 Better Use of Network Routing
It has also been conjectured that the multisocket algorithm takes better advantage of network resources, perhaps because some of the data streams are using different routes across the Internet, simultaneously. This theory has not been tested at this time. However, in 1995, NLM plans to establish a point-to-point satellite link to the University of California at San Francisco; running the multisocket algorithm across this link will force all of the data to traverse a known path, and Internet routing can be removed. Transmission rate improvement on this link would demonstrate that the multisocket method can be successful independent of the effects of Internet routing, although it would not eliminate routing as a factor in its success on the Internet.
2.2.3 Exploitation of Latencies Resulting from TCP Data Acknowledgments
TCP uses a sliding window concept to provide a means for flow control and also to try to achieve efficient transmission of data. Flow control refers to the control exercised by the receiving end of the link over the rate at which data is transmitted: the data should not be sent more rapidly than the receiver can manage its buffers. The efficient data transmission provided by the sliding window means that data may be transmitted before previously-transmitted data has been acknowledged. (Comer3 was used as a reference for this discussion of TCP.) All of the data within the window may be sent as rapidly as the CPU can operate. The data to the right of the window (see Figure 5) may not be sent until the window slides to include that data; this sliding only occurs when an acknowledgment is received for the first data currently in the window. Specifically, since TCP operates at the byte or "octet" level, the octet to the right of the window may not be sent until the first octet currently in the window is acknowledged.
In principle, if the window is optimally sized, it will deliver data to the network as fast as the network can accept the data and saturate the network with data packets to the full bandwidth capacity. If the window is sized too small, the octets within the window will be quickly sent to the network, then a delay will occur while TCP waits for an acknowledgement on the first octet sent out. Tuning the size of the TCP window may not be an option for the application programmer. However, the technique described in this paper is an alternative: create multiple TCP data streams and divide the data to be sent among the streams; at the receiving end, reconstruct the data from the individual streams. If the data streams are managed by independent processes in a multitasking OS environment, then, during some of the delay periods when one data stream is waiting for acknowledgments, another data stream may be actively transmitting, as illustrated in Figure 6.
3. RESULTS
3.1 Test Set-up
The algorithm has been evaluated on several communications links and platforms. The data has been organized into eight test series, each corresponding to a data collection over a particular geographical communications link and hardware configuration. Within each series, the data is grouped into runs, with each run corresponding to a particular transmission mode, such as FTP, 1-socket, or n-sockets, where n = 5,10,15 or 20. Test series 1-3, and 7-8 were WAN experiments conducted over the Internet; Test Series 4-6 were conducted within a LAN or campus network environment. All tests were done under "normal" operating conditions; no attempts were made to inhibit concurrent work by other system users. Of the machines used, the optima and baskerville machines operate in a computer science department environment at the University of Arizona; the perdita machine is part of the NASA Lewis Research Center complex; the liberty and solitude machines are used for research and development at NLM; and the helix machine is used for research and development by the NIH Division of Computer Research and Technology (DCRT).
| Test Series Type of test |
Server | Client | Dates |
|---|---|---|---|
| Internet | optima Sun Sparc 2 SunOS 4.1.1 Tucson, AZ |
liberty Sun 670 MP SunOS 4.1.3 Bethesda, MD |
5/12/94 - 6/9/94 |
| Internet | baskerville Sun Sparcserver Solaris 2.3 Tucson, AZ |
liberty 20 Sun 670 MP Solaris 2.3 Bethesda, MD |
11/21/94 - 11/25/94 |
| Internet | perdita Sun Sparc SunOS 4.1.3 Cleveland, OH |
liberty Sun 670 MP Solaris 2.3 Bethesda, MD |
11/25/94 - 11/27/94 |
| host-host | liberty Sun 670 MP Solaris 2.3 Bethesda, MD |
liberty Sun 670 MP Solaris 2.3 Bethesda, MD |
12/1/94 - 12/3/94 |
| local area network | solitude Sun Sparc 10 Solaris 2.3 Bethesda, MD |
liberty Sun 670 MP Solaris 2.3 Bethesda, MD |
12/3/94 - 12/5/94 |
| campus network | helix SGI Challenge XL Irix 5.1 Bethesda, MD |
liberty Sun 670 MP Solaris 2.3 Bethesda, MD |
12/5/954 - 12/7/94 |
| Internet | baskerville Sun Sparcserver 20 Solaris 2.3 Tucson, AZ |
solitude Sun Sparc 10 Solaris 2.3 Bethesda, MD |
12/7/94 - 12/8/94 |
| Internet | optima Sun Sparc 2 SunOS 4.1.1 Tucson, AZ |
solitude Sun Sparc 10 Solaris 2.3 Bethesda, MD |
12/21/94 - 12/22/94 |
For each test series, data was collected on regular time intervals by using the Unix cron capability to invoke execution of the multisocket application or the FTP application. On each execution of the multisocket application, a separate client request, giving the desired number N of data streams, was sent to the server; the server responded by reading in the image file to be transmitted and sending the image data down an N-data stream link; the client received the data in N streams and reconstructed the data to its original form. On each execution of FTP, a file was read and transmitted in binary mode, using perl and bourne shell script programs to control the interactive FTP commands. The data transmitted in all cases was a single cervical x-ray image of size 5,135,130 bytes.
For each transmission, "transmission time" was logged. For the multisocket application this time was computed by the unix time function under application control. The output from time was recorded by the client before it sent its image request to the server, and recorded again by the client after the image had been received from the server and reconstructed by the client. The difference between these two times was written to the log file. For FTP, the logged time is the time reported by the FTP program.
A summary of the collected data results is shown in Table 2. Times are recorded transmission times in seconds.
Test # data # samp Min Max Median Mean Standard series streams points time time time time deviation ------ ------- ------ ------- --- ------ ---- --------- 1 1 988 174 960 251 280 90 1 5 481 63 215 99 107 29 1 10 13 54 382 97 104 36 1 15 191 55 233 88 94 26 1 20 263 52 258 83 90 29 2 FTP 164 160 300 180 187 23 2 1 145 145 411 167 174 28 2 5 138 41 85 56 57 10 3 FTP 25 76 170 110 117 25 3 1 21 113 234 155 155 32 3 5 18 48 96 70 69 14 3 10 7 54 88 68 69 12 4 FTP 195 2 7 2 2 1 4 1 162 2 6 2 2 <1 4 5 160 4 8 5 4 <1 4 10 157 5 10 7 6 1 5 FTP 185 4 7 4 4 <1 5 1 188 5 9 6 5 <1 5 5 188 7 12 7 7 <1 5 10 188 7 13 8 8 1 6 FTP 208 5 81 11 17 15 6 1 91 6 37 9 9 4 6 5 91 8 180 15 11 23 6 10 91 11 35 16 14 6 7 FTP 121 140 390 180 185 34 7 1 104 137 233 162 166 18 7 5 102 40 286 57 53 24 7 10 81 39 195 50 47 17 8 FTP 40 210 460 240 247 41 8 1 40 208 441 233 243 37 8 5 40 61 149 83 92 23 8 10 40 49 121 71 74 16 Table 2: Test result summary: Internet FTP and 5-socket cases are highlighted
For each of the Internet test series 1-3 and 7-8, 95% confidence intervals for the mean transmission times were computed, using standard methods. For the cases where n was large (n>=30), the Z-statistic formula:
was used; where
also in these cases sigma was estimated by S, the sample standard deviation. Sample means and sample standard deviations are those given in Table 2. For the cases where n <30, the corresponding formula for computing confidence intervals with the t-statistic was used. The analysis and methodology follows that developed in Walpole[7].
95% confidence interval Number of for mean transmission Test series data streams time mu(sec) ----------- ------------ ------------------------ 1 1 274 < mu < 286 1 5 104 < mu < 110 1 10 99 < mu < 109 1 15 90 < mu < 98 1 20 85 < mu < 95 2 FTP 185 < mu < 191 2 1 169 < mu < 179 2 5 55 < mu < 59 3 FTP 106 < mu < 128 3 1 140 < mu < 170 3 5 59 < mu < 79 3 10 57 < mu < 81 7 FTP 178 < mu < 192 7 1 162 < mu < 170 7 5 48 < mu < 58 7 10 43 < mu < 51 8 FTP 40 < mu < 254 8 1 237 < mu < 249 8 5 88 < mu < 96 8 10 71 < mu < 77 Table 3: 95% confidence intervals for mean transmission times for Internet tests. FTP and 5-socket results are highlighted.
4. ANALYSIS
4.1 Local Transmission Test Series
4.1.1 Test Series 4 -- Client: liberty, Bethesda, MD -- Server: liberty, Bethesda, MD
This host-host transmission was done to understand the performance of the algorithm by reducing network delays to the extreme minimum. Table 2 indicates that no improvement in transmission rates were observed in comparing multisocket transmission to FTP or 1-socket transmission. In fact, it appears that use of the methods is somewhat slower in this environment, perhaps due to system overhead time spent in starting up and controlling the multiple processes for the multisocket application.
4.1.2 Test Series 5 -- Client: liberty, Bethesda, MD -- Server: solitude, Bethesda, MD
This is a transmission between two machines on the same Ethernet. Again, according to Table 2, there is no observed performance improvement due to multisocket transmission.
4.1.3 Test Series 6 -- Client: liberty, Bethesda, MD -- Server: helix, Bethesda, MD
This is a transmission between two machines across the National Institutes of Health campus. Table 2 indicates no observed performance improvement due to multisocket transmission.
4.2 Internet test series
4.2.1 Test Series 1 -- Client: liberty, Bethesda, MD -- Server: optima, Tucson, AZ
This test was the longest duration of the Test Series, extending over a period of about 30 days. The test plan called for 1-socket data to be collected at 15-minute intervals over each 24-hour period, while 5,10,15, and 20-socket data was to be collected at 15-minute intervals according to a uniform probability distribution. That is, at each 15-minute interval, one of 5, 10, 15, or 20-socket transmission was to occur, each having equal probability of occurrence. The actual data collection was affected by occasional machine outages, as well as some communications interruptions of indeterminate causes which created gaps in the data. Also, early in the data collection it became necessary to begin collection on an 8 a.m. to 8 p.m. schedule to avoid contention with other heavy users of the client system during the night. A histogram of the 1- and 5-socket sampling is shown in Figure 7 and illustrates the obvious bias toward collection in the 8 a.m. to 8 p.m. period.
Test Series 1 was the first test conducted. In later tests (i.e. in all other Internet tests) FTP data was collected to use as a baseline for comparing the performance of the multisocket algorithm. For Test Series 1, FTP data was not collected, and we use the performance of the 1-socket collection as a baseline for expected performance over this link using conventional file transfer techniques.
Referring to Table 3, the collected data reflects an improvement in mean transmission rate using 5 as compared to 1 sockets, by a ratio of about 2.6:1. (Sample mean ratios were used to compute the transmission rate improvement ratios.) Figure 8 illustrates this improvement with a histogram plot of the 1-socket and the 5-socket transmission times. Table 3 indicates no observed improvement in using 10 as opposed to 5 sockets. Similarly, there is some observed improvement in going to 15 sockets from 5/10, but no observed improvement in going from 15 to 20. It may be conjectured that improvements would be observed in moving upwards through each new level, if a large enough data set were available to make the confidence intervals very small. The maximum improvement in transmission rate observed in this test series, was 3.0:1, using the 15- and 1-socket sample means as the method for computing the ratio.
4.2.2 Test Series 2 -- Client: liberty, Bethesda, MD -- Server: baskerville, Tucson, AZ
This test differs from Test Series 1 as follows: the client and server had different operating systems and were different types of machines; however, the server geographical location remained unchanged; and a new data collection methodology was employed: at each 15 minute interval, one FTP transmission was made, followed by a 1-socket transmission, followed by a 5-socket transmission. A goal was to have the transmissions occur as close together in time as possible.
Table 3 indicates that the collected data shows an improvement in transmission rate using 5 sockets as compared to FTP by a ratio of about 3.3:1. The 1-socket data collection ran slightly faster than FTP.
4.2.3 Test Series 3 -- Client: liberty, Bethesda, MD -- Server: perdita, Cleveland, OH
The sampling was the same as that for Test Series 2, except that 10-socket data was collected in addition to the FTP, 1-, and 5-socket data. Note that the geography, machine type, and OS are different.
According to Table 3, 5-sockets ran faster than FTP by a ratio of 1.7:1. No improvement was observed in going to 10 sockets. It is of interest to note that FTP on this link ran significantly faster than the 1-socket transmissions, during the test period.
4.2.4 Test Series 7 -- Client: solitude, Bethesda, MD -- Server: baskerville, Tucson, AZ
In this test the client was not a multiprocessor machine. The same data was collected as for Test Series 3.
According to Table 3, 5-sockets ran faster than FTP by a ratio of 3.5:1. No significant difference was observed using 10 as opposed to 5 sockets. Our 1-socket transmissions ran slightly faster than FTP in this test.
4.2.5 Test Series 8 -- Client: solitude, Bethesda, MD -- Server: optima, Tucson, AZ
In this test neither the client nor the server was a multiprocessor machine. The same data was collected as for Test Series 3. The data was collected at half-hour time points in this test. According to Table 3, 5-sockets ran faster than FTP by a ratio of 2.7:1. An improvement using 10 sockets as opposed to 5 was observed, with 10 sockets running faster than FTP by a ratio of 3.3:1.
4.3 Issues
4.3.1 Test Results
It should be noted that the results presented show a central tendency in the data over the period of collection, not a transmission rate that the user can count on at any particular time. Among the factors affecting the transmission rate are (1) transmission method used, (2) load on client, (3) load on server, (4) local area network traffic load, and (5) Internet traffic load. Others could be listed, but these serve to make the point that some of the factors (2-4) are time-dependent. (Within these, some of the time dependencies are roughly predictable. For example, a machine in a university environment is typically more heavily loaded during the week than on a weekend; likewise, LAN and Internet traffic may tend to be heavier then. There are also daily cycles, with faster transmission rates observed during the night.
It should also be noted that there are unpredictable peaks in the data from time to time, which can be expected in the Internet environment and can be conjectured to be the result of brief but unusually heavy loads on the client or server, or the result of network congestion.
4.3.2 Software considerations
4.3.2.1 Shared memory use
The current implementation of the algorithm makes use of shared memory by the client to hold an entire copy of the received file data. Ideally, the client system would provide the capability to allocate one segment of shared memory of the required size; under the actual operating systems used, the amount of shared memory which could be allocated in a single segment was limited to about 106 bytes, meaning that multiple shared memory segments had to be allocated to hold the file data. The data segments transmitted will not in general match the size of the shared memory segments, meaning that a scheme must be implemented to efficiently map transmitted data segments to shared memory segments of the size allowed by the operating system. We handled this problem by writing utility routines which encapsulate the complexity of treating the shared memory segments as one long logical shared memory area, into which the received data segments are properly placed.
4.3.2.2 C1 Process
The client C1 process is an artifice which contributes unnecessary complexity to the implementation. The multisocket client code was developed from code for a pre-existing single-socket application which retrieves and displays x-ray images from a remote archive on the Internet. One of the characteristics of this code is that allocates space for the received image in conventional memory prior to requesting the image transmission. This code became the basis for the Main Client Process (C) code. When the multisocket application was first implemented, it was obvious that creating child processes directly from C was extremely inefficient, since each new process was consuming the same memory resources as C, i.e. a 5-socket transmission of a 5 MByte image would have child processes which consumed 5 × 5 = 25 MBytes of system memory. A solution was to create a separate, small program C1 to manage the client child processes. Then C creates C1 with a fork/exec command, which starts C1 as a separate process only consuming a small amount of memory. The client child processes which handle the file data transmission are then created by C1, and are each the size of C1, rather than the size of C. Further simplification is anticipated.
4.3.2.3 Interprocess Communication
Communication is necessary between C and C1 at two points: (1) when C receives the image size from the server, it passes the size to C1 so that C1 can allocate the number of shared memory segments necessary to hold the file data; and (2) when C1 detects that all the child processes have completed receiving their data segments, it notifies C that the file data has been completely received and is available in shared memory. This interprocess communication is achieved by having the process C allocate a small segment of shared memory, called a Control Segment; the handle to this Control Segment is passed to C1 as a command line parameter at the time of C1 execution; hence both C and C1 have access to the Control Segment. To use the Control Segment C and C1 set values in the segment as needed, then notify the other process that the Control Segment has been changed by sending a user-defined signal. In future work, this may be changed to using the more standard method of setting a semaphore.
4.3.2.4 Portability
This algorithm was originally developed under SunOS 4.1.3 using signals and signal handlers, shared memory segments, spawned processes, and the Berkeley sockets abstraction, in the C programming language, using the Sun compiler. It has been ported to Solaris 2.3 and compiled with the GNU C compiler with minor modifications to the client and server. The SunOS 4.1.3 version of the server was ported to a CONVEX computer running UNIX, with one minor change to a signal definition. The Solaris 2.3 version of the server was ported to an SGI Challenge XL running IRIX 5.1 with no code modifications. We would expect minor changes to the code for it to be ported to other UNIX platforms.
4.3.3 Stability of Algorithm
There were a number of failure points observed in the data collection. Some failures with indeterminable causes are perhaps to be expected in the complex testing environment. However, stability and error recovery remain the areas of greatest concern in further algorithm development. Failure of an FTP transmission (which occurred from time to time in the tests) usually means that the user simply tries again. Failure of a multisocket transmission, however, may leave hung or defunct processes in the system process table of either or both of the client and server, network ports tied up by the system and unavailable for use, and unreleased shared memory segments. In the tests run to date, these problems were dealt with by manually cleaning up the system on the command line; the next stage of development will attempt to detect transmission errors and automate the job of cleaning up errant processes and orphaned segments of shared memory.
The main cause of failure in the tests done to date appears to be a consequence of conducting multisocket tests which use the same client data port too closely together in time. If a multisocket job on the client machine binds to port P as its data port, we have seen using the netstat command that, at the end of the job, there is an observable, significant amount of time before the port P is released for use by the next job. In one instance, on the liberty machine, this time was greater than four minutes. Trying to run a second multisocket job which binds to P sooner than P is released by the system will not succeed. We have experimentally found that test set-ups which produced many failures could have the failure rate drastically reduced by simply inserting a wait interval of a few minutes between the tests. In the early tests conducted, in particular Test Series 1, this was not done. Also, the tests conducted did not attempt to force reusability of ports by setting options in the setsockopt routine. This will be done in future tests.
4.3.4 Practicality of Algorithm
The current implementation is experimental, written to assess performance, rather than being an operational Internet file-retrieval system. For example, the user may not request a directory of files at a remote site, pick one, and have it transmitted using the multisocket method. For performance data collection purposes, however, many practical features are provided: with files read in at run-time, the user may configure the client and server IP addresses, ports, test file name and path, and other data. The entire multisocket setup may be transported to Solaris 2.3-binary-compatible machines with six files: four executables, and two configuration files.
The algorithm implementation is complex; some of the complexity is due to implementation using legacy code and some due to the normal inefficiencies in producing an initial implementation. Other complex aspects, such as the use of multiple processes communicating through shared memory, are more deeply tied to the algorithm concept itself, and may not be easily simplified. If the same performance improvements can be achieved by efficient tuning of the TCP window, and if that capability becomes available to the application programmer, a simple alternative to multisocket transmission would become available.
4.3.5 Effect of the Algorithm on Other Users
The creation of multiple processes for data transmission may be expected to have an impact on other users of the client and server machines, if the machine loads are heavy relative to the CPU power available; the effect on other users onthe Internet is less clear. Presumably, transferring a fixed amount of data more efficiently can only have a positive impact on Internet usage in the big picture; a side effect of more efficient transmission, however, may be that more large transmissions will be made, and the efficiency gains will be eventually lost. The entire question of how to use bandwidth efficiently, and what "efficiency" means in the broad sense of what is efficient for the Internet community in the large is an area for future research.
5. CONCLUSION
Based on analysis of the collected data we conclude that the multisocket approach has a performance improvement over FTP on the tested links, with an improvement of more than 3:1 observed in some cases. In the link tested most extensively, there was an observed improvement in using 15 as opposed to 5 sockets. Increasing the number of sockets from 5 to 10, or from 15 to 20, did not yield speed improvement which was observable at the 95% confidence level, with the exception of one test case, where 10 sockets gave clear improvement over 5 sockets. However, in all of the Internet test cases, improvement in transmission rate using 5 sockets as compared to FTP or 1 socket was observed at the 95% confidence level.
Tests were conducted transmitting host-to-host, transmitting within a local area network, and transmitting within a campus network. In none of these cases was multisocket improvement observed. We conclude from this that the performance improvement cannot be solely explained by local factors, but must be at least in part because of effects associated with use of the wide area network.
Three theories to explain the multisocket performance improvement were originally proposed: (1) the improvement is due to using a multiprocessor machine and OS in the link--since we have observed performance improvement in a link using only single processor machines, we discount this idea as the sole explanation for the performance improvement; (2) the improvement is due to better use of Internet routing resources; perhaps multiple Internet routes are being used simultaneously--a point-to-point satellite transmission is planned for 1995 which will not have an intervening Internet; this will not resolve the issue of whether routing is a factor in the multisocket performance gains on the Internet; however, if performance improvements are observed, it will demonstrate that the multisocket method results in improved transmission independent of the effects of Internet routing; (3) the improvement is due to using the time during which single-socket transmission is waiting for data acknowledgments to send more data with independent processes, each having their own independent data channel. If (3) is the principal reason for the performance improvement, the algorithm could have application to large-delay transmission in particular, such as transmission across geostationary satellite links.
The current implementation of the algorithm is experimental, but the core functions are in place. Implementation of error recovery logic remains to be done. Development of a practical Internet file retrieval tool is possible, but would also require implementation of a user interface and some simplification of the logic, on the client side in particular.
As use of the Internet and the need to transmit large files increases, it will become more important to use the available bandwidth more efficiently. We expect that techniques such as the multisocket method we have discussed here, or alternative and simpler possibilities, such as more efficient control of the TCP window,will assume increasing importance.
A summary of the results of multisocket performance versus FTP is given in Figures 9 and 10.
6. REFERENCES
1. L.E. Berman , R. Long, G.R. Thoma, Challenges in Providing Access to Digitized Xrays over the Internet , Proceedings of the 23rd AIPR Workshop, Cosmos Club, Washington, DC, Oct. 12-14, 1994. Submitted for publication.
2. D.D. Clark, M.L. Lambert, L. Zhang, "NETBLT: A high throughput transport protocol", Frontiers in Computer Communications Technology, Sigcomm '87 Workshop, pp. 353-359.
3. D.E. Comer, Internetworking with TCP/IP, Vol. I, Prentice-Hall, Englewood Cliffs, NJ, 1991.
4. C.J. Turner, L.L. Peterson, "Image transfer: an end-to-end design", Sigcomm92, Baltimore, MD, pp. 258-268, Aug. 1992.
5. G.R. Thoma, L.R. Long, L.E. Berman,"Design issues for a digital xray archive over Internet", In: W. Niblack R.C. Jain, Ed. Proceedings of SPIE: Storage and Retrieval for Image and Video Databases II. San Jose, CA: 1994; Vol. 2185,pp. 129-138.
6. R. Long, L.E. Berman, G.R. Thoma, "Design considerations for wide area distribution of digital x-ray images", In: R. G. Jost, ed. Proceedings of SPIE Medical Imaging '93: PACS Design and Evaluation. Newport Beach, CA: 1993; Vol. 1989, pp. 383-394.
7. R. E. Walpole, R.H. Myers, Probability and Statistics for Engineers and Scientists, Macmillan, New York, NY, 1972.









Figure 1.
Figure 2.
Figure 3.
Figure 4.
Figure 5.
Figure 6.
(1)
(2)
Figure 7.
Figure 8.
Figure 9.
Figure 10.