Effects of Quantization Table Manipulation on JPEG Compression of Cervical Radiographs

L. E. Berman, R. Long, S. R. Pillemer
Society for Information Display International Symposium
May 18-20, 1993
Vol. XXIV, pp. 937-941


Abstract

Recent investigation into the Discrete Cosine Transform (DCT) for image compression has resulted in the international Joint Photographic Exploitation Group (JPEG) standard. We have investigated using the information derived from a study of the independent image bit-planes as a method for manipulating the JPEG quantization table. We will present results of this study and will discuss the difficulty in finding an effective quantization table for a particular imageclass.

1.0 Introduction

Our research on image compression of a limited set of the National Health and Nutrition Examination Survey (NHANES) digitized cervical radiographs, suggests that compression ratios in the range 30:1-40:1 can be used without significant signal degradation. This study uses the JPEG standard for compression.

The NHANES is conducted in part by the Center for Disease Control/National Center for Health Statistics and collaborating agencies. This survey collects information on a representative sampling of the United States population. Relevant socio-economic data, blood chemistry, and various radiographs are maintained for each individual. The radiographs have been stored on film and are in the process of being digitized by a laser scanner for permanent storage. The cervical radiographs are scanned at 146 dpi with a 12-bit dynamic range and dimensions of 1463x1755 pixels (2 bytes/pixel).

The focus of our project, the Digital X-ray Prototype Network (DXPNET), is to build a system that will allow remote users, including radiologists and researchers, to download images across the Wide-Area Network Internet for interpretation. Both the images and the associated readings will be maintained in the National Library of Medicine's (NLM) optical jukebox archive. The total time for image delivery, from the NLM archive to a remote radiologist, must be under 15 seconds[1].

The theoretical peak transfer rate on the Internet, 1.5 megabits/s at T1[2], corresponds to 26 seconds for cervical image transfers. Coupling lower expected Internet transfer rates with the storage capacity of the archive, provides the impetus for analyzing lossy data compression algorithms. This study is designed to answer the following questions:

  • Can the NHANES cervical spine images be compressed with JPEG such that there is no loss of detail in the cervical spine and surrounding regions?
  • How should the JPEG specific parameters be altered to satisfy the previous requirement?

2.0 METHOD

Although the NHANES radiographs are not for clinical use, the data must be preserved since it is part of a national compendium. Therefore, the compressed images must be a valid interpretation of the originals. If lossless compression could be used this would be the best alternative. However, the average cervical image entropy is not low enough to meet the demands of DXPNET (see section on Entropy Measurements).

This caused us to shift our attention to lossy compession techniques and JPEG in particular. JPEG supports several different processing modes including lossless, lossy, and progressive transmission. In the lossy mode, JPEG can handle data that is either 8 or 12-bits wide. The first processing step breaks the image into a stream of 8x8 blocks of pixels and transforms these grayscale values into the frequency domain using a Forward DCT (FDCT)[3]:

Equation 1:

Transforming the coefficients into the frequency domain causes most of the energy to reside in the DC and low frequency terms. This occurs because pixel values do not vary much within such a small region, and in general yields greater compression ratios. The output of the FDCT results in a set of 64 basis-signal amplitudes. These amplitudes, or coefficients, are then uniformly quantized with a 64 element quantization table (QTABLE).

For 12-bit imagery, each element of the QTABLE can be in the range of 1 to 4095. This value specifies the scale factor, or step size, that is applied to the corresponding coefficient. Scaling the coefficients with the QTABLE results in the greatest source of pixel reconstruction error in JPEG, but also provides the greatest amount of compression. After quantization, the resulting values are rounded to the nearest integer (equation 2). Finally, the coefficients are entropy encoded using either Huffman coding or arithmetic coding.

Equation 2:

The amount of compression is controlled by quantizing the coefficients resulting from the FDCT. Our first attempt at finding an appropriate QTABLE for the cervical image set looked at differences between the distribution of the noise and signal coefficients. This proved futile due to the high resolution of the images relative to the 8x8 DCT block size. That is, since the resolution of the image is so high, anatomical structure is spread among many 8x8 blocks. This results in blocks with low variance and shifts most of the energy into the lowest frequency component, known as the DC coefficient. This component is labelled DC with due reference to the terminology of direct current. The remaining coefficients are all labelled as AC.

In contrast to our study, others have tried modeling the global coefficient distribution to optimize the quantization step[RISKIN]. Earlier investigations by Pratt[PRATT] and Gibson[GIBSON] suggest that the DC coefficient is either a Rayleigh or Gaussian density. These studies also state that the AC coefficients are either Gaussian or Laplacian. Unfortunately, these techniques address global statistics (noise and signal together) and offer error minimization based on mathematical criteria. A limitation of coefficient modeling is that it affects visual perception in a way that is difficult to understand.

Another alternative is to ascertain the coefficients and corresponding amplitudes that cause perceptable differences to a human observer. Lohscheller[LOHSCHELLER] has conducted empirical studies measuring the sensitivity of the human visual system. This study shows that the observer is more sensitive to changes in the low frequency DCT components than in the high frequency DCT components. This type of analysis has advantages over coefficient modeling since it exploits the strengths and weaknessess of the end-user's visual system. However, this technique avoids using the physics of the imaging system. We did not explicitly use this technique, but it motivated us to use some subjective analysis for quantization.

Upon further reflection and collaboration we analyzed the randomness in each bit-plane of the digitzed images. A subjective experiment was conducted to determine which bit-planes contained "relevant information". It was this final experiment that proved most useful in determining an effective method for quantizing the DCT coefficients.

3.0 RESULTS

3.1 Entropy Measurements

Evidence to support our belief that lossless compression would not meet the demands of DXPNET were garnered by measuring the entropy for the cervical test set. We define the probability that a pixel, f(x,y) is equal to a(i) as p(f(x,y)).

Equation 3:

Using equation[entropy], the measured entropy averaged over the image set is 10.82 bits/pixel. Since each pixel is stored in two bytes, on average we can expect a lossless compressed image to have a compression ratio of 1.47:1. This compression level will not meet DXPNET's system objectives, and therefore prohibits the use of lossless data compression.

3.2 Bit-Plane Analysis

In seeking ways to effectively compress cervical radiographs, one method examined was lossless compression of individual bit-planes. It was expected that long runs of identical bits would be found in some of the planes, and application of a run-length encoder to the individual planes might give an acceptable compression ratio for the image as a whole.

In Table 1, the compression ratio for each bit-plane in one cervical image is given. The compression ratio, using Limpel-Ziv encoding, for each of the first six bit-planes is 1:1. This indicates that half of the bit-planes do not compress and that with lossless bit-plane compression the maximum compression ratio for this image is 2:1. The failure of the low-order bit-planes to compress leads us to believe that they are mostly noise and make no significant contributions to the visual interpretation of the image.

Table 1:

An experiment was conducted to test this hypothesis. The test set of images consisted of 15 cervical radiographs, of which three were duplicates added to test reader consistency. Four readers were asked to view each bit-plane and answer the following question:

  • Does the bit-plane show structure which helps define or show characteristics of the vertebra?

The readers each had technical backgrounds and general familiarity with the visual characteristics of the NHANES cervical radiographs, although none had a medical background. A reference image of a cervical radiograph was provided to each reader for comparison purposes. Readers were asked to view each of the 6 low-order planes for all of the images in one sitting, and to view the 6 high-order planes in a second sitting on a different day. In each case, the planes were viewed in least to most significant order, and the test question was answered yes (1.0) or no (0.0).

Table 2 shows the numerical values for the answers averaged over the four readers. The scores range between 0.00, corresponding to the case of no reader finding vertebra structure in the bit-plane, to 1.00, corresponding to all four readers finding structure.

Table 2: Table 2

Although the test set does not represent a statistically significant portion of the NHANES cervical radiographs, the results suggest that the readers perceived the largest change in perceptible vertebra structure between bit-planes 5 and 6. This is consistent with the results in Table[indiv]. The decreasing scores in planes 10 and 11 occur because these planes carry gross structural information, such as a head silhouette, and not the vertebra structure which was being tested for in this experiment.

Both the consistency of scoring of individual readers and the inter-reader consistency were examined. For individual readers, consistency was tested by presenting three duplicate images to the reader. For each reader, the 36 scores (3 images x 12 bit-planes) for the duplicates were compared to the scores for the three original images. For three of the readers, the scores on the duplicates and the originals were identical. For the fourth reader, 35 of the 36 scores on the duplicates matched the scores on the originals. Hence, the scoring consistency of the individual readers was very high.

For the inter-reader case, the overall proportion of agreement, (alpha), was computed for each pair of readers on each bit-plane. This quantity is computed as the sum of the proportion of readings for which both readers answered yes, a, plus the proportion of readings for which the readers answered no, d[FLEISS]:

alpha = a + d

The results are summarized in Table 3. For example, readers 1 and 4, in bit-plane 10, agreed with a yes for 9 of the 15 images, and agreed with a no 0 times, for
(p = 9/15 + 0/15 = .60).

Table 3:Table 3

Table 4: Table 4

Combining the results supports the hypothesis that bit-planes 0-4 contain no visual structure for the vertebra, and that bit-planes 5-7 represent a transition to the bit-planes with the most unambiguous vertebra structure (bit-planes 8-9). Bit-planes 10-11 carry only edge or "outline" information that may be difficult for readers to classify as vertebra information or not. These results are also consistent with analysis of laser scanners. Previous research shows that images that have been digitized with a laser scanner exhibit the characteristics of random noise in the lower 2-4 bits[LO].

Some tendencies of individual readers were observed in the results. For example, reader 4 had the strongest tendency to classify bit-planes as containing vertebra structure, and reader 3 had the strongest tendency to classify bit-planes as having no such structure. The disagreements in bit-planes 6 and 7 are due to reader 3's "no structure" responses, while the other readers tended to classify these planes as having structure. Conversely, the disagreement in bit-plane 11 is due to reader 4's tendency to find vertebra structure in this plane.

An additional check on these results was made by having one reading made by a rheumatologist who is accustomed to reading conventional film radiographs. For this additional reading, no vertebra structure was found until bit-plane 5.

4.0 Discussion

Initially we looked at images in a screening experiment to examine the quality of the images over a range of compression ratios. Intuitively, we expected that the images might show a threshold where the loss would first be detectable as compression ratios became higher. After that threshold we expected that the images would contain progressively less useful information to the human eye.

Using this approach, we found that the visually useful information did not follow a monotonic relationship with data compression. Instead, there were images with lower compression ratios that did not show as much useful information as images with higher compression ratios. This occurred because the images compared were derived from QTABLEs designed with different precepts. For example, we compared QTABLEs designed for DCT signal/noise studies and QTABLEs based on bit-plane information. This resulted in varying levels of compression ratios and several that were close in value.

Adding confusion to the experiment was the blocking effect, characteristic of the DCT. In some cases, blocking would occur at some of the lower compression ratios, making it difficult to predict an effective threshold. At the highest levels of compression there was significant blocking, which prevents analysis of fine level detail. This blocking effect appears as superimposed, homogenous gray-scale squares across the image. That is, each square may have varying gray-scale, but the region appears much smoother than it did prior to compression. The edges of the square can interefere with the detection of bone features.

Because of these problems, we abandoned attempts to establish a threshold data compression value below which data loss would not be visible. Instead we relied on an analysis of the noise level, as discussed in the section titled "Bit-Plane Analysis". We found that by zeroing out bit-planes, through scaling in the QTABLE, some bit-planes apparently did not contain visually useful information regarding the joints of the cervical spine. However, some of the images appeared to show loss of soft tissue and of bony structures (the spinous processes) away from the joints.

Part of this related to the display algorithms. Another factor that appeared to influence the quality of the compressed images was the quality of the original image. If the soft tissue was prominent in the original image, producing a white hazy appearance, the compressed image tended to show greater loss of the soft tissue and the spinous processes.

The data in Table 5 shows the step sizes used in the QTABLE, compression ratios, and root mean square error (RMSE) for one test image. The RMSE is calculated as follows[GONZAL]:

Table 5: Table 5

From the data, the monotonic increase in step size and compression ratios is at first paralleled by a monotonic increase in RMSE. But for the largest step size, the RMSE tends to decrease. The images that had higher compression ratios and higher RMSE appeared to contain more visually useful information than those with the highest compression ratio and lower RMSE. This sugggests that the RMSE may reflect variability rather than true error alone. The variability may include the variations in intensity corresponding to the gray scale that enable visual distinctions to be made. Hence the error and inherent variability necessary for distinguishing gray scale may be mixed in with one another resulting in confounding.

Two experiments were conducted with a rheumatologist to determine the effectiveness of bit-plane driven compression. In the first experiment four different images were viewed. For each image, the original was used as a reference and five variants were shown. The variants consisted of two images that had step sizes of 64 and three others with step sizes of 8, 16, and 32. The rheumatologist was not told the identity of the images and was allowed to view them in any order. In all cases, the rheumatologist felt that there was no loss of information, but there were some differences. For example, there was a definite change in the "pixel arrangement" at a very local level. This was very difficult to discern and did not change the overall image quality.

The second experiment was similar to the first in design. The original image was used as a reference and there were five variants. The five variants consisted of two images that had step sizes of 64 and three others with step sizes of 32, 128, and 255. The rheumatologist felt that the images with step sizes of 32 and 64 did not show any signs of information loss. However, images with step sizes of 128 and 255 showed blocking and loss of information.

One of the most important results is that subsequent increases in compression ratio, using bit-plane driven compression, showed predictable and consistent increases in the RMSE. Secondly, we have established a threshold value in which a constant step size of 64 in the QTABLE produces lossy images without any gross visible loss of information. This setup has the advantage that the results follow a parallel and monotonic relationship up to a certain point.

Finally, we believe that finding an effective quantization table to use with JPEG is not as simple as meeting some mathematical criteria. Since manipulating the QTABLE can alter the expected relationship between compression ratio and RMSE, more data must be used to adjust the QTABLE. Our conjecture is that the step sizes should be manipulated based on the following considerations:

  • Sufficient mathematical modeling of the noise associated with the imaging process.
  • Analysis by a group of recognized experts.
  • Mathematical criteria (such as RMS error).

5.0 Summary and conclusions

Although our sample set is not a statistically representative set of the NHANES cervical spine image set, the results of this study show that lossy compression might be a viable alternative to uncompressed image transmission. In two independent experiments, a rheumatologist was unable to detect any differences between the original image and variants with compression ratios as high as 40:1. In all cases, the variants went through the quantization step based on a bit-plane analysis. This analysis subjectively measured the level of noise resulting from digitizing the radiograph film with a laser scanner.

Future studies must first examine a representative sample set of the NHANES data to determine the consistency of our results. Finally, pre-processing steps to filter out background noise should be employed and other compression techniques should be explored.

6.0 Acknowledgements

The authors of this paper would like to thank DXPNET team members Cuong Do, Leif Neve, Babak Nouri, Gautam Roy, and Andrew Wayne for their support and tireless contributions.

7.0 References

[CLAFFY] K. C. Claffy, G. C. Polyzos, and H. Braun, "Traffic Characteristics of the T1 NSFNET Backbone," UCSD Technical Report CS92-252, SDSC Technical Report GA-A21019, July 1992.

[FLEISS]J. L. Fliess, Statistical Methods for Rates and Proportions, John Wiley, 1981.

[GIBSON]J. D. Gibson and R. C. Reininger, "Distributions of the Two-Dimensional DCT coefficients for Images," IEEE Trans. Commun., vol. 31, pp. 835-839, 1983.

[GONZALEZ]R. C. Gonzalez and P. Wintz, Digital Image Processing, 2nd ed., Addison Wesley, 1987, Ch. 6.

[LEE]H. Lee, Y. Kim, A. H. Rowberg, E. A. Riskin, "Statistical Distributions of DCT Coefficients and Their Application to an Interframe Compression Algorithm for 3-D Medical Images," IEEE Trans. on Medical Imaging, submitted October 1991.

[LIM]J. S. Lim, Two-Dimensional Signal and Image Processing, Prentice-Hall, 1990, Ch. 10.

[LO]S. B. Lo, B. Krasner, and S. K. Mun, "Noise Impact on Error-Free Image Compression," IEEE Trans. on Medical Imaging, vol. 9, no. 2, pp. 202-206, 1990.

[LOHSCHELLER]H. Lohscheller, "A Subjectively Adapted Image Communication System," IEEE Trans. Commun., vol. 32, pp. 1316-1322, 1984.

[PRATT]W. K. Pratt, Digital Image Processing, Wiley-Interscience, 1978, Ch. 10.

[THOMA]G. R. Thoma, L. R. Long, and L. E. Berman, "Access to a Digital Xray Archive over Internet," presented at the SPIE Conference on OE/Fibers, Sept. 8-11, 1992, Boston, MA.

[WALLACE]G. Wallace, "The JPEG Still Picture Compression Standard," IEEE Trans. on Consumer Electronics, submitted December 1991.