Web-accessible cervigram automatic segmentation tool

Zhiyun Xue, Sameer Antani, L. Rodney Long, George R. Thoma

National Library of Medicine, NIH, Bethesda, MD

ABSTRACT

Uterine cervix image analysis is of great importance to the study of uterine cervix cancer, which is among the leading cancers affecting women worldwide. In this paper, we describe our proof-of-concept, Web-accessible system for automated segmentation of significant tissue regions in uterine cervix images, which also demonstrates our research efforts toward promoting collaboration between engineers and physicians for medical image analysis projects. Our design and implementation unifies the merits of two commonly used languages, MATLAB and Java. It circumvents the heavy workload of recoding the sophisticated segmentation algorithms originally developed in MATLAB into Java while allowing remote users who are not experienced programmers and algorithms developers to apply those processing methods to their own cervicographic images and evaluate the algorithms. Several other practical issues of the systems are also discussed, such as the compression of images and the format of the segmentation results.

Keywords: image segmentation, system development, uterine cervix image

1. INTRODUCTION

Cervical cancer is a major disease threatening women’s health and life worldwide. It is caused by persistent infection with certain types of high-risk human papillomavirus (HPV) [1]. Screening tests offer great opportunity to prevent pre-cancerous lesions from progressing into cancers by detecting cervical cancer at an early stage when successful treatment is possible [2, 3]. One low-cost screening test, often used in resource-limited areas, is cervicography. This method involves using a specialized optical camera to take a 35-mm photograph of the cervix after application of acetic acid. The photograph is referred to as a cervigram. One example is given in Figure 1. The National Library of Medicine (NLM) hosts a large database of cervigrams collected from National Cancer Institute (NCI) studies for cervical cancer research [4, 5]. These cervigram slides have been converted to digital form by NLM. In addition to the cervigrams, this database contains clinical, cytologic, and molecular information for the 15,000 patients who were studied. The database has been used in clinical research, and in computer research in automatic segmentation and analysis of cervigrams.

Automated cervigram segmentation is a very challenging task due to (1) the complexity of the image content and (2) the large variation across the images in the database. The cervigrams in the repository have varying illumination conditions and viewing angles as well as different tissue contents and cervix shapes. In addition, acquisition artifacts or medical instruments may exist in the images; the boundaries between tissue regions are often blurred; and the color characteristics of different regions may be similar. We developed a multi-step segmentation approach to address these difficulties. Our method includes steps for (1) detection at a coarse level of accuracy of the cervix region-of-interest, (2) specular reflection elimination, (3) refined cervix boundary detection and extraction, (4) detection of the os, the opening into the uterus, (5) columnar epithelium segmentation, (6) squamous epithelium and acetowhite region (AW) segmentation, and, finally, (7) detection of patterns of mosaicism and punctuation. As is common in the engineering community, these cervigram segmentation algorithms were developed using MATLAB to take advantage of its extensive mathematical libraries and rapid prototyping capability. There are practical challenges to deploying systems developed using MATLAB, however [6-9].

MATLAB programs can be distributed in two forms; (i) in source form that requires installation of the MATLAB environment on the target computer; or (ii) as compiled executable binaries that need operating system-specific software libraries (the MATLAB runtime environment along with the developed applications) available. The first method may be prohibitively expensive, and the latter may be too cumbersome for inexperienced users. With the ubiquity of the Web, it is desirable to make MATLAB applications available via the Internet. Further, critical enhancements are often made in collaboration with domain experts. Such rapid prototyping is supported through a Web-based access to the software. Such an approach also permits evaluation and comparison of results with different algorithms toward making tools sufficiently mature for use as deployable applications in routine clinical use.

In previous work we have described systems we developed for automatically segmenting cervigrams [10-18], comparing performance of different segmentation algorithms [10-18], marking regions of interest on cervigrams (the Boundary Marking Tool) [4], evaluating inter- and intra-observer variations (MOSES) [19] and other. In this article, we present a cervigram segmentation tool which is usable by multiple, geographically-distributed user groups, even though we have not had to recode MATLAB algorithms or to install MATLAB at each user site. To the best of our knowledge, this is the first Web-accessible image segmentation tool for uterine cervix images.

Figure 1

Figure 1. Cervicographic image

2. CERVIGRAM SEGMENTATION TOOL

In contrast to the computational power offered by MATLAB for image processing, Java has a limited set of software libraries available. However, the widely-recognized strengths of Java include its cross-platform computational capabilities and native support for Web-based applications. We have sought to combine the strengths of each technology, without the costs of software recoding, by adopting a Client/Server architecture that uses MATLAB code on the server (where we can also take advantage of the processing power of server hardware) and Java on the client.

2.1. System architecture

The overall client/server system architecture is shown in Figure 2. It has three components: a Java application on the user machine, a Java servlet on the Web server, and a MATLAB computation engine on a server-side Windows machine. The Java application provides the graphical user interface for uploading images, collecting the required processing information, initiating segmentation, and displaying and managing results. The user inputs are sent to the MATLAB routines on the server to carry out the intensive part of the numerical computations. The output of the server-side MATLAB routines is then sent back to the client application through the Java servlet. This servlet communicates with the Java client using HTTP protocol and communicates with the MATLAB routines using Java sockets. This structure decouples the user interface from the MATLAB algorithms. The Java client is “pure Java” and is platform independent. This thin-client approach where the computational load is on the server provides flexibility to algorithm developers to deploy enhanced computational routines with minimal changes to the client. Modifications to the user interface itself can be deployed using Java Web Start Technology. This combination of Java for the user interface, MATLAB for computation, and the Java servlet for mediating the software interface, is intended to take advantage of the strong points of both Java and MATLAB for providing a high degree of modularity, generalizability, portability, and ease-of-use. Importantly, it relieves end users from MATLAB installation and the need to install new updates on the client side with newly implemented computational capabilities.

Figure 2.

Figure 2. System Architecture

2.2. Image compression

The cervigrams to be uploaded to the server are relatively large in size, about 2400x1600x3 bytes (about 16 MB). To reduce the required transmission bandwidth, efficient image compression is required. In our system, we apply the BCWT image codec, based on a novel line-based Backward Coding of Wavelet Trees (BCWT) algorithm, designed and implemented by our collaborators at Texas Tech University [20]. The codec implements a lossy compression algorithm developed with the goal of preserving relevant fine detail in medical images while permitting relatively high compression ratios. Test images used for quality assessment in the development of the BCWT algorithm include cervix histology images with Giga-pixel size. BCWT specifically addresses a number of problems that affect tile-based compression methods, such as boundary artifacts and high memory usage. It also provides the capabilities of ROI viewing and progressive- resolution decoding. As shown in Figure 3, the image is compressed using the TTC encoder on the client side and then is sent to the server. On the server, the compressed cervigram is decoded using the TTC decoder and then is sent to the MATLAB routines to be segmented. Since it is a lossy codec, experiments were conducted to examine the effect of compression on segmentation results. Preliminary tests show that the compression does not negatively affect the performance of segmentation to a noticeable level.

Figure 3.

Figure 3. Image encoding and decoding

2.3. Segmentation algorithms

The automatic segmentation and extraction of important tissues in cervigrams is a very complicated task. We approached this problem by implementing a multi-stage method, based on the expected visual contents and spatial arrangements of these contents in cervigrams. In our method, each stage targets a specific object type for segmentation [10]. We attempt to identify features and algorithms that are selected and tuned for each specific object type, with the goal of attaining both robustness and efficiency in the step-by-step segmentation.

An overview of the segmentation process is given in the following. Firstly, in the cervigrams, only the area within the actual visible cervix anatomy is of significance for our purposes. All of the tissues of interest are located within this cervix area. Objects in the image, but outside the cervix area, are discarded in the first step to both reduce the required computations and increase the accuracy of subsequent tissue analysis. Next, we remove a common type of visual “clutter” from the cervix. On the cervix surface, there are frequently specular reflections (SR), small and bright regions which are generated by camera flash reflecting from surface fluids. It is important to isolate the SR regions since they potentially obscure other important visual features. After the SR regions have been identified, we proceed to segment four types of tissue or anatomy. These are the os, squamous epithelium (SE), columnar epithelium (CE), and acetowhite regions (AW), also referred to as lesions. The os is the opening which leads from the outer surface of the cervix (the ectocervix) to the inside of the uterus. It is an important anatomical landmark within the visible cervix area. The columnar epithelium (CE) usually surrounds the os and appears red with a rough textured surface. The squamous epithelium (SE) usually surrounds the columnar epithelium and is a smooth, pinkish tissue. The space between the clearly-established areas of the CE and SE is the transformation zone. The vast majority of cervical cancers start in the transformation zone. The acetowhite (AW) regions are translucent, white areas that become visible after application of acetic acid. These are high-interest biomarkers and potential precursors of malignancy. Particular visual patterns within AW regions, which occur because of vascular abnormalities, may be used to grade severity of the lesion. There are three types of abnormal vascular patterns within acetowhite areas: punctation, mosaicism, and atypical vessels, all of which are the subjects of ongoing automated image processing analysis research [11-18]. Some of the research-level segmentation algorithms for these tissues have been integrated into our system, and we plan to add new algorithms as they are developed and evaluated, taking advantage of the system’s modular architecture that allows improvement of computational capability in a manner that is largely transparent to the client.

2.4. User interface

Figure 4 illustrates the main features of the user interface, which we describe here. Besides the standard menu and toolbar, the interface contains four panels: one for image “thumbnail” images, one for image properties, an image view panel for the main image display, and a panel for status. In the following, we briefly describe each of the interface components.

Menu bar. The File menu provides the functionality of opening a cervigram image, reading and saving the file of region boundaries, and printing and saving the “result image” with its automatically-segmented region boundaries. The Segmentation menu provides the functionality of segmenting the cervix ROI, the SR, the os, the CE, and the AW regions. Segmentation of other regions (such as the squamous epithelium (SE)) will be added when the algorithms are available. A Help menu is also provided.

Thumbnail, Image View, Property, and Status panels. The Thumbnail Panel shows the overall view of the entire image. It also contains a rectangle (in blue) specifying the currently-viewable part of the image as it appears in the Image View panel. Users can move this rectangle to control the part of the cervigram that appears in the image view panel. The Image View panel shows the current (user controllable) view of the original cervigram. (The user may have zoomed or panned on the cervigram). The Property Panel contains two tabs. The segmentation tab provides functionality of toggling visibility of the boundaries on the image. The image tab lists information about the image and the region-of-interest (marked by the blue rectangular in the thumbnail). The Status Panel displays message to the user, including a progress bar during segmentation, and an Abort button to interrupt the segmentation process, if it is too lengthy. Complete cervigram segmentation can be accomplished with a relatively small amount of user interactions: (1) the user opens a cervigram image; then (2) the user clicks the segmentation button for each segmentation process that is desired. The cervigram is uploaded to the server and, at each segmentation step, the server executes the request and sends the segmented boundaries in a predefined format back to the client. The boundaries are then displayed on the image and the user can examine them.

Screenshot one

(a) Screenshot one

Screenshot two

(b) Screenshot two

Figure 4. Graphic user interface of Cervigram Segmentation Tool

2.5. Region data file

The region data (i.e. the data defining the boundaries for the segmented region) is stored in an XML (Extensible Markup Language) document. The XML schema in the CST is designed to be consistent with those of additional software tools that we have developed for medical image analysis. One is the Boundary Marking Tool (BMT). The other is the Multi-Observer Segmentation Evaluation System (MOSES). The BMT allows users to draw boundaries on regions (i.e. to manually segment regions) in medical images and to record diagnostic or descriptive information about the tissue contained within these regions. The BMT has been used in National Cancer Institute (NCI) studies to analyze regions of interest within cervigrams as well as other images. MOSES is a tool for automatic performance evaluation of multiple image segmentations. Taking a set of individual segmentations (typically from multiple observers) as input, it computes a probabilistic estimate of the “true segmentation” and performance measures for the individual segmentations (relative to the group) by using a Bayesian decision framework to integrate two types of prior knowledge. It has been tested and used on cervigrams. Together, the BMT, MOSES, and CST compose a suite of complementary tools which are important for uterine cervix image segmentation: ground truth collection, segmentation evaluation, and automatic segmentation. Since all of them are involved in creating or processing region data which can be represented in the form of contours, we have implemented a common data format for exchanging data among them.

3. DISCUSSION

As described above, we have implemented our hybrid MATLAB/Java system by loosely coupling a Java client with a MATLAB computation engine, by use of a Java servlet. In this section, we briefly discuss possible alternative approaches. In recent years, The MathWorks, Inc., has developed a Java compiler for MATLAB called MATLAB Builder JA. This is an extension to the MATLAB compiler and is capable of deploying MATLAB functions as Java classes, which may then be used in a Java application. By using MATLAB Builder JA, at least two additional methods of integrating MATLAB and Java are possible. These are shown in Figure 5(a) and (b) respectively. In Figure 5(a), the JAR file created by MATLAB Builder JA (matlab.jar) is compiled and downloaded by the user along with the user interface. This architecture is not platform independent (it requires the installation of the MATLAB Compiler Runtime, a platform dependent file), however, and may pose a challenge to inexperienced users or to those with limited administrative access to their computers. Another possibility, shown in Figure 5(b), is to deploy the JAR file on the server. This maintains the proposed client/server architecture but might need to recreate and deploy the matlab.jar file to the server frequently when testing the Matlab-coded segmentation algorithms. We continue to investigate these and other approaches with respect to: platform independence, the workload requirements of end users, the convenience for code debugging/deploying and algorithm upgrading, and system stability.

Structure one

(b) Structure one

Structure two

(b) Structure two

Figure 5. Alternative system structures

4. CONCLUSION

To allow engineering and biomedical researchers and clinical staff to experiment with and evaluate cervigram segmentation algorithms and to acquire their feedback for system enhancements, we have been developed a segmentation tool that users may access through the Web. This tool implements a practical and cost effective solution to the problem facing many research groups in the universities who use MATLAB to develop algorithms. The architecture of the tool allows its use by geographically-separated users while circumventing the need to recode algorithms developed in MATLAB and the installation of MATLAB at each user site. Further, this work demonstrates the possibility of integrating into a single system the advantages of two ubiquitous and important languages: MATLAB and Java. This work is part of our continuing work toward promoting closer collaboration among engineering and biomedical researchers, and clinical practitioners, and is a step toward our efforts to create a Web-accessible Content-Based Image Retrieval (CBIR) system for uterine cervix images [21-23].

ACKNOWLEDGEMENT

This research was supported by the Intramural Research Program of the National Institutes of Health (NIH), National Library of Medicine (NLM), and Lister Hill National Center for Biomedical Communications (LHNCBC).

REFERENCES

  1. R. Herrero, M. H. Schiffman, C. Bratti, et al., “Design and methods of a population-based natural history study of cervical neoplasia in a rural province of Costa Rica: the Guanacaste project”, Rev Panam Salud Publica, No. 1, pp. 362-375, 1997.
  2. M. Schiffman, M. E. Adrianza, “ASCUS-LSIL triage study: design, methods and characteristics of trial participants”, Acta Cytol, Vol. 44, No. 5, pp. 726-742, 2000.
  3. J.W. Sellors and R. Sankaranarayanan, “Colposcopy and Treatment of Cervical Intraepithelial Neoplasia -A Beginner’s Manual”, Edited by J.W. Sellors and R. Sankaranarayanan, Published by the International Agency for Research on Cancer, France, 2003.
  4. J. Jeronimo, R. Long, L. Neve, et al., “Digital tools for collecting data from cervigrams for research and training in colposcopy”, Journal of Lower Genital Tract Disease, Vol.10, No. 1, pp.16-25, January 2006.
  5. L. R. Long, S. Antani, G. R. Thoma, “Image Informatics at a National Research Center”, Computerized Medical Imaging and Graphics, Vol. 29, pp. 171-193, February 2005.
  6. H. Helanterä, M. Salmenperä, H. Koivisto, “Global Condition Monitoring System - Implementing MATLAB-Based Analysis Services”, 1st International Conference on Informatics in Control, Automation and Robotics (ICINCO), pp. 300-305, August 2004.
  7. A. Pester, R. Ismailov, “Interactive applications in teaching with the MATLAB Web Server”, Vestnik National’nogo Techniceskogo Universiteta, pp. 14–19, 2001. URL: http://www3.cti.ac.at/pester/ publications/Using_Matlab_Webserver.pdf
  8. S. Samsi, A. Krishnamurthy, S. Ahalt, “A Java based web interface to MATLAB”, High Performance Embedded Computing (HPEC) workshop, September 2003.
  9. G. Chen, H. Yi, Z. Ni, “MIPP: a Web-based medical image processing system for stent design and manufacturing”, Proceedings of International Conference on Services Systems and Services Management (ICSSSM), Vol. 2, pp. 1484 – 1488, 2005.
  10. H. Greenspan, S. Gordon, G. Zimmerman, S. Lotenberg, J. Jeronimo, S. Antani, L. R. Long, “Automatic detection of anatomical landmarks in uterine cervix images”, IEEE Transactions on Medical Imaging, Vol. 28, No. 3, pp. 454-468, March 2009.
  11. S. Gordon, G. Zimmerman and H. Greenspan, “Image segmentation of uterine cervix images for indexing in PACS”, Proc. of the 17th IEEE Symposium on Computer-Based Medical Systems, pp. 298-303, CBMS 2004, Bethesda, MD, 2004.
  12. G. Zimmerman and H. Greenspan, “Automatic detection of specular reflections in uterine cervix images”, Proceedings of SPIE Medical Imaging, Vol. 6144, pp. 2037-2045, 2006.
  13. B. Tulpule, D. L. Hernes, Y. Srinivasan, S. Mitra, Y. Sriraja, B. S. Nutter, B. Phillips, R. L. Long, D. G. Ferris, “A probabilistic approach to segmentation and classification of neoplasia in uterine cervix images using color and geometric features”, Proceedings of SPIE Medical Imaging, Vol. 5748, pp. 995-1003, February 2005.
  14. Y. Srinivasan, B. S. Nutter, S. Mitra, B. Phillips, E. Sinzinger, “Classification of cervix lesions using filter bank-based texture mode”, Proceedings of the 19th IEEE Symposium on Computer-Based Medical, pp: 832 – 840, 2006.
  15. S. Yang, J. Guo, P. King, Y. Sriraja, S. Mitra, B. Nutter, D. Ferris, M. Schiffman, J. Jeronimo, L. R. Long, “A multispectral digital cervigram analyzer in the wavelet domain for early detection of cervical cancer”, Proceedings of SPIE Medical Imaging, Vol.5370, pp. 1833-1844, May 2004.
  16. X. Huang, W. Wang, Z. Xue, S. Antani, L. R. Long, J. Jeronimo, "Tissue classification using cluster features for lesion detection in digital cervigrams", Proceedings of SPIE Medical Imaging, Vol. 6914, pp. 69141Z-1-8, February 2008.
  17. Z. Xue, L. R. Long, S. Antani, G.R. Thoma, J. Jeronimo, “Segmentation of mosaicism in cervicographic images using support vector machines”, Proceedings of SPIE Medical Imaging, Vol. 7258, pp. 72594X-72594X-10, 2009.
  18. S. Lotenberg, S. Gordon, H. Greenspan, “Shape priors for segmentation of the cervix region within uterine cervix images”, Journal of Digital Imaging, vol. 22, no. 3, pp. 286-296, 2009.
  19. Y. Zhu, W. Wang, X. Huang, D. Lopresti, L.R. Long, S. Antani, Z. Xue, G. R. Thoma, “Balancing the Role of Priors in Multi-Observer Segmentation Evaluation”, Journal of Signal Processing Systems, Vol. 55, No. 1-3, pp. 158-207, May 2008.
  20. J. Guo, B. Hughes, S. Mitra, B. Nutter, “Ultra high resolution image coding and ROI viewing using line-based backward coding of wavelet trees (L-BCWT)”, Picture Coding Symposium, May 6-8, 2009, Chicago, Illinois, USA.
  21. Z. Xue, S. Antani, L. R. Long, J. Jeronimo, G. R. Thoma, “Investigating CBIR techniques for cervicographic images,” Proceedings of AMIA Annual Fall Symposium, pp.826-830, 2007.
  22. Z. Xue, S. Antani, L. R. Long, J. Jeronimo, G. R. Thoma, “A Web-accessible content-based cervicographic image retrieval system”, Proceedings of SPIE Medical Imaging, Vol. 6919, pp. 691907-1-9 February, 2008.
  23. Z. Xue, S. Antani, L. R. Long, J. Jeronimo, G. R. Thoma, “A system for searching uterine cervix images by visual attributes”, Proceedings of 22nd IEEE International Symposium on Computer Based Medical System, 2009.