Thomas M. Deserno1, 2 , Sameer Antani2 and Rodney Long2
| (1) | Department of Medical Informatics, Aachen University of Technology (RWTH), Pauwelsstr. 30, 52057, Aachen, Germany |
| (2) | US National Library of Medicine, US National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894, USA |
Key words Content-based image retrieval (CBIR) - pattern recognition - picture archiving and communication systems (PACS) - information system integration - data mining - information retrieval - semantic gap
| • | Using as one of our main organizing principles a concept (the gap) that highlights both potential deficiencies and how those deficiencies can be addressed |
| • | Presenting both the gaps that we have identified, as well as important system characteristics that are not exposed by the gap ontology, as a hierarchical structure of related attributes rather than as a purely descriptive exposition |
Searching for visual similarity by simply comparing large sets of pixels (comparing a query image to images in the database, for example) is not only computationally expensive but is also very sensitive to and often adversely impacted by noise in the image or changes in views of the imaged content. Therefore, to achieve rapid response and to ameliorate the sensitivity to image noise or view changes in position, orientation, and scale of the imaged content, frequent data reduction is carried out as follows: First, discriminant numerical features that serve as identifying signatures are extracted from each image in the repository. Second, the images are indexed on these precomputed signatures. Third, at query time, the signature extracted from the query example is compared with these indices of the images in the database (in this paper, we use the term signature to denote the (usually ordered) set of all feature values, also called the feature vector, which is used to characterize a particular image). This abstraction, while serving purposes of rapid computation and adding robustness to above-mentioned variations in imaged content, can potentially introduce a disparity (or a gap) between the expected result and the computed result. This gap could be caused by a variety of factors, which include discriminant potential in the extracted signature in general or for the intended query, and the extent to which it was applied to the imaged data, among others. It is, therefore, valuable to consider characterizing CBIR systems through such an itemization of gaps and characteristics.
In the published literature, two gaps have been identified in CBIR techniques: (1) the semantic gap1,5,10 between the low-level features that are automatically extracted by machine and the high-level concepts of human vision and image understanding, and (2) a sensory gap defined by Smeulders et al.1 between the object in the world and the information in a (computational) description derived from a recording of that scene. However, in our view, there are many other gaps that hinder the use of CBIR techniques in routine medical image management. For instance, there is a highly significant gap in the level of integration of CBIR into the general patient care information system. As another example, there is a gap in the automation of feature extraction. By means of the concept of gaps, we present a systematic analysis of required system features and properties. The paper classifies some prominent CBIR approaches in an effort to spur a more comprehensive view of the concept of gaps in medical CBIR research. We also attempt to show how our approach can be applied to characterize and distinguish prominent medical CBIR methods that have been published in the literature.
There are several gaps that one can define to explain the discrepancy between the proliferation of CBIR systems in the literature and the lack of their use in daily routine in the departments of diagnostic radiology at healthcare institutions, for example. It is insufficient, however, to merely define these gaps. To benefit from the concept of gaps, it is imperative to analyze systems presented in the literature with respect to their capability to close or minimize these gaps. In addition to the gaps, it is also important to be aware of other system characteristics that, although not resulting in a gap, might be critical for CBIR system analysis and classification. In this section, we address these points systematically.
We aim at defining a classification scheme, which we will call an ontology, by means of individual criteria, i.e., the so-called gaps. According to Lehmann,11 such an ontology must satisfy several requirements regarding the entities (gaps), the catalog (ontology), and the applications of the ontology.
| – | Abstract. They are formulated in a general manner that allows their instantiation to any approach of a medical CBIR system that has been published in the literature. |
| – | Applicable. They are formulated in such a way that they can be used in a variety of semantic contexts of medicine, where CBIR systems are applied. In particular, the instantiation of the entities of the ontology should not be affected by the person using the ontology. |
| – | Verifiable. They are formulated in such a way that there exists a method to evaluate each individual criterion. |
| – | Complete. The ontology covers all characteristics of medical CBIR systems and can be mapped to any situation and context of use. In particular, if two systems are characterized by the instances of the entities of the ontology, these instances must differ for different systems. |
| – | Unique. The ontology is well defined. In other words, if a system is characterized by means of the ontology, the same system always results in the same instantiation. |
| – | Sorted. The entities of the ontology are ordered semantically. For instance, they are grouped to support their unique assignment. |
| – | Efficient. The application of the ontology is possible within finite time and effort, and all criteria can be decided without additional devices or computer programs. |
| – | A priori. The ontology is used as a guideline for system design. |
| – | A posteriori. The ontology is used as a catalog of criteria for system analysis and weak-point detection. |
In this paper, we aim to build an ontology of gaps. The concept of gaps has often been used in CBIR literature, and the semantic gap is one of the prominent examples. To elaborate on what we have previously mentioned, the semantic gap is the disparity or discontinuity between human understanding of images and the “understanding” that is obtainable from computer algorithms. This gap has a direct effect on the evaluation of images as “similar,” as judged by humans, versus the same images being judged as similar by algorithms. Image similarity is defined by a human observer in a particular context on a high semantic level. On the other hand, for algorithms, image similarity is defined by computational analyses of pixel values with respect to characteristics such as color, texture, or shape. The semantic gap is closely connected to not only the content (objects) of the image but also to (1) the features used for the signature and (2) the effectiveness of the algorithms that are used to infer the image content. The semantic gap is of high importance as a factor affecting the usefulness of CBIR systems and is frequently cited by CBIR researchers. Three examples are given in this paper: First, the work of Enser and Sandom12 who have provided a detailed analysis of the semantic gap and created a classification of image types and user types to further understand categories of semantic gaps; second, Eakins and Graham10 who have used the idea of semantic content as a way to categorize types of CBIR queries—specifically, Eakins defines three types of CBIR queries according to their respective levels of semantic content; and, third, the recent work of Bosch et al.13 who have created a classification of published strategies that attempt to bridge the semantic gap by automated methods and have illustrated them for the domain of natural scene images.
| • | The level of automation of feature extraction, with full automation on one side and completely manual extraction on the other |
| • | The level of support for fast image database searching, with optimized algorithms and data structures, supported by parallelized hardware on one side and exhaustive, linear database searching with no specialized hardware support on the other |
| • | The level to which the system helps the user to refine and improve query results, with “intelligent” query refinement algorithms based on user identification of “good” and “bad” results, on one side and no refinement capability at all on the other |
Each gap (1) corresponds to an aspect of a CBIR system that is explicitly or implicitly addressed during implementation, (2) divides that aspect between what is potentially a fuller or more powerful implementation of that aspect from a less powerful implementation, and (3) has associated with it methods to bridge or reduce the gap. We note that a gap, as applied to a particular system, may or may not be significant for achieving the goals of that system and, when bridged, may or may not add value to the system for the particular system purpose. For example, a stand-alone CBIR system operating on a small database may respond to queries perfectly well with an exhaustive, linear search of its database and have no need for search optimization, let alone hardware parallelization. However, it appears highly likely that the use of CBIR systems within clinical routine in large treatment centers will require features such as the ability to handle multiple image modalities for multiple treatment purposes, efficient extraction and indexing of clinical-content-rich features, capability to exchange information with the patient information system, and optimized retrieval algorithms, data organization, and hardware support; in other words, many of the gaps that we identify will need to be bridged for practical application to clinical routine.
| – | Content. The user’s view of modeling and understanding images |
| – | Features. The computational point of view regarding numerical features and their limitations |
| – | Performance. The implementation and the quality of integration and evaluation |
| – | Usability. Ease of use of the system in routine applications |

| – | Intent and data. The goal or intent of the medical CBIR approach and the data that is used with it |
| – | Input and output (I/O). The level of input and output data that is required to communicate with the CBIR system |
| – | Feature and similarity. The kind of features and distance measures applied by the system |
| • | In italics, the CBIR system aspect to which the gap applies |
| • | A summary overview of the gap |
| • | Categories of methods to bridge or ameliorate the gap and, frequently, examples of the methods |
This group of gaps addresses the modeling, understanding, and use of images from the standpoint of a user. We have defined two relevant gaps.
| – | Not addressed. Meaningful terms are not assigned to images or ROIs; images are indexed by strictly mathematical measures, such as measures of color, texture, and shape. |
| – | Manual. Meaningful terms are manually assigned; for X-ray images of the cervical spine, a human operator may use interactive software to assign vertebrae labels “C1,” “C2”, …“C7” to image regions. |
| – | Computer assisted. A semi-automatic process is used to assign meaningful terms; in the above example, a computer algorithm may assign the labels to the regions on the image; a human operator then reviews and corrects them. |
| – | Automatic. Meaningful terms are automatically assigned; in this case, a computer algorithm would assign the region labels with no human intervention; some experimental work toward developing methods to automatically extract and associate low-level features to meaningful medical semantics has been reported in some limited domains, such as the mapping of shape, size, intensity, and texture features to radiologist semantics used for lung nodules (lobulation, malignancy, margin, sphericity, and others) in thoracic CT images.14 |
| – | Not addressed. The system is specific to a certain context, and the context gap is wide; for example, the system may be tailored to the retrieval needs of a database of gastrointestinal (GI) tract histology images. |
| – | Narrow. The system operates only for a small number of modalities or protocols or diagnostic procedures or on a small number of combinations of these; for example, a cancer-oriented system may be designed to operate on histology images of breast, lung, and uterine cervix, and may support labeling from controlled vocabularies for each of these domains only. |
| – | Broad. The system operates for a large number of modalities or protocols or diagnostic procedures, or on a large number of combinations of these; for example, a system may allow the user to store segmented shapes from any types of digital imaging and communications in medicine images into a database and to query the database by sketches of these shapes. |
| – | General. No restrictions apply at all, neither to the modality, the protocol, nor the diagnostics. |
When we consider the implementation steps that must occur to derive characterizations of images that are computable, we discover feature-related gaps. These gaps correspond (1) to the inadequacies of the chosen numerical features to characterize the image content or (2) to the practical difficulties of extracting these features from the images.
| – | Not addressed. Feature extraction is completely interactive or manual, e.g., manually outlined shapes, such as cardiac anatomical features (atria, ventricles, ascending aorta, and pulmonary artery).15 |
| – | Computer-assisted. Feature extraction is partly interactive, e.g., shapes segmented with the “livewire” algorithm,16 which completes shape segmentation, such as for vertebrae on spine X-ray images, based on a few user-supplied “guiding points”; another example is interactive region segmentation on histology images by region-growing or K-means clustering algorithms.17 |
| – | Automatic. There is no human interaction in the feature extraction; examples would be extraction of color or grayscale histograms, Gabor wavelet coefficients, or object counts, computed from an image with no human intervention.18 |
| – | Not addressed. Features are extracted for the entire image (global case); examples would include grayscale histograms computed from all of the pixels in the image.18 |
| – | Local. Features are extracted for individual ROIs; examples include color and texture measures computed from the interiors of tissue regions of known type, such as from the cervix region on images that contain the uterine cervix and surrounding anatomy.19 |
| – | Relational. Features are extracted for a certain composition of individual ROIs or objects; an example is the characterization of the relative spatial relationships of cardiac chambers (atria and ventricles) on tomography images.15 |
| – | Not addressed. Features are extracted for a fixed single scale; an example would be calculation of texture features from co-occurrence matrices that are applied to the image only at its original spatial resolution. |
| – | Multi. Features are extracted at multiple scales of the image; an example would be a system that applies Gaussian blurring and down sampling to create multiple spatial resolutions for each image, and then applies co-occurrence matrices to the image at each of these resolutions; a variation of this idea is to use the image at its original resolution but to apply mathematical operators that output information about the image contents at multiple levels of detail, as has been done20 for tumor shape, using mathematical morphology operators with multiple sizes of structuring elements; another example is any approach that includes features based on curvature scale space, which is inherently a multiscale approach, as has been done to characterize masses in mammography images.21 |
| – | Not addressed. The dimension of the data range space is less than the dimension of the data domain space. |
| – | Not applicable. The system handles 1D or 2D data only. |
| – | Complete range. The dimension of the data range space is equal to the dimension of the data domain space; an example is the indexing of functional imaging data consisting of 3D positron emission tomography images plus associated temporal information by including the volumetric characteristics of the data in the indexing.22 |
| – | Not addressed. The dimension of the channel data range space is less than the dimension of the channel data domain space. |
| – | Not applicable. The original system data is single channel. |
| – | Complete range. The dimension of the channel data range space is equal to the dimension of the channel data domain space; an example is characterizing skin lesions on dermatology images by RGB histograms;20 a variation of this technique is to first transform the image, with a dimensionality-preserving transformation, to a different color space, such as the MPEG7 HDS space, before calculating the histogram.23 |
Not all systems found in the literature are completely implemented and executable for performance evaluation. For those that are implemented and testable, the performance criteria include quality of integration, level of support for fast database searching, and the extent to which evaluation of the system for acceptable retrieval has been done.
| – | Not addressed. An implementation is not mentioned at all. |
| – | Mentioned. An implementation is described, but no supporting evidence is provided. |
| – | Documented. Screen shots are shown in the publication as evidence of the implementation of the system. |
| – | Offline. An implementation is available for download and installation. |
| – | Online. An implementation is directly accessible and executable via the Internet. |
“The experience of all commercial vendors of CBIR software is that system acceptability is heavily influenced by the extent to which image retrieval capabilities can be embedded within user’s overall work tasks.”
| – | Not addressed. The application is not interconnected with clinical data; for example, a prototype system for retrieval of cervicography images by color and texture from a small database of uterine cervix images.24 |
| – | Passive. The patient/image data is passed to the CBIR application. |
| – | Active. The application can initiate its own access to clinical data. |
| – | Not addressed. The system is based on a brute force approach, where the distance between the query feature vector and every feature vector in the database is computed; this approach is usually feasible only for stand-alone CBIR systems operating on small databases. |
| – | Hardware supported. The system is based on the brute force approach, but the database search is supported by specialized hardware architecture, such as a parallel computing environment; an innovation in this area is the use of active disk architecture, where some of the database search intelligence is placed on processors on the disk devices, and an “early discard” strategy is used to discard database entries that do not satisfy query requirements rather than sending them over the system connection to the CPU.26 If the active disks are operated in parallel, this approach has both the advantages of distributed computing and early data discard. |
| – | Software supported. The database of feature vectors is organized into clusters or cluster trees; the system uses algorithms tailored to this tree organization for fast access to feature vectors relevant to a particular query; for example, data organization based on clustering in shape space and a search strategy coupled with that organization have been implemented for a database of spine X-rays;27 a second example are the spatial access methods and specialized feature extraction developed for a database of tumor shapes that are reported in Korn et al.28 |
| – | Both. The system incorporates the indexed approach as described above and supports it with a distributed computing environment. |
| – | Not addressed—xxx. No experiments are described; the database contains xxx images. |
| – | Qualitative—xxx. Experiments are described but without expected output or ground truth based on xxx images. |
| – | Quantitative—xxx: Experiments are described with expected output or ground truth based on xxx images; for example, Xu et al.29 report results for retrieval of spine vertebrae by shape from a ground truth set of 207 images. |
This group of gaps addresses the usability of the system. Whereas the performance gaps focus on the area in which the system is used, the usability gaps describe the ease of use of the system from the perspective of the end user.
| – | Not addressed. The user inputs alphanumeric text, disregarding the QBE paradigm. |
| – | Feature. The user specifies certain intervals of feature vectors or vector components. |
| – | Pattern. The user specifies an example image or a part of an image (ROI); examples include systems like image retrieval in medical applications (IRMA),8 which the user may submit an entire image and search for similar images. |
| – | Composition. The user interactively selects and places structures from a given set; for example, the uterine cervix CBIR system described in Antani et al.30 allows the selection of ROIs, pre-drawn by medical experts, to be used as part of the query. The user selects properties of these ROIs, such as color and/or texture, to complete the query definition. |
| – | Sketch. The user interactively creates example patterns, including the previous options, but without being restricted to choosing from predefined pattern sets (for example, the user may create a “freehand drawing” of a query shape); examples include the uterine cervix CBIR system24 referenced above, which also allows freehand drawing of the ROIs to be used in the query; another example is the retrieval of spine vertebrae by sketching the desired shape.29 |
| – | Hybrid. The user may input text, one of the above visual patterns, or a combination of both. |
| – | Not addressed. The results returned by the system are not commented at all. |
| – | Basic. A similarity or dissimilarity number is given for each returned result; for example, the spine X-ray CBIR system of Antani et al.30 returns a dissimilarity (distance) measure for each result. |
| – | Advanced. More sophisticated explanations are provided by the system, such as cues indicating the relative significance of the various features in the returned results. |
| – | Not addressed. Just one request is answered. |
| – | Forward. A rudimentary option for query refinement is provided, such as the user being able to provide “relevance feedback” to the system by ranking individual results returned by the system on a scale that ranges from “low relevance” to “high relevance” and resubmitting the query.31 |
| – | Backward. In the refinement loop, the user can step back if results become worse. |
| – | Complete. A full history of the interactive session is available for restoration of any intermediate stage. |
| – | Combination. Based on the complete history, different queries can be performed, and their results can be combined; for example, the extended query refinement approach by the IRMA framework, which, additionally, supports set combination (such as AND, OR, and NOT) of intermediate query results.32 |
| – | Learning. During the usage, the system adapts to the user’s need. |
Under this heading, we group the intent or goal of the CBIR application, as well as the data domain (input data) and range (data used to compute features) in use.
| – | Not addressed. No information about the purpose is given. |
| – | Diagnostics. For example, the system is intended for case-based reasoning. |
| – | Research. For example, the system is intended to collect data to support evidence-based medicine. |
| – | Teaching. For example, the system is intended to find examples for sets of case collections. |
| – | Learning. For example, the system is intended for the self-exploration of medical cases. |
| – | Hybrid. The system is intended for at least two of the previously mentioned cases. |
| – | 1D. The system data consists of biomedical signals. |
| – | 2D. The system data consists of images. |
| – | 2D + t. The system data consists of image sequences. |
| – | 3D. The system data consists of volumetric datasets. |
| – | 3D + t. The system data consists of a sequence of volumes. |
| – | Hybrid. The system data consists of more than one of the categories above. |
| – | 1D grayscale. The system data consists of grayscale images or volumes. |
| – | 1D other. The system data has a 1D range other than grayscale. |
| – | 2D. The system data has a 2D range. |
| – | 3D color. The system data consists of color images or volumes. |
| – | 3D other. The system data has a 3D range other than color. |
| – | >3D. The system data consists of a multichannel range. |
| – | Hybrid. The system data consists of more than one of the categories above. |
Content-based image retrieval in medical applications may also be combined with a text-based search in the patient health record. According to Tang et al., different combinations between text and images for input and output might be used.4 In general, it is easier to make inferences from text to images than from images to text. The first can be done from text associated with the image (e.g., Google image search), whereas the latter needs semantic concepts.
| – | Free text. The system input consists of any alphanumerical wording that requires stemming, etc. for automatic processing. |
| – | Keyword. The system input consists of words addressing a concept of special semantics, e.g., as part of a controlled vocabulary. |
| – | Feature value. The system input consists of instances of an image-based feature, e.g., a numerical range. |
| – | Image. The system input consists of a query image, marked region of interest, drawing, or any other nonalphanumeric data. |
| – | Hybrid. The system input consists of more than one of the categories above. |
| – | Image only. The system returns similar images. |
| – | Image and keyword. The system returns similar images and controlled image category information. |
| – | Image and text. The system returns similar images and other text, such as in multimedia documents. |
| – | Keyword only. The system returns a restricted set of words based on a controlled vocabulary. |
| – | Free text. The system returns alphanumerical wording that describes the image. |
| – | Hybrid. The system output consists of more than one of the categories above, for example, images, keywords, and free text. |
The process of computing the similarity between images is dependent on (1) the particular representation of the image signature (i.e., the numerical features that are used to characterize the image) and (2) the distance of similarity measure that is being used to compute similarity of signatures.
| – | Grayscale. The image features are based on image intensity only. |
| – | Color. The image features are based on color and grayscale. |
| – | Shape. The image features are based on location or delineation of a region. |
| – | Texture. The image features are based on complex visual pattern related to a ROI. |
| – | Special—xxx. The image features are based on a context-based feature, where xxx denotes the name of the feature. |
| – | Hybrid. The image features are based on more than one of the categories above. |
| – | Not applicable. No distance measure is used, e.g., the system does retrieval by intervals of feature values. |
| – | Undeclared—xxx. The distance measure is named xxx, but it is not asserted to be a metric. |
| – | Nonmetric—xxx. A nonmetric distance measure is used, where xxx denotes the measure. |
| – | Metric—xxx. A metric distance measure named xxx is used. |
| – | Hybrid. Any combination of the above is used. |
In this paper, we have proposed a nomenclature and classification scheme for analysis and assessment of medical CBIR systems. We have attempted to address the core features and required functionality of medical CBIR explicitly, systematically, and comprehensively, using the concept of gaps as a unifying idea to highlight potential shortcomings in various aspects of CBIR systems, as well as to illustrate methods for addressing those shortcomings. For important CBIR system characteristics that do not fit into the gaps ontology, we have provided a second, supplementary hierarchical grouping of related attributes.
It is our intent that this effort will contribute to the ongoing research and development in medical CBIR by providing a more formal and methodical approach to conceptualizing CBIR systems in terms of their characteristics, their potential shortcomings, and how these shortcomings may be addressed, than has hitherto been available.