Multimodal Query Expansion and Integration for Medical Image Retrieval

M.M. Rahman, S.K. Antani, G.R. Thoma

U.S. National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, United States

Abstract:

The digital imaging revolution in the medicine has changed the way present-day physicians diagnose and treat diseases. Hospitals and medical research centers daily produce an increasingly large number of digital images of diverse modalities. Currently, the utilization of such images is limited to patient’s diagnostic needs largely due to the lack of automated, effective, and clinically meaningful image analysis, indexing, and search methods. Text-based searches on patient meta-data have dominated image access. It has been shown, however, that single-modality information retrieval, either using text as contextual information or images as visual feature, has limitations. There is a conjecture in the field that integration of the complementing textual and image information into a unified information retrieval system could improve retrieval quality and help improve utilization of all available (and relevant) clinical information. My work aims to develop such an integrated approach by exploiting the advantages of both the modalities and by involving the users in the retrieval loop. As proof-of-concept, a Content-Based Image Retrieval (CBIR) system is under development in which various global (whole image), region-based local, and concept-based image features such as perceptually distinguishable color or texture patches are extracted at different levels of abstraction. This is done using a judicious combination of supervised and unsupervised classification techniques. Image context is derived from text keywords extracted from the associated annotations and clinical meta-data and indexed by employing the vector space model of information retrieval. These features can be used to expand a multi-modal query using related keywords and/or concepts computed from the top retrieved relevant images based on correlation analysis and user-feedback. The proposed framework thus support cross-modal multiple query expansion and propagate user-perceived semantics between modalities. Finally, most semantically relevant images are obtained using a weighted feature combination, which may be adjusted dynamically across and within modalities. An exhaustive experimental analysis is being performed on a diverse medical image collection with case-based annotation by domain experts. Initial results demonstrate the flexibility and effectiveness of the proposed multi-modal framework over using only a single modality and without any user feedback.