Title:
Digital Preservation and Knowledge Discovery Based On Documents From an International Health Science Program.
Author(s):
Dharitri Misra, Robert H. Hall, Susan M. Payne, George R. Thoma.
Institution(s):
1) National Library of Medicine, Bethesda, MD 20894.
Source:
Proceedings of the 12th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL). Washington, DC. June 2012:23-26.
Abstract:
Important biomedical information is often recorded, published or
archived in unstructured and semi-structured textual form.
Artificial intelligence and knowledge discovery techniques may
be applied to large volumes of such data to identify and extract
useful metadata, not only for providing access to these documents,
but also for conducting analyses and uncovering patterns and
trends in a field. The System for Preservation of Electronic
Resources (SPER), an information management tool developed at
the U.S. National Library of Medicine, provides these capabilities
by integrating machine learning, data mining and digital
preservation techniques. In this paper, we present an overview of
SPER and its ability to retrieve information from one such dataset.
We show how SPER was applied to the semi-structured records of
an international health science program, the 46-year continuous
archive of conference publications and related documents from
the Joint Cholera Panel of the U.S.-Japan Cooperative Medical
Science Program (CMSP). We explain the techniques by which
metadata was extracted automatically from the semi-structured
document contents to preserve these publications, and show how
such data was used to quantitatively describe the activity of a
research community toward a preliminary study of a subset of its
specific health science program goals.
Publication Type: CONFERENCE
More about this article:








