National Library of Medicine, HTTP://www.nlm.nih.gov Communications Engineering Branch Title Lister Hill National Center for Biomedical Communications, HTTP://www.lhncbc.nlm.nih.gov/
 

CEB Home
CEB Projects
Related Image Processing Work
Publications
Repositories
NHANES
Student Internships Site Index
Turning The Pages Online: http://archive.nlm.nih.gov/proj/ttp/intro.htm
Use MyMorph document conversion tool to make PDF files http://docmorph.nlm.nih.gov/docmorph/
Medical Article Records GROUNDTRUTH (MARG): http://marg.nlm.nih.gov/index2.asp
MD on Tap: http://mdot.nlm.nih.gov/proj/mdot/mdot.php
AnatQuest: http://anatquest.nlm.nih.gov/

Song Mao, Ph.D.

Song Mao Staff Scientist


National Library of Medicine
Communications Engineering Branch/MS 55
Bldg. 38A
8600 Rockville Pike
Bethesda, MD 20894 USA

(301) 496-3927

Song Mao received the M.S. and Ph.D. degrees in Electrical and Computer Engineering from the University of Maryland, College Park, in 1999 and 2002, respectively. He is a Staff Scientist in the Communications Engineering Branch, part of the Lister Hill National Center for Biomedical Communications, a research division of the U.S. National Library of Medicine, U.S. National Institutes of Health. He joined the Branch in 2002.

Dr. Mao conducts research on automated metadata extraction from digital images and documents of various types. In particular, he is interested in supervised and unsupervised machine learning, stochastic language modeling, statistical parsing, and string matching methods. He has developed key algorithms and modules in the System for Preservation of Electronic Resources (SPER) for metadata extraction from historical documents, and Medical Article Records System (MARS) for automated bibliographical data extraction for MEDLINE®. He has worked as a co-op student at the IBM Almaden Research Center during spring and summer of 2001 on information retrieval of multilingual Web documents.

Dr. Mao’s research interests include document image analysis, machine learning, information extraction, pattern recognition, performance evaluation, and computer vision. He is a reviewer for IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), IAPR International Conference on Pattern Recognition (ICPR) and International Journal of Document Analysis and Recognition (IJDAR). He is a member of the IEEE and IEEE Computer Society.


Current Projects

  • System for Preservation of Electronic Resources (SPER): System for Preservation of Electronic Resources (SPER) is an R&D project for long term preservation of essential electroic resources at the National Library of Medicine. A prototype SPER system has been designed and developed and includes the following basic digital preservation functions: ingest with automated metadata extraction (AME) (for scanned documents and NLM Web pages) based on machine-learning techniques, archive, search and retrieval, and migration. A production system (SPER-AME) has also been developed and is being used in the Library Operations Divisions Division of NLM for automated metadata extraction from about 70,000 historical Food and Drug Administration Notice of Judgment documents. The AME system achieved superior performance in our empirical evaluation on large datasets. I am responsible for all AME related software design, programming, and maintenance and some of SPER system design and client programming, and their maintenance.




    Return to top of page

CEB Home | CEB Projects | Related Work | Publications | Repositories | NHANES | Site Index

U.S. National Library of Medicine, 8600 Rockville Pike, Bethesda, MD 20894
National Institutes of Health | U.S. Dept. of Health and Human Services
Copyright information | Privacy policy | NLM Accessibility
USA.gov | Need a plug-in? | RSS

URL: http://archive.nlm.nih.gov/staff/mao.php
Last updated May 15, 2007

Send questions or comments about this site to