Title:
Screening Nonrandomized Studies For Medical Systematic Reviews: a
Comparative Study of Classifiers.
Author(s):
Tanja Bekhuis, Dina Demner-Fushman.
Institution(s):
1) Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
2) Communications Engineering Branch, Lister Hill National Center for Biomedical Communications, US National Library of Medicine, Bethesda, MD, USA
Source:
Artificial Intelligence Medicine 2012. Chicago, IL. July 2012;3:197-207.
Abstract:
Objectives: To investigate whether (1) machine learning classifiers can help identify nonrandomized
studies eligible for full-text screening by systematic reviewers; (2) classifier performance varies with
optimization; and (3) the number of citations to screen can be reduced.
Methods: We used an open-source, data-mining suite to process and classify biomedical citations that
point to mostly nonrandomized studies from 2 systematic reviews. We built training and test sets for
citation portions and compared classifier performance by considering the value of indexing, various
feature sets, and optimization. We conducted our experiments in 2 phases. The design of phase I with
no optimization was: 4 classifiers x 3 feature sets x 3 citation portions. Classifiers included k-nearest
neighbor, naive Bayes, complement naive Bayes, and evolutionary support vector machine. Feature sets
included bag of words, and 2- and 3-term n-grams. Citation portions included titles, titles and abstracts,
and full citations with metadata. Phase II with optimization involved a subset of the classifiers, as well as
features extracted from full citations, and full citations with overweighted titles. We optimized features
and classifier parameters by manually setting information gain thresholds outside of a process for iterative
grid optimization with 10-fold cross-validations. We independently tested models on data reserved for
that purpose and statistically compared classifier performance on 2 types of feature sets. We estimated
the number of citations needed to screen by reviewers during a second pass through a reduced set of
citations.
Results: In phase I, the evolutionary support vector machine returned the best recall for bag of words
extracted from full citations; the best classifier with respect to overall performance was k-nearest neighbor.
No classifier attained good enough recall for this task without optimization. In phase II, we boosted
performance with optimization for evolutionary support vector machine and complement naive Bayes
classifiers. Generalization performance was better for the latter in the independent tests. For evolutionary
support vector machine and complement naive Bayes classifiers, the initial retrieval set was reduced
by 46% and 35%, respectively.
Conclusions: Machine learning classifiers can help identify nonrandomized studies eligible for full-text
screening by systematic reviewers. Optimization can markedly improve performance of classifiers. However,
generalizability varies with the classifier. The number of citations to screen during a second
independent pass through the citations can be substantially reduced.
Publication Type: JOURNAL
More about this article:








