Machine learning improves searches in world's largest biomedical literature database
Results sorted by relevance, instead of date, provide an improved experience for users of PubMed, the world's largest biomedical literature database, according to a study publishing August 28 in the open access journal PLOS Biology by Zhiyong Lu and colleagues at the National Library of Medicine (NLM)/National Center for Biotechnology Information (NCBI), which develops and maintains PubMed.
PubMed contains over 28 million article abstracts from the biomedical literature, with an average of two more added every minute. It is an indispensable resource, global in scope, accessed by millions of users every day. From its inception, search results were returned only in reverse chronological order, most recent first, a ranking system that emphasized recency rather than relevance to the search query. In 2013, a relevance ranking system was introduced, but it depended on artificial weighting factors and required continual manual adjustment.
In June 2017, NLM/NCBI staff introduced a machine-learning algorithm which draws on dozens of relevance signals including user responses—specifically, the frequency of click-throughs to the articles returned for a given search—to improve relevance ranking. This ranking system, called Best Match, is offered as an alternative to chronological ordering. The team found that the click-through rate increased 20% on the returned results by Best Match compared to the same results presented chronologically. The overall usage of relevance sorting increased from 7.5% of all searches before the introduction of Best Match to 12% as of April 2018. Since machine-learning systems depend on user input to improve, the increase in use should allow the system to "teach itself" to become more valuable to its users over time.
"Overall, the new Best-Match algorithm shows a significant improvement in finding relevant information over the default time order in PubMed," the authors stated. "We encourage PubMed users to try this new relevance search and provide input to help us continue to improve the ranking method."