A study published June 1 in the journal Nature Biotechnology describes the results of an open challenge to predict which breast cancer cell lines will respond to which drugs, based only on the sum of cells' genomic data. The winning entry, from the Helsinki Institute for Information Technology, was 78 percent accurate in identifying sensitive versus resistant cell lines, and was one of 44 algorithms submitted by groups from around the world.
"The idea is simple – we have this question and anybody can participate in searching for the answer. The question is, do we have enough information from mutations, or gene expression, or methylation, or copy number alterations to predict how cancer cells will respond to drugs," says James Costello, PhD, investigator at the University of Colorado Cancer Center, assistant professor in the Department of Pharmacology at the CU School of Medicine, and Director of Computational and Systems Biology Challenges within the Sage Bionetworks/DREAM organization.
The results of this effort, which mined genomic data provided by the laboratory of Dr. Joe Gray at the Oregon Health Science University, are intended to characterize molecular markers in cancer, which can then be tested against large collections of human samples, such as those of The Cancer Genome Atlas (TCGA), a project by the National Cancer Institute to genomically characterize thousands of human cancers. Now with this massive NCI database, the ongoing challenge is to mine these data to guide drug development and cancer treatment. To incentivize teams of data scientists to work with this data, the NCI partnered with the Dialogue for Reverse Engineering Assessment Methodologies or DREAM project. Prizes include funding support and publication.
"It's an international effort to address fundamental biomedical research questions. The goal is to push open science and access to data, both for researchers and ultimately for doctors and patients," Costello says.
Specifically, the challenge asked groups to submit algorithms – computerized mathematical strategies – that could correctly classify 18 breast cancer cell lines from the most "sensitive" to most "resistant" to 28 therapeutic compounds. In this case, challenge organizers including Costello knew the answers and so could check the accuracy of the submitted strategies. The expectation is that algorithms that accurately show what we already know to be true of cancer cells and cancer drugs could then be used to predict sensitivities we don't already know, perhaps suggesting new treatments for known cancer cell lines, or matching a patient's individual cancer cells with the best treatment.
As a corollary, the study was able to show what types of genomic data are most predictive of drug response.
"There are new, great, super flashy technologies we can use to look inside cancer cells, but at least for the time being the data from these new techniques don't tell us how cells will respond to therapy as well as the more established gene expression microarrays," Costello says. The reasons, he suggests, include the 20-year history of microarray analysis that has shaped our ability to accurately generate and use this type of data, and also to the fact that gene expression data is a proxy for many cellular processes. That is, many genomic modifications, such as methylation or mutation, directly affect gene expression and if that modification is a result of a cellular process, gene expression will change accordingly and microarrays measure the results of this effect.
"Data scientists are hungry for new data and new questions and if you can get people from all over the world to work on these challenges, it can drive innovation and we can get to an answer much quicker. If you give smart people interesting questions they frequently come up with highly innovative solutions," Costello says.
A community effort to assess and improve drug sensitivity prediction algorithms, DOI: 10.1038/nbt.2877