Bioengineers advance computing technique for health care and more

August 12, 2015, Rice University
Rice University researchers used their progeny clustering technique to analyze a test data set with 41 characteristics drawn from the 440 cells that fell, roughly, into four shapes. Using those characteristics, the program accurately split the samples into the proper clusters, matching their shapes. The researchers expect their data analysis tool to help clinicians obtain meaningful patient groupings prior to treatment. Credit: Qutub Systems Biology Lab/Rice University

Rice University scientists have developed a big data technique that could have a significant impact on health care.

The Rice lab of bioengineer Amina Qutub designed an called "progeny clustering" that is being used in a hospital study to identify which treatments should be given to children with leukemia.

Details of the work appear today in Nature's online journal Scientific Reports.

Clustering is important for its ability to reveal information in complex sets of data like medical records. The technique is used in bioinformatics—a topic of interest to Rice scientists who work closely with fellow Texas Medical Center institutions.

"Doctors who design clinical trials need to know how to group patients so they receive the most appropriate treatment," Qutub said. "First, they need to estimate the optimal number of clusters in their data." The more accurate the clusters, the more personalized the treatment can be, she said.

Separating groups by a single data point, like eye color, would be easy, she said. But when separating people by the types of proteins in their bloodstreams, it becomes more difficult.

"That's the kind of data that's become prevalent everywhere in biology, and it's good to have," Qutub said. "We want to know hundreds of features about a single person. The problem is identifying how to use all that data."

The Rice algorithm provides a way to assure the number of clusters is as accurate as possible, she said. The algorithm extracts characteristics about patients from a data set, mixing and matching them randomly to create artificial populations—the "progeny," or descendants, of the parent data. The characteristics appear in roughly the same ratios in descendants as they do among the parents.

These characteristics, called dimensions, can be anything: as simple as hair color or place of birth, or as detailed as one's or the proteins expressed by tumor cells. For even a small population, each individual may have hundreds or even thousands of dimensions.

By creating progeny with the same dimensions of features, the Rice algorithm increases the size of the data set. With this additional data, the distinct patterns become more apparent, allowing the algorithm to optimize the number of clusters that warrant attention from doctors and scientists.

Qutub and lead researcher Wendy Hu, a graduate student in her lab at Rice's BioScience Research Collaborative, said their technique is just as reliable as state-of-the-art clustering evaluation algorithms, but at a fraction of the computational cost. In lab tests, progeny clustering compared favorably to other popular methods, they wrote, and was the only method to successfully discover clinically meaningful groupings in an reverse phase protein array data set.

Progeny clustering also allows researchers to determine the ideal number of clusters in small populations, Qutub said.

The algorithm was put to work in an ongoing trial involving patients with leukemia at Texas Children's Hospital. There, Qutub said, " clustering allowed them to design a robust clinical trial, even though that trial did not involve a large number of children. It meant they didn't have to wait to enroll more."

Technologies that gather data about patients—from sophisticated hospital equipment to simple wrist-worn health monitors—are advancing rapidly, Qutub said. That puts a premium on tools that can decipher growing mountains of data. Ten patients, for example, may be few in number but there may be hundreds or thousands of dimensions for each, she said.

"Big data is just numbers, but the numbers don't have any value if you don't get information from them," Hu said. "My job is to look at these numbers and use computational tools and insights from biology to generate new information. This can help us know more about diseases and come up with therapeutic solutions and diagnostic schemes and identify new drug targets."

"If you don't know how to handle that data, you're at a loss," Qutub said. "You won't know the best way to group people or assign them to a certain therapy or exercise regime."

The algorithm could apply to any set, Qutub said. "We could just as easily use it for a population of voters to see who should get campaign materials from a candidate," she said. "Progeny clustering has a lot of possible applications."

The lab plans to make the algorithm available free through its website, she said.

Explore further: Rice team rises to big-data breast cancer challenge

More information: Progeny Clustering: A Method to Identify Biological Phenotypes, Scientific Reports 5, Article number: 12894 DOI: 10.1038/srep12894

Related Stories

Protein pathways provide clues in leukemia research

May 31, 2012

Scientists at Rice University and the University of Texas MD Anderson Cancer Center have successfully profiled protein pathways found to be distinctive to leukemia patients with particular variants of the disease.

Cells exercise suboptimal strategy to survive

April 6, 2015

There are few times in life when one should aim for suboptimal performance, but new research at Rice University suggests scientists who study metabolism and its role in evolution should look for signs of just that.

Recommended for you

Technology near for real-time TV political fact checks

January 18, 2019

A Duke University team expects to have a product available for election year that will allow television networks to offer real-time fact checks onscreen when a politician makes a questionable claim during a speech or debate.

Privacy becomes a selling point at tech show

January 7, 2019

Apple is not among the exhibitors at the 2019 Consumer Electronics Show, but that didn't prevent the iPhone maker from sending a message to attendees on a large billboard.

China's Huawei unveils chip for global big data market

January 7, 2019

Huawei Technologies Ltd. showed off a new processor chip for data centers and cloud computing Monday, expanding into new and growing markets despite Western warnings the company might be a security risk.


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.