New statistical models key yield powerful insight from health care databases

August 3, 2017

Recognizing that administrative health care databases can be a valuable, yet challenging, tool in the nation's ongoing pursuit of personalized medicine, statisticians Liangyuan Hu and Madhu Mazumdar of the Icahn School of Medicine at Mount Sinai have developed advanced statistical modeling and analytic tools that can make health care and medical data more meaningful. Hu will present their findings August 3 at the 2017 Joint Statistical Meetings (JSM) in Baltimore, Md.

The availability of large electronic health records is promising for medical discovery and efforts to develop individualized treatments. "Powerful statistical analyses and results from these records and databases can be the foundation on which informed medical questions are asked and decisions are made," notes Hu.

For example, doctors seeking to provide optimal treatment for high-risk cancer patients could consider multiple radical prostatectomy (RP) or radiotherapy (RT) modalities. But, since it is difficult to conduct that would yield quality results comparing RP to RT for long-term survival among such a high-risk group, physicians are limited to the available data that can help them make precise, customized decisions. "Therefore, finding evidence using statistical tools from large, representative national databases is crucial to inform such critical medical decisions," says Hu.

Demonstrating with a case study in chronic diseases, Hu will show challenges typically associated with drawing inferences from electronic health records and administrative databases. Limitations such as uncontrolled data collection settings, practice variation among physicians and missing data can lead to false conclusions, if not addressed properly by rigorous statistical methods. Their methods leverage machine learning and flexible models to draw valid inference using sampled from a representative population and reflect outcome from actual clinical practice.

"In clinical prediction studies, we show that combining strengths of nonparametric algorithms and parametric models leads to the development of a data-driven and reproducible that will not only generate immediate public impact, but also advance developments in statistical methodology pertaining to drawing valid and useful information from vast data sources," concludes Hu.

Explore further: Going to extremes to predict natural disasters

Related Stories

Going to extremes to predict natural disasters

July 10, 2017

Predicting natural disasters remains one of the most challenging problems in simulation science because not only are they rare but also because only few of the millions of entries in datasets relate to extreme events. A systematic ...

Recommended for you

How to cut your lawn for grasshoppers

November 22, 2017

Picture a grasshopper landing randomly on a lawn of fixed area. If it then jumps a certain distance in a random direction, what shape should the lawn be to maximise the chance that the grasshopper stays on the lawn after ...

Plague likely a Stone Age arrival to central Europe

November 22, 2017

A team of researchers led by scientists at the Max Planck Institute for the Science of Human History has sequenced the first six European genomes of the plague-causing bacterium Yersinia pestis dating from the Late Neolithic ...

Ancient barley took high road to China

November 21, 2017

First domesticated 10,000 years ago in the Fertile Crescent of the Middle East, wheat and barley took vastly different routes to China, with barley switching from a winter to both a winter and summer crop during a thousand-year ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.