The big problem of small data: A new approach

October 18, 2018, Cold Spring Harbor Laboratory
To demonstrate that DEFT can be applied to a variety of small datasets, CSHL scientists used it to analyze data from the CMS Higgs Boson detector. Of 60 particle impressions, DEFT estimated that up to six were from real events. (Pictured: A 3D perspective of a Higgs Boson event recorded in 2012. Impressions are characterized by green towers and red lines.) Credit: McCauley, T; Taylor, L; CERN

Big Data is all the rage today, but Small Data matters too! Drawing reliable conclusions from small datasets, like those from clinical trials for rare diseases or in studies of endangered species, remains one of the trickiest obstacles in statistics. Now, Cold Spring Harbor Laboratory (CSHL) researchers have developed a new way to analyze small data, one inspired by advanced methods in theoretical physics, but available as easy-to-use software.

"Dealing with small datasets is a fundamental part of doing science," CSHL Assistant Professor Justin Kinney explained. The challenge is that, with very little data, it's not only hard to come to a conclusion; it's also hard to determine how certain your conclusions are.

"It's important to not only produce the best guess for what's going on, but also to say, 'This guess is probably correct,'" said Kinney.

A good example is clinical drug trials.

"When each data point is a patient, you will always be dealing with small datasets, and for very good reasons," he said. "You don't want to test a treatment on more people than you have to before determining if the drug is safe and effective. It's really important to be able to make these decisions with as little data as possible."

Quantifying that certainty has been difficult because of the assumptions that common statistical methods make. These assumptions were necessary back when standard methods were developed, before the computer age. But these approximations, Kinney notes, "can be catastrophic" on small datasets.

Top: Number of Higgs Boson particle events expected based on Standard Model simulations.

Bottom: DEFT was used to smoothly predict (black) how many 4-lepton decay events were indicators of a true Higgs Boson event within a margin of uncertainty (green). Credit: Kinney Lab/CSHL

Now, Kinney's lab has crafted a modern computational approach called Density Estimation using Field Theory, or DEFT, that fixes these shortcomings. DEFT is freely available via an open source package called SUFTware.

In their recent paper, published in Physical Review Letters, Kinney's lab demonstrates DEFT on two datasets: national health statistics compiled by the World Health Organization, and traces of subatomic particles used by physicists at the Large Hadron Collider to reveal the existence of the Higgs boson particle.

Kinney says that being able to apply DEFT to such drastically diverse "real-world" situations —despite its computations being inspired by theoretical physics—is what makes the new approach so powerful.

"Flexibility is a really good thing... We're now adapting DEFT to problems in survival analysis, the type of statistics used in ," Kinney said. "Those new capabilities are going to be added to SUFTware as we continue developing this new approach to statistics."

Explore further: Researchers propose a better way to make sense of 'Big Data'

More information: SUFTware

Related Stories

Researchers propose a better way to make sense of 'Big Data'

February 18, 2014

Big Data is everywhere, and we are constantly told that it holds the answers to almost any problem we want to solve. Companies collect information on how we shop, doctors and insurance companies gather our medical test results, ...

Solving real-world problems

September 20, 2018

Tools developed by Håvard Rue have transformed data analysis, interpretation and communication, and are applied broadly: from modeling the spread of infectious diseases to mapping fish stocks.

Climate change won't reduce winter deaths

June 18, 2015

In a study that contradicts the received wisdom on health impacts of climate change, scientists say that we shouldn't expect substantial reduction in winter deaths as a result of global warming. This new research is published ...

Long-sought decay of Higgs boson observed

August 28, 2018

Six years after its discovery, the Higgs boson has at last been observed decaying to fundamental particles known as bottom quarks. The finding, presented today at CERN1 by the ATLAS and CMS collaborations at the Large Hadron ...

Recommended for you

Physicists build fractal shape out of electrons

November 12, 2018

In physics, it is well-known that electrons behave very differently in three dimensions, two dimensions or one dimension. These behaviours give rise to different possibilities for technological applications and electronic ...

Atomic parity violation research reaches new milestone

November 12, 2018

A reflection always reproduces objects as a complete mirror image, rather than just its individual parts or individual parts in a completely different orientation. It's all or nothing, the mirror can't reflect just a little. ...

Innovative experimental scheme can create mirror molecules

November 12, 2018

Exploring the mystery of molecular handedness in nature, scientists have proposed a new experimental scheme to create custom-made mirror molecules for analysis. The technique can make ordinary molecules spin so fast that ...

0 comments

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.