Using the statistics of disorder to unravel real-world chaos
What do election polls, hospital records, and the Syrian conflict have in common? How can a hospital use a patient's vital signs to calculate their risk of cardiac arrest in real time?
Statistician Rebecca Steorts is developing advanced data analysis methods to answer these questions and other pressing real-world problems. Her research has taken her from computer science to biostatistics and hospital care to human rights.
One major focus of Steorts' research has been estimating death counts in the Syrian civil war. She is working with her research group at Duke and the Human Rights Data Analysis Group (https://hrdag.org/) on combining databases of death records into a single master list of deaths in the conflict, a task known as record linkage.
"The key problem of record linkage is this: you have this duplicated information, how do you remove it?" explained Steorts. For example, journalists from different organizations might independently record the same death in their databases. Those duplicates have to be removed before an accurate death toll can be determined.
At first glance, this might seem like an easy task. But typographic errors, missing information, and inconsistent record-keeping make hunting for duplicates a complex and time consuming problem; a simple algorithm would require days to sort through all the records. So Steorts and her collaborators designed software to sift through the different databases using powerful machine learning techniques. In 2015, she was named one of MIT Technology Review's 35 Innovators Under 35 for her work on the Syrian conflict. She credits a number of colleagues and students for their contributions to the project, including Anshumali Shrivastava (Rice University), Megan Price (HRDAG), Brenda Betancourt and Abbas Zaid (Duke University), Jeff Miller (Harvard Biostatistics, formerly Duke University), Hanna Wallach (Microsoft Research), and Giacomo Zanella (University of Bocconi and Visitor of Duke University in 2016).
Steorts' work towards estimating death counts in the Syrian conflict is still ongoing, but human rights isn't the only field that she plans to study. "I think of my work as very interdisciplinary," she said. "For me, it's all about the applications."
Recently, Steorts, colleague Ben Goldstein, and students Reuben McCreanor and Angie Shen have been applying statistical methods to medical data from the Duke healthcare system. Her ultimate goal is to find techniques that can be used for many different applications and data sets.