Combining computer science, statistics creates machines that can learn

July 17, 2013

Learning a subject well means moving beyond the recitation of facts to a deeper knowledge that can be applied to new problems. Designing computers that can transcend rote calculations to more nuanced understanding has challenged scientists for years. Only in the past decade have researchers' flexible, evolving algorithms—known as machine learning—matured from theory to everyday practice, underlying search and language-translation websites and the automated trading strategies used by Wall Street firms.

These applications only hint at machine learning's potential to affect daily life, according to John Lafferty, the Louis Block Professor in Statistics and Computer Science. With his two appointments, Lafferty bridges these disciplines to develop theories and methods that expand the horizon of machine learning to make predictions and extract meaning from data.

"Computer science is becoming more focused on data rather than computation, and modern statistics requires more computational sophistication to work with large data sets," Lafferty says. "Machine learning draws on and pushes forward both of these disciplines."

Lafferty's work focuses on the theories and algorithms that power machine learning. The goal is to develop computer programs that, with little or no human input, can extract knowledge from large amounts of numbers, text, audio or video and make predictions and decisions about events that haven't been coded in its instructions.

"The classical areas of applied mathematics, including partial differential equations, developed from the study of physical processes such as ," Lafferty says. "What we're seeing now is that entirely new directions in applied mathematics are opening up from the study of modern large data sets."

As big data becomes more common in fields including astronomy, biology, and the humanities, researchers need new to reveal meaningful signals amid the noise. Machine learning powers advanced technologies from face and speech recognition to cars that drive themselves, and scientists hope to apply it to personalized medical treatments—all problems where must make decisions based on a flood of data, much of it previously unseen.

Lafferty joined the University in 2011 as part of the Physical Science Division's computational and applied mathematics initiative, launched in 2008 to recruit faculty and students in these areas. Before that he helped to found the world's first machine-learning department at Carnegie Mellon University in 2002 and spent several years at the IBM Thomas J. Watson Research Center, working on early machine-learning projects in natural speech and text processing.

At Carnegie Mellon, Lafferty's projects included building a topic model, with Dave Blei of Princeton University, from the complete database of articles in the journal Science since its 1880 founding. By connecting semantically related groups of words that appear together in papers more frequently than random chance, the topic model carves a structure from this enormous text database. Researchers can then explore the journal by following terms relevant to their field, finding overlooked papers that simple search engines might not spot.

Today Lafferty studies the balancing act between the competing demands of computational efficiency and statistical accuracy, finding new ways to handle high-dimensional data sets. While traditional statistics is focused on "tall and thin" data, with many records and few variables, modern data sets, such as those in genomics, are often "short and wide," featuring few subjects and tens of thousands of variables. Researchers need ways to sort the relevant factors from the irrelevant to make statistical analysis possible.

Lafferty also studies semi-supervised learning, a machine-learning technique where a human trains the computer to categorize inputs, such as speech or images, and then turns it loose on new, unseen data. At Carnegie Mellon, Lafferty and colleagues developed a computer-vision program to recognize people on a webcam that graduate students had set up in the computer-science department lounge to watch for seminar leftovers. After an observer entered labels the first day of data collection, the program was more than 80 percent accurate in identifying ten individuals who appeared regularly over four months.

This spring Lafferty is teaching machine learning to a new generation of scientists. In the course Machine Learning and Large-Scale Data Analysis, undergraduates develop algorithms to predict Chicago crime, search for exoplanets, find New Year's Day wishes on Twitter, and study the language in State of the Union addresses, running their analyses on virtual computer clusters built in Amazon Web Services. The students enrolled in the course include majors in physics, mathematics, computer science, linguistics, economics, neuroscience and political science, reflecting the wide relevance of machine learning to today's research world.

While many experts are drawn to companies such as Google to work on specific applications, Lafferty sees broader and more surprising work coming from research universities. "The potential impact is very large," Lafferty says, "and the ideas that we're developing will be applied in ways that we can't even anticipate."

Explore further: Researcher develops computational text analysis method made possible regardless of language or domain

Related Stories

DARPA envisions the future of machine learning

March 20, 2013

Machine learning – the ability of computers to understand data, manage results, and infer insights from uncertain information – is the force behind many recent revolutions in computing. Email spam filters, smartphone ...

Google buys machine learning startup

March 13, 2013

Google said Wednesday that it has bought a Canadian startup specializing in getting machines to understand what people are trying to say.

Turing award goes to 'machine learning' expert

March 9, 2011

A Harvard University professor has been awarded a top technology prize for research that has paved the way for computers that more closely mimic how humans think, including the one that won a "Jeopardy!" tournament.

Recommended for you

A not-quite-random walk demystifies the algorithm

December 15, 2017

The algorithm is having a cultural moment. Originally a math and computer science term, algorithms are now used to account for everything from military drone strikes and financial market forecasts to Google search results.

US faces moment of truth on 'net neutrality'

December 14, 2017

The acrimonious battle over "net neutrality" in America comes to a head Thursday with a US agency set to vote to roll back rules enacted two years earlier aimed at preventing a "two-speed" internet.

FCC votes along party lines to end 'net neutrality' (Update)

December 14, 2017

The Federal Communications Commission repealed the Obama-era "net neutrality" rules Thursday, giving internet service providers like Verizon, Comcast and AT&T a free hand to slow or block websites and apps as they see fit ...

The wet road to fast and stable batteries

December 14, 2017

An international team of scientists—including several researchers from the U.S. Department of Energy's (DOE) Argonne National Laboratory—has discovered an anode battery material with superfast charging and stable operation ...


Adjust slider to filter visible comments by rank

Display comments: newest first

1 / 5 (1) Jul 17, 2013
I have heard things about "big data", such that certain algorithms run against the web and social networks, can predict big events and the "global subconscious". We can tap into the inner knowledge of all humans and become enpowered by it. Im sure it is currently being used by the US government, probably to figure out how to conquer the world and control everyone!
1.7 / 5 (3) Jul 18, 2013
This is literally the coolest physorg article I've ever read. I have spent the last couple of months wondering if it was possible to do what Lafferty has done, and was already learning Python in order to find out.
1 / 5 (2) Jul 18, 2013
krundoloss, that is absolute horse shit, the government claims to be trying to "predict the future" by mining twitter, but anyone with more wit than a tube of toothpaste knows there is ZERO possibility of "predicting" anything intelligent by means of any amounts of text mining.
not rated yet Jul 21, 2013
krundoloss, that is absolute horse shit, the government claims to be trying to "predict the future" by mining twitter, but anyone with more wit than a tube of toothpaste knows there is ZERO possibility of "predicting" anything intelligent by means of any amounts of text mining.

I must have the wit of a tube of toothpaste then. It is conceivable that demonstrations and government overthrows can be predicted based on what people are saying. Perhaps it would be a good way for us to protect our diplomats and other citizens. Social media is an easy way to coordinate a massive get together. Can they predict the end of the earth or the next be earthquake? No, THAT is ridiculous, but human led events, absolutely possible.
1 / 5 (2) Jul 25, 2013
Re: "anyone with more wit than a tube of toothpaste knows there is ZERO possibility of "predicting" anything intelligent by means of any amounts of text mining."

Um, this very well could turn out to be like one of those quotes that people put into books which are designed to laugh at how quaint people of the past were. For those who are not quite sure, I recommend visiting David Blei's page on topic models at

The math involved is quite serious, but from what I can tell, this topic modeling technology will eventually redefine how we access the Internet, as well as the files on our own machines. If you've ever been researching a subject with Google searches, and ran into a topic which dramatically expanded your understanding of what is possible with the subject, then you already intuitively know the power of topic modeling. When all subjects can be effortlessly browsed by topics, we're talking total game changer.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.