Combining computer science, statistics creates machines that can learn

Jul 17, 2013

Learning a subject well means moving beyond the recitation of facts to a deeper knowledge that can be applied to new problems. Designing computers that can transcend rote calculations to more nuanced understanding has challenged scientists for years. Only in the past decade have researchers' flexible, evolving algorithms—known as machine learning—matured from theory to everyday practice, underlying search and language-translation websites and the automated trading strategies used by Wall Street firms.

These applications only hint at machine learning's potential to affect daily life, according to John Lafferty, the Louis Block Professor in Statistics and Computer Science. With his two appointments, Lafferty bridges these disciplines to develop theories and methods that expand the horizon of machine learning to make predictions and extract meaning from data.

"Computer science is becoming more focused on data rather than computation, and modern statistics requires more computational sophistication to work with large data sets," Lafferty says. "Machine learning draws on and pushes forward both of these disciplines."

Lafferty's work focuses on the theories and algorithms that power machine learning. The goal is to develop computer programs that, with little or no human input, can extract knowledge from large amounts of numbers, text, audio or video and make predictions and decisions about events that haven't been coded in its instructions.

"The classical areas of applied mathematics, including partial differential equations, developed from the study of physical processes such as ," Lafferty says. "What we're seeing now is that entirely new directions in applied mathematics are opening up from the study of modern large data sets."

As big data becomes more common in fields including astronomy, biology, and the humanities, researchers need new to reveal meaningful signals amid the noise. Machine learning powers advanced technologies from face and speech recognition to cars that drive themselves, and scientists hope to apply it to personalized medical treatments—all problems where must make decisions based on a flood of data, much of it previously unseen.

Lafferty joined the University in 2011 as part of the Physical Science Division's computational and applied mathematics initiative, launched in 2008 to recruit faculty and students in these areas. Before that he helped to found the world's first machine-learning department at Carnegie Mellon University in 2002 and spent several years at the IBM Thomas J. Watson Research Center, working on early machine-learning projects in natural speech and text processing.

At Carnegie Mellon, Lafferty's projects included building a topic model, with Dave Blei of Princeton University, from the complete database of articles in the journal Science since its 1880 founding. By connecting semantically related groups of words that appear together in papers more frequently than random chance, the topic model carves a structure from this enormous text database. Researchers can then explore the journal by following terms relevant to their field, finding overlooked papers that simple search engines might not spot.

Today Lafferty studies the balancing act between the competing demands of computational efficiency and statistical accuracy, finding new ways to handle high-dimensional data sets. While traditional statistics is focused on "tall and thin" data, with many records and few variables, modern data sets, such as those in genomics, are often "short and wide," featuring few subjects and tens of thousands of variables. Researchers need ways to sort the relevant factors from the irrelevant to make statistical analysis possible.

Lafferty also studies semi-supervised learning, a machine-learning technique where a human trains the computer to categorize inputs, such as speech or images, and then turns it loose on new, unseen data. At Carnegie Mellon, Lafferty and colleagues developed a computer-vision program to recognize people on a webcam that graduate students had set up in the computer-science department lounge to watch for seminar leftovers. After an observer entered labels the first day of data collection, the program was more than 80 percent accurate in identifying ten individuals who appeared regularly over four months.

This spring Lafferty is teaching machine learning to a new generation of scientists. In the course Machine Learning and Large-Scale Data Analysis, undergraduates develop algorithms to predict Chicago crime, search for exoplanets, find New Year's Day wishes on Twitter, and study the language in State of the Union addresses, running their analyses on virtual computer clusters built in Amazon Web Services. The students enrolled in the course include majors in physics, mathematics, computer science, linguistics, economics, neuroscience and political science, reflecting the wide relevance of machine learning to today's research world.

While many experts are drawn to companies such as Google to work on specific applications, Lafferty sees broader and more surprising work coming from research universities. "The potential impact is very large," Lafferty says, "and the ideas that we're developing will be applied in ways that we can't even anticipate."

Explore further: Forging a photo is easy, but how do you spot a fake?

Related Stories

DARPA envisions the future of machine learning

Mar 20, 2013

Machine learning – the ability of computers to understand data, manage results, and infer insights from uncertain information – is the force behind many recent revolutions in computing. Email spam filters, ...

Google buys machine learning startup

Mar 13, 2013

Google said Wednesday that it has bought a Canadian startup specializing in getting machines to understand what people are trying to say.

Turing award goes to 'machine learning' expert

Mar 09, 2011

A Harvard University professor has been awarded a top technology prize for research that has paved the way for computers that more closely mimic how humans think, including the one that won a "Jeopardy!" tournament.

Recommended for you

Forging a photo is easy, but how do you spot a fake?

Nov 21, 2014

Faking photographs is not a new phenomenon. The Cottingley Fairies seemed convincing to some in 1917, just as the images recently broadcast on Russian television, purporting to be satellite images showin ...

Algorithm, not live committee, performs author ranking

Nov 21, 2014

Thousands of authors' works enter the public domain each year, but only a small number of them end up being widely available. So how to choose the ones taking center-stage? And how well can a machine-learning ...

Professor proposes alternative to 'Turing Test'

Nov 19, 2014

(Phys.org) —A Georgia Tech professor is offering an alternative to the celebrated "Turing Test" to determine whether a machine or computer program exhibits human-level intelligence. The Turing Test - originally ...

Image descriptions from computers show gains

Nov 18, 2014

"Man in black shirt is playing guitar." "Man in blue wetsuit is surfing on wave." "Black and white dog jumps over bar." The picture captions were not written by humans but through software capable of accurately ...

Converting data into knowledge

Nov 17, 2014

When a movie-streaming service recommends a new film you might like, sometimes that recommendation becomes a new favorite; other times, the computer's suggestion really misses the mark. Yisong Yue, assistant ...

User comments : 5

Adjust slider to filter visible comments by rank

Display comments: newest first

krundoloss
1 / 5 (1) Jul 17, 2013
I have heard things about "big data", such that certain algorithms run against the web and social networks, can predict big events and the "global subconscious". We can tap into the inner knowledge of all humans and become enpowered by it. Im sure it is currently being used by the US government, probably to figure out how to conquer the world and control everyone!
HannesAlfven
1.7 / 5 (3) Jul 18, 2013
This is literally the coolest physorg article I've ever read. I have spent the last couple of months wondering if it was possible to do what Lafferty has done, and was already learning Python in order to find out.
Stephen_Crowley
1 / 5 (2) Jul 18, 2013
krundoloss, that is absolute horse shit, the government claims to be trying to "predict the future" by mining twitter, but anyone with more wit than a tube of toothpaste knows there is ZERO possibility of "predicting" anything intelligent by means of any amounts of text mining.
intomaths
not rated yet Jul 21, 2013
krundoloss, that is absolute horse shit, the government claims to be trying to "predict the future" by mining twitter, but anyone with more wit than a tube of toothpaste knows there is ZERO possibility of "predicting" anything intelligent by means of any amounts of text mining.


I must have the wit of a tube of toothpaste then. It is conceivable that demonstrations and government overthrows can be predicted based on what people are saying. Perhaps it would be a good way for us to protect our diplomats and other citizens. Social media is an easy way to coordinate a massive get together. Can they predict the end of the earth or the next be earthquake? No, THAT is ridiculous, but human led events, absolutely possible.
HannesAlfven
1 / 5 (2) Jul 25, 2013
Re: "anyone with more wit than a tube of toothpaste knows there is ZERO possibility of "predicting" anything intelligent by means of any amounts of text mining."

Um, this very well could turn out to be like one of those quotes that people put into books which are designed to laugh at how quaint people of the past were. For those who are not quite sure, I recommend visiting David Blei's page on topic models at http://www.cs.pri...ng.html.

The math involved is quite serious, but from what I can tell, this topic modeling technology will eventually redefine how we access the Internet, as well as the files on our own machines. If you've ever been researching a subject with Google searches, and ran into a topic which dramatically expanded your understanding of what is possible with the subject, then you already intuitively know the power of topic modeling. When all subjects can be effortlessly browsed by topics, we're talking total game changer.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.