Active learning model for computer predictions

Dec 03, 2013 by Mark Riechers
Active learning model for computer predictions
Rob Nowak and Kevin Jamieson. Credit: Nick Berard.

Computers serve as powerful tools for categorizing, displaying and searching data. They're tools we continually upgrade to improve processing power, both locally and via the cloud. But computers are simply the medium for big data. "We really need people to interact with the machines to make them work well," says McFarland-Bascom Professor of Electrical and Computer Engineering Rob Nowak. "You can't turn over a ton of raw data and just let the machine figure it out."

Unlike computers, people cannot be upgraded. They work at a finite speed and at rising costs, so Nowak is improving interactive systems that can optimize the performance of both humans and machines tackling big data problems together.

Typically, human experts—people who categorize data—will receive a large, random dataset to label. The computer then looks at those labels to build a basis of comparison for labeling new data in the future. However, Nowak suggests the model should be flipped. "Rather than asking a person to label a random set of examples, the machine gets the set of examples, then asks a human for further classification of a specific set of data that it might find confusing," says Nowak.

With support from the National Science Foundation and Air Force Office of Scientific Research, Nowak has been exploring an active learning model, in which the machine receives all the data up front. Initially, with no labels, the machine makes very poor predictions, improving as a human expert supplies labels for some of the data. For example, if a new data point is similar to one that a human has labeled, the machine can predict that this point should probably have the same label. The machine can also use the similarities and labels to quantify its confidence in the predictions it makes. And when the confidence for a certain prediction is low, it asks the human expert for advice.

To explore these sorts of human-machine interactions, Nowak and his student Kevin Jamieson have applied the idea to a technology that's a natural fit in Wisconsin—an iOS app that can predict which craft beers a user will prefer.

In this case, the similarities between data points—beers—are based on flavor, color, taste and other characteristics defined by the spectrum of terms used to describe beers in reviews on Ratebeer.com. Using that existing data, the researchers' algorithm can find the closest match for beers the user might enjoy, in much the same way that a bartender might: presenting the user with two beer choices, then using the user's preference between the two to hone in on a specific point in the "beer space."

"Basically, if I already know that you prefer Spotted Cow to Guinness, then I'm probably not going to ask you to compare Spotted Cow to some other stout," says Nowak. "Because there are relationships between every beer, I don't have to ask you for every comparison."

These sorts of "this-or-that" determinations tend to be more stable than categorizations based on ranking scales or other more subjective measures, which are more vulnerable to psychological priming effects and can change over time. Finer point comparisons offer the machine more reliable data to improve its categorization and prediction over time.

And most importantly, it allows machines to process data much, much faster, since they require less human help to categorize the data. For example, pulling from thousands of possible beers, Nowak says the app can make a personalized beer recommendation based on only 10 to 20 comparisons.

That sort of efficiency becomes important as data sets get bigger and human labor can't keep up. In a collaboration with UW-Madison psychology colleagues, Nowak has applied his model to the relative emotionality of words; without the active machine learning model, learning the similarities between 400 words could require as many as 30 million total comparisons. "Even if you could recruit a cohort of 1,000 undergraduates, that would still be 30,000 trials apiece," he says.

Understanding human judgments about the similarity of word meanings is a fundamental challenge in cognitive science and absolutely crucial in order to make machines capable of understanding the subtleties of human language. Optimizing ways to apply machines and people toward problems like that could be key to making big data analysis economical and effective in many more situations. "There's no research to be done on the infrastructure side," he says. "We have big data infrastructure. What we don't understand is how to optimally yoke humans and machines together in big analyses."

Explore further: Oh great, Facebook wants to know you're being sarcastic

add to favorites email to friend print save as pdf

Related Stories

Oh great, Facebook wants to know you're being sarcastic

Sep 26, 2013

You might think social networks couldn't possibly gather more information on you than they already do. That in a world where your every move is tagged, flagged and logged, there is nothing more that could ...

Real-time energy audit reduces power consumption

Oct 23, 2013

Governments are pressuring industries to reduce energy consumption for both environmental and economic reasons. Optimizing factory processes and improving equipment can lower energy usage but this not only ...

Researchers turn to machines to identify breast cancer type

Dec 02, 2013

Researchers from the University of Alberta and Alberta Health Services have created a computer algorithm that successfully predicts whether estrogen is sending signals to cancer cells to grow into tumours in the breast. By ...

Quantum dots make efficient decisions

Nov 22, 2013

Even the simplest forms of life face an endless barrage of decisions—where to search for sustenance, for example, or how to avoid predators. Various mathematical models can mimic these decision-making processes, ...

New plan of attack in cancer fight

Jul 19, 2013

New research conducted by Harvard scientists is laying out a road map to one of the holy grails of modern medicine: a cure for cancer.

Recommended for you

Tackling urban problems with Big Data

17 hours ago

Paul Waddell, a city planning professor at the University of California, Berkeley, with a penchant for conducting research with what he calls his "big urban data," is putting his work to a real-world test ...

Computer-assisted accelerator design

Apr 22, 2014

Stephen Brooks uses his own custom software tool to fire electron beams into a virtual model of proposed accelerator designs for eRHIC. The goal: Keep the cost down and be sure the beams will circulate in ...

User comments : 0

More news stories

Genetic code of the deadly tsetse fly unraveled

Mining the genome of the disease-transmitting tsetse fly, researchers have revealed the genetic adaptions that allow it to have such unique biology and transmit disease to both humans and animals.