Machine learning branches out

November 14, 2013 by Larry Hardesty

Much artificial-intelligence research is concerned with finding statistical correlations between variables: What combinations of visible features indicate the presence of a particular object in a digital image? What speech sounds correspond with instances of what words? What medical, genetic, and environmental factors are correlated with what diseases?

As the number of variables grows, calculating their aggregate statistics becomes dauntingly complex. But that calculation can be drastically simplified if you know something about the structure of the data—that, for instance, the sound corresponding to the letter "T" is frequently followed by the sound corresponding to the letter "R," but never by the sound corresponding to the letter "Q."

In a paper being presented in December at the annual conference of the Neural Information Processing Systems Foundation, MIT researchers describe a new technique that expands the class of data sets whose structure can be efficiently deduced. Not only that, but their technique naturally describes the data in a way that makes it much easier to work with.

In the paper, they apply their technique to several sample data sets, including information about commercial airline flights. Using only flights' scheduled and actual departure times, the algorithm can efficiently infer vital information about the propagation of flight delays through U.S. airports. It also identifies those airports where delays are most likely to have far-reaching repercussions, which makes it simpler to reason about the behavior of the network as a whole.

Thinking graphically

In technical terms, the researchers' work concerns probabilistic graphical models. In this context, a graph is a mathematical construct that consists of nodes and edges, usually depicted as, respectively, circles and the lines that connect them. A network diagram is a familiar example of a graph; a family tree is another.

In a graphical model, the edges have an associated number, which describes the statistical relationship between the nodes. In the linguistic example, the nodes representing the sounds corresponding to "T" and "R" would be connected by a highly weighted edge, while the nodes corresponding to "T" and "Q" wouldn't be connected at all.

Graphical models simplify reasoning about data correlations because they eliminate the need to consider certain dependencies. Suppose, for instance, that your artificial-intelligence algorithm is looking for diagnostically useful patterns in a mountain of medical data, where the variables include patients' symptoms, their genetic information, their treatment histories, and prior diagnoses. Without the graph structure, the algorithm would have no choice but to evaluate the relationships among all the variables at once. But if it knows, for instance, that gene "G" is a cause of disease "D," which is treated with medication "M," which has side effect "S," then it has a much simpler time determining whether, for instance, "S" is a previously unidentified indicator of "D." A graphical model is a way of encoding those types of relationships so that they can be understood by machines.

Historically, graphical models have sped up machine-learning algorithms only when they've had a few particular shapes, such as that of a tree. A tree is a graph with no closed loops: In a , for instance, a closed loop would indicate something biologically impossible—that, say, someone is both parent and sibling to the same person.

Out of the loop

According to Ying Liu, a graduate student in MIT's Department of Electrical Engineering and Computer Science who co-wrote the new paper with his advisor, Alan Willsky, the Edwin Sibley Webster Professor of Electrical Engineering, loops pose problems because they can make statistical inference algorithms "overconfident." The algorithm typically used to infer statistical relationships within graphical models, Liu explains, is a "message-passing algorithm, where each node sends messages to only its neighbors, using only local information and incoming messages from other neighbors. It's a very good way to distribute the computation."

If the graph has loops, however, "a node may get some message back, but this message is partly from itself. So it gets overconfident about the beliefs."

In prior work, Liu and Willsky showed that efficient machine learning can still happen in a "loopy" probabilistic graph, provided it has a relatively small "feedback vertex set" (FVS)—a group of nodes whose removal turns a loopy graph into a tree. In the airline-flight example, many of the nodes in the FVS were airline hubs, which have flights to a large number of sparsely connected airports. The same structure is seen in other contexts in which machine learning is currently applied, Liu says, such as social networking..

In the new paper, they show that the structure of a graphical model can be deduced by similar means. A structureless data set is equivalent to a graph in which every node is connected to every other node. Liu and Willsky's algorithm goes through the graph, sequentially removing nodes that break loops and, using the efficient algorithm they demonstrated previously, calculating how close the statistical dependencies of the resulting graph are to those of the fully connected graph.

In this manner, the builds up an FVS for the graph. What remains is a tree—or something very close to a tree—that allows for efficient calculation. In practice, Liu and Willsky found that in order to make machine learning efficient, they required an FVS whose size was only about the logarithm of the total number of nodes in the graph.


Sujay Sanghavi, an assistant professor of electrical and computer engineering at the University of Texas at Austin, has also studied the problem of learning the structure of graphical models. Of the MIT researchers' work, Sanghavi says, "They kind of decompose the problem into two problems, each of which is much simpler to solve individually, and then they alternate between the two. That is a much better way to solve the problem than the original one, which has both issues mixed together."

In cases where the FVS has been identified, Sanghavi says, "You can easily find the graph that does not include those nodes. That's a cute observation, that once you have these nodes, the rest of the problem is easy. The other problem is also easy, which is, I give you only those nodes, which are a small number, and you need to find only those edges which have one of the endpoints as those nodes. It's just when you try to solve both problems together that it becomes hard. That's the main insight in this paper, and I think it's quite nice."

Sanghavi has recently been using graphical models to examine the structure of gene regulatory networks, and he's curious to see whether the MIT researchers' technique will apply to that problem. "Some genes fire and they fire other genes and so on," Sanghavi explains. "Really what you want to do is find the dependence network between genes, and that can be posed as a Gaussian graphical-model learning problem. It would be interesting to see if their methods perform well."

Explore further: Short algorithm, long-range consequences

More information:

Related Stories

Short algorithm, long-range consequences

March 1, 2013

In the last decade, theoretical computer science has seen remarkable progress on the problem of solving graph Laplacians—the esoteric name for a calculation with hordes of familiar applications in scheduling, image processing, ...

Explained: Graphs

December 17, 2012

When most people hear the word "graph," an image springs to mind: a pair of perpendicular lines overlaid with a line, a curve, or bars of different heights.

Frog calls inspire a new algorithm for wireless networks

July 17, 2012

Males of the Japanese tree frog have learnt not to use their calls at the same time so that the females can distinguish between them. Scientists at the Polytechnic University of Catalonia have used this form of calling behaviour ...

Reliable communication, unreliable networks

August 5, 2013

Now that the Internet's basic protocols are more than 30 years old, network scientists are increasingly turning their attention to ad hoc networks—communications networks set up, on the fly, by wireless devices—where ...

Recommended for you

Making AI systems that see the world as humans do

January 19, 2017

A Northwestern University team developed a new computational model that performs at human levels on a standard intelligence test. This work is an important step toward making artificial intelligence systems that see and understand ...

Firms push hydrogen as top green energy source

January 18, 2017

Over a dozen leading European and Asian firms have teamed up to promote the use of hydrogen as a clean fuel and cut the production of harmful gasses that lead to global warming.

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

1 / 5 (2) Nov 14, 2013
Good news. Now we will be able to misattribute causation to correlations (or get the cause and effect wrong way around) much faster. Seriously though this will be a powerful tool. I hope we can keep it from being abused too much without limiting its value to society. Unseen correlations could be used to make plans for ideal conditions like scheduling an election to maximize one party's turnout or chose a time for a terrorist event which will harm more of a specific group than others.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.