Improving machine learning with an old approach

December 22, 2015 by Grayson York
Improving machine learning with an old approach
Rong Ge is an assistant professor of computer science.

Computer scientist Rong Ge has an interesting approach to machine learning. While most machine learning specialists will build an algorithm which molds to a specific dataset, Ge builds an algorithm which he can guarantee will perform well across many datasets.

A paper he wrote as a postdoc at Microsoft Research, Escaping From Saddle Points—Online Stochastic Gradient for Tensor Decomposition, describes how a programmer can use the imprecision of a common , known as stochastic gradient descent, to his advantage.

Normally this algorithm is used to speed up a slow learning process by only approximating the correct solution rather than working harder to get precision; however, Ge and his colleagues found that the small amount of "noise" created by the algorithm can be the saving grace of an algorithm which would otherwise be trapped by its own perfectionism.

"This algorithm is not normally used for this purpose," Ge says, "It is normally used as a heuristic to approximate the solution to a problem."

Noise allows the algorithm to escape from something called a "saddle point" on the function which the stochastic gradient is trying to optimize, which looks sort of like a sine wave. Ge describes gradient descent as being like a ball rolling down a hill. When on the slope of the hill it always seeks the lowest point, but if it is at a saddle point, a high point on a "ridge" between two different slopes, it will not start rolling.

Stochastic gradient descent remedies this problem by jostling the ball enough to start it rolling down one of the slopes. But one cannot be sure which way it will roll.

The results he has obtained relate to a certain branch of machine learning known as unsupervised learning. One common problem from unsupervised learning is "clustering," in which the algorithm attempts to find clusters of points which are similar to each other while different from the other points in the set. The algorithm then labels each of the clusters which it has found, and returns its solution to the programmer.

A key requirement for the final result of the algorithm to be correct is that the two slopes end at low points of equal depth, which represent optimal solutions. Often two solutions will appear different at first glance, but will actually represent the same set of clusters, different only because the labels on clusters were switched. Ge said one of the hardest parts of his process was designing functions that have this property.

These results are guaranteed to hold so long as the dataset is not provided by someone who has specifically engineered it to break the algorithm. If someone has designed such a dataset the problem becomes "NP hard," meaning that there is little hope for even the best algorithms to solve it.

Hopefully we will see more growth in this field, especially interesting results such as this which find that the weaknesses associated with a certain algorithm can actually be strengths under different circumstances.

Explore further: Microsoft open sources Distributed Machine Learning Toolkit for more efficient big data research

More information: Escaping From Saddle Points —- Online Stochastic Gradient for Tensor Decomposition.

Related Stories

Scott Aaronson on Google's new quantum-computing paper

December 14, 2015

In 2010, a Canadian company called D-Wave announced that it had begun production of what it called the world's first commercial quantum computer, which was based on theoretical work done at MIT. Quantum computers promise ...

Scientists teach machines to learn like humans

December 10, 2015

A team of scientists has developed an algorithm that captures our learning abilities, enabling computers to recognize and draw simple visual concepts that are mostly indistinguishable from those created by humans. The work, ...

Path-finder computes search strategy to find Waldo

February 9, 2015

London-born Where's Waldo? creator Martin Handford started out as a commercial illustrator with a specialty in drawing crowd scenes. By now we know how that talent supported his success and fame. His first book, Where's Waldo?, ...

Recommended for you

Volumetric 3-D printing builds on need for speed

December 11, 2017

While additive manufacturing (AM), commonly known as 3-D printing, is enabling engineers and scientists to build parts in configurations and designs never before possible, the impact of the technology has been limited by ...

Tech titans ramp up tools to win over children

December 10, 2017

From smartphone messaging tailored for tikes to computers for classrooms, technology titans are weaving their way into childhoods to form lifelong bonds, raising hackles of advocacy groups.

Mapping out a biorobotic future  

December 8, 2017

You might not think a research area as detailed, technically advanced and futuristic as building robots with living materials would need help getting organized, but that's precisely what Vickie Webster-Wood and a team from ...

Lyft puts driverless cars to work in Boston

December 6, 2017

Lyft on Wednesday began rolling out self-driving cars with users of the smartphone-summoned ride service in Boston in a project with technology partner nuTonomy.


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.