System that replaces human intuition with algorithms outperforms human teams

October 16, 2015 by Larry Hardesty

Big-data analysis consists of searching for buried patterns that have some kind of predictive power. But choosing which "features" of the data to analyze usually requires some human intuition. In a database containing, say, the beginning and end dates of various sales promotions and weekly profits, the crucial data may not be the dates themselves but the spans between them, or not the total profits but the averages across those spans.

MIT researchers aim to take the human element out of big-data analysis, with a new system that not only searches for patterns but designs the feature set, too. To test the first prototype of their system, they enrolled it in three data science competitions, in which it competed against human teams to find predictive patterns in unfamiliar data sets. Of the 906 teams participating in the three competitions, the researchers' "Data Science Machine" finished ahead of 615.

In two of the three competitions, the predictions made by the Data Science Machine were 94 percent and 96 percent as accurate as the winning submissions. In the third, the figure was a more modest 87 percent. But where the teams of humans typically labored over their prediction algorithms for months, the Data Science Machine took somewhere between two and 12 hours to produce each of its entries.

"We view the Data Science Machine as a natural complement to human intelligence," says Max Kanter, whose MIT master's thesis in computer science is the basis of the Data Science Machine. "There's so much data out there to be analyzed. And right now it's just sitting there not doing anything. So maybe we can come up with a solution that will at least get us started on it, at least get us moving."

Between the lines

Kanter and his thesis advisor, Kalyan Veeramachaneni, a research scientist at MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL), describe the Data Science Machine in a paper that Kanter will present next week at the IEEE International Conference on Data Science and Advanced Analytics.

Veeramachaneni co-leads the Anyscale Learning for All group at CSAIL, which applies machine-learning techniques to practical problems in , such as determining the power-generation capacity of wind-farm sites or predicting which students are at risk for dropping out of online courses.

"What we observed from our experience solving a number of data science problems for industry is that one of the very critical steps is called feature engineering," Veeramachaneni says. "The first thing you have to do is identify what variables to extract from the database or compose, and for that, you have to come up with a lot of ideas."

In predicting dropout, for instance, two crucial indicators proved to be how long before a deadline a student begins working on a problem set and how much time the student spends on the course website relative to his or her classmates. MIT's online-learning platform MITx doesn't record either of those statistics, but it does collect data from which they can be inferred.

Featured composition

Kanter and Veeramachaneni use a couple of tricks to manufacture candidate features for data analyses. One is to exploit structural relationships inherent in database design. Databases typically store different types of data in different tables, indicating the correlations between them using numerical identifiers. The Data Science Machine tracks these correlations, using them as a cue to feature construction.

For instance, one table might list retail items and their costs; another might list items included in individual customers' purchases. The Data Science Machine would begin by importing costs from the first table into the second. Then, taking its cue from the association of several different items in the second table with the same purchase number, it would execute a suite of operations to generate candidate features: total cost per order, average cost per order, minimum cost per order, and so on. As numerical identifiers proliferate across tables, the Data Science Machine layers operations on top of each other, finding minima of averages, averages of sums, and so on.

It also looks for so-called categorical data, which appear to be restricted to a limited range of values, such as days of the week or brand names. It then generates further feature candidates by dividing up existing features across categories.

Once it's produced an array of candidates, it reduces their number by identifying those whose values seem to be correlated. Then it starts testing its reduced set of features on sample data, recombining them in different ways to optimize the accuracy of the predictions they yield.

"The Data Science Machine is one of those unbelievable projects where applying cutting-edge research to solve practical problems opens an entirely new way of looking at the problem," says Margo Seltzer, a professor of at Harvard University who was not involved in the work. "I think what they've done is going to become the standard quickly—very quickly."

Explore further: New techniques could help identify students at risk for dropping out of online courses

More information: "Deep Feature Synthesis: Towards Automating Data Science Endeavors." groups.csail.mit.edu/EVO-DesignOpt/groupWebSite/uploads/Site/DSAA_DSM_2015.pdf

Related Stories

Recommended for you

Swiss unveil stratospheric solar plane

December 7, 2016

Just months after two Swiss pilots completed a historic round-the-world trip in a Sun-powered plane, another Swiss adventurer on Wednesday unveiled a solar plane aimed at reaching the stratosphere.

Solar panels repay their energy 'debt': study

December 6, 2016

The climate-friendly electricity generated by solar panels in the past 40 years has all but cancelled out the polluting energy used to produce them, a study said Tuesday.

Wall-jumping robot is most vertically agile ever built

December 6, 2016

Roboticists at UC Berkeley have designed a small robot that can leap into the air and then spring off a wall, or perform multiple vertical jumps in a row, resulting in the highest robotic vertical jumping agility ever recorded. ...

15 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

Vidyaguy
5 / 5 (2) Oct 16, 2015
The significant problem is that the speed of this kind of pseudo-intuition renders the humans that use it and develop confidence in it dependent and trusting. Ultimately, like so many scientists and engineers I have observed, the critical facility to question the machine output decays. And yet, there will be situations in which genuine and unique human intuition would have been better, albeit slower. A faster, more tightly-focussed intuitive machine conclusion that is trustingly accepted, but too narrow, can lead to a tortuous path that is slow or a dead-end, because it discourages parallel thinking. The really bright human is remarkably good at using dynamic intuitive results, entertaining and updating parallel intuitive possibilities, while acting on it.

One of the problems with AI, if it does transcend that special human capability, is that human existence either loses its purpose, or it manages to make the shift from seeker to helper.
SuperThunder
1 / 5 (3) Oct 16, 2015
@Vidyaguy You made me think of some possibly testable questions. Can the Data Science Machine predict humans turning against it if they behave in a way that looks like they're dependent and trusting? If the Data Science Machine is in charge of their food supply and watches all of their actions, can they successfully plot and overthrow it without being predicted and exterminated for crimes against the Data Science Machine? Can the Data Science Machine quantify flirting?
caroline_green_c91
1 / 5 (1) Oct 16, 2015
This post made me think.
-Caroline
Creative BioMart
I Have Questions
1 / 5 (1) Oct 17, 2015
We can't do anything about it, super intelligent machines are coming. They are as we speak gathering information from the internet and classifying it using machine learning. Even better people are now making machines that can be trained to train themselves. Personally I look forward to the future.
sascoflame
not rated yet Oct 17, 2015
Unfamiliar data is not where humans do best. Given that humans have never had any success in the experiments we hear about from Physics.Org why didn't they test the machine against something aurally intelligent like a chimp or an iron bar?
EyeNStein
1 / 5 (1) Oct 17, 2015
Humans are better at intuitively extrapolating from 'the familiar' into 'the similar' data sets.
The fact that fast methodical data crunching computers often outperform the human preference to guess then check solutions on any data set is not at all surprising. The fact that humans still outperformed computers on some data solution tasks just shows we are still more flexible than programming will ever be.
jerromyjon
3 / 5 (2) Oct 17, 2015
"The fact that humans still outperformed computers on some data solution tasks just shows we are still more flexible than programming will ever be."

The fact is that no human understands how their mind does what it does in a 1+1=2 basis. The only way to level the field is to allow the "machine" to decide what it thinks about and give it goals to achieve to "survive". When it is implemented correctly it could determine it needs to take over the world to prevent humanity from destroying itself, which it needs for survival. That would be inhumane to the machine.

Or we could forget this mostly useless data and concentrate on saving ourselves and when we achieve a sustainable peaceful future the "machines" could mutually coexist with us, complementing our existence.
Dug
not rated yet Oct 17, 2015
Spoken like a true algorithm. Since we don't know the competency level of the human teams it makes the tests questionable if not meaningless.

As an average human intuit, I find it suspicious that Google can't produce a highly accurate search engine (actually less specific than it use to be) - while MIT researchers can produce a program that sorts less than obvious but significant patterns from data. Of course MIT hasn't yet faced monetizing its algorithms and the human greed factor that invariably reduces their performance efficiency.
jeffensley
3 / 5 (2) Oct 17, 2015
One of the problems with AI, if it does transcend that special human capability, is that human existence either loses its purpose, or it manages to make the shift from seeker to helper.


Exactly... science needs guidance from a moral body composed of spiritual leaders, philosophers
and researchers to help choose paths to follow and those to avoid. Right now, "I wonder if we can do this?" is the only question that seems to be asked in regards to research. In regards to the subject above, no one seems to play out the consequences of letting machines do everything (including thinking) for us. That scenario needs to be considered for every form of research that has the potential to change the world as we know it.. astrophysics, genetics, atomic physics, etc...
jerromyjon
not rated yet Oct 18, 2015
"no one seems to play out the consequences of letting machines do everything (including thinking) for us."

I see the effects on a regular basis. Simple example, how many people still remember important phone numbers? In my experience perhaps 10% feel it is important to maintain unaided mental abilities. Very scary.
ProcrastinationAccountNumber3659
1 / 5 (4) Oct 18, 2015
@jerromyjon There is good reason why people find remember numbers and other mental tasks unimportant. I find your way of doing things is part of the "old guard". Things like phone numbers are not remembered because there is no reason to remember it. It does not mean that peoples mental capabilities are lesser, they are just directed somewhere else. I read somewhere that younger people tend to prioritize how to and where to find information over memorizing the information. In todays world with such vast amounts of information how is this approach not the better method?

In the end computers are just tools. We have progressed significantly with tools to augment our physical capabilities and now we are using tools that augment our mental capabilities.
marcush
2.3 / 5 (3) Oct 21, 2015
science needs guidance from a moral body composed of spiritual leaders, philosophers
and researchers to help choose paths to follow and those to avoid. Right now, "I wonder if we can do this?" is the only question that seems to be asked in regards to research. In regards to the subject above, no one seems to play out the consequences of letting machines do everything (including thinking) for us. That scenario needs to be considered for every form of research that has the potential to change the world as we know it.. astrophysics, genetics, atomic physics, etc...


So you'd rather prejudices narrow our knowledge? e.g. creationists tell us not to study evolution? You've got to be joking. Its up to us how we use the findings of Science - after the fact.
marcush
1 / 5 (1) Oct 21, 2015
As for

One of the problems with AI, if it does transcend that special human capability, is that human existence either loses its purpose, or it manages to make the shift from seeker to helper.


You have to be careful when using the term "purpose" as it can imply something supernatural. Perhaps what you mean is a satisfying lifestyle. Again, its up to us to decide how technology is utilised whether its AI or H-bombs. Technology has always been a two-edged sword.
marcush
1 / 5 (1) Oct 21, 2015
I think what a few people here are implying is that often technology is used in a way that doesn't seem to favour the majority. I think this is a very legitimate concern but it is more general than just AI and is perhaps more of a socio/economic/political problem.
jeffensley
5 / 5 (2) Oct 26, 2015
So you'd rather prejudices narrow our knowledge? e.g. creationists tell us not to study evolution? You've got to be joking. Its up to us how we use the findings of Science - after the fact.


Where did I say let Creationists make the calls? I would intentionally include people from a variety of backgrounds. "Discovery" alone is not a good enough reason to pursue some paths. Why don't we do experiments on human children to see how much pain they can tolerate before losing consciousness?

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.