Move over, Turing Test. Winograd Schema Challenge in town

August 14, 2014 by Nancy Owano weblog
Eugene Goostman. Credit: Vladimir Veselov and Eugene Demchenko

Isn't there something better than the Turing test to measure computer intelligence? Is the Turing Test the best we have to judge a machine's capability to produce behavior that requires human thought? Doubt was expressed by many when it was announced that the program Eugene Goostman had fooled 33 percent of judges into thinking the chatbot was a human after five minutes of questioning. Anders Sandberg, a University of Oxford research fellow, said in The Conversation that "Eugene's success in the Turing test may tell us more about how weak we humans are when it comes to detecting intelligence and agency in conversation than about how smart our machines are." Gary Marcus, a professor of cognitive science at New York University, said that "It turns out that you can do a lot of misdirection, answer sarcastically, and evade the fact that you are a computer. So all it really shows is you can fool humans for a short period of time, about five minutes - not all of the humans, but maybe more than you might've expected - by having these sort of personality twitches.

Now there is a clear replacement effort, called the Winograd Schema Challenge. This is to be a yearly event designed to judge if a truly models human intelligence. The deadline is October 1, 2015, where $25,000 will be awarded to the program that passes the test.

Who or what is the "Winograd" in the contest title? According to I Programmer, the test elaborates ideas from Terry Winograd, known for developing an AI-based framework for understanding natural language. This Winograd test was developed by Hector Levesque, a professor of computer science at the University of Toronto, who won the 2013 IJCAI (International Joint Conference on Artificial Intelligence) Award for Research Excellence.

Sponsors are Nuance Communications, a voice and language solutions company, in cooperation with Commonsense Reasoning; the latter, as its title suggests, is a research group focused on research in commonsense reasoning, and they will administer and evaluate the Winograd Schema Challenge. Contest details are on their site. "Rather than base the test on the sort of short free-form conversation suggested by the Turing Test," said the site posting, "the Winograd Schema Challenge (WSC) poses a set of multiple-choice questions that have a particular form." Sample questions are provided.

Charles Ortiz, research scientist at Nuance, said the benefits of such a challenge "can help guide more systematic research efforts that will, in the process, allow us to realize new systems that push the boundaries of current AI capabilities and lead to smarter personal assistants and intelligent systems."

Explore further: Chatbot Eugene put to Turing test wins first prize

More information:

Related Stories

Chatbot Eugene put to Turing test wins first prize

June 27, 2012

( -- Billed as the biggest Turing test ever staged, a contest took place on June 23 in the UK, yet another event commemorating the 100th anniversary of the birth of Alan Turing. The twist is that the contest was ...

Mitsuku chatbot has good answers for the Loebner Prize

September 17, 2013

( —A chatbot named Mitsuku has won the Loebner Prize 2013, announced over the weekend, beating out three other contestants for the top prize of a bronze medal and $4,000. Mitsuku's creator is Steve Worswick, Mitsuku's ...

Recommended for you


Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Aug 14, 2014
"Isn't there something better than the Turing test to measure computer intelligence?"
-The Turing test does not measure intelligence. It only passes or fails a party in deciding if an agent is machine or human.

not rated yet Aug 15, 2014
Both of these tests are severely lacking. You can't test to see if the computer's intelligence matches a human unless you give the computer a chance to be exposed to all the stimuli of our world and interactions for a few years, and then see if the computer can do some intelligent, creative reasoning to solve some novel problems in this world (or even better). Otherwise the tests are just measuring the computer's ability to stay within a scripted test in a highly constrained environment, which takes less computing power than an ant's brain.
Aug 15, 2014
This comment has been removed by a moderator.
5 / 5 (2) Aug 15, 2014
Both of these tests are severely lacking. You can't test to see if the computer's intelligence matches a human . . .

The article makes no such claim regarding the test.

Regarding "intelligence", it's hard to define it in a way that can be measured. Computers are already much better at accomplishing many of the things we must otherwise use our minds to do.
not rated yet Aug 15, 2014
"The article makes no such claim regarding the test."

-Seeing if a computer's intelligence matches a human is kinda (not exactly) a goal of the Turing test. So, the article doesn't need to make that claim; it is assumed.

"Computers are already much better at accomplishing many of the things we must otherwise use our minds to do."

-Well, a car is better able to travel than I am, at least as far as endurance goes. Are we going to define intelligence based on comparison to human abilities? There are some people who are much better at many things than I am. Does that mean I'm not intelligent? Because now we're overlapping measurement with incidence; how intelligent versus if intelligent.

"it's hard to define it in a way that can be measured"

-Psychologists and educators do it all the time.

I've said this before elsewhere. The only useful approach to machine intelligence is in evaluating the work it performs and problems it solves.
not rated yet Aug 15, 2014
Also should mention that I have a degree in neuroscience, and I work in information technology in my profession, and I laugh every time I hear about some variation of a Turing test that will check if computers are approaching human intelligence. The brain is a highly parallel pattern matching machine that uses probabilistic statistics to deal with an extremely complex world, and the mystery is how creative, adaptable, intelligence thoughts are formulated to mesh with all of that info. Comparing these carefully pre-programmed, scripted tests to the brain is apples and oranges and I can't believe that even a few scientists think it's anything more than 1% progress toward "testing" machine "intelligence" in the human context.
1 / 5 (1) Aug 16, 2014
I don't understand the attitude here. The point is, the Turing Test stood unchallenged for decades; having a computer pass it is a big deal, but it's just a signpost on the road, it's not the creation of artificial intelligence. You all sound like people complaining about the speed limit without remembering that it wasn't all that long ago that people had to travel on horseback.
not rated yet Aug 16, 2014
The turing test is unchallenged? I don't think so. Turing came up with many genius breakthroughs, but the Turing test is not a meaningful test for what it was intended.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.