Can artificial intelligence tell a polar bear from a can opener?

Can artificial intelligence tell a polar bear from a can opener?
Teapot with golf ball pattern. Credit: Nicholas Baker/PLOS Computational Biology

How smart is the form of artificial intelligence known as deep learning computer networks, and how closely do these machines mimic the human brain? They have improved greatly in recent years, but still have a long way to go, a team of UCLA cognitive psychologists reports in the journal PLOS Computational Biology.

Supporters have expressed enthusiasm for the use of these networks to do many individual tasks, and even jobs, traditionally performed by people. However, results of the five experiments in this study showed that it's easy to fool the networks, and the networks' method of identifying objects using computer vision differs substantially from .

"The machines have severe limitations that we need to understand," said Philip Kellman, a UCLA distinguished professor of psychology and a senior author of the study. "We're saying, 'Wait, not so fast.'"

Machine vision, he said, has drawbacks. In the first experiment, the psychologists showed one of the best deep learning networks, called VGG-19, color images of animals and objects. The images had been altered. For example, the surface of a golf ball was displayed on a teapot; zebra stripes were placed on a camel; and the pattern of a blue and red argyle sock was shown on an elephant. VGG-19 ranked its top choices and chose the correct item as its first choice for only five of 40 objects.

"We can fool these artificial systems pretty easily," said co-author Hongjing Lu, a UCLA professor of psychology. "Their learning mechanisms are much less sophisticated than the human mind."

VGG-19 thought there was a 0 percent chance that the elephant was an elephant and only a 0.41 percent chance the teapot was a teapot. Its first choice for the teapot was a golf ball, which shows that the artificial intelligence looks at the texture of an object more so than its shape, said lead author Nicholas Baker, a UCLA psychology graduate student.

"It's absolutely reasonable for the golf ball to come up, but alarming that the teapot doesn't come up anywhere among the choices," Kellman said. "It's not picking up shape."

Humans identify objects primarily from their shape, Kellman said. The researchers suspected the computer networks were using a different method.

Can artificial intelligence tell a polar bear from a can opener?
Black-outlined white hammer. Credit: PLOS Computational Biology/

In the second experiment, the psychologists showed images of glass figurines to VGG-19 and to a second deep learning network, called AlexNet. VGG-19 performed better on all the experiments in which both networks were tested. Both networks were trained to recognize objects using an image database called ImageNet.

However, both networks did poorly, unable to identify the glass figurines. Neither VGG-19 nor AlexNet correctly identified the figurines as their first choices. An elephant figurine was ranked with almost a 0 percent chance of being an elephant by both networks. Most of the top responses were puzzling to the researchers, such as VGG-19's choice of "website" for "goose" and "can opener" for "polar bear." On average, AlexNet ranked the correct answer 328th out of 1,000 choices.

"The machines make very different errors from humans," Lu said.

In the third experiment, the researchers showed 40 drawings outlined in black, with images in white, to both VGG-19 and AlexNet. These first three experiments were meant to discover whether the devices identified objects by their shape.

The networks again did a poor job of identifying such items as a butterfly, an airplane and a banana.

The goal of the experiments was not to trick the networks, but to learn whether they identify objects in a similar way to humans, or in a different manner, said co-author Gennady Erlikhman, a UCLA postdoctoral scholar in psychology.

In the fourth experiment, the researchers showed both networks 40 images, this time in solid black.

With the black images, the networks did better, producing the correct object label among their top five choices for about 50 percent of the objects. VGG-19, for example, ranked an abacus with a 99.99 percent chance of being an abacus and a cannon with a 61 percent chance of being a cannon. In contrast, VGG-19 and AlexNet each thought there was less than a 1 percent chance that a white hammer (outlined in black) was a hammer.

Can artificial intelligence tell a polar bear from a can opener?
Black abacus. PLOS Computational Biology/Sweet Clip

The researchers think the networks did much better with the black objects because the items lack what Kellman calls "internal contours"—edges that confuse the machines.

In experiment five, the researchers scrambled the images to make them more difficult to recognize, but they preserved pieces of the objects. The researchers selected six images the VGG-19 network got right originally, and scrambled them. Humans found these hard to recognize. VGG-19 got five of the six images right, and was close on the sixth.

As part of the fifth experiment, the researchers tested UCLA undergraduate students, in addition to VGG-19. Ten students were shown objects in black silhouettes—some scrambled to be difficult to recognize and some unscrambled, some objects for just one second, and some for as long as the students wanted to view them. The students correctly identified 92 percent of the unscrambled objects and 23 percent of the scrambled ones with just one second to view them. When the students could see the silhouettes for as long as they wanted, they correctly identified 97 percent of the unscrambled objects and 37 percent of the scrambled objects.

What conclusions do the psychologists draw?

Humans see the entire object, while the artificial intelligence networks identify fragments of the object.

"This study shows these systems get the right answer in the images they were trained on without considering shape," Kellman said. "For humans, overall shape is primary for recognition, and identifying images by overall shape doesn't seem to be in these deep learning systems at all."

There are dozens of deep learning machines, and the researchers think their findings apply broadly to these devices.

Explore further

New AI system mimics how humans visualize and identify objects

More information: Nicholas Baker et al. Deep convolutional networks do not classify based on global object shape, PLOS Computational Biology (2018). DOI: 10.1371/journal.pcbi.1006613
Journal information: PLoS Computational Biology

Citation: Can artificial intelligence tell a polar bear from a can opener? (2019, January 7) retrieved 15 October 2019 from
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors

User comments

Jan 07, 2019
"It's not picking up shape."

How can it? There's nothing in a teapot that makes it a teapot - it's the intent or potential use of the object, not the shape or appearance that makes it what it is. There's thousands of different looking teapots, or objects that serve the purpose.

The problem is that they're trying to apply a kind of "platonic idealism" to objects. It's the idea that there is teapot-ness that can be identified and reduced to some set of abstract features, rather than a million different teapots that may or may not be teapots depending on who you ask.

It's the same problem as, when is a stool a seat and when is it a small coffee table. Insisting on either alternative and holding on to it is a wrong answer. You need a context, which the machine algorithm lacks entirely. It can't use the shape of the object to identify it because the shape alone is meaningless..

Jan 08, 2019
The result is not surprising as CNN (convolutional neural networks) which are commonly used in image recognition tasks work with a number of small convolution kernels (hence the name). These are small patches of a couple of pixels across that react to a specific orientation/texture.

In other words: CNNs do not look for "overall shape" but characteristic shape patches and then - on a coarser level - spatial relation of recognized patches.

That said I find it a bit weird that the researcher claims that the 'correct' classification of the above image is a "teapot with golf ball texture". One could easily argue that it is a "teapot shaped golf ball".

Just because the human vision system/brain says an image is X doesn't mean it is X. Our vision system can be fooled quite easily also (e.g. M.C. Escher images or any number of optical illusions)

Jan 08, 2019
The amazing thing about a dancing bear is not how well the bear dances.

What's really being tested here is the methods by which we teach expert systems.

Jan 08, 2019
NNs learn by example. If we don't feed them outlines as an example (but just images of hammers) then there's no reason why they should learn that an outline signifies a particular type of object.

In the end the author is faulting the neural network for CORRECTLY reacting to : "ceci n'est pas une pipe"

Jan 08, 2019
Why I love physorg: nowhere else could I see someone refer to Rene Magritte and expect folks to get it.

Jan 08, 2019
A favorite: The Fair Captive: https://www.renem...tive.jsp

I particularly like the flaming tuba.

Jan 08, 2019
When AI is perfected and that question, in the headline, is posed to It. I fully expect It to respond -- "Why the fuck should I care?"

Jan 08, 2019
In the end the author is faulting the neural network for CORRECTLY reacting to : "ceci n'est pas une pipe"

Not so fast.

The Short Stick

Shuzan held out a short stick and said, "If you call this a short stick, you oppose its reality. If you do not call it a short stick, you ignore the fact. Now what do you wish to call this?"

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more