Context is ev ... well, something, anyway

March 5, 2010 by Larry Hardesty, Massachusetts Institute of Technology
Standard object recognition software mistakenly detects a sofa, a cabinet, and a mirror in a street scene (right), but a new MIT system corrects those errors (left) using statistical information about how often certain types of objects occur together. Image: Myung 'Jin' Choi

Today, computers can't reliably identify the objects in digital images. But if they could, they could comb through hours of video for the two or three minutes that a viewer might be interested in, or perform web searches where the search term was an image, not a sequence of words. And of course, object recognition is a prerequisite for the kind of home assistance robot that could execute an order like "Bring me the stapler." Now, MIT researchers have found a way to improve object recognition systems by using information about context. If the MIT system thinks it’s identified a chair, for instance, it becomes more confident that the rectangular thing nearby is a table.

A typical object recognition system will scan a for groups of pixels that differ from those around them; those pixels could define an edge, a corner or some other feature of an object. Usually, the system has been trained on a set of sample images, which teaches it how to correlate feature patterns with particular objects.

Some researchers have tried to use context information to refine those . But according to Myung “Jin” Choi, a grad student in MIT’s Laboratory for Information and Decision Systems and one of the leaders of the new project, those researchers were generally working with a standard training set that included examples of only about 20 different types of object. In that case, it was fairly straightforward to specify how frequently each object co-occurred with every other object in the set.

Upping the ante

A system that could recognize only 20 different objects, however, wouldn’t be very useful. And with a large number of objects, it becomes computationally impractical to consider the frequency of all possible two-object combinations. In work to be presented at the IEEE Conference on and this summer, Choi and her colleagues — including graduate student Joseph Lim and Professors Antonio Torralba and Alan Willsky — describe a different approach. Working with a training set that included more than 4,000 images and 107 different types of objects, they created algorithms that pored through the images and automatically constructed a hierarchical map of the object categories — kind of like the organizational chart for a large company, which shows who reports to whom. In the map, each object is connected to at most one object above it in the hierarchy (everyone in the organization reports to only one person), drastically reducing the number of connections that the system has to consider. The connection between any two objects is given a weight that indicates how often the objects appear together in the training images. The map also encodes information about the typical relative locations of two connected objects: buildings generally appear above roads, for instance, not below them.

When the system analyzes a new image, it uses standard object recognition algorithms to generate a list of candidate objects, together with each object’s “confidence score” — a statistical measure of how likely the object is to have been correctly identified. Then it revises those scores on the basis of the information encoded in the contextual map.

In experiments, Choi compared the performance of the bare object recognition algorithms with their performance when augmented by the contextual map. In both cases, she considered the three objects per image with the highest confidence scores. The bare algorithms correctly identified all three objects roughly 14 percent of the time; with the addition of the contextual map, the success rate jumped to about 25 percent.

Long row to hoe

Of course, that means that the system still failed to correctly identify three objects per image about 75 percent of the time, which shows just how difficult the problem of object recognition remains. “Context really is essential,” says Serge Belongie, an associate professor of computer science at the University of California, San Diego who has worked on both object recognition in general and context-based object recognition in particular. “It deserves a proper treatment, and Jin is doing that.” But Belongie cautions that context awareness will never be more than an augmentation of an underlying system that recognizes objects from visual features. “We absolutely cannot afford to take our eye off the ball of the component recognition systems that need to feed these context engines,” he says. And, he adds, to be useful, object recognition systems will need to be much more precise than today’s prototypes are. “Imagine that you take a picture of a wild mushroom while you’re hiking,” Belongie says, “and then you send it to the system to find out what it is. And it says, ‘Mushroom!’ You’re like, Thanks. That’s really useful. I knew that part.”

Nonetheless, Choi is continuing to improve her contextual-map system, against the day when the underlying algorithms are more reliable. The next version of the system, she says, will add entries to the map that, in effect, represent higher-level scene descriptions. Street scenes, for instance, may frequently feature sky, buildings and roads, while building interiors may frequently feature floors, walls and windows. The system won’t need to explicitly label these additional map entries, however; it will simply register them as foci around which certain types of objects regularly cluster. She’s confident that this modification will make the added benefits of context awareness even more acute.

Explore further: Seeing things: Researchers teach computers to recognize objects

Related Stories

Out of sight, out of mind? Not really

August 23, 2005

By playing a trick on the brain, neuroscientists at MIT's McGovern Institute for Brain Research have discovered one way that humans naturally recognize objects.

Shape encoding may start in the retina

September 12, 2007

New evidence from the University of Southern California suggests that there may be dedicated cells in the retina that help compile small bits of information in order to recognize objects. The research was conducted by Ernest ...

Teaching computers to recognise

June 1, 2009

( -- Recognising objects and groups of objects is something we humans take for granted. For computers, this is far from straightforward. A European project has come up with novel solutions to this conundrum.

Recommended for you

Female golden snub-nosed monkeys share nursing of young

February 21, 2019

An international team of researchers including The University of Western Australia and China's Central South University of Forestry and Technology has discovered that female golden snub-nosed monkeys in China are happy to ...

When does one of the central ideas in economics work?

February 20, 2019

The concept of equilibrium is one of the most central ideas in economics. It is one of the core assumptions in the vast majority of economic models, including models used by policymakers on issues ranging from monetary policy ...

In colliding galaxies, a pipsqueak shines bright

February 20, 2019

In the nearby Whirlpool galaxy and its companion galaxy, M51b, two supermassive black holes heat up and devour surrounding material. These two monsters should be the most luminous X-ray sources in sight, but a new study using ...

Research reveals why the zebra got its stripes

February 20, 2019

Why do zebras have stripes? A study published in PLOS ONE today takes us another step closer to answering this puzzling question and to understanding how stripes actually work.


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.