Algorithm for identifying object boundaries in digital images 50,000 times more efficient than predecessor

May 31, 2011 by Larry Hardesty
Courtesy of Jason Chang

Determining the boundaries of objects is one of the central problems in computer vision. It's something humans do with ease: We glance out the window and immediately see cars as distinct from the sidewalk and street and the people walking by, or lampposts as distinct from the facades of the buildings behind them. But duplicating that facility in silicon has proven remarkably difficult.

One of the best ways for a computer to determine is to make lots of guesses and compare them; the boundaries that most of the guesses agree on are likeliest to be accurately drawn. Until now, that process has been monstrously time consuming. But Jason Chang, a graduate student in the Department of and Computer Science, and John Fisher, a principal research scientist at MIT's and Artificial Intelligence Lab (CSAIL), have figured out how to make it at least 50,000 times more efficient. Their findings could help improve systems for , for tracking and for 3-D , among others.

One reason that boundary determination — or as it's more commonly known, image segmentation — is such a hard problem is that there's no one right answer. Ask 10 people to trace the boundaries of objects in a digital image, and you'll likely get 10 different responses. "We want an algorithm that's able to segment images like humans do," Chang says. "But because humans segment images differently, we shouldn't come up with one segmentation. We should come up with a lot of different segmentations that kind of represent what humans would also segment."

Populating the field

To generate its set of candidate segmentations, Chang and Fisher's algorithm strikes different balances between two measures of segmentation quality. One measure is the difference between the parts of the image on opposite sides of each boundary. The most obvious way to gauge difference is by color value: A segmentation with blue pixels on one side of the boundary and red pixels on the other would be better than one that featured slightly different proportions of the same 30 shades of blue on both sides. But researchers have devised other, more subtle measures of difference, and Chang and Fisher's algorithm can use any of them.

Courtesy of Jason Chang

The other measure of a segmentation's quality is its simplicity. If a computer is trying to segment a street scene that features a car, for instance, you probably don't want it to draw boundaries around every separate gleam of different-colored light on the car's hood. Simplicity and difference in appearance tend to be competing measures: It's easy to maximize color difference, for instance, if you draw a boundary around every pixel that's different from its neighbors, but such a segmentation would be ridiculously complex.

Chang and Fisher's algorithm assigns each segmentation a total score based on both simplicity and difference in appearance. But different segmentations could have roughly equivalent total scores: A segmentation that's a little too complex but has superb color difference, for instance, could have the same score as a segmentation that's pretty good on both measures. Chang and Fisher's algorithm is designed to find candidates with very high total scores. That ensures that none of the candidates will be outrageously bad, but it also makes the computation that much more complicated.

Other researchers have adopted the same general approach, but to generate their candidates, they adapted algorithms originally designed to find the one segmentation with the highest total score. Chang and Fisher realized that, because they were considering so many suboptimal possibilities anyway, they could use a less precise algorithm that runs much more efficiently. Although it's not essential to their approach that they find the highest-scoring segmentation, it's still likely that the many candidates they produce will include a few that are very close to it.

"There are a lot of competing methodologies out there, so it's hard for me to say that this is going to revolutionize segmentation," says Anthony Yezzi, a professor of electrical and computer engineering at the Georgia Institute of Technology. But, Yezzi says, the way in which Chang and Fisher's algorithm represents images is "an interesting new vehicle that I think could get a lot of mileage, even beyond ." The same technique, Yezzi says, could be applied to problems of object tracking — whether it's the motion of an object in successive frames of video or changes in a tumor's size over time — and pattern matching, where the idea is to recognize the similarity of objects depicted from slightly different angles or under different lighting conditions.

This story is republished courtesy of MIT News (, a popular site that covers news about MIT research, innovation and teaching.

Explore further: Avatars make the Internet sign to deaf people

Related Stories

Researchers Study Gene Regulation In Insects

Apr 27, 2006

Susan Brown, an associate professor of biology at Kansas State University, is interested in how evolution generates so much diversity in insects shapes and forms.

Recommended for you

Avatars make the Internet sign to deaf people

Aug 29, 2014

It is challenging for deaf people to learn a sound-based language, since they are physically not able to hear those sounds. Hence, most of them struggle with written language as well as with text reading ...

Chameleon: Cloud computing for computer science

Aug 26, 2014

Cloud computing has changed the way we work, the way we communicate online, even the way we relax at night with a movie. But even as "the cloud" starts to cross over into popular parlance, the full potential ...

User comments : 10

Adjust slider to filter visible comments by rank

Display comments: newest first

4.7 / 5 (3) May 31, 2011
The fusion of different algorithms is always an interesting concept (I did my own dissertation (mainly) on a novel segmentation algorithm for hard to detect boundaries. In itself a fusion of one algortithm from image segmentation and one from biomechanic modeling)

One of the quality criteria for a new segmentation approach is always how hard it is to parametrize the algorithm for best results. The above looks like it could do away with some parameters from the individual algorithms at the cost of introducing one new parameter for each (the weight the algorithm gets for the final score).

However I doubt that they get an efficiency of 50000 times (by what standard?). Maybe in extrem cases that make maximum use of the novel approach - but certainly not as a general measure.
not rated yet May 31, 2011
O nostalgia. I remember working with Canny exclusively back when I was creating software for use in automated vehicles.
4.3 / 5 (3) May 31, 2011
Yeah, humans do it better, but we only look at what we understand is an object. Why not make the software impress upon the image what an object is most like, that way if there is an abnormality like a bird sitting on a branch, the software will separate the two. Humans do it like that, so why compare with out using the technique that people do? It should be a breeze.
4.3 / 5 (3) May 31, 2011
The 'unbreeze' part is to tell the computer what a bird is and what a branch is. That would mean giving it a sense of meaning to what is on the picture. I.e. you are trying to make your computer conscious of what it sees. That's not a breeze. That would be true artificial intelligence.
not rated yet May 31, 2011
I have to agree with antialias. Big O notation rates efficiency of algorithms. 50,000 seems a little extreme to me too. Just like simple sorting algorithms some of the most efficient ones are only efficient most of the time.
2 / 5 (1) May 31, 2011
I'm far to uninformed to make an intelligent statement on how to go about computer vision. However I can't shake the feeling that it shouldn't be this difficult. We just need a language for describing objects and a way to expect what objects to find depending on the context... Well, at any rate I think this problem deserves way more funding, way more.
1 / 5 (1) May 31, 2011
(A factor of 50,000 is only possible if the previous algorithm sucks.)

Unless the computer is familiar with the objects (or kinds of object) that may be present in the picture, object boundary recognition will be shaky at best. Actually, developing a reliable algorithm for finding the boundaries in arbitrary pictures, without resorting to databases of objects or a form gallery, should be provable to be impossible.

Hence, an object database is needed. And the more specific the assortment of pictures to analyze, the better and faster the software, mainly because of the smaller database.

Face recognition, spotting road signs, recognizing land usage type areas, the orientation of 3D objects on the product line, these all are examples that would be vastly simpler with a small database.

Optimization then becomes mostly charting the area between specific database objects and general notions about object form properties.
1 / 5 (2) May 31, 2011
I know humans can do this very efficiently compared to machines, but we practice it every day and are aided by 3D data to help us improve! Give a machine 3D data and this becomes simpler in theory (but maybe harder computationally?). I don't see pure segmentation, with no additional resources like an object database or training, as a worthwhile goal.

not rated yet May 31, 2011
Trying to read the paper now to see if it is useful for what I am trying to accomplish at work. Why persons attempt to complicate an already complicated paper, with the generous use of a Thesaurus, baffles me.
not rated yet Jun 25, 2011
Humans also have the advantage of doing feature extraction with depth information by both stereoscopic vision and slight eye and head movements. We also note differential motion of objects in the visual scene, which of course directly highlights the boundaries of those objects relative to background. There's a lot more information in that data stream than what's found in a static 2D image.