Dhruv Batra seeks to remove ambiguity in computer visual recognition systems

August 11th, 2014
When Dhruv Batra of the Virginia Tech College of Engineering travels in September to Zurich for the 2014 European Conference on Computer Vision, he will be a rising star in the growing field of vision and pattern recognition in computers.

The assistant professor with Virginia Tech's Bradley Department of Electrical and Computer Engineering previously co-led a tutorial in the research field at another industry conference in Ohio this past June. On his way to Zurich, Batra will give talks on the same subject—creating software programs that help computers "see" and understand photographs just as humans can – at software giant Microsoft's research lab at Cambridge University and then a separate event at Oxford University, both in the United Kingdom.

The travel comes on the heels of Batra's spring acceptance of three major federal research grants worth than more a combined $1 million: A National Science Foundation CAREER Award, a U.S. Army Research Office Young Investigators Award, and an U.S. Office of Naval Research grant.

The awards—valued at $500,000 for five years for the CAREER Award, $150,000 for three years from the Army, and $360,000 for three years from the Navy, all focus on machine learning and computer vision—creating algorithms and techniques that will teach computers to better understand photographic images, and quickly so.

There are software programs that can track facial recognition and are used by scores of law enforcement and security offices, but finding that face can be tricky for computers. A patch from a photographic image may look like a face to a computer vision module system, but it may simply be an incidental arrangement of tree branches and shadows, said Batra.

In other words, computers may be "hallucinating" faces floating in thin air. Batra wants to halt such errors with a new visual system that jointly reasons about multiple plausible hypotheses from different vision modules such as 3-D scene layout, object layout, and pose.

"When we see an image, we see things that a computer won't see. We see people, action, and the environment, the layout of space, and what is in front and behind. We interpret right away emotion, action, and place, the city or the rural country," said Batra. "Computers cannot do that, it's just a 2-D image, flat. Computers are not intuitive. We relate. Computers do not do so well with ambiguity."

Batra said machine perception systems today are often accurate only in a narrow regime – for instance recognizing humans in images only when they are standing upright, with limbs at the side. A person can be mistaken for a tree, or a person hunched over a bicycle as they travel down a street may be ignored over a nearby street lamp.

Batra said improved recognition systems for computers can used in a variety of means outside of law enforcement, including self-driving cars and emergency rescue personnel looking for people who may be lost in a rural area, or trapped within rubble of a disaster site inside a city.

Once computers better understand photographs or renderings they are looking at, they will be able to form multiple hypotheses of the images it's seeing, to interpret thoughts through visual cues. In other words, computers will be able to tell users if a person in the image is riding that bicycle along that street, or walking along the same street, or riding the bike or walking along a beach or in a rural forest.

Emotional interpretation also is possible, computers being able to determine by recognition a person crying or not.

Once the computers form their hypotheses of the image, it can alert the end user for feedback. Batra said the same technology developed for understanding images can be used for self-driving cars to recognize pedestrians entering the street versus nearby objects.

It also could be used in fields outside of Batrra's research area, such as voice applications to differentiating accents in mobile software applications such as Apple's Siri, which can understand different languages, but has trouble with variations pronunciations.

As part of his work for the awards, Batra is building a high-end compute cluster with 500 cores and 4 Terabytes of RAM, including servers equipped with Graphical Processing Units, the latter alone worth $40,000. Silicon Valley-based technology firm NVIDIA is supporting Batra's work through an equipment donation of 8 Tesla K40 GPUs.

"This is the period of building the computing infrastructure to support development and execution of the ideas we proposed," said Batra. "The cluster I am building will have more computer power than all machines in the department put together. Aiming to mimic the human brain's capabilities is no easy feat, but the future is exciting."

Provided by Virginia Tech

This Phys.org Science News Wire page contains a press release issued by an organization mentioned above and is provided to you “as is” with little or no review from Phys.Org staff.

More news stories

China auto show highlights industry's electric ambitions

The biggest global auto show of the year showcases China's ambitions to become a leader in electric cars and the industry's multibillion-dollar scramble to roll out models that appeal to price-conscious but demanding Chinese ...

After Facebook scrutiny, is Google next?

Facebook has taken the lion's share of scrutiny from Congress and the media about data-handling practices that allow savvy marketers and political agents to target specific audiences, but it's far from alone. YouTube, Google ...

Is dark matter made of primordial black holes?

Astronomers studying the motions of galaxies and the character of the cosmic microwave background radiation came to realize in the last century that most of the matter in the universe was not visible. About 84 percent of ...

Muons spin tales of undiscovered particles

Scientists at U.S. Department of Energy (DOE) national laboratories are collaborating to test a magnetic property of the muon. Their experiment could point to the existence of physics beyond our current understanding, including ...

How social networking sites may discriminate against women

Social media and the sharing economy have created new opportunities by leveraging online networks to build trust and remove marketplace barriers. But a growing body of research suggests that old gender and racial biases persist, ...