(Phys.org)—How do we know if we're looking at the three-dimensional world or at a kind of trompe l'oeil image painted on the inside of a huge glass sphere? More to the point, how would a robot know?
Blessed with brains and the power of biological computation, humans can compute the most likely explanation for what we see. Our neural networks turn the fizz of photons, hitting a curved screen, into perception.
That's awfully difficult to translate into code, says David Cox, who holds a joint appointment as Assistant Professor of Molecular and Cellular Biology and of Computer Science at Harvard.
"Vision is the process of figuring out what's out there in a 3D world, from a set of 2D images cast onto our retinas," Cox explains. "It's actually really hard, and the only reason it seems easy is that we're seeing the world through the solution to the problem."
After all, evolution over hundreds of millions of years has given us a system that works rather well. When we look out at the world, Cox marvels, "we sort of just transparently see."
"That's one of the challenges for computer vision," he says: "Our intuitions about what's easy and what's difficult are usually wrong, because all of our intuitions are coming by way of this biological system. When you sit down and try to write a computer program that does the same thing, you discover just how hard it is."
Working at the Harvard School of Engineering and Applied Sciences, the Department of Molecular and Cellular Biology, and the Center for Brain Science, Cox aims to create artificial systems that can both see and understand what they're looking at. It's a task that requires an in-depth knowledge of neuroscience, but also a fair amount of blue-sky thinking about what might be possible in the realm of artificial object recognition.
Cox thinks of his research as reverse engineering—or, more whimsically, committing corporate espionage on nature.
"There's only one set of systems in the known universe that can do what we're looking for, and they happen to be biological systems," he says, "so the motivation for the reverse engineering side of the work is to get the competing product, as it were, open up the box, and figure out how it works so that we can turn around and build artificial systems that work the same way."
Of course, reproducing a brain is easier said than done. Cox's research group employs massively parallel, high-performance computers to try to reproduce the level of computation that happens within the brain—for example, to study facial recognition techniques, which he's been pursuing with members of Todd Zickler's group at SEAS.
There's a huge difference between recognizing a face in a mugshot and recognizing a face that's embedded in a complicated and cluttered real-world scene.
"If you move an object around your visual field, it's appearing on different parts of your retina, it may be lit differently, and you're seeing it from different angles," Cox explains. "There's effectively an infinite number of ways the same object can appear. At the same time, there are infinitely many valid interpretations for any one image that falls onto your retina."
Many of the potential payoffs for designing such an intelligent system are so bizarre that they seem only to belong in the realm of science fiction. Your laptop could notice whether you look tired, happy, or sad, and interact with you appropriately. Your self-driving car could spot you on the sidewalk and offer you a ride.
"If we had computer vision systems that worked as well as our own visual systems do, there's a much richer set of interactions we could have with machines," Cox says.
It's not just about the applications for Cox, though; the basic science of the retina and neurons is wondrously complex and mysterious, and it's on the bridge between biology and computer science that he finds himself at home.
"I'm a 'have your cake and eat it too' kind of person," Cox says. "I think there's great potential to advance our knowledge of how the brain works, but one of the things that's most exciting for me is this idea that if we really understand that, we should be able to build machines that work the same way.
"With that, there's a huge range of really world-changing applications that we could bring to bear."
Explore further: Skin device uses motion to power electronics