Multiview 3-D photography made simple

Jun 19, 2013 by Larry Hardesty

Computational photography is the use of clever light-gathering tricks and sophisticated algorithms to extract more information from the visual environment than traditional cameras can.

The first of computational photography is the so-called light-field camera, which can measure not only the intensity of incoming light but also its angle. That information can be used to produce multiperspective 3-D images, or to refocus a shot even after it's been captured.

Existing light-field cameras, however, trade a good deal of resolution for that extra angle information: A camera with a 20-megapixel sensor, for instance, will yield a refocused image of only one megapixel.

Researchers in the Camera Culture Group at MIT's Media Lab aim to change that with a system they're calling Focii. At this summer's Siggraph—the major computer graphics conference—they'll present a paper demonstrating that Focii can produce a full, 20-megapixel multiperspective 3-D image from a single exposure of a 20-megapixel sensor.

Moreover, while a commercial light-field camera is a $400 piece of hardware, Focii relies only on a small rectangle of plastic film, printed with a unique checkerboard pattern, that can be inserted beneath the lens of an ordinary digital single-lens-reflex camera. Software does the rest.

Gordon Wetzstein, a postdoc at the Media Lab and one of the paper's co-authors, says that the new work complements the Camera Culture Group's ongoing research on glasses-free 3-D displays. "Generating live-action content for these types of displays is very difficult," Wetzstein says. "The future vision would be to have a completely integrated pipeline from live-action shooting to editing to display. We're developing core technologies for that pipeline."

Because a light-field camera captures information about not only the intensity of light rays but also their angle of arrival, the images it produces can be refocused later. Credit: KSHITIJ MARWAH

In 2007, Ramesh Raskar, the NEC Career Development Associate Professor of Media Arts and Sciences and head of the Camera Culture Group, and colleagues at Research showed that a plastic film with a pattern printed on it—a "mask"—and some algorithmic wizardry could produce a light-field camera whose resolution matched that of cameras that used arrays of tiny lenses, the approach adopted in today's commercial devices. "It has taken almost six years now to show that we can actually do significantly better in resolution, not just equal," Raskar says.

Split atoms

Focii represents a light field as a grid of square patches; each patch, in turn, consists of a five-by-five grid of blocks. Each block represents a different perspective on a 121-pixel patch of the light field, so Focii captures 25 perspectives in all. (A conventional 3-D system, such as those used to produce 3-D movies, captures only two perspectives; with multiperspective systems, a change in viewing angle reveals new features of an object, as it does in real life.)

The key to the system is a novel way to represent the grid of patches corresponding to any given light field. In particular, Focii describes each patch as the weighted sum of a number of reference patches—or "atoms"—stored in a dictionary of roughly 5,000 patches. So instead of describing the upper left corner of a light field by specifying the individual values of all 121 pixels in each of 25 blocks, Focii simply describes it as some weighted combination of, say, atoms 796, 23 and 4,231.

According to Kshitij Marwah, a graduate student in the Camera Culture Group and lead author on the new paper, the best way to understand the dictionary of atoms is through the analogy of the Fourier transform, a widely used technique for decomposing a signal into its constituent frequencies.

In fact, visual images can be—and frequently are—interpreted as signals and represented as sums of frequencies. In such cases, the different frequencies can also be represented as atoms in a dictionary. Each atom simply consists of alternating bars of light and dark, with the distance between the bars representing frequency.

The atoms in the Camera Culture Group researchers' dictionary are similar but much more complex. Each atom is itself a five-by-five grid of 121-pixel blocks. Each block consists of arbitrary-seeming combinations of color: The blocks in one atom might all be green in the upper left corner and red in the lower right, with lines at slightly different angles separating the regions of color; the blocks of another atom might all feature slightly different-size blobs of yellow invading a region of blue.

Behind the mask

In building their dictionary of atoms, the Camera Culture Group researchers—Marwah, Wetzstein, Raskar, and Yosuke Bando, a visiting scientist at the Media Lab—had two tools at their disposal that Joseph Fourier, working in the late 18th century, lacked: computers, and lots of real-world examples of light fields.

To build their dictionary, they turned a computer loose to try out lots of different combinations of colored blobs and determine which, empirically, enabled the most efficient representation of actual light fields.

Once the dictionary was built, however, they still had to calculate the optimal design of the mask they use to record information—the patterned that they slip beneath the camera lens. Bando explains the principle behind mask design using, again, the analogy of Fourier transform.

"If a mask has a particular frequency in the vertical direction"—say, a regular pattern of light and dark bars—"you only capture that frequency component of the image," Bando says. "So you have no way of recovering the other frequencies. If you use frequency domain reconstruction, the mask should contain every frequency in a systematic manner."

"Think of atoms as the new frequency," Marwah says. "In our case, we need a mask pattern that can effectively cover as many atoms as possible."

"It's cool work," says Kari Pulli, senior director of research at graphics-chip company Nvidia. "Especially the idea that you can take video at fairly high resolution—that's kind of exciting."

Pulli points out, however, that assembling an image from the information captured by the mask is currently computationally intensive. Moreover, he says, the examples of light fields used to assemble the dictionary may have omitted some types of features common in the real world. "There's still work to be done for this to be actually something that consumers would embrace," Pulli says.

Explore further: Revealing faded frescos

Related Stories

Bell Labs researchers build camera with no lens

Jun 04, 2013

( —A small team of researchers at Bell Labs in New Jersey has built a camera that has no lens. Instead, as they explain in their paper they've uploaded to the preprint server arXiv, the camera ...

3-D, after-the-fact focus image sensors invented

Apr 03, 2012

( -- At the heart of digital photography is a chip called an image sensor that captures a map of the intensity of the light as it comes through the lens and converts it to an electronic signal.

Glasses-free 3-D TV looks nearer (w/ Video)

Jul 12, 2012

As striking as it is, the illusion of depth now routinely offered by 3-D movies is a paltry facsimile of a true three-dimensional visual experience. In the real world, as you move around an object, your perspective ...

An ultrasensitive molybdenum-based image sensor

Jun 12, 2013

A new material has the potential to improve the sensitivity of photographic image sensors by a factor of five. In 2011, an EPFL team led by Andras Kis discovered the amazing semi-conducting properties of ...

Recommended for you

Revealing faded frescos

14 hours ago

Many details of the wall and ceiling frescos in the cloister of Brandenburg Cathedral have faded: Plaster on which horses once "galloped" appears more or less bare. A hyperspectral camera sees images that remain hidden to ...

Device could detect driver drowsiness, make roads safer

16 hours ago

Drowsy driving injures and kills thousands of people in the United States each year. A device being developed by Vigo Technologies Inc., in collaboration with Wichita State University professor Jibo He and ...

New capability takes sensor fabrication to a new level

Jun 30, 2015

Operators must continually monitor conditions in power plants to assure they are operating safely and efficiently. Researchers on the Sensors and Controls Team at DOE's National Energy Technology Laboratory ...

Smart phones spot tired drivers

Jun 30, 2015

An electronic accelerometer of the kind found in most smart phones that let the device determine its orientation and respond to movement, could also be used to save lives on our roads, according to research ...

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.