September 13, 2013

Researchers use machine learning to boil down the stories that wearable cameras are telling

Computers will someday soon automatically provide short video digests of a day in your life, your family vacation or an eight-hour police patrol, say computer scientists at The University of Texas at Austin.

The researchers are working to develop tools to help make sense of the vast quantities of video that are going to be produced by wearable camera technology such as Google Glass and Looxcie.

"The amount of what we call 'egocentric' video, which is video that is shot from the perspective of a person who is moving around, is about to explode," said Kristen Grauman, associate professor of computer science in the College of Natural Sciences. "We're going to need better methods for summarizing and sifting through this data."

Grauman and her colleagues developed a superior technique that uses machine learning to automatically analyze recorded videos and assemble a better short "story" of the footage than what is available from existing methods.

Better video summarization should prove important in helping military commanders managing data coming in from soldiers' cameras, investigators trying to sift through cellphone video data in the wake of disasters like the Boston Marathon bombing, and senior citizens using video summaries of their days to compensate for memory loss, said Grauman.

"There's research showing that if people suffering from memory loss wear a camera that takes a snapshot once a minute, and then they review those images at the end of the day, it can help their recall," said Grauman. "That's pretty inspiring. What if instead of images that were selected just because they were a minute apart, they had a video or photographic summary that was selected because it told a good story? Maybe that would help even more. That's the kind of thing we're hoping to achieve."

This 12-frame summary is distilled from an uninterrupted three hour video taken by a person going about her day while wearing a $200 Looxcie camera. Credit: Courtesy of Kristen Grauman

Grauman, her postdoc Lu Zheng and doctoral student Yong Jae Lee presented their method, which they call "story-driven" video summarization, at the IEEE Conference on Computer Vision and Pattern Recognition this summer.

Their findings are based on video amassed by volunteers wearing commercially available Looxcie cameras, which cost about $200, record five hours of video at a stretch, connect to smartphones and fit in an ear as a large Bluetooth device does.

"The task is to take a very long video and automatically condense it into very short video clips, or a series of stills, that convey the essence of the story," said Grauman. "To do that, though, we first have to ask: What makes a good visual story? Our answer is that beyond displaying important persons, objects and scenes, it must also convey how one thing leads to the next."

To tackle the challenge, Grauman and her colleagues took a two-step approach. The first step involved using machine learning techniques to teach their system to "score" the significance of objects in view based on egocentric factors such as how often the objects appeared in the center of the frame, which is a good proxy for where the camera wearer's gaze is, or whether they are touched by the wearer's hands.

"If you give us a region in the video, then we will give back an importance level, based on all those properties that we have extracted and learned how to combine," said Grauman. "So at that point you can select frames that will maximize the importance."

The next step was to use those important frames, through the video, and look for early ones that influence later ones. To do that they adapted a method developed by researchers at Carnegie Mellon University that could predict how one news article leads to another, assembling a series of articles to transition from a starting point to a known end point.

For the text work, researchers used word frequencies and correlations across articles to quantify influence. For the video work, Grauman and Lu used their significant objects and frames to do the same. Then they were able to identify a chain of video clips that efficiently filled in the story from beginning to end.

"We ran human 'taste tests' comparing our method to previous methods," said Grauman, "and between 75 and 90 percent of people evaluating the summaries, depending on the datasets and method being compared, found that our system is superior."

Grauman said that as video summarization techniques continue to improve, they will become invaluable aids not just to people with very specialized needs, like police investigators and those suffering from memory loss, but to everyday Web surfers as well.

"My hope is that we'll be able to get video browsing much closer to what we experience with image browsing," she said. "Consider browsing 50 images on a webpage. It's manageable, since you can scroll down and see them all in one pass. Now imagine trying to browse 50 videos online. It's simply not efficient. We need summarization algorithms in order to improve video search considerably."

Provided by University of Texas at Austin

Citation: Researchers use machine learning to boil down the stories that wearable cameras are telling (2013, September 13) retrieved 20 June 2024 from https://phys.org/news/2013-09-machine-stories-wearable-cameras.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Software detects and extracts text from within video frames, makes it searchable

0 shares

Feedback to editors

French-Chinese probe to hunt universe's biggest explosions

11 minutes ago

The ornate horns of ancient marvel Lokiceratops point to evolutionary insights

15 minutes ago

High-temperature superconductivity: Exploring quadratic electron-phonon coupling

1 hour ago

Scientists devise algorithm to engineer improved enzymes

3 hours ago

Improving crops with laser beams and 3D printing

12 hours ago

Researchers find wave activity on Titan may be strong enough to erode the coastlines of lakes and seas

18 hours ago

Caffeine may be a useful marker of wastewater leaks in storm drain systems

18 hours ago

Boosting the synthesis of stable sugar compounds with a novel nature-inspired approach

19 hours ago

Earth's atmosphere is our best defense against nearby supernovae, study suggests

19 hours ago

Shepherd's graffiti sheds new light on Acropolis lost temple mystery

19 hours ago

Load comments (0)

Researchers use machine learning to boil down the stories that wearable cameras are telling

French-Chinese probe to hunt universe's biggest explosions

The ornate horns of ancient marvel Lokiceratops point to evolutionary insights

High-temperature superconductivity: Exploring quadratic electron-phonon coupling

Scientists devise algorithm to engineer improved enzymes

Improving crops with laser beams and 3D printing

Researchers find wave activity on Titan may be strong enough to erode the coastlines of lakes and seas

Caffeine may be a useful marker of wastewater leaks in storm drain systems

Boosting the synthesis of stable sugar compounds with a novel nature-inspired approach

Earth's atmosphere is our best defense against nearby supernovae, study suggests

Shepherd's graffiti sheds new light on Acropolis lost temple mystery

Relevant PhysicsForums posts

Math Major Trying to Learn CS

Parallelizing N-Queens

How to test locally hosted websites on mobile?

Question about learning programming

Why do emails from my contact form bounce?

Anyone with experience linking FFTW for C

Software detects and extracts text from within video frames, makes it searchable

Japan's NHK unveils multi-camera system for 'bullet-time' slow motion replays (w/ video)

New surveillance camera can search 36 million faces for matches in one second

OpenGlass apps show support for visually impaired (w/ Video)

Facial-recognition technology proves its mettle

Researchers develop fast, economical method for high-definition video compositing

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Researchers use machine learning to boil down the stories that wearable cameras are telling

French-Chinese probe to hunt universe's biggest explosions

The ornate horns of ancient marvel Lokiceratops point to evolutionary insights

High-temperature superconductivity: Exploring quadratic electron-phonon coupling

Scientists devise algorithm to engineer improved enzymes

Improving crops with laser beams and 3D printing

Researchers find wave activity on Titan may be strong enough to erode the coastlines of lakes and seas

Caffeine may be a useful marker of wastewater leaks in storm drain systems

Boosting the synthesis of stable sugar compounds with a novel nature-inspired approach

Earth's atmosphere is our best defense against nearby supernovae, study suggests

Shepherd's graffiti sheds new light on Acropolis lost temple mystery

Relevant PhysicsForums posts

Related Stories

Software detects and extracts text from within video frames, makes it searchable

Japan's NHK unveils multi-camera system for 'bullet-time' slow motion replays (w/ video)

New surveillance camera can search 36 million faces for matches in one second

OpenGlass apps show support for visually impaired (w/ Video)

Facial-recognition technology proves its mettle

Researchers develop fast, economical method for high-definition video compositing

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience