June 23, 2016

New method detects human activity in videos earlier and more accurately

by Disney Research

Researchers at Disney Research and Boston University have found that a machine learning program can be trained to detect human activity in a video sooner and more accurately than other methods by rewarding the program for gaining confidence in its prediction the longer it observes the activity.

It seems intuitive that the program would grow more confident that it is detecting, say, a person changing a tire, the longer it observes the person loosening lugnuts, jacking up the car and subsequently removing the wheel, but that's not the way most computer models have been trained to detect activity, said Leonid Sigal, senior research scientist at Disney Research.

"Most training techniques are happy if the computer model gets 60 percent of the video frames correct, even if the errors occur late in the process, when the activity should actually be more apparent," Sigal said. "That doesn't make much sense. If the model predicts a person is making coffee even after it sees the person put pasta into boiling water, it should be penalized more than if it made the same incorrect prediction when the person was still just boiling water."

Shugao Ma, a Ph.D. student in computer science at Boston University and a former intern at Disney Research, found that this change in training methods resulted in more accurate predictions of activities. The computer also was often able to accurately predict the activity early in the process, even after seeing only 20 to 30 percent of the video. Likewise, the program can detect that an activity is finished if its confidence that it is observing that activity begins to drop.

The research team, which included Stan Sclaroff, Boston University professor of computer science, will present their findings June 26 at the Computer Vision and Pattern Recognition conference, CVPR 2016, in Las Vegas.

"Automatic detection of human activities in videos has many potential applications, such as video retrieval and human-computer interaction," said Jessica Hodgins, vice president at Disney Research. "Human-robot interaction applications in particular could benefit from early detection of activities; a caregiving robot, for instance, would want to recognize that an elderly patient was in danger of falling so it could act to steady the patient."

Activity detection remains a challenging technical task because there are so many variables in actors, their appearance, surroundings and viewpoints. Understanding the progression of the activity - "making pasta," for instance, might include setting a pot on a stove, boiling water, boiling noodles, draining, etc. - is thus critical for a computer program to recognize the activity, Ma said.

The researchers used Long Short Term Memory (LSTM), a type of recurrent neural network that is well-suited to learn how to classify, process and predict time series, for this task. They introduced the concept of ranking losses in the learning objectives, which are computed for each time point in the prediction, so that the detection score gets higher as the action progresses. They found that LSTM models that used the ranking losses had better performance than LSTM models that were trained only based on whether the activity was classified correctly.

More information: "Learning Activity Progression in LSTMs for Activity Detection and Early Detection-Paper"
[PDF, 3.42 MB] s3-us-west-1.amazonaws.com/dis … -Detection-Paper.pdf

Provided by Disney Research

Citation: New method detects human activity in videos earlier and more accurately (2016, June 23) retrieved 19 April 2024 from https://phys.org/news/2016-06-method-human-videos-earlier-accurately.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Team develops vision system that improves object recognition

9 shares

Feedback to editors

European XFEL elicits secrets from an important nanogel

6 hours ago

Chemists introduce new copper-catalyzed C-H activation strategy

6 hours ago

Scientists discover new way to extract cosmological information from galaxy surveys

7 hours ago

Compact quantum light processing: New findings lead to advances in optical quantum computing

7 hours ago

Some plant-based steaks and cold cuts are lacking in protein, researchers find

7 hours ago

Merging nuclear physics experiments and astronomical observations to advance equation-of-state research

7 hours ago

Which countries are more at risk in the global supply chain?

7 hours ago

The Italian central Apennines are a source of CO₂, study finds

7 hours ago

Dramatic burning of royal remains reveals Maya regime change

8 hours ago

Accelerating the discovery of new materials via the ion-exchange method

8 hours ago

Load comments (0)

New method detects human activity in videos earlier and more accurately

European XFEL elicits secrets from an important nanogel

Chemists introduce new copper-catalyzed C-H activation strategy

Scientists discover new way to extract cosmological information from galaxy surveys

Compact quantum light processing: New findings lead to advances in optical quantum computing

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Merging nuclear physics experiments and astronomical observations to advance equation-of-state research

Which countries are more at risk in the global supply chain?

The Italian central Apennines are a source of CO₂, study finds

Dramatic burning of royal remains reveals Maya regime change

Accelerating the discovery of new materials via the ion-exchange method

Relevant PhysicsForums posts

Number of Multiplications in the FFT Algorithm

Error logging in: onLoginSuccess is not a function

My Website For Creating Interactive Visuals Linked To Equations

Latest Notable AI accomplishments

Building a homemade Long Short Term Memory with FSMs

Most efficient way to randomly choose a word from a file with a list of words

Team develops vision system that improves object recognition

Researchers improve automated recognition of human body movements in videos

New computer vision algorithm predicts orientation of objects

Deep-learning vision system anticipates human interactions using videos of TV shows

Computer watches human camera operators to improve automated sports broadcasts

When 'smart' apps become smart for real

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

New method detects human activity in videos earlier and more accurately

European XFEL elicits secrets from an important nanogel

Chemists introduce new copper-catalyzed C-H activation strategy

Scientists discover new way to extract cosmological information from galaxy surveys

Compact quantum light processing: New findings lead to advances in optical quantum computing

Some plant-based steaks and cold cuts are lacking in protein, researchers find

Merging nuclear physics experiments and astronomical observations to advance equation-of-state research

Which countries are more at risk in the global supply chain?

The Italian central Apennines are a source of CO₂, study finds

Dramatic burning of royal remains reveals Maya regime change

Accelerating the discovery of new materials via the ion-exchange method

Relevant PhysicsForums posts

Related Stories

Team develops vision system that improves object recognition

Researchers improve automated recognition of human body movements in videos

New computer vision algorithm predicts orientation of objects

Deep-learning vision system anticipates human interactions using videos of TV shows

Computer watches human camera operators to improve automated sports broadcasts

When 'smart' apps become smart for real

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience