share this!
2
4
Share
Email

September 12, 2018

Snapshots of the future: Tool learns to predict user's gaze in headcam footage

The miniaturization of video cameras has led to an explosion in their use, including their incorporation into a range of portable devices such as headcams, used in scenarios ranging from sporting events to armed combat. To analyze tasks performed in view of such devices and provide real-time guidance to individuals using them, it would be helpful to characterize where the user is actually focusing within footage at each moment in time, but the tools available to predict this are still limited.

In a new study reported at the 15th European Conference on Computer Vision (ECCV 2018), researchers at The University of Tokyo have developed a computational tool that can learn from footage taken using a headcam, in this case of various tasks performed in the kitchen, and then accurately predict where the user's focus will next be targeted. This new tool could be useful to enable video-linked technologies to predict what actions the user is currently performing, and provide appropriate guidance regarding the next step.

Existing programs for predicting where the human gaze is likely to fall within a frame of video footage have generally been based on the concept of "visual saliency," which uses distinctions of features such as color, intensity, and contrast within the image to predict where a person is likely to be looking. However, in footage of human subjects performing complex tasks, this visual-saliency approach is inadequate, as the individual is likely to shift their attention from one object to another in a sequential, and often predictable, manner.

To take advantage of this predictability, in this study the team used a novel approach combining visual saliency with "gaze prediction," which involves an artificial intelligence learning such sequences of actions from existing footage and then applying the obtained knowledge to predict the direction of the user's gaze in new footage.

"Our new approach involves the construction of first a 'saliency map' for each frame of footage, then an 'attention map' based on where the user was previously looking and on motion of the user's head, and finally the combination of both of these into a 'gaze map,'" Yoichi Sato says. "Our results showed that this new tool outperformed earlier alternatives in terms of predicting where the gaze of the headcam user was actually directed."

Although the team's results were obtained for footage of chores in a kitchen, such as boiling water on a stove, they could be extended to situations such as tasks performed in offices or factories. In fact, according to lead author Yifei Huang, "Tools for evaluating so-called egocentric videos of this kind could even be applied in a medical context, such as assessing where a surgeon is focusing and offering guidance on the most appropriate steps to be taken next in an operation."

The article "Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition" is published in the proceedings of European Conference on Computer Vision (ECCV 2018) and as an arXiv paper at arxiv.org/abs/1803.09125 .

More information: Predicting Gaze in Egocentric Video by Learning Task-dependent Attention Transition, arxiv.org/abs/1803.09125

Provided by University of Tokyo

Citation: Snapshots of the future: Tool learns to predict user's gaze in headcam footage (2018, September 12) retrieved 17 July 2024 from https://phys.org/news/2018-09-snapshots-future-tool-user-headcam.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Calling for better police body cam design

6 shares

Feedback to editors

Research team develops method to design safer opioids

3 minutes ago

New 3D anatomical atlas of the African clawed frog increases understanding of development and metamorphosis processes

12 hours ago

Intensive farming could raise risk of new pandemics, researchers warn

13 hours ago

Scientists develop new AI method to create material 'fingerprints'

15 hours ago

Study shows frogs can quickly increase their tolerance to pesticides

16 hours ago

Nature-based solutions to disaster risk from climate change are cost-effective, study confirms

16 hours ago

Astronomers discover what may be 21 neutron stars orbiting sun-like stars

16 hours ago

Scientists use machine learning to predict diversity of tree species in forests

18 hours ago

Physicists pool skills to better describe the unstable sigma meson particle

19 hours ago

Telescope tag-team discovers 10 strange and exotic pulsars

19 hours ago

Load comments (0)

Snapshots of the future: Tool learns to predict user's gaze in headcam footage

Research team develops method to design safer opioids

New 3D anatomical atlas of the African clawed frog increases understanding of development and metamorphosis processes

Intensive farming could raise risk of new pandemics, researchers warn

Scientists develop new AI method to create material 'fingerprints'

Study shows frogs can quickly increase their tolerance to pesticides

Nature-based solutions to disaster risk from climate change are cost-effective, study confirms

Astronomers discover what may be 21 neutron stars orbiting sun-like stars

Scientists use machine learning to predict diversity of tree species in forests

Physicists pool skills to better describe the unstable sigma meson particle

Telescope tag-team discovers 10 strange and exotic pulsars

Relevant PhysicsForums posts

Particle.js: Exploring Particle Physics with Web Technologies

Help solving a geometrical matching issue with Graph Neural Networks

5 GHz PC WiFi connection Cybersecurity question

Help with some optimization code for Block Matrices

Is an API Always Necessary for Server-Client Communication?

I did this POST message configuration damage to my wifi internet, help

Calling for better police body cam design

Egocentric videos: Finding clues to user identity

AI could predict your next move from watching your eye gaze

Overturning widely held ideas: Visual attention drawn to meaning, not what stands out

People find changes in user interfaces annoying

Deep-learning vision system anticipates human interactions using videos of TV shows

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Medical Xpress

Tech Xplore

Science X

Snapshots of the future: Tool learns to predict user's gaze in headcam footage

Research team develops method to design safer opioids

New 3D anatomical atlas of the African clawed frog increases understanding of development and metamorphosis processes

Intensive farming could raise risk of new pandemics, researchers warn

Scientists develop new AI method to create material 'fingerprints'

Study shows frogs can quickly increase their tolerance to pesticides

Nature-based solutions to disaster risk from climate change are cost-effective, study confirms

Astronomers discover what may be 21 neutron stars orbiting sun-like stars

Scientists use machine learning to predict diversity of tree species in forests

Physicists pool skills to better describe the unstable sigma meson particle

Telescope tag-team discovers 10 strange and exotic pulsars

Relevant PhysicsForums posts

Related Stories

Calling for better police body cam design

Egocentric videos: Finding clues to user identity

AI could predict your next move from watching your eye gaze

Overturning widely held ideas: Visual attention drawn to meaning, not what stands out

People find changes in user interfaces annoying

Deep-learning vision system anticipates human interactions using videos of TV shows

Recommended for you

Hyphens in paper titles harm citation counts and journal impact factors

A big step toward the practical application of 3-D holography with high-performance computers

Combining multiple CCTV images could help catch suspects

Applying deep learning to motion capture with DeepLabCut

Training artificial intelligence with artificial X-rays

New model for large-scale 3-D facial recognition

Newsletter sign up

Donate and enjoy an ad-free experience