A program that captions your photos

November 16, 2015

Two researchers at Idiap, a research institute in Martigny that is affiliated with EPFL, developed an algorithm that – unlike systems recently unveiled by Google and Microsoft – can describe an image without having to pull up captions that it has already learned. To do this, the researchers used a program capable of making vector representations of images and captions based on an analysis of caption syntax.

"When we give it a photo, the program compares the image vector to the vector of possible words and selects the most likely noun, verb and prepositional phrases," said Rémi Lebret, a PhD student specializing in Deep Learning at Idiap. This is how the system finds the most likely description for a photo of a man skateboarding, for example, even if it has never seen a similar photo previously. The computer breaks down the picture into elements ("a skateboard, a man, a ramp") and verbs that could describe the action (" riding") before captioning the picture.

Getting it right

This approach is unlike existing ones. "Those other systems propose the first word based on the photo and then use that word to predict subsequent ones," said Pedro Oliveira Pinheiro, the other Idiap researcher on this project. Those algorithm based on sequence labeling with recurrent neural networks can cause problems, however, because if it poorly predicts the start of the phrase, the entire caption will necessary be wrong. Those systems also have a longer learning curve, and they tend to recycle previously used captions.

A program that captions your photos

The technology developed by Pinheiro and Lebret is simpler and works better. And it has piqued the interest of social media. The two researchers did a six-month research internship at Facebook, which is drawing on their work to develop its own model of automatic captions meant in part for the visually impaired. The two researchers believe that their algorithm could be improved in the future through the use of more complex language models and by linking it to larger databases.

Explore further: Microsoft Research project can interpret, caption photos

More information: Phrase-based Image Captioning. arxiv.org/abs/1502.03671

Related Stories

Microsoft Research project can interpret, caption photos

May 29, 2015

If you're surfing the web and you come across a photo of the Mariners' Felix Hernandez on the pitchers' mound at Safeco Field, chances are you'll quickly interpret that you are looking at a picture of a baseball player on ...

Revealing the mysteries of the Maya script

November 2, 2015

EPFL researchers have come up with an algorithm to analyze Mayan writing. This project could one day contribute to translating this complex and still partially unknown language.

Image descriptions from computers show gains

November 18, 2014

"Man in black shirt is playing guitar." "Man in blue wetsuit is surfing on wave." "Black and white dog jumps over bar." The picture captions were not written by humans but through software capable of accurately describing ...

YouTube extends automatic video captioning

March 4, 2010

YouTube, in a significant development for millions of deaf Internet users, extended automatic caption capability Thursday to all English-language videos on the video-sharing website.

Recommended for you

Volvo to supply Uber with self-driving cars (Update)

November 20, 2017

Swedish carmaker Volvo Cars said Monday it has signed an agreement to supply "tens of thousands" of self-driving cars to Uber, as the ride-sharing company battles a number of different controversies.

1 comment

Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Nov 16, 2015
Star Trek's tri-corders and scanners are becoming more and more believable.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.