New surveillance camera system provides text feed

Jun 03, 2010 by Lin Edwards report
Two major tasks of the I2T framework: (a) image parsing and (b) text description. Image credit: Benjamin Yao.

( -- Scientists at the University of California in Los Angeles (UCLA) have developed a prototype surveillance camera and computer system to analyze the camera images and deliver a text feed describing what the camera is seeing. The new system aims to make searching vast amounts of video much more efficient.

The system was developed by Professor Song-Chun Zhu and colleagues Haifeng Gong and Benjamin Yao, in collaboration with the company ObjectVideo, of Reston Virginia in the US. Dubbed I2T for Image to , the system runs frames through a series of vision algorithms to produce a textual summary of the contents of the frames. The text can then be indexed and stored in a database that can be searched using a simple text search. The system has been successfully demonstrated on surveillance footage.

The I2T system draws on a database of over two million images containing identified objects in over 500 classifications. The database was collected by Zhu starting in 2005 in Ezhou, China, with support from the Chinese government, but is still not large enough to allow the system to assess a dynamic situation correctly.

The first process in I2T is an image parser that analyzes an image and removes the background and identifies the shapes in the picture. The second part of the process determines the meanings of the shapes by referring to the image database. Zhu said that once the image is parsed transcribing the results into natural language “is not too hard.”

Diagram of the I2T framework. Image credit: Benjamin Yao.

The system also uses algorithms describing the movement of objects from one frame to another and can generate text describing motions, such as “boat 3 approaches maritime marker at 40:01.” It can also sometimes match objects that have left and then re-entered a scene, and can describe events such as a car running a stop sign.

Professor Zhu said at the moment almost all searches for images within video is done using surrounding text, but the new system directly generates text from the images. He also added that the existence of YouTube and other video collections, and the expanding use of surveillance cameras everywhere show that being unable to efficiently search video is a major problem.

The I2T system is not yet advanced enough to recognize a large number of images instantly and is not ready yet for commercialization, but the researchers say it is close and needs only “minor tweaks.” The scientists also say they may be able to feed the text into a vocal synthesizer to increase its usefulness.

You can now listen to all podcasts at

Explore further: System to automatically find a common type of programming bug significantly outperforms its predecessors

More information: -- Technical description of I2T: Image Parsing to Text Generation -
-- Research paper: Benjamin Yao, et al. I2T: Image Parsing to Text Description, Proceedings of IEEE [pdf].

add to favorites email to friend print save as pdf

Related Stories

Picture this - automatic image categorisation

May 03, 2005

Creating, storing and transmitting visual images has become increasingly easy. Yet the same problem always arises – how to categorise or classify visual images automatically without using external metadata or image thumbnails? ...

P2P comes to the aid of audiovisual search (w/ Video)

Nov 18, 2009

( -- Current methods of searching audiovisual content can be a hit-and-miss affair. Manually tagging online media content is time consuming, and costly. But new 'query by example' methods, built on peer-to-peer ...

Recommended for you

Yahoo boosts share buyback plan by $2 billion

14 minutes ago

Yahoo on Thursday told US regulators that it will spend another $2 billion buying back shares as the pioneering US Internet search firm continues an effort to re-invent itself.

Blue Freedom uses power of flowing water to charge

4 hours ago

Good friends may decide to tell you something that is not true but nonetheless sustaining: Nothing is impossible. That was the case of Blue Freedom co-founder who asked his friend if it would be possible ...

Special ops troops using flawed intel software

4 hours ago

Special operations troops heading to war zones are asking for commercial intelligence analysis software they say will help their missions. But their requests are languishing, and they are being ordered to use a flawed, in-house ...

Applications of optical fibre for sensors

9 hours ago

Mikel Bravo-Acha's PhD thesis has focused on the applications of optical fibre as a sensor. In the course of his research, conducted at the NUP/UPNA-Public University of Navarre, he monitored a sensor fitted to optical fibre ...

User comments : 5

Adjust slider to filter visible comments by rank

Display comments: newest first

not rated yet Jun 03, 2010
yield and stop sign violation cameras?
not rated yet Jun 03, 2010
Would be nice to have murder and rape prevention cameras.

More people are murdered in America each year by other Americans than the total number of all American "war on terror" casualties combined.

Our troops are in the wrong place. They should be here hunting down murderers, gang members, and drug dealers.
not rated yet Jun 03, 2010
Actually a subject, finally, on this site I know a lot about. This system will rely entirely on the database; other than the database, there is nothing new about this system. Consistency will be key in the database, and really, all the objects need to be defined by the same person in the same fashion. For example, you can't call one angle of the objects above backpack, and another angle called napsack. This will be very difficult to maintain and is the primary reason it hasn't been implemented effectively yet. You run into a similar problem with voice recognition and dictation software. The image processing techniques are nothing new, cookie cutter. Good luck guys, you'll need it.
not rated yet Jun 03, 2010
I can't help but wonder, is that first picture a human-generated or a computer-generated example of the algorithm? I slightly suspect that it's computer-generated, as the water is described as being green. I mean sure it's mostly green, but that's just because it's reflecting the trees. Wouldn't most humans ignore that and describe the water as blue?

Or, perhaps, that's what the computer would say if it worked yet but it's really just a human imitating the expected computer behavior?
not rated yet Jul 17, 2010
Infrared cameras are devices that forms an image using infrared radiation would be great.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.