Automated image analysis arises from handcraft and machine learning

May 24, 2012

The amount of visual information increases with tremendous speed. The archives of television networks, image bank databases and social media in the web are all bursting with billions of pictures – and more is produced by the second. In order to organise these heaps of data and to find wanted information from it, the analysis of the images must be automatised.

In his recent doctoral dissertation for the Aalto University Department of Information and Computer Science, Ville Viitaniemi has studied methods for image analysis that are based on detection of visual categories.

"The content of can be discerned and classified in countless ways. For a computer to know how to recognise and interpret images, it is useful to dissect them into prescribed categories," explains Viitaniemi.

The general task of automatic visual recognition and analysis has persisted throughout the existence of computers. Instead of presenting the computer an open question of what is in a picture, the computer is better off solving a bunch of small sub-tasks in which the images are dissected into categories. By choosing the right categories and combining them, the contents of images can be increasingly more accurately described.

"In my dissertation I look by experimentation for an efficient system for recognising visual categories."

Splice, recognise, fuse

The general mathematical model for recognising images is yet to be presented, and Viitaniemi says any such model would presently be computationally too heavy. The human brain on the other hand is not well enough known at the systemic level in order its mechanisms for visual recognition to be imitated.

"For now, the only method that works is ‘an engineer’s approach’: to try to figure out which parts of the system, organised in which way, produce adequate results."

The three basic steps of the top-performing system of visual category detection are feature extraction, detection of the features, and the fusion of the results of the detection. In his research Viitaniemi strived to find the most efficient ways to execute these phases.

"First, the images under inspection are extracted of certain features such as colours, textures and shapes. Then the detection system is taught by methods of machine learning to detect the features from images. When a group of features have been detected, a fusion of the results follows," sums up Viitaniemi the process of visual analysis.

A bag of visual words into a support vector machine

For the extraction of features Viitaniemi wound up to prefer a method called Bag of Visual Words. A single image is broken down to 100–300 meaningful locations, after which the neighbourhood of each location is given a specific visual description.

"For each neighbourhood, a histogram is collected of the directions of its surrounding gradients. This way a useful feature is put together. A feature characterising an entire image can then be created by looking into the statistics of the distribution of the local features."

The refined bags of visual words go into a support vector machine, which has been taught to recognise whether a feature belongs to certain category or not. Fed enough features, the machine will know whether it is a bird or an aeroplane on the sky of a picture.

"Different methods have to be experimented with, because a few successes in recognition tasks do not guarantee reliable performance. As long as we are not able to imitate the methods of image recognition of the brain, the best way is to experiment and experiment, through trial and error."

Explore further: Computerized emotion detector

add to favorites email to friend print save as pdf

Related Stories

Google image search gets a 'swirl'

Nov 17, 2009

Google Labs on Tuesday brought more focus to finding pictures online, adding a "Swirl" tool that automatically groups similar images into categories presented on results pages.

Picture this - automatic image categorisation

May 03, 2005

Creating, storing and transmitting visual images has become increasingly easy. Yet the same problem always arises – how to categorise or classify visual images automatically without using external metadata or image thumbnails? ...

Reconstruct Mars automatically in minutes

Sep 18, 2009

A computer system is under development that can automatically combine images of the Martian surface, captured by landers or rovers, in order to reproduce a three dimensional view of the red planet. The resulting ...

Recommended for you

Computerized emotion detector

Sep 16, 2014

Face recognition software measures various parameters in a mug shot, such as the distance between the person's eyes, the height from lip to top of their nose and various other metrics and then compares it with photos of people ...

Cutting the cloud computing carbon cost

Sep 12, 2014

Cloud computing involves displacing data storage and processing from the user's computer on to remote servers. It can provide users with more storage space and computing power that they can then access from anywhere in the ...

Teaching computers the nuances of human conversation

Sep 12, 2014

Computer scientists have successfully developed programs to recognize spoken language, as in automated phone systems that respond to voice prompts and voice-activated assistants like Apple's Siri.

Mapping the connections between diverse sets of data

Sep 12, 2014

What is a map? Most often, it's a visual tool used to demonstrate the relationship between multiple places in geographic space. They're useful because you can look at one and very quickly pick up on the general ...

User comments : 0