Automated image analysis arises from handcraft and machine learning

May 24, 2012, Aalto University

The amount of visual information increases with tremendous speed. The archives of television networks, image bank databases and social media in the web are all bursting with billions of pictures – and more is produced by the second. In order to organise these heaps of data and to find wanted information from it, the analysis of the images must be automatised.

In his recent doctoral dissertation for the Aalto University Department of Information and Computer Science, Ville Viitaniemi has studied methods for image analysis that are based on detection of visual categories.

"The content of can be discerned and classified in countless ways. For a computer to know how to recognise and interpret images, it is useful to dissect them into prescribed categories," explains Viitaniemi.

The general task of automatic visual recognition and analysis has persisted throughout the existence of computers. Instead of presenting the computer an open question of what is in a picture, the computer is better off solving a bunch of small sub-tasks in which the images are dissected into categories. By choosing the right categories and combining them, the contents of images can be increasingly more accurately described.

"In my dissertation I look by experimentation for an efficient system for recognising visual categories."

Splice, recognise, fuse

The general mathematical model for recognising images is yet to be presented, and Viitaniemi says any such model would presently be computationally too heavy. The human brain on the other hand is not well enough known at the systemic level in order its mechanisms for visual recognition to be imitated.

"For now, the only method that works is ‘an engineer’s approach’: to try to figure out which parts of the system, organised in which way, produce adequate results."

The three basic steps of the top-performing system of visual category detection are feature extraction, detection of the features, and the fusion of the results of the detection. In his research Viitaniemi strived to find the most efficient ways to execute these phases.

"First, the images under inspection are extracted of certain features such as colours, textures and shapes. Then the detection system is taught by methods of machine learning to detect the features from images. When a group of features have been detected, a fusion of the results follows," sums up Viitaniemi the process of visual analysis.

A bag of visual words into a support vector machine

For the extraction of features Viitaniemi wound up to prefer a method called Bag of Visual Words. A single image is broken down to 100–300 meaningful locations, after which the neighbourhood of each location is given a specific visual description.

"For each neighbourhood, a histogram is collected of the directions of its surrounding gradients. This way a useful feature is put together. A feature characterising an entire image can then be created by looking into the statistics of the distribution of the local features."

The refined bags of visual words go into a support vector machine, which has been taught to recognise whether a feature belongs to certain category or not. Fed enough features, the machine will know whether it is a bird or an aeroplane on the sky of a picture.

"Different methods have to be experimented with, because a few successes in recognition tasks do not guarantee reliable performance. As long as we are not able to imitate the methods of image recognition of the brain, the best way is to experiment and experiment, through trial and error."

Explore further: Google image search gets a 'swirl'

Related Stories

Google image search gets a 'swirl'

November 17, 2009

Google Labs on Tuesday brought more focus to finding pictures online, adding a "Swirl" tool that automatically groups similar images into categories presented on results pages.

Picture this - automatic image categorisation

May 3, 2005

Creating, storing and transmitting visual images has become increasingly easy. Yet the same problem always arises – how to categorise or classify visual images automatically without using external metadata or image thumbnails? ...

Reconstruct Mars automatically in minutes

September 18, 2009

A computer system is under development that can automatically combine images of the Martian surface, captured by landers or rovers, in order to reproduce a three dimensional view of the red planet. The resulting model can ...

Recommended for you

Light-based production of drug-discovery molecules

February 18, 2019

Photoelectrochemical (PEC) cells are widely studied for the conversion of solar energy into chemical fuels. They use photocathodes and photoanodes to "split" water into hydrogen and oxygen respectively. PEC cells can work ...


Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.