Thermodynamics of visual images may help us see the world

February 13, 2013 by Lisa Zyga feature
(a) A grayscale image of a forest. Photo by Dan Ruderman. (b) The same image after it is quantized into two equally populated levels of black and white. The researchers found that small patches within this quantized image retain substantial local structure. This finding led them to discover that the photo is scale-invariant—its structure stays the same as its scale changes. Credit: Greg J. Stephens, et al. ©2013 American Physical Society

(—Although researchers know that a large portion of the brain is devoted to visual processing, exactly how we interpret the complex patterns within natural scenes is far from understood. One question scientists ask is, is there something about the structure of the visual world itself that enables our brains to process and understand our visual surroundings, and is this structure something that can be described quantitatively?

A team of researchers at Princeton University has taken a closer look at images of nature and proposed that the scale invariance of images closely resembles the thermodynamics of physical systems at a critical point, with the distribution of pixels in the images analogous to the distribution of particle states in a physical system such as a . The parts of an image that correspond to the low-, or local minima, have surprisingly interpretable structure, and these thermodynamic characteristics may help the brain see.

The researchers, Greg J. Stephens, Thierry Mora, Gašper Tkačik, and William Bialek, at Princeton University, have published their study on the thermodynamics of images in a recent issue of .

In their study, the scientists analyzed an ensemble of photographs taken in a forest at Hacklebarney State Park in New Jersey. The researchers converted the grayscale camera images to binary (black and white) images. Although intensity information was lost in the quantization, many details such as the structure of the trees and a body of water could still be identified.

(a) 4 x 4 patches from the quantized forest image with the lowest energy states, starting with the lowest energy states of all: solid black and white blocks. The other patches are local minima, and many of them can be interpreted as lines and edges. The scientists speculate that the visual system might build neurons that identify these local minima in order to build a representation of the world. In part (b), the researchers computed the average light-intensity images that correspond to those in part (a). These average images resemble those that trigger neuron responses in the primary visual cortex. Credit: Greg J. Stephens, et al. ©2013 American Physical Society

The researchers then divided each binary image into much smaller patches composed of 3 x 3 and 4 x 4 pixels and examined the distribution of black and white pixels in these patches.

To quantify how much structure is present in these tiny segments of natural images, the researchers measured the entropy of the distribution of pixels. Randomly distributed pixels would give an entropy level of 9 and 16 bits, respectively, for the 3 x 3 and 4 x 4 pixel regions. But the researchers found that the entropy levels of the same-sized regions from the photo were only 6.5 and 11.2 bits, suggesting that substantial local structure remains in the tiny patches.

To explore how local image structure changes with scale, the researchers averaged neighboring pixels within each image and repeated their patch analysis. After such "coarse-graining," the image had lower resolution, but remarkably both the and pixel distribution were unchanged from the original image. Even after repeating this coarse-graining process four times, the pixel distributions in the small square regions remained the same, indicating that the photo is scale-invariant—its structure stays the same as its scale changes.

The scientists saw this scale invariance as a hint that natural images may have something in common with a physical system at a critical point. In physical systems, scale invariance emerges only when the temperature reaches a critical value, at which point a phase transition occurs between two phases characterized by different forms of order.

To examine whether the ensemble of natural images has its own critical point, the researchers treated the distribution of pixels as the Boltzmann distribution for a physical system, where the patterns of pixels in the small patches are associated with different energy levels according to their probability. Remarkably, as the patch size increased so too did a peak in the specific heat, a thermodynamic variable that characterizes fluctuations in the energy of the ensemble. These results suggest a sharp transition in the thermodynamic limit of large patch sizes, similar to how a physical system reaches this limit at a critical temperature.

The researchers found that this approach to the thermodynamics of images also shares similarities with Zipf-like distributions. According to Zipf's law, elements in a group (for example, words in a book) that are sorted from most common to least common will follow a pattern where the second most common element is 1/2 as common as the first, the third most common element is 1/3 as common as the first, etc. Zipf-like distributions have been found to hold for many different situations, and here the scientists found that they also closely describe the distribution of the size of pixel patches ranked by the structure as determined by their black and white pixels.

Perhaps the most interesting implication of viewing natural images from a thermodynamics perspective is what it reveals about the nature of image patches that correspond to the low energy states. The patches with the absolute lowest energy states are those that are either all black or all white. However, a small number of patches have pixels in both states yet are considered local minima, since flipping any single pixel would increase the energy. Looking closer at these patches, the researchers found that many of them have distinct patterns, such as edges between dark and light regions.

The researchers speculate that the importance of these local minima in natural images may help us and other creatures "see" our surroundings, even when our eyes don't absorb every pixel. The visual system may build neurons that are tuned to these "basins of attraction." In other words, these low-energy patches may assist the brain in filling in the details using some kind of error-correcting code based on the of the visual world.

Explore further: The worlds smallest 3D HD display

More information: Greg J. Stephens, et al. "Statistical Thermodynamics of Natural Images." PRL 110, 018701 (2013). DOI: 10.1103/PhysRevLett.110.018701

Journal reference: Physical Review Letters search and more info website


Related Stories

The worlds smallest 3D HD display

May 16, 2011

( -- It seems like small displays are all of the rage these days, and they just keep getting more and more advanced. In October of last year Ortus Technology created a 4.8-inch liquid crystal display that showed ...

Sharp begins production of 5-inch full-HD LCD panels

October 1, 2012

Sharp Corporation has started production of 5-inch full-HD (1,080 x 1,920 pixels, 443 ppi) LCD panels for smartphones with a pixel density among the highest in the world. Production began at the end of September and full-scale ...

Seeing the light with nist's new noiseless optical amplifier

August 9, 2012

( -- Most devices that amplify light suffer from the same problem: making the image brighter also adds muddying distortion. Scientists working at the National Institute of Standards and Technology have demonstrated ...

Recommended for you

Making AI systems that see the world as humans do

January 19, 2017

A Northwestern University team developed a new computational model that performs at human levels on a standard intelligence test. This work is an important step toward making artificial intelligence systems that see and understand ...

Firms push hydrogen as top green energy source

January 18, 2017

Over a dozen leading European and Asian firms have teamed up to promote the use of hydrogen as a clean fuel and cut the production of harmful gasses that lead to global warming.


Adjust slider to filter visible comments by rank

Display comments: newest first

3.5 / 5 (4) Feb 13, 2013
Something doesn't seem right about the math here.
The first picture they create is binary. Then they average neighboring pixels creating a half tone picture. Yes, this decreases resolution, but it increases the alphabet (black, white AND 3 additional possible half tones). The two processes counteract each other when calculating the entropy measure (and this repeats with each further averaging run)

So that should hold for all possible images - not just 'natural' ones.

Scale invariance is also not shown via that kind of operation they describe (averaging and using the same kernels). It is shown by taking larger kernels and calculating the measures on those kernels.
not rated yet Feb 13, 2013
This is an exercise in perception subjective to the observer. I see structural information in the binary image that is missing in the gray scale image.
not rated yet Feb 14, 2013
yes, that is correct. the local maxima and minima are actually referred to by name as focal points and theory of how to identify the local maxima and minima using information theory is analogous and possibly isomorphic in application with the visual hierarchy and focal patterns postulated by http://en.wikiped..._Arnheim in his essays on the subject and numerous books. I recomend "The Power of the Center" and "Art and Visual Perception: A Psychology of the Creative Eye" and "Entropy and Art"

Also, I recommend using the photoshop "unsharp Mask" and "Gausian Blur" commands to develop a histogram of the detail density. In effect these two commands used correctly can create detail density filter that acts like a tuner in a radio to select and quantify only those areas of specific detail density. Run the image processing recursively can get a map of second order detail density differences. in other words it will highlight local maxima and minima the eye will be pulled to.
not rated yet Feb 14, 2013
Good work, can you check for 2nd order structures in the arrangement and shape of local maxima and minima. a cube looks much easier to identify and distinguis between a pyramid when looking at the edges (edges are easily identified as information maxima on a line) only and the edges of cubes or pyramids, plus the connecting points are all local information maxima. The eye in general is attracted to information maxima and minima unconsciously during saccades according the the gestalt theory of human perception.
not rated yet Feb 20, 2013
Edge detection is what the visual system does anyway, IIRC earlier science on the retina and inwards processing.

But the generic sense structure (should turn up in related sense such as hearing) is interesting, assuming they haven't messed up the math, because it could map naturally to the generic neuron structure.

To have signals go far in neural structures, the brain seems to have evolved a microstructure close to criticality. Then signals won't amplify too much (worse case may be epileptic oscillations) or too little (signals die out) but can connect over long distances. This has been tested in silicio and in vitro (but I haven't the ref handy).

Whether this is coincidence or not, it should probably help mapping one to the other.
2.5 / 5 (2) Feb 20, 2013
Edge detection is what the visual system does anyway, IIRC earlier science on the retina and inwards processing.

It does. But the result of that edge detection is not fed directly to the brain. (Ok, OK, I know the retina is technically part of the to be more specific: the visual cortex).

There are various places right behind the retina where the edge detection results are bundled and act on common neurons in excitatory and inhibitory fashion.
This in turn gives rise to the ability to see in a 'scale invariant' manner (the more bundling steps in the hierarchy the larger the scale The apostrophes are there because it's not a completely scale invariant).

This helps keep the infomation volume needed to be porcessed in the brain small(ish). It's a bit like wavelet compression in a way.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.