How to Measure What We Don't Know

Sep 10, 2009 By Lisa Zyga feature

( -- How do we discover new things? For scientists, observation and measurement are the main ways to extract information from Nature. Based on observations, scientists build models that, in turn, are used to make predictions about the future or the past. To the extent that the predictions are successful, scientists conclude that their models capture Nature’s organization. However, Nature does not reveal secrets easily - there is no way for observers to learn everything about a process, so some information always remains hidden from view; other kinds of information are present, but difficult to extract. In a recent study, researchers have investigated how to measure the degree of hidden information in a process (its “crypticity”) and, along the way, solved several puzzles involved in extracting, storing, and communicating information.

In their study, James Crutchfield, Physics Professor at the University of California at Davis, and graduate students Christopher Ellison and John Mahoney, have developed the analogy of scientists as cryptologists who are trying to glean hidden information from Nature. As they explain, “Nature speaks for herself only through the data she willingly gives up.” To build good models, scientists must use the correct “codebook” in order to decrypt the information hidden in observations and so decode the structure embedded in Nature’s processes.

In their recent work, the researchers adopt a thorough-going informational view: All of Nature is a communication channel that transmits the past to the future by storing information in the present. The information that the past and future share can be quantified using the “excess entropy” - the mutual information between the past and the future.

Since the present mediates between the past and future, it is natural to think that the excess entropy must somehow be stored in the present, the researchers explain. And while this is true, the researchers showed that, somewhat surprisingly, the present typically contains much more information than just the excess entropy. The information stored in the present is known as the “statistical complexity.” The more information Nature must store to turn her noble gears, the more structured her behavior.

The information that manages to go unaccounted for - the difference between the stored information (statistical complexity) and the observed information (excess entropy) - is the “crypticity”. It captures a new and under-appreciated complexity of a process, something that goes above and beyond what is directly measured in observations. At a more general level, the researchers provide an explicit way to understand the difference between simply making predictions from data versus modeling the process’s underlying structure.

“The results are at the crossroads of several research threads, from causal inference to new forms of computing,” Crutchfield told “But here are a couple of things we highlight: One can look at all of nature as a communication channel: Nature communicates the past to the future, by storing information in the present. In addition, information about how a system is structured can be available in observations, but very hard to extract. Crypticity measures the degree of that difficulty. Even in equilibrium there are temporal asymmetries.”

Although excess entropy, statistical complexity and crypticity are straightforward to define, their direct calculation has been a long-standing puzzle. Crutchfield, Ellison, and Mahoney developed a novel approach to its solution. The process, interpreted as a communication channel, is scanned in both the forward and reverse time directions to create models for prediction and retrodiction. By analyzing the relationship between predicting and retrodicting, they were able to uncover not only the external, time-symmetric information (excess entropy), but also the internal, asymmetric information (statistical complexity and crypticity). By looking inside Nature's communication channel, they discovered a rather non-intuitive asymmetry: Even processes in equilibrium commonly harbor temporally asymmetric structures.

“The basic idea is that a process can appear to not transmit much information from its past to its future, but still require a large amount of hardware to keep the internal machine going,” Crutchfield said. “For example, imagine that you have two coins: Coin A is a fair coin and Coin B is slightly biased. Now the output of this process is a series of heads and tails. That's all the observer gets to see. The observer doesn't know when A is used or B is used. To an observer this process is very close to a fair coin - the heads and tails from B just don't differ much in their statistics from the heads and tails from A. So, the observed process has little mutual information (the heads and tails are pretty much independent of the past). That is, the process has very low excess entropy. Nonetheless, there is one bit of internal stored information: Which coin, A or B, is flipped at each step? You can take this example to an extreme where you have hundreds of internal coins, all slightly biased, all slightly different in their bias, and therefore distinct coins. The large number of coins gives you an arbitrarily large statistical complexity. But the small biases mean the excess is as close to zero as you like.”

These fundamental results should impact research across a wide range of disciplines, from statistical modeling to novel forms of computing. As the researchers explain, when a process contains hidden information, the process cannot be directly represented using only raw measurement data. Rather, a model must be build to account for the degree of hidden information that is encrypted within the process’s observed behavior. Otherwise, analyzing a process only in terms of observed information overlooks the process’s structure, making it appear more random than it actually is.

“In statistical modeling, if you ignore a process's crypticity, you will conclude that nature is more random and less structured than she really is,” Crutchfield said. “We suspect that this general principle will be seen (or is even operating) in many scientific domains, from biosequence analysis to dark energy modeling.”

Intel, which partially funded this research, also has an interest in using the results to improve network performance. For many years, Intel has funded research on complex systems through the Network Dynamics Program, which Crutchfield runs.

“Intel's original interest, 10-plus years ago, was to stimulate research on the structure and dynamics of networks,” Crutchfield said. “This was an extremely successful program, in fact stimulating much progress in the early years that has now blossomed into the field of network science. The present work adds to that growing body of understanding of how complex systems are organized and how we should model and characterize them. In particular, crypticity and causal irreversibility suggest new metrics for network performance.”

More information: James P. Crutchfield, Christopher J. Ellison, and John R. Mahoney. “Time's Barbed Arrow: Irreversibility, Crypticity, and Stored .” Physical Review Letters 103, 094101 (2009).

Copyright 2009
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of

Explore further: Experiment with speeding ions verifies relativistic time dilation to new level of precision

add to favorites email to friend print save as pdf

Related Stories

Physicists investigate how time moves forward

Sep 05, 2008

As humans, we have a very intuitive concept of time, and of the differences between the past, present, and future. But, as scientists Edward Feng of the University of California, Berkeley, and Gavin Crooks of the Lawrence ...

Physicist Proposes Solution to Arrow-of-Time Paradox

Aug 27, 2009

( -- Entropy can decrease, according to a new proposal - but the process would destroy any evidence of its existence, and erase any memory an observer might have of it. It sounds like the plot ...

Why Life Originated (And Why it Continues)

Dec 09, 2008

( -- Today, scientists understand pretty well how life evolves, by mechanisms based on Darwin’s theory of natural selection for survival of the fittest. However, Darwin’s 1859 classic, On the ...

Recommended for you

Uncovering the forbidden side of molecules

3 hours ago

Researchers at the University of Basel in Switzerland have succeeded in observing the "forbidden" infrared spectrum of a charged molecule for the first time. These extremely weak spectra offer perspectives ...

How Paramecium protozoa claw their way to the top

Sep 19, 2014

The ability to swim upwards – towards the sun and food supplies – is vital for many aquatic microorganisms. Exactly how they are able to differentiate between above and below in often murky waters is ...

User comments : 11

Adjust slider to filter visible comments by rank

Display comments: newest first

5 / 5 (2) Sep 11, 2009
The author says "to build *good models*...", and there's the rub. A "model" is something that represents some phenomenon accurately in salient aspects. It is never as complex as the phenomenon, for if it were, it would not be amenable to the sort of mathematical and computational analysis that science lends. And conversely, though it represents some salient aspects, it does not represent *all* aspects. So we are really being told that an approximation is not exact, which is an obvious tautology >shrugs
not rated yet Sep 11, 2009
"Nature as a communications channel" is an interesting idea... but is it a good model for reality? It is more a model for our exploration of nature. As such, it suggests a method for measuring what we don't know about our MODELS of reality.... not reality itself.
Remember... science does not discover reality, only helps us model our interaction with it. It is easy to fall into the same trap many religions fall into... we begin to believe we comprehend the incomprehensible because we can "make an icon" that represents it for us.
The model/icon is useful in how it helps us navigate this thing called life... it is not life.
1 / 5 (1) Sep 11, 2009
It *is* an interesting paper, even though it is silent on QM and more realistic models of experimentation. (Such as varying parameters or isolating unobservable nodes).

Though the article oversells the paper. There is nothing new about effective theories, or using noise distributions for unknown effects, or having unobservable nodes in a system analysis of IC. But experimenting as per above is a mean for uncovering this lacking information to desireable degree, and when we have a complete QM description it promises us "the truth, the whole truth and nothing but the truth" (no local hidden variables).

I think it may be true in this classical setting as well. IIRC there is a theorem in category theory which shows that one can map mathematical objects fully. As I understand it, by throwing all possible data at it one can extract its function. Here it would be to do experiments, data and parameters, not merely passively observe what the system does.
2.5 / 5 (2) Sep 11, 2009

We don't have to be satisfied with not "learn[ing] everything".

Btw, perhaps evolution figured this out early. IIRC there was this description of neuroscience on rats, where they have figured out that rats (and presumably other animals) simulate the situation forwards but also backwards when they do things. I.e. they figure out "how will I get out of this mess" but also "how did I get into this mess" meanwhile it happens, precisely as the paper describes how to extract maximum information of passive observation. (Of course, animals tend to act as well, do choices and all that jazz that the paper doesn't cover.)
not rated yet Sep 11, 2009

It is never as complex as the phenomenon, for if it were, it would not be amenable to the sort of mathematical and computational analysis that science lends.

I don't think that follows. Nature seems to be algorithmic, as our theories are.

Reasonably a phenomena is emergent when our theories of the same shows emergence, and so on. That takes care of spatial computing resources, in principle.

Similarly if a phenomena would be NP, say, it wouldn't converge on a result in reasonable time. But we can observe that it does. That takes care of temporal computing resources, in principle.

Then anything nature can do, in principle we can do as well.

[Conversely, if we can't say that an algorithm is undecidable, nature can't either. When you walk into anthropic principle land, where you can claim that _all_ of observed physics is stable because you need it to be.]
4.9 / 5 (45) Sep 12, 2009
A "model" is something that represents some phenomenon accurately in salient aspects. It is never as complex as the phenomenon, for if it were, it would not be amenable to the sort of mathematical and computational analysis that science lends. And conversely, though it represents some salient aspects, it does not represent *all* aspects

Actually Tjorn, I think smoke&mirrors is quit correct here. Our models of reality are not the Reality itself, because for starters the form is different. We encode reality within pre-existing conceptual forms, and therefore delimit phenomenal reality, which includes subject dependent paradigmns,.. and Reality proper, that is, reality as it is apart from being conceptualized. In other words there can never be a complete one-to-one correlation between models and Reality. It appears that the authors presume that there can be in principal, at the start, but I could be missing the point.
1 / 5 (1) Sep 12, 2009
For me the process of Nature understanding is pretty close to solving of detective story: we are required to collect many subtle indicia to get emergent picture of deeper reality. In certain sense it's a process of spontaneous phase transition, i.e. condensation of ideas.
5 / 5 (1) Sep 12, 2009
it's obvious. with a ruler of unknown and unknowable length.
not rated yet Sep 13, 2009
Noumenon, I don't think they're implying we can have complete correlation between model and reality, rather that we can measure how uncorrelated they are, the cripticity of a complex system. We can know how much we don't know, which indeed is useful in designing large networks and such. Otherwise, it's old philosophy, as you no doubt can tell.

Computer science seems to be the future of science in general, what with guys like the above and Wolfram's ideas. We are no longer limited by our natural senses, but by how much data we can compute and how fast. Pure reason is becoming our primary mean of interacting with reality.

But pure reason has been thoroughly proven insufficient for some time. It's just a matter of time till a computer scientist comes up with a theorem, if they haven't already, that sets an asymptotic limit to computer modelling. To compute all the data in the universe, wouldn't you need another universe as memory?
not rated yet Sep 13, 2009
Then again, what if instead of our models approaching reality asymptotically, it's the other way around? What if there is no actual hardware, but just pure information, and not just in the philosophical sense? An infinite sea of continuous memory, which can be read/written as an act of will. Maybe Nietzche was on to something...
not rated yet Sep 15, 2009
Then again, what if instead of our models approaching reality asymptotically, it's the other way around? What if there is no actual hardware, but just pure information, and not just in the philosophical sense? An infinite sea of continuous memory, which can be read/written as an act of will. Maybe Nietzche was on to something...

This is a central tenant of the Information model of the Universe. It's an interesting read but very subjective in some cases.