A frank discussion of the power law and linking correlation to causation

February 10, 2012 by Bob Yirka report
An example power-law graph, being used to demonstrate ranking of popularity. To the right is the long tail, and to the left are the few that dominate (also known as the 80-20 rule). Image: Wikipedia.

(PhysOrg.com) -- Michael Stumpf a mathematics professor at Imperial College in London, and Mason Porter a lecturer at Oxford have teamed together to write and publish a perspective piece in Science regarding the inexact science of trying to apply the power law to situations in science where it’s not always easy to show a direct link between correlation and causation, a key problem they say, in much of the science that is conducted today.

A is where a relationship between two quantities exists that can be described mathematically where one measurement is directly related to the outcome of another. In their perspective the authors describe it as related via an exponent which can be mathematically described. One example is where the surface area of a sphere is firmly fixed to its radius, i.e. the area increases proportionally as the radius does. In such cases, its scalable, it doesn’t matter how large or small the sphere is, its surface area can still be calculated using the same formulaic relationship. It’s also meaningful, i.e. the relationship formula can be used to actually calculate sphere surface areas.

In their perspective, the authors use the relationship between the metabolic performance of an organism and its body size, which biologists describe using allometric scaling, a power law, to make their assertions. Research has shown that using such a formula allows scientists to calculate the second when obtaining the first through measurement, regardless of body size, a clear and useful thing when studying virtually any organism.

In their perspective, however, the authors point out that not all areas of science are so compliant, which leads to all manner of assumptions regarding outcomes that may or may not be true. One prime example is when researchers collect data points and find they can draw a line though them, which suggests a correlation. Unfortunately, quite often other lines could have just as easily been drawn, indicating there was no correlation at all. The point here is that in scenarios where the power law cannot be applied, researchers are often left to make educated guesses, going on little more than intuitive leaps based on past experience. This is especially important when it is a practicable impossibility to obtain a reasonably large sample size.

They also point out that many instances occur in research where the power law is applied in ways that don’t actually make sense. For example, if a line is drawn though a set of data points showing correlation, it won’t matter much if that line doesn’t offer any real insight into what is being demonstrated.

Because of such scenarios, the authors conclude that it might behoove the scientific community, both researchers and readers of scientific papers alike, to apply some bit of skepticism to such claims when they are made.

Explore further: Libel case against the scientific journal Nature begins

More information: Critical Truths About Power Laws, Science 10 February 2012: Vol. 335 no. 6069 pp. 665-666. DOI: 10.1126/science.1216142

The ability to summarize observations using explanatory and predictive theories is the greatest strength of modern science. A theoretical framework is perceived as particularly successful if it can explain very disparate facts. The observation that some apparently complex phenomena can exhibit startling similarities to dynamics generated with simple mathematical models (1) has led to empirical searches for fundamental laws by inspecting data for qualitative agreement with the behavior of such models. A striking feature that has attracted considerable attention is the apparent ubiquity of power-law relationships in empirical data. However, although power laws have been reported in areas ranging from finance and molecular biology to geophysics and the Internet, the data are typically insufficient and the mechanistic insights are almost always too limited for the identification of power-law behavior to be scientifically useful (see the figure). Indeed, even most statistically “successful” calculations of power laws offer little more than anecdotal value.

Related Stories

Libel case against the scientific journal Nature begins

November 14, 2011

(PhysOrg.com) -- The British science journal Nature, which publishes both purely academic papers and editorial pieces, is being sued in a British court by a former editor of the theoretical physics journal Chaos, Solitons ...

Recommended for you

Four pre-Inca tombs found in Peru's Lima

November 27, 2015

Archaeologists in Peru have found four tombs that are more than 1,000 years old in a pyramid-shaped cemetery that now sits in the middle of a residential neighborhood in Lima, experts said.


Adjust slider to filter visible comments by rank

Display comments: newest first

1 / 5 (1) Feb 10, 2012
I don't agree. I once developed software for use by mucicipalities to record and base decisions on data collected at bathing beaches regarding coliform presence in the water. The geometric mean calculation of the combined sample counts, which is a 'power formula', more than adequately did the job of casting out singularly unique deviations to come up with an accurate depiction of the general state of the waters at the beach. It turned out to be invaluable.
1 / 5 (1) Feb 10, 2012
Ahh, good, mainstream math interest following Taleb's 2001 Fooled by Randomness:The Hidden Role of Chance in Life and in the Markets
3 / 5 (4) Feb 10, 2012
It can be frightening the level of "understanding" of individuals doing important work for municipalities, if what baudrunner says is true. baudrunner claims to have "developed software" for assessing coliform contamination of shore water. They describe the software as using "geometric mean calculations" of samples, which, baudrunner says, is a "power formula". A "formula" is not a "law". A "law" decribes a relationship between two variables, that, as one varies, the other does, as well. If baudrunner developed a formula relating coliform concentration to temperature or that predicts its value from concentrations of other bacteria, that would be a "law" or "rule". What baudrunner did was simply produce software to intake values from various measures, then calculate a picture of the shore at only one time. And baudrunner doesn't seem to realize this.
2 / 5 (4) Feb 10, 2012
The fact is, "concluding" a power law relationship is dependent on a trick that those who don't undserstand the world won't realize is being played. Which means, if "peer reviewers" are craven, it can be difficult getting the word out.
To "establish" a power law, y = x^k, you simply take the log of both sides, to get log y = k log x, graph the data and show they form a straight line. But logarithms "squash" data, log 10 being 1, log 100 being 2, log 1000 being 3. Almost any data can be made to look "linear" on a log - log graph! Then, you perform a linear regression to get a formula of the straight line that best fits the data. But many think that, if a linear regression gives you a straight line, that means the data is linear! No it doesn't. It merely gives you the straight line that comes closest to all the data. If the correlation coefficient is close to 1, the data are nearly linear! But how many "peer reviewers" call attention to that?
1 / 5 (2) Feb 10, 2012
If the mathematical trick of logarithms "squashing" data is the fraud used to "prove" power laws, it only mirrors the wholesale deceit in the "scientific" community.
Gary Wells claims to improve assembling eyewitness testimony by requiring people to stare for minutes at a single picture of a suspect, then move to another. The long process can cause people to see details missing from a quick viewing and so conclude, even if the picture is of the perpetrator, that they were wrong! Too, the tediousness of the technique can cause many to get tired and so just give up, not even trying after a few pictures. The result is that they don't identify anyone, including the actual perpetrator. Wells bills his scam as "reducing the number of false identifications". But not because it increases correct identifications! It reduces the rate of any identifications and, if you have fewer identifications, you have fewer incorrect ones!
1 / 5 (5) Feb 10, 2012
Or consider the swindle when some New Jersey politicos wanted to ban cell phone use in cars, so insurance carrier crooks would have an "excuse" to deny coverage for coverage for accidents. They contracted some "scientist" liars to design an "experiment" that would "conclude" that cell phone use "causes accidents".
The "scientists" arbitrarily "defined" a cell phone as "causing an accident' if it was used within ten minutes of the accident occurring. As a result, you could take a message at a restaurant, get into your car, drive without using the cell phone and be plowed into by the drugged out kid of a politician and the cell phone wouild be fingered as the "cause" of the accident!
1 / 5 (1) Feb 11, 2012
Science uses the language of math.
'Boxed warnings' for math?
Hardly a deterrent for misuse.
5 / 5 (2) Feb 11, 2012
Julian your first post was ok but the others!!! Hoo boy!
rocky j squirrel
not rated yet Feb 11, 2012
Insufficient data points will lead to incorrect conclusions! Too often when three data pints are in a line (on any plot format) it get assumed that they portray a mathmatical relationship. They may, however there is an infinity of formulas (lines) that can be drawn through the three points, of course the won't be straight but wiggley.
In testing the burn rate versus pressure of solid rocket propellant I found formulations that had three in line point when tested at three standard points, but when tested at 15 to 20 points a skijunp plot developed right over the standard three points. (this is called plateau propellant) Of course such data of major importance to rocket builders.
1 / 5 (1) Feb 11, 2012
The problem of induction has been with us since Aristotle, was given a name by Karl Popper and the hazards of ignoring it demonstrated by N.N. Taleb.
1 / 5 (1) Feb 12, 2012
julianpenrod, your first post was not ok. Sampling regimens were implemented on a regular basis, and yes, all other things were considered in order to have a bathing beach flagged as unusable for swimming until further notice. The program did the job more than adequately. You see, I factored in reality. You probably don't know what I'm talking about.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.