Linux Evolution Reveals Origins of Curious Mathematical Phenomenon

Dec 01, 2008 By Lisa Zyga feature
When the Zipf curve is plotted on a log-log scale, it appears as a straight line with a slope of -1. This graph shows that four Debian Linux releases each follow Zipf’s law: Woody (orange), Sarge (green), Etch (blue) and Lenny (black). Credit: T. Maillart, et al.

(PhysOrg.com) -- Zipf’s law is a testament to the order in our world, showing that the same patterns emerge in a wide variety of situations. The linguist George Kingsley Zipf first proposed the law in 1949, when he noticed that the distribution of words in a newspaper, book, or other literary article always followed the same pattern.

Zipf counted how many times each word appeared, and found that the probability of the occurrence of words starts high and tapers off. Specifically, the most frequent word occurs about twice as often as the second most frequent word, which occurs about twice as often as the fourth most frequent word, and so on. Mathematically, this means that the frequency of any word is inversely proportional to its rank. When the Zipf curve is plotted on a log-log scale, it appears as a straight line with a slope of -1.

Since Zipf’s discovery, researchers have found that the power law describes many other natural and human phenomena, including the distribution of cities ranked by their population, the distribution of corporate wealth, and Internet traffic characteristics.

When analyzing systems that follow Zipf’s law, researchers usually assume certain mechanisms to be responsible for this patterned behavior. However, no one has ever empirically demonstrated that these assumed mechanisms are indeed the origin of Zipf’s law.

Now, a team of researchers from ETH Zürich (the Swiss Federal Institute of Technology Zürich) in Switzerland has confirmed that these assumed mechanisms – such as scale-free, proportional growth rates – are at the origin of Zipf’s law. The researchers used four orders of magnitude of data detailing the evolution of open source software applications created for a Linux operating system to confirm the assumption.

The team studied Debian Linux, a free operating system continuously being developed by more than 1,000 volunteers from around the world. Developers create software packages, such as text editors or music players, that are added to the system. Beginning with 474 packages in 1996, Debian Linux has expanded to include more than 18,000 packages today. The packages form an intricate network, with some packages having greater connectivity than others, as defined by how many other packages depend on a given package.

“Open source offers a unique opportunity provided by the high completeness of data concerning open source (thanks to the disclosure policy of the open source terms of license),” lead author Thomas Maillart of ETH Zürich told PhysOrg.com. “Debian Linux allowed us to retrieve exhaustive information from several years ago. Many other complex systems are not so well ‘documented.’”

As the researchers explain, the Linux network is constantly changing: new packages enter, some disappear, and others gain or lose connectivity. Yet throughout the 12 years, the distribution of packages, as ranked by their number of incoming links from other packages, has followed Zipf’s law, with a few very popular packages having much greater connectivity than most.

While many previous models of Zipf’s law start with the assumption that the set of entities (e.g. packages) appeared at the same time, the Swiss researchers track the time evolution of package connectivity in the Linux network since 1996. This perspective enabled them to test for the presence of specific characteristics of the growth of the Linux network, which leads to the emergence of Zipf’s law.

Using the data, they showed that the growth rates of connectivities between packages are proportional to the degree of connectivity between packages. In addition, they showed empirically that the average growth rate of the total number of links to a given package over a time interval is proportional to that time interval. Further, the variability of the total number of links to a given package increases proportionally to the square-root of time, providing a crucial test of the mechanism of stochastic proportional growth of connectivity between packages. Altogether, these characteristics are responsible for the universal distribution pattern of Zipf’s law.

“We show that the distribution of connectivity of new entrants is also a power law with an exponent much bigger than 1, confirming that the proportional growth mechanism is solely responsible for the Zipf's law,” Maillart said.

He explained that, while Linux data allowed the researchers to confirm the origins of Zipf’s law, their results bring up more questions.

“Linux Debian gave us the opportunity to verify the ‘proportional mechanism,’ thanks to an important dataset and a huge investigation potential,” Maillart said. “All changes (evolution) in open source software are freely available and therefore can be tracked in detail. However, model verification has brought one answer and many resulting questions we intend to give an answer to. We think particularly of mechanisms of success/failure of projects in relation with their management.

“Remember that we still do not clearly understand the reasons of the success of the open source, since it's free and based on altruist contributions by programmers,” he said. “Additionally, one can bet that further research in this direction (open source and proportional growth) may raise useful questions for other systems (cities, economy, etc.) that would bring new insights to explain their evolution.”

More information: T. Maillart.; D. Sornette; S. Spaeth, and G. von Krogh. “Empircal Tests of Zipf’s Law Mechanism in Open Source Linux Distribution.” Physical Review Letters 101 218701 (2008).

Copyright 2008 PhysOrg.com.
All rights reserved. This material may not be published, broadcast, rewritten or redistributed in whole or part without the express written permission of PhysOrg.com.

Explore further: Sensitive detection method may help impede illicit nuclear trafficking

add to favorites email to friend print save as pdf

Related Stories

Recommended for you

Device turns flat surface into spherical antenna

Apr 14, 2014

By depositing an array of tiny, metallic, U-shaped structures onto a dielectric material, a team of researchers in China has created a new artificial surface that can bend and focus electromagnetic waves ...

User comments : 16

Adjust slider to filter visible comments by rank

Display comments: newest first

brane
2.6 / 5 (5) Dec 01, 2008
agreed
mattytheory
3 / 5 (3) Dec 01, 2008
Had to look up Zipf's law. The first part of the article was interesting. But, I agree.. the last part was yawn.
FredG
3 / 5 (3) Dec 01, 2008
Would not this only be true if the links were random? However, since 2003, corporations have taken over Linux development and have full time paid engineers doing the development.

So the packages are not random.
fleem
2.6 / 5 (5) Dec 02, 2008
Fleem's law: 10% of all so-called science articles (and grant recipients) will attempt to make something blatantly obvious and mundane seem mysterious and complex.
SmartK8
3 / 5 (4) Dec 02, 2008
Fleem: Agreed. Those are hot candidates for the Ig Nobel Prize 2009. Good luck guys.
Going
3 / 5 (2) Dec 02, 2008
I wonder Zipf%u2019s law also applies to the number of species evolved over time in a given environment.
theophys
4 / 5 (4) Dec 02, 2008
Seems more like an add for Linux than anything else. These guys just want funding to play around with programming. "further research in this direction...may raise useful questions for other systems (cities, economy, etc.) that would bring new insights to explain their evolution.%u201D

What a bunch of ballux. If you want to know how economic theories or cities, pick up a history book.
Yoknapatawpha
4.3 / 5 (3) Dec 02, 2008
I quote:

the growth rates of connectivities between packages are proportional to the degree of connectivity between packages

AND

the variability of the total number of links to a given package increases proportionally to the square-root of time, providing a crucial test of the mechanism of stochastic proportional growth of connectivity between packages

AND

they showed empirically that the average growth rate of the total number of links to a given package over a time interval is proportional to that time interval.

NOW MY QUESTIONS IS:
How could this article NOT point out that it should be the root inverse proportion of the mean??

And those of you that were bashing this article... if you read this far, can you still not get it?

sheesh...
Yok

A_Paradox
3 / 5 (2) Dec 03, 2008
a/
Would not this only be true if the links were random? However, since 2003, corporations have taken over Linux development and have full time paid engineers doing the development.

So the packages are not random.


Maybe it depends what you mean by random. In the evolution of linux code packages there might be some "true" randomness to the origin of a particular idea or strategy but once it is instantiated its evolution may be largely determined by 'dialectical' interaction with the rest of its world, just like biological species, etc, but will be mostly unpredictable due to the non-linear progression of these recursive interactions.

b/ I am not sure what Yok is saying about "root inverse proportion of the mean", but I am neither mathematician nor scientist. I agree with Yok though that the bashers must be sleeping through their lives, not to be entranced by yet another demonstration of the amazing depths of emergent order manifest by evolutionary processes.

Mark
trimleyman
3 / 5 (4) Dec 03, 2008
linux is opensource so unlike apple and microsoft stands on it's merits alone. apple and microsoft spend billions advertising each-others failings, perceived or real. we who use linux understand that it is not perfect but is at least headed in the right direction and is still developed primarily by the community. a huge industry has built around the failings of mircosoft's os in particular. this employers large numbers in the Silicon Valley and elseware.
kwilco
3 / 5 (1) Dec 06, 2008
Fleem's law: 10% of all so-called science articles (and grant recipients) will attempt to make something blatantly obvious and mundane seem mysterious and complex.


Although the article is about linux, the underlying natural law, "Zipfs law, a testament to the order in our world, showing that the same patterns emerge in a wide variety of situations," is potentially profound. Evidently, humans are driven by forces beyond our comprehension. Beyond psychology and into the physical world.
kwilco
5 / 5 (1) Dec 06, 2008
P.S. Check out Stephen Wolfram's cellular automata:

http://www.maa.or...ram.html
corymp
5 / 5 (1) Dec 07, 2008
this reminds me of the double slit experiment... if this document was read by one of those volunteers and they realized that they were contributing to this, would you think the whole pattern would change completely?
tigger
5 / 5 (1) Dec 07, 2008
Linux bites the big one... hey, let's recompile the kernel because we want to install a web cam. Uggghhhh, Linux fan boys LIKE the fact that they have to compile the kernel because they think it makes them software engineers... when the reality is they are generally social outcasts trying to elevate themselves through delusion.

Sigh... anyway, Zipf%u2019s law, great, yeah.
Ashibayai
5 / 5 (1) Dec 07, 2008
Sounds like a case of e^x to me.
deatopmg
5 / 5 (1) Dec 07, 2008
Fleem's law: 10% of all so-called science articles (and grant recipients) will attempt to make something blatantly obvious and mundane seem mysterious and complex.


ONLY 10%!

More news stories

CERN: World-record current in a superconductor

In the framework of the High-Luminosity LHC project, experts from the CERN Superconductors team recently obtained a world-record current of 20 kA at 24 K in an electrical transmission line consisting of two ...

Glasses strong as steel: A fast way to find the best

Scientists at Yale University have devised a dramatically faster way of identifying and characterizing complex alloys known as bulk metallic glasses (BMGs), a versatile type of pliable glass that's stronger than steel.

Low Vitamin D may not be a culprit in menopause symptoms

A new study from the Women's Health Initiative (WHI) shows no significant connection between vitamin D levels and menopause symptoms. The study was published online today in Menopause, the journal of The North American Menopa ...

Astronomers: 'Tilt-a-worlds' could harbor life

A fluctuating tilt in a planet's orbit does not preclude the possibility of life, according to new research by astronomers at the University of Washington, Utah's Weber State University and NASA. In fact, ...