August 10, 2017

Unzipping Zipf's Law: Solution to a century-old linguistic problem

Did you know that in every language, the most frequent word occurs twice as often as the second most frequent word? This phenomenon called 'Zipf's law' is more than one century old, but until now, scientists have not been able to elucidate it exactly. Sander Lestrade, a linguist at Radboud University in The Netherlands, proposes a new solution to this notorious problem in PLOS ONE.

Zipf's law describes how the frequency of a word in natural language, is dependent on its rank in the frequency table. So the most frequent word occurs twice as often as the second most frequent word, three times as often as the subsequent word, and so on until the least frequent word (see Figure 1). The law is named after the American linguist George Kingsley Zipf, who was the first who tried to explain it around 1935.

Biggest mystery in computational linguistics

"I think it's safe to say that Zipf's law is the biggest mystery in computational linguistics," says Sander Lestrade, linguist at Radboud University in Nijmegen, the Netherlands. "In spite of decades of theorizing, its origins remain elusive." Lestrade now shows that Zipf's law can be explained by the interaction between the structure of sentences (syntax) and the meaning of words (semantics) in a text. Using computer simulations, he was able to show that neither syntax or semantics suffices to induce a Zipfian distribution on its own, but that syntax and semantics 'need' each other for that.

"In the English language, but also in Dutch, there are only three articles, and tens of thousands of nouns," Lestrade explains. "Since you use an article before almost every noun, articles occur way more often than nouns." But that is not enough to explain Zipf's law. "Within the nouns, you also find big differences. The word 'thing', for example, is much more common than 'submarine', and thus can be used more frequently. But in order to actually occur frequently, a word should not be too general either. If you multiply the differences in meaning within word classes, with the need for every word class, you find a magnificent Zipfian distribution. And this distribution only differs a little from the Zipfian ideal, just like natural language does, as you can see in Figure 1."

Not only are predictions based on Lestrades new model completely consistent with phenomena found in natural language, his theory also holds for almost every language in the world, not only for English or Dutch. Lestrade: "I am overjoyed with this finding, and I am convinced of my theory. Still, its confirmation must come from other linguists."

More information: Sander Lestrade et al. Unzipping Zipf's law, PLOS ONE (2017). DOI: 10.1371/journal.pone.0181987

Journal information: PLoS ONE

Provided by Radboud University

Citation: Unzipping Zipf's Law: Solution to a century-old linguistic problem (2017, August 10) retrieved 15 May 2024 from https://phys.org/news/2017-08-unzipping-zipf-law-solution-century-old.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Surprising mathematical law tested on Project Gutenberg texts

921 shares

Feedback to editors

Tiger beetles fight off bat attacks with ultrasonic mimicry

7 hours ago

Machine learning model uncovers new drug design opportunities

11 hours ago

Astronomers find the biggest known batch of planet ingredients swirling around young star

11 hours ago

How 'glowing' plants could help scientists predict flash drought

11 hours ago

New GPS-based method can measure daily ice loss in Greenland

11 hours ago

New candidate genes for human male infertility found by analyzing gorillas' unusual reproductive system

13 hours ago

Study uncovers technologies that could unveil energy-efficient information processing and sophisticated data security

13 hours ago

Scientists develop an affordable sensor for lead contamination

13 hours ago

Chemists succeed in synthesizing a molecule first predicted 20 years ago

13 hours ago

New optical tweezers can trap large and irregularly shaped particles

13 hours ago

Load comments (2)

Unzipping Zipf's Law: Solution to a century-old linguistic problem

Biggest mystery in computational linguistics

Tiger beetles fight off bat attacks with ultrasonic mimicry

Machine learning model uncovers new drug design opportunities

Astronomers find the biggest known batch of planet ingredients swirling around young star

How 'glowing' plants could help scientists predict flash drought

New GPS-based method can measure daily ice loss in Greenland

New candidate genes for human male infertility found by analyzing gorillas' unusual reproductive system

Study uncovers technologies that could unveil energy-efficient information processing and sophisticated data security

Scientists develop an affordable sensor for lead contamination

Chemists succeed in synthesizing a molecule first predicted 20 years ago

New optical tweezers can trap large and irregularly shaped particles

Relevant PhysicsForums posts

Cover songs versus the original track, which ones are better?

Who is your favorite Jazz musician and what is your favorite song?

How does academic transcripts translation work?

Biographies, history, personal accounts

Today's Fusion Music: T Square, Cassiopeia, Rei & Kanade Sato

Music to Lift Your Soul: 4 Genres & Honorable Mention

Surprising mathematical law tested on Project Gutenberg texts

Applying Zipf's Law to galaxies

Linguists to re-think reason for short words

Why context matters in the long and short of words: Researchers improve 75-year-old language theory

What the pupils tells us about language

Physicists eye neural fly data, find formula for Zipf's law

The power of ambiguity: Using computer models to understand the debate about climate change

Study finds avoiding social media before an election has little to no effect on people's political views

Researchers develop algorithms to understand how humans form body part vocabularies

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

Analysis of millions of posts shows that users seek out echo chambers on social media

The spread of misinformation varies by topic and by country in Europe, study finds

Medical Xpress

Tech Xplore

Science X

Unzipping Zipf's Law: Solution to a century-old linguistic problem

Biggest mystery in computational linguistics

Tiger beetles fight off bat attacks with ultrasonic mimicry

Machine learning model uncovers new drug design opportunities

Astronomers find the biggest known batch of planet ingredients swirling around young star

How 'glowing' plants could help scientists predict flash drought

New GPS-based method can measure daily ice loss in Greenland

New candidate genes for human male infertility found by analyzing gorillas' unusual reproductive system

Study uncovers technologies that could unveil energy-efficient information processing and sophisticated data security

Scientists develop an affordable sensor for lead contamination

Chemists succeed in synthesizing a molecule first predicted 20 years ago

New optical tweezers can trap large and irregularly shaped particles

Relevant PhysicsForums posts

Related Stories

Surprising mathematical law tested on Project Gutenberg texts

Applying Zipf's Law to galaxies

Linguists to re-think reason for short words

Why context matters in the long and short of words: Researchers improve 75-year-old language theory

What the pupils tells us about language

Physicists eye neural fly data, find formula for Zipf's law

Recommended for you

The power of ambiguity: Using computer models to understand the debate about climate change

Study finds avoiding social media before an election has little to no effect on people's political views

Researchers develop algorithms to understand how humans form body part vocabularies

Study shows AI conversational agents can help reduce interethnic prejudice during online interactions

Analysis of millions of posts shows that users seek out echo chambers on social media

The spread of misinformation varies by topic and by country in Europe, study finds

Newsletter sign up

Donate and enjoy an ad-free experience