Linguists to re-think reason for short words

January 25, 2011 by Lin Edwards report

(PhysOrg.com) -- Linguists have thought for many years the length of words is related to the frequency of use, with short words used more often than long ones. Now researchers in the US have shown the length is more closely related to the amount of information the words carry than their frequency of use.

A link between the length of words and how frequently they are used was first proposed in 1935 by George Kingsley Zipf, a Harvard University and philologist. Zipf's idea was that people would tend to shorten words they used often, to save time in writing and speaking. The relationship seems intuitive and it seems to apply to many languages with short words such as “the”, “a”, “to”, “and”, “so” (and equivalents in other languages) being frequently used.

Researchers at the Massachusetts Institute of Technology (MIT), led by Steven Piantadosi, tested the Zipf relationship by analysing word use in 11 European languages. They analyzed digitized texts for correlations between words by counting how often all pairs of words occurred in sequence. This information was then used to estimate the probability of words occurring after given previous words or sequences of words. They made the assumption that the more predictable a word is, the less information it conveys, and estimated the information content from information theory, which says the information content is proportional to the negative logarithm of the probability of a word occurring.

Piantadosi said if the word length is directly related to information content this would make the transmission of information through more efficient and also make speech and written texts easier to understand. This is because shorter words, carrying less information, would be scattered through the speech, essentially “smoothing out” the information density and delivering the important information at a steady rate.

The studies suggest that the short words are in fact the least informative and most predictable words rather than the most often used, and that word length is more closely related to the information the contain.

The paper is soon to be published in the Proceedings of the National Academy of Sciences (PNAS). Steven Piantadosi belongs to the PhD program with MIT’s Department of Brain and Cognitive Sciences.

More information: Piantadosi, S. T., et al. Proceedings of the National Academy of Sciences (2011). PNAS paper will appear online at http://dx.doi.org/ … s.1012551108

© 2010 PhysOrg.com

4.2 /5 (16 votes)  

Filter


Move the slider to adjust rank threshold, so that you can hide some of the comments.


Display comments: newest first

A_Paradox
Jan 25, 2011

Rank: not rated yet
I wonder how this relates to the need for a certain degree of redundancy in natural languages. Redundancy is required because speech often occurs in a noisy context. Noisy sound is unpredictable and can take various forms in that it may be pulses of sound which can have different durations and different dynamic variations, or it can be persistent sound which masks certain frequencies.

One example of simple redundancy is the prepositions [in, at, on, near, into, out, etc] which can sometimes add essential clarity to utterances but often do not add much meaning. And in European languages for example there are 'necessary' agreements - embodied as inflections - between adjectives and the nouns they are qualifying. The success of English with its relatively few such inflections [compared to Russian for example] show that much of this is just redundant. So why is it there?
that_guy
Jan 25, 2011

Rank: not rated yet
This article illustrates throwing out the baby with the bathwater. Scientists should be the most open to seeing grays instead of black and whites.

Shouldn't this scientist consider all possible explanations? Certainly compound words and words with suffixes and prefixes are going to be longer and convey more meaning through building blocks (Grandmother, defensive, flammable, chemical names).

The usage argument still applies, because less "meaningful" words are necessary to use more often in speech to give it context.

Sure better examples
or
I'm sure that there are better examples than this one

Longer words with more information have to be used less, because their specific meanings don't apply as often. I would use 'inflammable' less than 'to' because it doesn't apply in as many situations.

and...lets face it, we think of simple words as simple things. Car and automobile mean the same thing even though this study would consider one to have more meaning
antialias
Jan 26, 2011

Rank: not rated yet
While the placements of the words might not convey information many of the 'small' words denote relationships between the intrinsically information carrying (i.e. long) words. Such modifiers can have huge impact on the _type_ of information conveyed without changing the _amount_ of information conveyed
(e.g. "the tire is in the trunk" vs. "the tire is on the trunk")

The information carrying capacity of words (especially the short ones) can't be only be judged by probability of occurence since they are always embedded (quite literally) in a context.
A_Paradox
Jan 27, 2011

Rank: 5 / 5 (1)
@that_guy
and...lets face it, we think of simple words as simple things. Car and automobile mean the same thing even though this study would consider one to have more meaning


long ago, when I wss a kid, the wonder of television revealed to my little British soul that Americans always said "automobile" when they meant "car". Nowadays I never hear the word automobile in an American movie or TV program, it is always "car".

Perhaps this change has something to do with the [apparent] fact that Americans also commonly use[d?] the word "car" when referring to railway carriages and trams [as in "Streetcar named Desire"]. The decline of railways as people transport medium maybe opened the way for car to displace automobile as it has.
po6ert
Jan 30, 2011

Rank: not rated yet
latin, a highly precise language get uses many longer words to make very subtle distinctions.
De gustibus non disputandum est cover a lot of territory in few words
A_Paradox
Feb 01, 2011

Rank: not rated yet
po6ert,

the year I studied Latin [care of some aged soul we called "Yob"] I achieved an overall mark of 36%. Luckily nobody of importance to me thought that this was of any particular importance.

On-line translators this evening seem to imply that "est" in the quote should come before "disputandum". 36% notwithstanding, I think that makes better sense too ... :-)
frajo
Feb 01, 2011

Rank: not rated yet
De gustibus non disputandum est

On-line translators this evening seem to imply that "est" in the quote should come before "disputandum".
Online translators cannot be trusted for more than single words.
In fact, you have to realize that ancient times were essentially times of remembering the spoken word. No citation databases, no books "Latin for dummies". Thus, the sound of the spoken word was of utmost importance. (The oldest European literature, Odyssey and Iliad, is completely written in hexameters.)
And now listen to yourself reciting first "de gustibus non disputandum est" and then "De gustibus non est disputandum". The first is a hexameter, sticking to the ear, while the second is of the kind you have forgotten before the next adage.
eryksun
Feb 06, 2011

Rank: not rated yet
It's RISC for human language instead of processor instruction sets.
Rank 4.2 /5 (16 votes)
Relevant PhysicsForums posts

More news stories

Social welfare cuts ultimately come with heavy price, researchers say

(Phys.org) -- Slashing government funding for Medicaid, food stamps and other programs that serve the poor – while politically popular with some lawmakers and many conservatives – may do more harm ...

Other Sciences / Social Sciences

created May 24, 2012 | popularity 4.3 / 5 (12) | comments 99

Ancient Bethlehem seal unearthed in Jerusalem

Israeli archaeologists have discovered a 2,700-year-old seal that bears the inscription "Bethlehem," the Israel Antiquities Authority announced Wednesday, in what experts believe to be the oldest artifact ...

Other Sciences / Archaeology & Fossils

created May 23, 2012 | popularity 3.5 / 5 (14) | comments 22

Oldest Jewish archaeological evidence on the Iberian Peninsula

German archaeologists of the Friedrich Schiller University Jena found one of the oldest archaeological evidence so far of Jewish Culture on the Iberian Peninsula at an excavation site in the south of Portugal, ...

Other Sciences / Archaeology & Fossils

created May 25, 2012 | popularity 4.3 / 5 (4) | comments 12

Dollars and sense: Why are some people morally against tax?

As the U.S. presidential election campaigns heat up, the economic debate is dominated by bailouts, austerity and, inevitably, taxation. Now a new study published in Symbolic Interaction asks why tax is such an important issue ...

Other Sciences / Social Sciences

created May 23, 2012 | popularity 3 / 5 (2) | comments 12

Oldest art even older

New dates from Geißenklösterle Cave in Southwest Germany document the early arrival of modern humans and early appearance of art and music.

Other Sciences / Archaeology & Fossils

created May 24, 2012 | popularity 5 / 5 (2) | comments 6


SpotterRF debuts Radar Backpack Kit (w/ Video)

(Phys.org) -- SpotterRF has announced a special radar backpack kit designed to enhance situational awareness for soldiers on the ground. The company says its special radar is designed for warfighters as part ...

Australia hails surprise super-telescope decision

Australia has hailed a surprise decision giving it a role in a radio telescope project aimed at revolutionising astronomy, vowing to draw on its decades of experience in space science.

Astronomers seize last chance in lifetime for Venus Transit

Astronomers are gearing for one the rarest events in the Solar System: an alignment of Earth, Venus and the Sun that will not be seen for another 105 years.

Thousands of shellfish found dead in Peru

Thousands of crustaceans were found dead off the coast of Lima following the mystery mass death of dolphins and pelicans, the Peruvian Navy said Friday.

SpaceX capsule has 'new car' smell, astronauts say

SpaceX's Dragon cargo vessel smells like a new car, said astronauts at the International Space Station after opening the hatches Saturday following the spacecraft's landmark mission to the orbiting lab.

Family history of Alzheimer's affects functional connectivity

(HealthDay) -- Cognitively normal individuals with a family history of late-onset Alzheimer's disease (AD) may display lower resting state functional connectivity in the default mode network (DMN) of the brain, ...