Using digitized books as 'cultural genome,' researchers unveil quantitative approach to humanities
December 16, 2010 By Steve Bradt
Researchers have been tracking the frequency with which words appear in books, allowing scholars the ability to more precisely quantify a wide variety of cultural and historical trends. Leading the four-year effort are Harvard's Jean-Baptiste Michel (foreground), a postdoctoral researcher in the Department of Psychology and Program for Evolutionary Dynamics, and Erez Lieberman Aiden, a junior fellow in Harvard’s Society of Fellows. Photo: Kris Snibbe
(PhysOrg.com) -- Researchers have created a powerful new approach to scholarship, using approximately 4 percent of all books ever published as a digital "fossil record" of human culture. By tracking the frequency with which words appear in books over time, scholars can now precisely quantify a wide variety of cultural and historical trends.
The four-year effort, led by Harvard University's Jean-Baptiste Michel and Erez Lieberman Aiden, is described this week in the journal Science.
The team, comprising researchers from Harvard, Google, Encyclopaedia Britannica, and the American Heritage Dictionary, has already used their approach -- dubbed "culturomics," by analogy with genomics -- to gain insight into topics as diverse as humanity's collective memory, the adoption of technology, the dynamics of fame, and the effects of censorship and propaganda.
"Interest in computational approaches to the humanities and social sciences dates to the 1950s," says Michel, a postdoctoral researcher based in Harvard's Department of Psychology and Program for Evolutionary Dynamics. "But attempts to introduce quantitative methods into the study of culture have been hampered by the lack of suitable data. We now have a massive dataset, available through an interface that is user-friendly and freely available to anyone."
Google will release a new online tool to accompany the paper: a simple interface that enables users to type in a word or phrase and immediately see how its usage frequency has changed over the past few centuries.
"Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena in the social sciences and humanities," says Aiden, a junior fellow in Harvard's Society of Fellows and principal investigator of the Laboratory-at-Large, part of Harvard's School of Engineering and Applied Sciences. "While browsing this cultural record is fascinating for anyone interested in what's mattered to people over time, we hope that scholars of the humanities and social sciences will find this to be a useful and powerful tool."
This Google Books data set, which is available for download along with the Google Books Ngram Viewer, is a free quantitative tool made available to supplement humanities research worldwide. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, Russian, and Hebrew.
It is the largest data release in the history of the humanities, the authors note, a sequence of letters 1,000 times longer than the human genome. If written in a straight line, it would reach to the moon and back 10 times over.
"Now that a significant fraction of the world's books have been digitized, it's possible for computer-aided analysis to reveal undiscovered trends in history, culture, language, and thought," says Jon Orwant, engineering manager for Google Books.
The paper describes the development of this new approach and surveys a vast range of applications, focusing on the past two centuries. The team's findings include:
- Some 8,500 new words enter the English language annually, fueling a 70 percent growth of the lexicon between 1950 and 2000. But many of these million-plus words can't be found in dictionaries.
"We estimated that 52 percent of the English lexicon -- the majority of words used in English books -- consist of lexical 'dark matter' undocumented in standard references," the researchers write in Science.
- Humanity is forgetting its past faster with each passing year. The Harvard-Google team tracked the frequency with which each year from 1875 to 1975 appeared, finding that references to the past decrease much more rapidly now than in the 19th century. References to "1880" didn't fall by half until 1912 -- a lag of 32 years -- but references to "1973" reached half their peak just a decade later, in 1983.
- Innovations spread faster than ever. For instance, inventions from the end of the 19th century spread more than twice as fast as those from the early 1800s.
- Modern celebrities are younger and more famous than their 19th-century predecessors, but their fame is shorter-lived. Celebrities born in 1950 initially achieved fame at an average age of 29, compared to 43 for celebrities born in 1800. But their fame also disappears faster, with a "half-life" that is increasingly short.
"People are getting more famous than ever before," the researchers write, "but are being forgotten more rapidly than ever." - The most famous actors tend to become famous earlier (around age 30) than the most famous writers (around age 40) and politicians (after age 50). But patience pays off: Top politicians end up much more famous than the best-known actors.
- Culturomics is a powerful tool for automatically identifying censorship and propaganda. For example, Jewish artist Marc Chagall was mentioned just once in the entire German corpus from 1936 to 1944, even as his prominence in English-language books grew roughly fivefold. Evidence of similar suppression is seen in Russian with regard to Leon Trotsky; in Chinese with regard to Tiananmen Square; and in the US with regard to the "Hollywood Ten," a group of entertainers blacklisted in 1947.
- "Freud" is more deeply engrained in our collective subconscious than "Galileo," "Darwin," or "Einstein."
Provided by
Harvard University
-
From lemons to lemonade: Reaction uses carbon dioxide to make carbon-based semiconductor,
28 comments
-
Thioridazine kills cancer stem cells in human while avoiding toxic side-effects of conventional cancer treatments,
3 comments
-
SpaceX private rocket blasts off for space station (Update),
41 comments
-
Climate scientists say they have solved riddle of rising sea,
30 comments
-
Scotland passes turbine test to harness tidal power,
40 comments
-
Ideas to mitigate risk of 911 calls being misdirected
May 24, 2012
-
Live scribe pen?
May 10, 2012
-
Shallow water flow simulation
May 07, 2012
-
Tablet for taking notes?
May 05, 2012
-
Best fit tablet for me?
May 05, 2012
-
Measure of Informaton
May 04, 2012
- More from Physics Forums - Computing & Technology
More news stories
Yahoo kills 'Livestand' just 6 months after debut
(AP) -- Yahoo is killing a tablet magazine called Livestand just six months its debut on the iPad.
9 hours ago |
not rated yet |
1
Computers excel at identifying smiles of frustration (w/ Video)
(Phys.org) -- Researchers at the Massachusetts Institute of Technology (MIT) in the US have trained computers to recognize smiles, and they have turned out to be more adept at recognizing smiles of frustration ...
Yahoo! ditches digital newsstand for iPads
Yahoo! shuttered its fledgling digital newsstand for iPads on Friday in what it said was the start of a product purge intended to make the floundering Internet pioneer more nimble.
10 hours ago |
not rated yet |
0
Facebook IPO debacle raises investor dander
The spate of complaints and investigations over the Facebook stock offering suggests big institutions had an edge over small investors, raising questions about the process.
11 hours ago |
not rated yet |
0
Apple CEO Cook gives up $75M in stock dividends
(AP) -- Apple CEO Tim Cook is giving up $75 million in dividends on restricted stock that the company is awarding to all of its employees.
15 hours ago |
1.8 / 5 (4) |
2
Of mice and mental models: Neuroscientific implications of risk-optimized behavior in the mouse
(Medical Xpress) -- Regardless of an organism’s biological complexity, every encephalized animal continuously makes under-informed behavioral choices that can have serious consequences. Despite its ubiquity, ...
Dragon arrives at space station in historic 1st (Update 2)
The privately bankrolled Dragon capsule made a historic arrival at the International Space Station on Friday, triumphantly captured by astronauts wielding a giant robot arm.
Landmark calculation clears the way to answering how matter is formed
(Phys.org) -- An international collaboration of scientists, including Thomas Blum, associate professor of physics, is reporting in landmark detail the decay process of a subatomic particle called a kaon ...
High-speed method to aid search for solar energy storage catalysts
Eons ago, nature solved the problem of converting solar energy to fuels by inventing the process of photosynthesis.
It's in the genes: Research pinpoints how plants know when to flower
Scientists believe they've pinpointed the last crucial piece of the 80-year-old puzzle of how plants "know" when to flower.
Researchers solve structure of human protein critical for silencing genes
In a study published in the journal Cell on May 24, Cold Spring Harbor Laboratory (CSHL) scientists describe the three-dimensional atomic structure of a human protein bound to a piece of RNA that "guides" the pr ...