December 16, 2010

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

Using digitized books as 'cultural genome,' researchers unveil quantitative approach to humanities

Researchers have been tracking the frequency with which words appear in books, allowing scholars the ability to more precisely quantify a wide variety of cultural and historical trends. Leading the four-year effort are Harvard's Jean-Baptiste Michel (foreground), a postdoctoral researcher in the Department of Psychology and Program for Evolutionary Dynamics, and Erez Lieberman Aiden, a junior fellow in Harvard’s Society of Fellows. Photo: Kris Snibbe
× close
Researchers have been tracking the frequency with which words appear in books, allowing scholars the ability to more precisely quantify a wide variety of cultural and historical trends. Leading the four-year effort are Harvard's Jean-Baptiste Michel (foreground), a postdoctoral researcher in the Department of Psychology and Program for Evolutionary Dynamics, and Erez Lieberman Aiden, a junior fellow in Harvard’s Society of Fellows. Photo: Kris Snibbe

(PhysOrg.com) -- Researchers have created a powerful new approach to scholarship, using approximately 4 percent of all books ever published as a digital "fossil record" of human culture. By tracking the frequency with which words appear in books over time, scholars can now precisely quantify a wide variety of cultural and historical trends.

The four-year effort, led by Harvard University's Jean-Baptiste Michel and Erez Lieberman Aiden, is described this week in the journal Science.

The team, comprising researchers from Harvard, , Encyclopaedia Britannica, and the American Heritage Dictionary, has already used their approach -- dubbed "culturomics," by analogy with genomics -- to gain insight into topics as diverse as humanity's collective memory, the adoption of technology, the dynamics of fame, and the effects of censorship and propaganda.

"Interest in computational approaches to the humanities and social sciences dates to the 1950s," says Michel, a postdoctoral researcher based in Harvard's Department of Psychology and Program for . "But attempts to introduce quantitative methods into the study of culture have been hampered by the lack of suitable data. We now have a massive dataset, available through an interface that is user-friendly and freely available to anyone."

Google will release a new online tool to accompany the paper: a simple interface that enables users to type in a word or phrase and immediately see how its usage frequency has changed over the past few centuries.

"Culturomics extends the boundaries of rigorous quantitative inquiry to a wide array of new phenomena in the social sciences and humanities," says Aiden, a junior fellow in Harvard's Society of Fellows and principal investigator of the Laboratory-at-Large, part of Harvard's School of Engineering and Applied Sciences. "While browsing this cultural record is fascinating for anyone interested in what's mattered to people over time, we hope that scholars of the humanities and social sciences will find this to be a useful and powerful tool."

This Google Books data set, which is available for download along with the Google Books Ngram Viewer, is a free quantitative tool made available to supplement humanities research worldwide. It is based on the full text of about 5.2 million books, with more than 500 billion words in total. About 72 percent of its text is in English, with smaller amounts in French, Spanish, German, Chinese, Russian, and Hebrew.

It is the largest data release in the history of the humanities, the authors note, a sequence of letters 1,000 times longer than the human genome. If written in a straight line, it would reach to the moon and back 10 times over.

"Now that a significant fraction of the world's books have been digitized, it's possible for computer-aided analysis to reveal undiscovered trends in history, culture, language, and thought," says Jon Orwant, engineering manager for Google Books.

The paper describes the development of this new approach and surveys a vast range of applications, focusing on the past two centuries. The team's findings include:

Provided by Harvard University

Load comments (0)