America archives its billions of tweets

Jan 22, 2013 by Anne Renaut
Photo taken on October 8, 2012 shows the main reading room at the Library of Congress in Washington, DC. The Library of Congress, repository of the world's largest collection of books, has set for itself the enormous task of archiving Americans' billions of tweets.

The Library of Congress, repository of the world's largest collection of books, has set for itself the enormous task of archiving something less weighty and far more ephemeral—Americans' billions of tweets.

The venerable US institution is assembling all of the 400 million tweets sent by Americans each day, in the belief that each of the mini-messages reflect a small but important part of the national narrative.

"An element of our mission at the Library of Congress is to collect the story of America, and to acquire collections that will have research value," according to Gayle Osterberg, director of communications at the library.

The Library of Congress, located off the National Mall in Washington, houses millions of hard copy books and historic documents, and its online archives amass millions of additional works produced by Americans for more than two centuries.

Now it wants to be keeper of the nation's brief Internet messages as well: in April 2010 inked a deal with the Library, giving it access to tweets dating back to the company's inception in 2006.

Collecting the 140-character micro-missives, said Osterberg, is in keeping with the library's main goal "to collect the story of America and to acquire collections that will have research value."

One major challenge to the Library, however, is storing the messages from the popular social messaging site, which now number 170 billion. Twitter last month said the number of active users on the messaging platform has topped 200 million, most of whom are in the United States.

Tweets that have been deleted or that are locked will not be among those gathered by the Library of Congress.

The Library of Congress is assembling all of the 400 million tweets sent by Americans each day, in the belief that each of the mini-messages reflect a small but important part of the national narrative.

Among the messages to be preserved for posterity are the first-ever tweets sent by one of the company's founders, Jack Dorsey.

Also saved for all time is a famous tweet sent by President after his historic November 2008 victory to claim the White House in his first term.

"We just made history. All of this happened because you gave your time, talent and passion. All of this happened because of you. Thanks," read the micro-message from the famously tech-savvy US president.

Unlike traditional bound books or even digital web pages, the real challenge of preserving tweets is keeping up with their number, which has continued to grow almost exponentially.

There were 140 million tweets sent each day in February 2011, but more than three times as many—about a half billion—by October 2012.

The Library of Congress's tweets are being stored by Gnip, Inc., a social media aggregation company headquartered in Boulder, Colorado, which has put more than 133,000 gigabytes of storage space available.

Gnip says it is a particular challenge to gather tweets during "peak" times, such as news event watched the world over like the Japanese tsunami in March 2011, which generated many thousand tweets per second.

It has proven to be a Herculean challenge for Gnip to make tweets accessible to all those who wish to view them.

So far it has been unable to meet the demands of researchers worldwide who hope to access the archive. Even a search among the first four years of , from 2006 to 2010, could take about 24 hours.

"It is clear that technology to allow for scholarship access to large data sets is lagging behind technology for creating and distributing such data," said a recent white paper published by the .

"This is an inadequate situation," the Library concluded, calling the massive archiving project "prohibitively costly."

And yet Lee Humphreys, a professor of communication at Cornell University in New York, said that the brief online messages can reveal volumes "about the culture where they were produced."

Explore further: LinkedIn membership hits 300 million

add to favorites email to friend print save as pdf

Related Stories

Twitter clocks half-billion users: monitor

Jul 30, 2012

Over 500 million people are on micro-blogging site Twitter and Americans and Brazilians are the most connected, according to a study by social media monitor Semiocast released Monday.

Twitter lets advertisers better target tweets

Jul 19, 2012

Twitter on Thursday began letting businesses more easily turn tweets into advertising that targets users of the globally popular one-to-many text messaging service.

Beyonce pregnancy sets Twitter record

Aug 29, 2011

Twitter users fired off a record number of tweets per second following the announcement by pop diva Beyonce at the MTV Video Music Awards that she is expecting a baby.

Recommended for you

LinkedIn membership hits 300 million

15 hours ago

The career-focused social network LinkedIn announced Friday it has 300 million members, with more than half the total outside the United States.

Researchers uncover likely creator of Bitcoin

22 hours ago

The primary author of the celebrated Bitcoin paper, and therefore probable creator of Bitcoin, is most likely Nick Szabo, a blogger and former George Washington University law professor, according to students ...

White House updating online privacy policy

Apr 18, 2014

A new Obama administration privacy policy out Friday explains how the government will gather the user data of online visitors to WhiteHouse.gov, mobile apps and social media sites. It also clarifies that ...

User comments : 2

Adjust slider to filter visible comments by rank

Display comments: newest first

jonnyboy
3.4 / 5 (5) Jan 22, 2013
another huge waste of money.
dan42day
1 / 5 (1) Jan 23, 2013
more than 133,000 gigabytes of storage space available.


Wow 133 terabytes! That's almost 45 Toshiba 3TB disk drives worth about $6000 altogether!

More news stories

Health care site flagged in Heartbleed review

People with accounts on the enrollment website for President Barack Obama's signature health care law are being told to change their passwords following an administration-wide review of the government's vulnerability to the ...

Airbnb rental site raises $450 mn

Online lodging listings website Airbnb inked a $450 million funding deal with investors led by TPG, a source close to the matter said Friday.

Researchers uncover likely creator of Bitcoin

The primary author of the celebrated Bitcoin paper, and therefore probable creator of Bitcoin, is most likely Nick Szabo, a blogger and former George Washington University law professor, according to students ...