Stanford helps to digitally preserve mountains of documents

Jun 15, 2010 By Cynthia Haven

Each year, the U.S. Government Printing Office publishes mountains of paper documents, everything from the Congressional Record to Government Accountability Office reports. But that's only a fraction of its output nowadays. More and more of its content is online-only, and that's a problem.

Cyberspace records are vulnerable to computer crashes; they're also vulnerable to tampering. This week Stanford has joined the effort to protect government documents electronically through LOCKSS (Lots of Copies Keep Stuff Safe), a Stanford-based international consortium of more than 200 university and college libraries that collects and preserves electronic content.

In the latest development, on Monday, June 14, the U.S. Government Printing Office (GPO) announced that it has entered the LOCKSS alliance.

The GPO is the federal government's resource for gathering, cataloging, producing, providing, authenticating and preserving published U.S government information.

For the last 15 years, the GPO has been trying to centralize all its information online - culminating in last year's launch of the Federal Digital System ( to give the public a one-stop site to authentic, published government information.

But what happens if the government servers or databases crash? What if some Manchurian candidate or renegade government agency changes U.S. documents for its own nefarious (or even ostensibly benign) purposes? How does one save democracy from havoc-making hackers?

In the past, we relied on paper: 1,250 libraries nationwide participating in the Federal Depository Library Program. Congress created the program in 1813 to ensure that the American public had access to its government's information. As the decades passed, however, the amount of printed information kept growing. Libraries welcomed the chance to minimize bulky hard copies as the world went digital.

Potential limitations on transparency

When documents go digital, however, "it could severely limit transparency," said James Jacobs, government documents librarian for Stanford University Libraries, who is leading the LOCKSS-USDOCS Project.

"The more you centralize digital content, the easier it is to change things without anybody knowing. LOCKSS is a safety net. The simplicity and beauty of LOCKSS is that there are lots of libraries which preserve that content," he said.

"It's a transparency issue - libraries provide an added level of trust in access to government information."

It's not necessary to be imaginative to envision an administration tampering with data. According to Jacobs, "It has happened before. From the mid-1980s through the late 1990s, the American Library Association published an annual review of instances where the government didn't want citizens to know about something and consciously obscured the record. It happens more than we would like."

Less Access to Less Information By and About the U.S. Government is online at On the site, Jacobs praises the work as an "amazing series … a chronology of efforts to restrict and privatize government information."

LOCKSS will prevent such "editing."

"Here's what LOCKSS does: If something happens with the GPO, if a server or database crashes, people can get the information from the library," said Jacobs. Does that mean that anybody can go into a LOCKSS participating library and access the LOCKSS government information? "Yes and no," said Jacobs. "In all practicality no, but in theory yes.

"It's a complete preservation archive - the content only gets made accessible if the live content goes away."

How LOCKSS works

In other words, if a person goes into the library and looks for a copy of a particular congressional hearing, the reader can look for a hard copy or digital copy first. If none is available, the library would release the LOCKSS version from its preservation archive and put it on a public site.

"We've never really had to test it at LOCKSS - but yes, you could definitely get it, in an hour or so," said Jacobs.

Eighteen participating LOCKSS libraries in the United States have signed up for the GPO program, with one Canadian library also interested.

"We've just started harvesting content from," said Jacobs. Eventually, as LOCKSS catches up, it will move to a routine where "every time a new document is published on GPO, we get alerted and automatically harvest it."

"It's a strange world we live in," Jacobs admitted. The world of cyberspace, websites and FDsys "is like a cloud. You can see it there, but it's amorphous, and you can't touch it. We're hoping to preserve the cloud - the dot-gov cloud."

It may be less enduring than commonly assumed: "The proselytizers of the Internet have done a very good job assuring everyone that the Internet means it's around forever," he said. Nevertheless, "There's no real consensus about how to preserve digital bits in the long term. The Internet is 25 years old, Google is 10 years old." The truth is, said Jacobs, that most websites disappear within a few months. A lot more become cyberspace corpses.

"LOCKSS explores the possibilities of long-term preservation," said Jacobs, including even a reconsideration of that humble medium, paper.

"Digital content is much more difficult to preserve. If I put a book on my shelf, it can stay there for 200 years," he said. "Paper takes a lot longer to disappear than digital bits."

For the time being, anyway, we live in a "hybrid era," said Jacobs. While digital publications provide quick online access, sometimes paper documents - well-indexed, easy to flip through, pointing readers to citations in other places - are simpler to use than digital records.

"Who wants to download and print a 500-page hearing?" he asked. Yet congressional hearings can run at least that long.

Clearly, long-term data preservation is more complicated and nuanced than it looks, if that's possible. Jacobs cheerfully admitted, "I'm completely in the weeds, and it's kind of fun."

"This kind of thing is not all that sexy - but it's really important," he said. "Librarians have been doing it for a long time and we want to continue doing it."

Thanks to the new agreement, Stanford University will be doing it for some time to come.

Explore further: What if our children are the screen-obsessed couch potatoes of the future?

More information:

Related Stories

Gen Y logs on at the library

Jan 01, 2008

More Americans turn to the Internet for issues such as illnesses, finances, taxes and careers rather than look to other information sources, a survey found.

Government Web sites kept alive at Cyber Cemetery

Sep 14, 2009

(AP) -- It was a historian's nightmare. During the change from the Clinton to the Bush administration, Web sites affiliated with the Clinton White House went dark, and an unknown number of online documents ...

EU launches digital library at Frankfut Book Fair

Oct 18, 2009

The European Union used the world's biggest book fair to launch the EU Bookshop's digital library, making more than 50 years of documents in about 50 languages available for free on the Internet.

Humanity's earliest written works go online

Apr 21, 2009

(AP) -- National libraries and the U.N. education agency put some of humanity's earliest written works online Tuesday, from ancient Chinese oracle bones to the first European map of the New World.

Major universities see promise in Google Book Search settlement

Oct 28, 2008

( -- Stanford University, the University of California and University of Michigan announce today their joint support for the outstanding public benefits made possible through the proposed settlement agreement submitted to the United States District Court, S ...

Recommended for you

China blasts Google security move as 'unacceptable'

just added

A Chinese cyberspace bureau on Thursday denounced Google for deciding not to recognise the agency's authority after a Beijing-linked security breach, calling the US Internet giant's action "unacceptable and ...

Affordable 3D printer heating up on Kickstarter

55 minutes ago

When the printing site heard of a company planning to promote a printer affordable to many, "we had flashbacks from last year when a number of incredibly cheap machines launched crowdfunding campai ...

Japan's mobile app Line reviving IPO plans

3 hours ago

Line, the popular messaging app launched in the aftermath of Japan's earthquake and tsunami, is set for an initial public offering as early as this year, a report said Thursday, after shelving plans for a ...

Search, social & shopping: Pinterest turns 5

14 hours ago

In its five short years of life, Pinterest has become 'the' place where brides-to-be create wish boards of wedding china photos and do-it-yourself home renovators bookmark shiny turquoise tiles for bathrooms. ...

User comments : 0

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.