'Digital dark age' may doom some data

Jerome P. McDonough
Jerome P. McDonough says an unintended consequence of our rapidly digitizing world is the potential of a "digital dark age." Photo by L. Brian Stauffer, U. of I. News Bureau

What stands a better chance of surviving 50 years from now, a framed photograph or a 10-megabyte digital photo file on your computer's hard drive?

The framed photograph will inevitably fade and yellow over time, but the digital photo file may be unreadable to future computers – an unintended consequence of our rapidly digitizing world that may ultimately lead to a "digital dark age," says Jerome P. McDonough, assistant professor in the Graduate School of Library and Information Science at the University of Illinois at Urbana-Champaign.

According to McDonough, the issue of a looming digital dark age originates from the mass of data spawned by our ever-growing information economy – at last count, 369 exabytes worth of data, including electronic records, tax files, e-mail, music and photos, for starters. (An exabyte is 1 quintillion bytes; a quintillion is the number 1 followed by 18 zeroes.)

The concern for archivists and information scientists like McDonough is that, with ever-shifting platforms and file formats, much of the data we produce today could eventually fall into a black hole of inaccessibility.

"If we can't keep today's information alive for future generations," McDonough said, "we will lose a lot of our culture."

Contrary to popular belief, electronic data has proven to be much more ephemeral than books, journals or pieces of plastic art. After all, when was the last time you opened a WordPerfect file or tried to read an 8-inch floppy disk?

"Even over the course of 10 years, you can have a rapid enough evolution in the ways people store digital information and the programs they use to access it that file formats can fall out of date," McDonough said.

Magnetic tape, which stores most of the world's computer backups, can degrade within a decade. According to the National Archives Web site by the mid-1970s, only two machines could read the data from the 1960 U.S. Census: One was in Japan, the other in the Smithsonian Institution. Some of the data collected from NASA's 1976 Viking landing on Mars is unreadable and lost forever.

From a cultural perspective, McDonough said there's a "huge amount" of content that's only being developed or is available in a digital-only format.

"E-mail is a classic example of that," he said. "It runs both the modern business world and government. If that information is lost, you've lost the archive of what has actually happened in the modern world. We've seen a couple of examples of this so far."

McDonough cited the missing White House e-mail archive from the run-up to the Iraq War, a violation of the Presidential Records Act.

"With the current state of the technology, data is vulnerable to both accidental and deliberate erasure," he said. "What we would like to see is an environment where we can make sure that data does not die due to accidents, malicious intent or even benign neglect."

McDonough also cited Barack Obama's political advertising inside the latest editions of the popular videogames "Burnout Paradise" and "NBA Live" as an example of something that ought to be preserved for future generations but could possibly be lost because of the proprietary nature of videogames and videogame platforms.

"It's not a matter of just preserving the game itself. There are whole parts of popular and political culture that we won't be able to preserve if we can't preserve what's going on inside the gaming world."

McDonough believes there would also be an economic effect to the loss of data from a digital dark age.

"We would essentially be burning money because we would lose the huge economic investment libraries and archives have made digitizing materials to make them accessible," he said. "Governments are likewise investing huge sums to make documents available to the public in electronic form."

To avoid a digital dark age, McDonough says that we need to figure out the best way to keep valuable data alive and accessible by using a multi-prong approach of migrating data to new formats, devising methods of getting old software to work on existing platforms, using open-source file formats and software, and creating data that's "media-independent."

"Reliance on open standards is certainly a huge part, but it's not the only part," he said. "If we want information to survive, we really need to avoid formats that depend on a particular media type. Commercial DVDs that employ protection schemes make it impossible for libraries to legally transfer the content to new media. When the old media dies, the information dies with it."

Enthusiasm for switching from proprietary software such as Microsoft's Office suite to open-source software such as OpenOffice has only recently begun to gather momentum outside of information technology circles.

"Software companies have seen the benefits of locking people into a platform and have been very resistant to change," McDonough said. "Now we are actually starting to see some market mandates in the open direction."

McDonough cites Brazil, the Netherlands and Norway as examples of countries that have mandated the use of non-proprietary file formats for government business.

"There has been quite a movement, particularly among governments, to say: 'We're not going to buy software that uses proprietary file formats exclusively. You're going to have to provide an open format so we can escape from the platform,' " he said. "With that market demand, you really did see some more pressure on vendors to move to something open."

Source: University of Illinois at Urbana-Champaign

Explore further

The hidden gems of data accessibility statements

Citation: 'Digital dark age' may doom some data (2008, October 27) retrieved 18 October 2019 from https://phys.org/news/2008-10-digital-dark-age-doom.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Feedback to editors

User comments

Oct 27, 2008
Almost all books & papers are printed
on acid-based paper which will fall
apart in 50 years. Films-same problem.
CDs-ditto. Nothing lasts forever.

Oct 27, 2008
Come on DNA-based storage! Somebody seriously needs to get a read/write method in practice with that stuff. Wicked long half-life too!

Oct 28, 2008
how about solid state hard drives? vinyl records?

Oct 28, 2008
And thus continues the saga of man realizing his mortality, and his struggle to preserve whatever little bits of knowledge he attains.

The only way you can store information throughout generations is to pass it along verbally from one generation to the next. The human brain is the one medium which will NOT change.

Finally, and here's a sobering fact...the only written history which has survived for thousands and thousands of years was transcribed by cavemen!

Oct 28, 2008
The only way to move data correctly is to digitize it and move it from medium to medium as the technology advances.

369 exabytes... what nonsense. How many of that is the 2 billion copies of Windows that every computer comes installed with? I'll bet if you look at original data, and delete all the copies you'd likely bring that down quite considerably.

All you'd need is a million people with a Terabyte drive and you'd store all that info copied twice over again!

I worked for a place that had data on 3 1/2" magnetic tapes. We hired a highschool student to read all the tapes and copy the data onto a hard drive. After 8 months of work, he finished and all the data barely filled a CD-ROM. In a few years it will be copied to a blue ray, with a lot of other disks, and will probably never have been looked at.

Oct 28, 2008
There's still a substantial amount of today's data that has yet to be digitized. We've made lots of progress in digitizing our intellectual data, but there's vasts amounts of cultural data that's still only available through more traditional means.

Even if we managed to keep all human data electronically, we still have to maintain it in a way that allows us to take it with us through time. Not only that, we have to store it in a way that is invulnerable to disasters, both man-made or natural. Much of the data from the past was lost because the storage media was either frail, or susceptible to disasters or war. And as advanced as we perceive ourselves to be, our information is still vulnerable to these factors. If, heaven forbid, a catastrophic new disease were to begin spreading tomorrow morning that quickly takes out most of mankind in a global pandemic, most of our data will eventually disappear. In the ensuing panic and chaos, the survivors will neglect to secure our history and knowledge for the sake of survival. Most of our digitized electronic data would eventually corrode due to neglect and many of our books of knowledge and technology would become useless because there would be few left who could understand and use any of it or even to secure it. Society's most knowledgeable people tend to be older ones, and these are among the less resilient to disease. Doctors and medical experts would be at the front line of such a disaster and most would become casualties early on. In the end, what we would likely have is a smaller, younger,less knowledgeable population more concerned about individual survival than preserving the data needed to continue to advance our civilization. And there would be few left who are capable of teaching any of it to anyone.

I know it sounds bleak, but i think its important that we realize just how vulnerable our way of life is to catastrophe as long as we remain incapable of securing our knowledge data against absolutely anything that can cause the human race to stumble and fall. As long as the burden of advancement of humankind remains with a small minority of highly educated people across the globe, the human race will be absolutely vulnerable to massive data losses in the face of a major global disaster unless we develop some real protection against this possibility.

Oct 28, 2008
Does this mean that the 30 second video clip of our puppy chasing her tail that I put on Youtube might not be available in the 31st century? Don't take any chances! Go to Youtube and look up "Maisy chasing her tail" before it is too late.

Oct 28, 2008
Then there's the data that's DESIGNED to be unreadable. Future historians are, from the point of view of DRM advocates, just more people trying to illegally copy proprietary data.

Oct 29, 2008
Finally, and here's a sobering fact...the only written history which has survived for thousands and thousands of years was transcribed by cavemen!

Not exactly. There is the Bible, there is egyptian writings in their various temples, cities, and pyramids. There is various clay tablets and scrolls from the ancient world.

People always use the word "Caveman" in a derogatory way.

Well, lets take your typical "caveman" from 5 thousand years ago, and set him down in the Amazon rain forest...

Then lets take any one of you people reading this thread, and do the same to you.

See who survives...

I'll put one dollar on the caveman.

Oct 29, 2008
This is exactly why I'm printing the Internet.

On stone tablets.

Oct 30, 2008
Five thousand years is still a relatively short time ago, I really meant tens or even hundreds of thousands of years.
The drawings on cave-walls are still viewable to this day, and depict daily life for those who created them...unlike the Bible, whose content has been skewed by the different religions "du jour". And that is really the point, that we need a way to store information in a way that cannot be modified, but which can be read by anybody with eyes.
Stone tablets seem to fit that bill, for the time being anyways :)

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more