DNA used to encode a book and other digital information

Aug 17, 2012 by Lin Edwards report
DNA

(Phys.org) -- A team of researchers in the US has successfully encoded a 5.27 megabit book using DNA microchips, and they then read the book using DNA sequencing. Their experiments show that DNA could be used for long-term storage of digital information.

George Church and Sriram Kosuri of Harvard’s Wyss Institute for Biologically Inspired Engineering, and colleagues, encoded Church’s book “Regenesis” of around 53,400 words into , along with 11 images in JPG format and a JavaScript program. This is 1,000 times more data than has been encoded in previously.

DNA is made up of nucleotides, and in theory at least each nucleotide can be used to encode two bits of data. This means that the density is a massive 1 million gigabits per cubic millimeter, and only four grams of DNA could theoretically store all the digital data created annually. This is much denser than digital storage media such as flash drives, and more stable, since the DNA sequences could be read thousands of years after they were encoded.

The experiment’s success lay in the strategy of encoding the data in short sequences of DNA rather than long ones, and this reduced the difficulty and cost of writing and reading the data. Dr Kosuri said the process was analogous to storing data on a hard drive, where data is written in small blocks called sectors.

They first converted the book, program and images to HTML and then translated this into a sequence of 5.27 million 0s and 1s, and these 5.27 megabits were then sequenced into sections of nucleotides 96 bits long using one DNA nucleotide for one bit. The nucleotide bases A and C encoded for 0, while G and T encoded for 1. Each block also contained a 19 bit address to encode the block’s place in the overall sequence. Multiple copies of each block were synthesized to help in error correction.

After the book and other information was encoded into the DNA, drops of DNA were attached to microarray chips for storage. The chips were kept at 4°C for three months and then dissolved and sequenced. Each copy of each block of nucleotides was sequenced up to 3,000 times so that a consensus could be reached. In this way they reduced the bit errors in the 5.27 megabits to just 10.

The procedure, described in a paper in the journal Science, cannot be used for rewritable data but could be used for very long-term storage of data. One advantage of using DNA is that a much greater density of information can be stored, but another major advantage is that DNA is a biological molecule that will always be able to be read biologically without special equipment such as CD or DVD players that can quickly become obsolete.

The main disadvantage of this system is that at the moment the technologies used to synthesize and sequence DNA are far too expensive for it to be a practical system for everyday use. Another problem is that while DNA has been sequenced from sources such as mummies thousands of years old, the DNA tends to be fragmented, and work needs to be done on improving the stability of DNA over centuries and longer.

Explore further: The origin of the language of life

More information: Next-Generation Digital Information Storage in DNA, Science, DOI: 10.1126/science.1226355

ABSTRACT
Digital information is accumulating at an astounding rate, straining our ability to store and archive it. DNA is among the most dense and stable information media known. The development of new technologies in both DNA synthesis and sequencing make DNA an increasingly feasible digital storage medium. Here, we develop a strategy to encode arbitrary digital information in DNA, write a 5.27-megabit book using DNA microchips, and read the book using next-generation DNA sequencing.

Related Stories

Researchers Discover New Way to Store Information Via DNA

Feb 20, 2008

Researchers at UC Riverside have found a way to get into your body and your bloodstream. No, they’re not spiritual gurus or B-movie mad scientists. Nathaniel G. Portney, Yonghui Wu, Stefano Lonardi, and Mihri Ozkan from ...

DNA falls apart when you pull it

May 20, 2011

DNA falls apart when you pull it with a tiny force: the two strands that constitute a DNA molecule disconnect. Peter Gross of VU University Amsterdam has shown this in his PhD research project. With this research, ...

New type of extra-chromosomal DNA discovered

Mar 09, 2012

(PhysOrg.com) -- A team of scientists from the University of Virginia and University of North Carolina in the US have discovered a previously unidentified type of small circular DNA molecule occurring outside ...

Recommended for you

The origin of the language of life

Dec 19, 2014

The genetic code is the universal language of life. It describes how information is encoded in the genetic material and is the same for all organisms from simple bacteria to animals to humans. However, the ...

Quest to unravel mysteries of our gene network

Dec 18, 2014

There are roughly 27,000 genes in the human body, all but a relative few of them connected through an intricate and complex network that plays a dominant role in shaping our physiological structure and functions.

EU court clears stem cell patenting

Dec 18, 2014

A human egg used to produce stem cells but unable to develop into a viable embryo can be patented, the European Court of Justice ruled on Thursday.

User comments : 14

Adjust slider to filter visible comments by rank

Display comments: newest first

Lurker2358
1.1 / 5 (7) Aug 17, 2012
Their experiments show that DNA could be used for long-term storage of digital information.


BS.

Where are the evolutionists pointing out "mutation" issues?

One cosmic ray particle, EVER, and your data is destroyed...

10 bit errors in 5.27 megabits is actually pretty bad for just 3 months.

A bit error in the right place in a markup language or a programming language could be impossible to recover. A bit error in a photograph or 3-d model is impossible to recover without another copy.

Now you could do double or triple redundancy to make it easier to recover some bit errors, but at this rate you will have 8 bit errors per megabyte per year, so after an alleged century you're looking at 800 bit errors per megabyte.

I figure you need at least 3 copies to ensure it's fully recoverable after one century, and probably 2 or 3 extra copies per extra century thereafter.

Now text could be recoverable even with bit errors, but language may be different in the future...
Lurker2358
1 / 5 (6) Aug 17, 2012
One obvious thing you would want to do is create a "World Rosetta Stone," which does the following:

1, Contains a "Language A to Language B" dictionary for every language pair, or at least for every major language pair. At least catch the top 2 or 3 dialects for each major language.

2, Contains a dictionary and thesaurus for each language.

3, Contains example paragraphs for conversion of common usage and "figures of speech," which can be very hard to understand centuries or millennia after the fact.

4, Preserves what we know of each old or ancient version of each language as well as what we know of their translations.

All of these things exist today, but I don't think they exist in single archive or translator tool, though improvements have been made in recent years with tools like google translate.
DarkHorse66
4.2 / 5 (6) Aug 17, 2012
@Lurker:
"Where are the evolutionists pointing out "mutation" issues?" This wasn't designed to reside in a living being. The original template (once it has been finalised) would be kept, and can be stored in multiple ways. Besides, I'm sure we will come up with a way to 'fix' the DNA set in place.
"One cosmic ray particle, EVER, and your data is destroyed..."
One power surge & your hard drive can be destroyed. Sectors die all the time even without that. Try storing data for 100yrs on them. How friendly would plasma be to electronics?
"Each copy of each block of nucleotides was sequenced up to 3,000 times so that a consensus could be reached." Effectively, these are copies. Apart from that: this is only an initial experiment in a lab, to see if it can even be done!
"Now text could be recoverable even with bit errors, but language may be different in the future..."Today's language is already different to that of 100yrs ago.But we have ongoing records of its evolution.Best Regards, DH66
Deathclock
2.3 / 5 (3) Aug 17, 2012
I could encode a book with rocks on the beach or salt granules on the table at Denny's... Granted DNA is more difficult to work with but the principle is nothing new.
Guy_Underbridge
4.2 / 5 (5) Aug 17, 2012
This means that the density is a massive 1 million gigabits per cubic millimeter.


That's a LOT of porn...
Deathclock
1 / 5 (1) Aug 17, 2012
1 million gigabits = 12,500 terabytes.... per millimeter^3
antialias_physorg
5 / 5 (1) Aug 17, 2012
They first converted the book, program and images to HTML

How the hell do you convert a program (or an image) to HTML?
Or did they simply use a tagged format? (Not all tagged formats are HTML)

One cosmic ray particle, EVER, and your data is destroyed...

There's ways to guard against that. If the letters in the 'alphabet' you use have a hamming distance of 3 you can recreate a letter if a single bit in it have been flipped. (for a hamming distance of 5 you can recreate a letter if 2 bits have been flipped, etc. )

There are different encoding schemes ithat can guard against 'block' noise (i.e. if you expect bits in sequence will be detroyed - like if you transfer digital data during a thunderstorm). But with cosmic rays we're more likely talking stochastic noise to which the first scheme is appropriate.
antialias_physorg
5 / 5 (2) Aug 17, 2012
A bit error in the right place in a markup language or a programming language could be impossible to recover.

Errors in markup languages are actually pretty easy to fix (as opposed to full binary data). Even if the tag is destroyed you can still save all the data in the other tags (unless the relational positioning of the tags is important - which isn't the case here). If it's only partially destroyed you can often infer what tag it was.

Storing on DNA (in live cells) could have also the benefit that you could use common cellular mechanisms to continually repair DNA.
Deathclock
1 / 5 (1) Aug 17, 2012
At 12 petabytes per cubic millimeter you could store 10 (or 100 or 1000) copies of this data and read each copy simultaneously and assume the correct information is the one with the highest degree of concurrence and then simply correct the corrupted copies.
c0y0te
not rated yet Aug 18, 2012
Reminded me of an SF story by Chris Lawson called "Written in Blood".
86daily
1 / 5 (2) Aug 18, 2012
Here's a provoking though. If all DNA is so specific to each creatures and they can vibrate at very specific frequencies to the point of termination. Why don't we find out at what frequency the AIDS virus vibrates at and put that in a form of a frequency explosion and take the AIDS virus off this earth. Oh, could that work for let's say all white males. Or, all colds viruses, cockroaches, GMO Corn Plants, or what ever else you can think of. Why don't you help me on this. Can you think of anything around our life that needs elimination. CABAL's?
86daily
1 / 5 (2) Aug 18, 2012
Oh! I forgot to tell you. That's already invented. I can't take the credit for that ideas. Shucks I could of won the Nobel prize for Peace. Hey, can you keep it a secret that this is already invented. I'll patent it and then collect my prize after using on the white male banksters. The Peace Prize of Course.
Osiris1
2.3 / 5 (3) Aug 20, 2012
Hey 86daily, here is a thought. Once worked for an engineering firm with a large reliability lab that included a huge shake table. Our resident vibration tech, a favorite of ALL the women in the plant for some reason, had come up with some odd pearls of wisdom for we college students at the time. Seems that most objects have resonant modes for not only maybe for the entire object, but also for some of its parts. Humans too! Among other things, he found that the resonant frequency of the human rectum is about nineteen cycles per second. You may call that hertz or hurts if you may. He hooked up the table, pointed it toward the reinforced concrete wall, and demonstrated this with a wood 4x4 connecting machine and the wall. And turned it on. In minutes, lines formed outside every restroom within 1000 feet. We wondered then seriously about his popularity with the ladies so asked him if he found any ..other...resonant body parts. He clammed up, and no woman would tell us, grinnin
antialias_physorg
5 / 5 (2) Aug 20, 2012
Here's a provoking though. If all DNA is so specific to each creatures and they can vibrate at very specific frequencies to the point of termination. Why don't we find out at what frequency the AIDS virus ...

1) What are you smoking? Since when does DNA vibrate at a certain frequency? DNA is all curled up in a bunch. It's not stretched out in a string that will vibrate.

2) Even if it DID vibrate. The whole reason the HI-virus is so hard to kill is because when it repliccates it makes a lot of errors (i.e. it mutates very quickly). The DNA of no two HI-viruses is identical.

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.