Researchers store computer operating system and short movie on DNA

March 2, 2017
In a study in Science, researchers Yaniv Erlich and Dina Zielinski describe a new coding technique for maximizing the data-storage capacity of DNA molecules. Credit: New York Genome Center

Humanity may soon generate more data than hard drives or magnetic tape can handle, a problem that has scientists turning to nature's age-old solution for information-storage—DNA.

In a new study in Science, a pair of researchers at Columbia University and the New York Genome Center (NYGC) show that an algorithm designed for streaming video on a cellphone can unlock DNA's nearly full storage potential by squeezing more information into its four base nucleotides. They demonstrate that this technology is also extremely reliable.

DNA is an ideal storage medium because it's ultra-compact and can last hundreds of thousands of years if kept in a cool, dry place, as demonstrated by the recent recovery of DNA from the bones of a 430,000-year-old human ancestor found in a cave in Spain.

"DNA won't degrade over time like cassette tapes and CDs, and it won't become obsolete—if it does, we have bigger problems," said study coauthor Yaniv Erlich, a computer science professor at Columbia Engineering, a member of Columbia's Data Science Institute, and a core member of the NYGC.

Erlich and his colleague Dina Zielinski, an associate scientist at NYGC, chose six files to encode, or write, into DNA: a full computer operating system, an 1895 French film, "Arrival of a train at La Ciotat," a $50 Amazon gift card, a computer virus, a Pioneer plaque and a 1948 study by information theorist Claude Shannon.

They compressed the files into a master file, and then split the data into short strings of binary code made up of ones and zeros. Using an erasure-correcting algorithm called fountain codes, they randomly packaged the strings into so-called droplets, and mapped the ones and zeros in each droplet to the four nucleotide bases in DNA: A, G, C and T. The algorithm deleted letter combinations known to create errors, and added a barcode to each droplet to help reassemble the files later.

In all, they generated a digital list of 72,000 DNA strands, each 200 bases long, and sent it in a text file to a San Francisco DNA-synthesis startup, Twist Bioscience, that specializes in turning digital data into biological data. Two weeks later, they received a vial holding a speck of DNA molecules.

The video will load shortly
Columbia University and the New York Genome Center (NYGC) show that an algorithm designed for streaming video on a cellphone can unlock DNA's nearly full storage potential by squeezing more information into its four base nucleotides. They demonstrate that this technology is also extremely reliable. Credit: Columbia University

To retrieve their files, they used modern sequencing technology to read the DNA strands, followed by software to translate the genetic code back into binary. They recovered their files with zero errors, the study reports. (In this short demo, Erlich opens his archived operating system on a virtual machine and plays a game of Minesweeper to celebrate.)

They also demonstrated that a virtually unlimited number of copies of the files could be created with their coding technique by multiplying their DNA sample through polymerase chain reaction (PCR), and that those copies, and even copies of their copies, and so on, could be recovered error-free.

Finally, the researchers show that their coding strategy packs 215 petabytes of data on a single gram of DNA—100 times more than methods published by pioneering researchers George Church at Harvard, and Nick Goldman and Ewan Birney at the European Bioinformatics Institute. "We believe this is the highest-density data-storage device ever created," said Erlich.

The capacity of DNA data-storage is theoretically limited to two binary digits for each nucleotide, but the biological constraints of DNA itself and the need to include redundant information to reassemble and read the fragments later reduces its capacity to 1.8 binary digits per nucleotide base.

The team's insight was to apply fountain codes, a technique Erlich remembered from graduate school, to make the reading and writing process more efficient. With their DNA Fountain technique, Erlich and Zielinski pack an average of 1.6 bits into each base nucleotide. That's at least 60 percent more data than previously published methods, and close to the 1.8-bit limit.

Cost still remains a barrier. The researchers spent $7,000 to synthesize the DNA they used to archive their 2 megabytes of data, and another $2,000 to read it. Though the price of DNA sequencing has fallen exponentially, there may not be the same demand for DNA synthesis, says Sri Kosuri, a biochemistry professor at UCLA who was not involved in the study. "Investors may not be willing to risk tons of money to bring costs down," he said.

But the price of DNA synthesis can be vastly reduced if lower-quality molecules are produced, and coding strategies like DNA Fountain are used to fix molecular errors, says Erlich. "We can do more of the heavy lifting on the computer to take the burden off time-intensive molecular coding," he said.

Explore further: Researchers break record for DNA data storage

More information: "DNA Fountain enables a robust and efficient storage architecture," Science, science.sciencemag.org/cgi/doi/10.1126/science.aaj2038

Related Stories

Researchers break record for DNA data storage

July 8, 2016

University of Washington and Microsoft researchers have broken what they believe is the world record for the amount of digital data successfully stored—and retrieved—in DNA molecules.

Researchers make DNA storage a reality

January 23, 2013

Researchers at the EMBL-European Bioinformatics Institute (EMBL-EBI) have created a way to store data in the form of DNA – a material that lasts for tens of thousands of years. The new method, published today in the journal ...

DNA used to encode a book and other digital information

August 17, 2012

(Phys.org) -- A team of researchers in the US has successfully encoded a 5.27 megabit book using DNA microchips, and they then read the book using DNA sequencing. Their experiments show that DNA could be used for long-term ...

Recommended for you

Cloning thousands of genes for massive protein libraries

June 26, 2017

Discovering the function of a gene requires cloning a DNA sequence and expressing it. Until now, this was performed on a one-gene-at-a-time basis, causing a bottleneck. Scientists at Rutgers University-New Brunswick in collaboration ...

Discovery of a new mechanism for bacterial division

June 26, 2017

Most rod-shaped bacteria divide by splitting into two around the middle after their DNA has replicated safely and segregated to opposite ends of the cell. This seemingly simple process actually demands tight and precise coordination, ...

Previously unknown extinction of marine megafauna discovered

June 26, 2017

Over two million years ago, a third of the largest marine animals like sharks, whales, sea birds and sea turtles disappeared. This previously unknown extinction event not only had a consid-erable impact on the earth's historical ...

Lending plants a hand to survive drought

June 26, 2017

The findings have helped some plants survive 50 percent longer in drought conditions, and could eventually benefit major crops such as barley, rice and wheat, which are crucial to world food supplies.

10 comments

Adjust slider to filter visible comments by rank

Display comments: newest first

gkam
1.8 / 5 (5) Mar 05, 2017
Sony is working on their Gamma Max DNA reader.
Uncle Ira
3.4 / 5 (5) Mar 05, 2017
Sony is working on their Gamma Max DNA reader.


You got a source for that or is it something you learned while gaining "experience" as a "REAL" Bio-Medical Engineer?
gkam
1.8 / 5 (5) Mar 05, 2017
It was a joke, . . like you anonymous snipers who hide, too scared to take responsibility for your own words.
Captain Stumpy
3.7 / 5 (3) Mar 05, 2017
You got a source for that or is it something you learned while gaining "experience" as a "REAL" Bio-Medical Engineer?
@Ira
ROTFLMFAO

another engineering position for super jeenyious?
wow!

rather fascinating article though... http://science.sc.../tab-pdf

Erlich and Zielinski present a method, DNA Fountain, which approaches the theoretical maximum for information stored per nucleotide. They demonstrated efficient encoding of information—including a full computer operating system—into DNA that could be retrieved at scale after multiple rounds of polymerase chain reaction.
biological integration with an OS and machines in the future?

i wonder what the long term stability would be like
Uncle Ira
3.4 / 5 (5) Mar 05, 2017
It was a joke, . . like you anonymous snipers who hide, too scared to take responsibility for your own words.


How is anybody supposed to know that? It was just as goofy as all your other engineer stuffs.
Uncle Ira
3.4 / 5 (5) Mar 05, 2017
like you anonymous snipers who hide, too scared to take responsibility for your own words.


What have I ever said that I should be afraid of? You keep saying that but it doesn't mean anything. What does "take responsibility for you own words" mean? Cher, believe it or don't believe it, I don't say anything here that I don't say to everybody I see every day.

Skippy, I really am just the way I am. There is not a long line of couyons waiting to give me some "responsibility" for my words. There is not even one couyon trying to give "responsibility" for my words. The only person who ever says anything about "responsibility" for my words is you.

After 100 times asking I suppose I am wasting my time. What you can do to make me "responsible" for my words? Even if you knew my name, address, telephone number, birthday, with a four color map to my front gallery, what you could do to make me "responsible" for my words?

Nothing is what I am guessing.
gkam
1 / 5 (4) Mar 05, 2017
I do not expect you to know anything.

Look up the BetaMax.
Uncle Ira
3.4 / 5 (5) Mar 05, 2017
Look up the BetaMax.
What that have to do with me taking "responsibility for my words". You make jokes just as bad as you pretend to be the engineer.
Captain Stumpy
3.7 / 5 (3) Mar 05, 2017
@Ira
What does "take responsibility for you own words" mean?
funny thing about that...
check this out: https://phys.org/...lly.html

so, being "responsible" to him, by using his example, means:

1- it's ok to lie

2- you don't need to prove anything, just say "i'm real" or "i have experience" when asked for evidence

3- when presented with evidence proving you wrong, simply ignore it and keep repeating the lie (even if it is libel !!)

4- if you simply state "i did [insert claim]" then everyone should believe you

5- there is no need to link reputable evidence [or even relevant topical evidence] when you can link some random article that supports your beliefs or that talks about your own irrelevant past

.

let me know if i missed anything, @Ira
Captain Stumpy
3.7 / 5 (3) Mar 05, 2017
hey @Ira, did you notice this?
"DNA won't degrade over time like cassette tapes and CDs, and it won't become obsolete—if it does, we have bigger problems," said study coauthor Yaniv Erlich, a computer science professor at Columbia Engineering, a member of Columbia's Data Science Institute, and a core member of the NYGC
now, i know he likely means over a lifetime, or some other short time span... but i found that to be a mite funny, considering

i mean, we know DNA degrades even in the short term (ask mikey or liar-girl about that one, eh?) and that we can't actually get DNA from certain long dead stuff (like dinosaurs) ... so this is a mite misleading

this would be a great way to smuggle information for spies though
be they industrial or governmental (pun intended) !

Please sign in to add a comment. Registration is free, and takes less than a minute. Read more

Click here to reset your password.
Sign in to get notified via email when new comments are made.