A new coding technique could make it possible to condense your entire digital library onto a microscopic hard drive.
Except the hard drive won’t consist of metals and plastic. It will be made of DNA.
Scientists in New York have developed a way to compress digital files and squeeze the data into the four base nucleotides of DNA: A, G, C and T. They did so by adapting an algorithm designed for streaming videos on cell phones.
“We take storage almost [for] granted, and we accumulate a lot of information in our daily life,” said Yaniv Erlich, a computer science professor at Columbia University who co-authored a new study describing the technique.
“DNA is very compact,” he said in a video interview. “It’s one million times more compact than what you can get when you use a regular digital medium.”
Genetic code has other major advantages over hard drives and CDs: DNA can potentially last for millennia, if stored in the right conditions. (Just ask scientists trying to revive the long-extinct woolly mammoth.) And DNA won’t ever become outdated â assuming we don’t turn into cyborgs.
“If I try to hear Nirvana’s Nevermind, my favorite CD twenty years ago, it’s probably scratched and I cannot really listen to that,” Erlich said. “But we can still retrieve DNA from skeletons that are thousands of years old. DNA will never be obsolete.”Â
The research, published last week in the journal Science, is the latest in a growing number of studies to turn digital data into biological data.Â
Scientists are eyeing DNA because they already know it can store immense amounts of data in a minuscule space. Our genetic information â everything that makes us who we are â is stored within the microscopic chemical structure of DNA molecules. The idea with digital data is to take advantage of this dense, compact DNA structure and fill it with other types of information.
Researchers like George Church of Harvard University and Nick Goldman and Ewan Birney at the European Bioinformatics Institute have developed pioneering methods for converting ones and zeros into A’s, G’s, C’s and T’s.Â
Erlich and his co-author Dina Zielinksi, an associate scientist at the New York Genome Center, say they’ve found the best data-storing method yet. Their coding strategy could potentially pack 215 petabytes of data onto a single gram of DNA â about 100 times more than methods published by Church, Goldman and Birney.
So how does it work?
To start, Erlich and Zielinksi selected 2 megabytes’ worth of digital artifacts to write into the DNA, including a full computer operating system, an 1895 French film, a $50 Amazon gift card, a computer virus, and more.
They compressed those data files into one master file, then split the data into short strings of binary code made of ones and zeros. Next, they turned to an algorithm that â in extreme layman’s terms â breaks up files and sends them as smaller chunks, called “droplets,” to a storage device. (More formally, it’s an erasure-correcting algorithm called fountain codes.) Â
The Columbia team randomly packaged their strings of binary code into these droplets. They then mapped the ones and zeros into the four nucleotide bases in DNA. The algorithm deleted any letter combinations known to create errors. It also added a barcode to each droplet, to help with reassembling the files later.Â
By the end, the scientists had generated a digital list of 72,000 DNA strands, each with 200 nucleotide bases. The duo sent these via text file to Twist Bioscience, a DNA-synthesis startup in San Francisco that specializes in transforming digital files into biological data.Â
Two weeks later, Erlich and Zielinksi received a vial holding a speck of DNA molecules.Â
Now their task was to transform those droplets into the original digital files. To do this, they used a modern sequencing technique to read the DNA strands. A software program translated the genetic code â A, G, C, T â back into ones and zeros.
They recovered their files with zero errors, according to their Science study.
But don’t ditch your hard drive just yet.
This belabored, highly technical technique is also extremely expensive. Erlich and Zielinksi said they spent $7,000 to synthesize the DNA they used to archive their 2 megabytes of data, plus another $2,000 to read it.
The costs of this technique aren’t likely to fall precipitously anytime soon, given the limited demand for data synthesizing, Sri Kosuri, a biochemistry professor at the University of California Los Angeles, who was not involved in the study, said in a statement.