DNA DIGITAL DATA STORAGE refers to any process to store digital data
in the base sequence of
DNA . This technology uses artificial
using commercially available oligonucleotide synthesis machines for
DNA sequencing machines for retrieval. This type of
storage system is more compact than current magnetic tape or hard
drive storage systems due to the data density of the DNA. Currently it
was reported that in 1 gram of
DNA 215 petabytes (215 million
gigabytes ) could be stored. It also has the capability for
longevity, as long as the
DNA is held in cold, dry and dark
conditions, as is shown by the study of woolly mammoth
DNA from up to
60,000 years ago, and for resistance to obsolescence, as
DNA is a
universal and fundamental data storage mechanism in biology. These
features have led to researchers involved in their development to call
this method of data storage "apocalypse-proof" because "after a
hypothetical global disaster, future generations might eventually find
the stores and be able to read them." It is, however, a slow
process, as the
DNA needs to be sequenced in order to retrieve the
data, and so the method is intended for uses with a low access rate
such as long-term archival of large amounts of scientific data.
* 1 History
* 2 See also
* 3 References
* 4 Further reading
The idea and the general considerations about the possibility of
recording, storage and retrieval of information on
DNA molecules were
originally made by Mikhail Neiman and published in 1964–65 in the
Radiotekhnika journal, USSR, and the technology may therefore be
referred to as MNeimONics, while the storage device may be known as
MNeimON (Mikhail Neiman OligoNucleotides).
Among early examples of
DNA data storage, in 2007 a device was
created at the University of Arizona , using addressing molecules to
encode mismatch sites within a
DNA strand. These mismatches were then
able to be read out by performing a restriction digest, thereby
recovering the data. This system has a number of advantages over other
methods. Firstly, unlike other methods in which bespoke molecules are
synthesised for each new
DNA encoding, a common set of molecules could
be used to encode any arbitrary data.
DNA synthesis is currently
expensive, and laborious, so this means that this investment can be
used to encode many different sets of data, using the same set of DNA
molecules. The encoded
DNA created here is also "bio-compatible",
meaning that, in principle it can be readily inserted into, and
propagated within, an organism.
On August 16, 2012, the journal Science published research by George
Church and colleagues at
Harvard University , in which
DNA was encoded
with digital information that included an HTML draft of a 53,400 word
book written by the lead researcher, eleven JPG images and one
petabits can be stored in each cubic millimeter of DNA. The
researchers used a simple code where bits were mapped one-to-one with
bases, which had the shortcoming that it led to long runs of the same
base, the sequencing of which is error-prone. This research result
showed that besides its other functions,
DNA can also be another type
of storage medium such as hard drives and magnetic tapes.
An improved system was reported in the journal Nature in January
2013, in an article led by researchers from the European
Bioinformatics Institute (EBI) and submitted at around the same time
as the paper of Church and colleagues. Over five million bits of data,
appearing as a speck of dust to researchers, and consisting of text
files and audio files , were successfully stored and then perfectly
retrieved and reproduced. Encoded information consisted of all 154 of
Shakespeare's sonnets, a twenty-six-second audio clip of the "I Have a
Dream" speech by Martin Luther King, the well known paper on the
James Watson and
Francis Crick , a photograph of
EBI headquarters in
Hinxton , United Kingdom, and a file describing
the methods behind converting the data. All the
DNA files reproduced
the information between 99.99% and 100% accuracy. The main
innovations in this research were the use of an error-correcting
encoding scheme to ensure the extremely low data-loss rate, as well as
the idea of encoding the data in a series of overlapping short
oligonucleotides identifiable through a sequence-based indexing
scheme. Also, the sequences of the individual strands of DNA
overlapped in such a way that each region of data was repeated four
times to avoid errors. Two of these four strands were constructed
backwards, also with the goal of eliminating errors. The costs per
megabyte were estimated at $12,400 to encode data and $220 for
retrieval. However, it was noted that the exponential decrease in DNA
synthesis and sequencing costs, if it continues into the future,
should make the technology cost-effective for long-term data storage
within about ten years.
The long-term stability of data encoded in
DNA was reported in
February 2015, in an article by researches from
ETH Zurich . By adding
Reed–Solomon error correction coding and by
DNA within silica glass spheres via Sol-gel
chemistry, the researchers predict error-free information recovery
after up to 1 million years at -18 °C and 2000 years if stored at 10
°C. By adding the possibility of being able to handle errors, the
research team could reduce the cost of
DNA synthesis down to ~$500/MB
by choosing a more error-prone
DNA synthesis method. In a news article
New Scientist the team stated that if they are able to further
decrease the cost they would store an archive version of in
Also, a group of researchers, led by Boise State University is
working toward a better way to store digital information using nucleic
acid memory (NAM). They suggest that the global flash memory market is
predicted to reach $30.2 billion this year, potentially growing to
$80.3 billion by 2025. They estimated that by 2040, the demand for
global memory will exceed the projected supply of silicon (the raw
material used to store flash memory), and that nucleic acid memory has
a retention time far exceeding electronic memory. They have discussed
the longevity of the
DNA materials through first principle theoretical
calculations that is published as commentary research article.
According to their claims "With information retention times that range
from thousands to millions of years, volumetric density 103 times
greater than flash memory and energy of operation 108 times less, we
DNA used as a memory-storage material in nucleic acid
memory (NAM) products promises a viable and compelling alternative to
electronic memory." and "Given exponentially increasing demands for
safeguarded information worldwide, and the long retention times for
DNA (ranging from thousands to millions of years), NAM can store the
world's information for future generations using far less space and
energy. NAM could thus be used as a time capsule for massive,
infrequently accessed records in scientific, financial, governmental,
historical, genealogical, personal and genetic domains.".
The above methods of
DNA storage had the disadvantage that the whole
strand of synthetic
DNA has to be sequenced in order to retrieve only
one of several data sets that were previously encoded. In April 2016
researchers at the University of Washington published an encoding,
storage, retrieval and decoding method that enables random access of
any one of the data sets
In March 2017, Dr. Yaniv Erlich and Dina Zielinski of Columbia
University and the
New York Genome Center published a method known as
DNA Fountain which allows perfect retrieval of information from a
density of 215 petabytes per gram of DNA. The technique approaches the
Shannon capacity of
DNA storage, achieving 85% of the theoretical
limit. Using this method, they were also able to perfectly retrieve an
operating system called
KolibriOS , the French movie Arrival of a
Train at La Ciotat , a $50 Amazon gift card, a computer virus, a
Pioneer plaque and a study by
Claude Shannon , all with a total of
2.14 megabytes. A process which allows 2.18 × 1015 retrievals using
DNA sample was also tested, being able to perfectly
decode the data. The method is however not ready for large-scale use,
as it costs $7000 to synthesize 2 megabytes of data and another $2000
to read it.
Plant-based digital data storage
* ^ A B C D E Yong, E. (2013). "Synthetic double-helix faithfully
stores Shakespeare's sonnets". Nature. doi :10.1038/nature.2013.12279
* ^ A B C Goldman, N.; Bertone, P.; Chen, S.; Dessimoz, C.;
Leproust, E. M.; Sipos, B.; Birney, E. (2013). "Towards practical,
high-capacity, low-maintenance information storage in synthesized DNA"
. Nature. 494 (7435): 77–80. PMC 3672958 . PMID 23354052 . doi
* ^ https://sites.google.com/site/msneiman1905/eng
* ^ Skinner, Gary M.; Visscher, Koen; Mansuripur, Masud
(2007-06-01). "Biocompatible Writing of Data into DNA". Journal of
Bionanoscience. 1 (1): 17–21. doi :10.1166/jbns.2007.005 .
* ^ Church, G. M. ; Gao, Y.; Kosuri, S. (2012). "Next-Generation
Digital Information Storage in DNA". Science. 337 (6102): 1628. PMID
22903519 . doi :10.1126/science.1226355 .
* ^ Grass, R. N.; Heckel, R.; Puddu, M.; Paunescu, D.; Stark, W. J.
(2015). "Robust Chemical Preservation of Digital Information on
Silica with Error-Correcting Codes". Angewandte Chemie International
Edition. 54 (8): 2552. PMID 25650567 . doi :10.1002/anie.201411378 .
* ^ Jacobs, Angelika (February 13, 2015). "Data-storage for
eternity". Eidgenössische Technische Hochschule (ETH) Zürich.
Archived from the original on March 15, 2015. Retrieved March 15,
* ^ A B Zhirnov, V.; Zadegan, R. M.; Sandhu, G. S.; Church, G. M.;
Hughes, W. L. (2016). "Nucleic acid memory". Nature Materials. 15 (4):
366–370. doi :10.1038/nmat4594 .
* ^ "A DNA-Based Archival Storage System."
* ^ Yong, Ed. "This Speck of
DNA Contains a Movie, a Computer
Virus, and an Amazon Gift Card". The Atlantic. Retrieved 3 March 2017.
* ^ "Researchers store computer operating system and short movie on
DNA". Phys.org. Retrieved 3 March 2017.
* ^ "
DNA could store all of the world\'s data in one room". Science
Magazine. 2 March 2017. Retrieved 3 March 2017.
* ^ Erlich, Yaniv; Zielinski, Dina (2 March 2017). "
enables a robust and efficient storage architecture". Science. 355
(6328): 950–954. doi :10.1126/science.aaj2038 . Retrieved 3 March
* Edwards, Lin (August 17, 2012). "
DNA used to encode a book and
other digital information". Phys Org. Phys Org. Retrieved 2013-01-28.
* Mardis, E. R. (2008). "Next-Generation
DNA Sequencing Methods".
Annual Review of Genomics and Human Genetics. 9: 387–402. PMID
18576944 . doi :10.1146/annurev.genom.9.081307.164359 .
* Cole, Adam (January 24, 2013). "Shall I Encode Thee In DNA?
Sonnets Stored On Double Helix?" (Download article and audio is
available). National Public Radio.
* Naik, Gautam (January 24, 2013). "Storing Digital Data in DNA".
The Wall Street Journal
The Wall Street Journal . New York City: Dow Jones & Company.
* Wall Street Journal article. "Storing Digital Data in DNA"
* Ewan Birney\'s Blog. "Using
DNA as a digital archive media"
* Also see "The 10,000 year archive"
* Ed Yong\'s National Geographic blog. "Shakespeare’s Sonnets and
MLK’s Speech Stored in
DNA Sequencing Caught in Deluge of Data. The New York Times
* Aron, Jacob (February 15, 2015). "Glassed-in
DNA makes the
ultimate time capsule". New Scientist. Retrieved February