HOME

TheInfoList



OR:

There are two conceptualisations of data archaeology, the technical definition and the social science definition. Data archaeology (also data archeology) in the technical sense refers to the art and science of recovering
computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
encoded In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...
and/or
encrypted In cryptography, encryption (more specifically, encoding) is the process of transforming information in a way that, ideally, only authorized parties can decode. This process converts the original representation of the information, known as plain ...
in now obsolete
media Media may refer to: Communication * Means of communication, tools and channels used to deliver information or data ** Advertising media, various media, content, buying and placement for advertising ** Interactive media, media that is inter ...
or formats. Data archaeology can also refer to recovering information from damaged electronic formats after
natural disaster A natural disaster is the very harmful impact on a society or community brought by natural phenomenon or Hazard#Natural hazard, hazard. Some examples of natural hazards include avalanches, droughts, earthquakes, floods, heat waves, landslides ...
s or human error. It entails the rescue and recovery of old data trapped in outdated, archaic or obsolete storage formats such as floppy disks, magnetic tape, punch cards and transforming/transferring that data to more usable formats. Data archaeology in the social sciences usually involves an investigation into the source and history of datasets and the construction of these datasets. It involves mapping out the entire lineage of data, its nature and characteristics, its quality and veracity and how these affect the analysis and interpretation of the dataset. The findings of performing data archaeology affect the level to which the conclusions parsed from data analysis can be trusted. The term data archaeology originally appeared in 1993 as part of the Global Oceanographic Data Archaeology and Rescue Project (GODAR). The original impetus for data archaeology came from the need to recover computerised records of climatic conditions stored on old computer tape, which can provide valuable evidence for testing theories of
climate change Present-day climate change includes both global warming—the ongoing increase in Global surface temperature, global average temperature—and its wider effects on Earth's climate system. Climate variability and change, Climate change in ...
. These approaches allowed the reconstruction of an image of the
Arctic The Arctic (; . ) is the polar regions of Earth, polar region of Earth that surrounds the North Pole, lying within the Arctic Circle. The Arctic region, from the IERS Reference Meridian travelling east, consists of parts of northern Norway ( ...
that had been captured by the
Nimbus 2 Nimbus 2 (also called Nimbus-C) was a meteorological satellite. It was the second in a series of the Nimbus program. Launch Nimbus 2 was launched on May 15, 1966, by a Thor-Agena rocket from Vandenberg Air Force Base, California, United State ...
satellite A satellite or an artificial satellite is an object, typically a spacecraft, placed into orbit around a celestial body. They have a variety of uses, including communication relay, weather forecasting, navigation ( GPS), broadcasting, scient ...
on September 23, 1966, in higher resolution than ever seen before from this type of data.
NASA The National Aeronautics and Space Administration (NASA ) is an independent agencies of the United States government, independent agency of the federal government of the United States, US federal government responsible for the United States ...
also utilises the services of data archaeologists to recover information stored on 1960s-era vintage computer tape, as exemplified by the
Lunar Orbiter Image Recovery Project The Lunar Orbiter Image Recovery Project (LOIRP) was a project to digitize the original analog data tapes from the five Lunar Orbiter spacecraft that were sent to the Moon in 1966 and 1967; it was funded by NASA, SkyCorp, SpaceRef Interactive, ...
(LOIRP).


Recovery

There is a distinction between data recovery and data intelligibility. One may be able to recover data but not understand it. For data archaeology to be effective, the data must be intelligible.
Study on website October 23, 2011
A term closely related to data archaeology is
data lineage Data lineage refers to the process of tracking how data is generated, transformed, transmitted and used across a system over time. It documents data's origins, transformations and movements, providing detailed visibility into its life cycle. This ...
. The first step in performing data archaeology is an investigation into their data lineage. Data lineage entails the history of the data, its source and any alterations or transformations they have undergone. Data lineage can be found in the metadata of a dataset, the para data of a dataset or any accompanying identifiers (methodological guides etc). With data archaeology comes methodological transparency which is the level to which the data user can access the data history. The level of methodological transparency available determines not only how much can be recovered, but assists in knowing the data. Data lineage investigation involves what instruments were used, what the selection criteria are, the measurement parameters and the sampling frameworks. In the socio-political manner, data archaeology involves the analysis of data assemblages to reveal their discursive and material socio-technical elements and apparatuses. This kind of analysis can reveal the politics of the data being analysed and thus that of their producing institution. Archaeology in this sense, refers to the provenance of data. It involves mapping the sites, formats and infrastructures through which data flows and are altered or transformed over time. it has an interest in the life of data, and the politics that shapes the circulation of data. This serves to expose the key actors, practices and praxes at play and their roles. It can be accomplished in two steps. First is, accessing and assessing the technical stack of the data (this refers to the infrastructure and material technologies used to build/gather the data) to understand the physical representation of the data and also. Second, analysing the contextual stack of the data which shapes how the data is constructed, used and analysed. This can be done via a variety of processes, interviews, analysing technical and policy documents and investigating the effect of the data on a community or the institutional, financial, legal and material framing. This can be attained by creating
data assemblage
Data archaeology charts the way data moves across different sites and can sometimes encounter data friction.


Disaster recovery

Data archaeologists can also use
data recovery In computing, data recovery is a process of retrieving deleted, inaccessible, lost, corrupted, damaged, overwritten or formatted data from computer data storage#Secondary storage, secondary storage, removable media or Computer file, files, when ...
after natural disasters such as fires, floods,
earthquake An earthquakealso called a quake, tremor, or tembloris the shaking of the Earth's surface resulting from a sudden release of energy in the lithosphere that creates seismic waves. Earthquakes can range in intensity, from those so weak they ...
s, or even
hurricanes A tropical cyclone is a rapidly rotating storm system with a low-pressure area, a closed low-level atmospheric circulation, strong winds, and a spiral arrangement of thunderstorms that produce heavy rain and squalls. Depending on its locat ...
. For example, in 1995 during
Hurricane Marilyn Hurricane Marilyn was the most powerful hurricane to strike the Virgin Islands since Hurricane Hugo of 1989, and the third such tropical cyclone in roughly a two-week time span to strike or impact the Leeward Islands, the others being Hurricane ...
the National Media Lab assisted the
National Archives and Records Administration The National Archives and Records Administration (NARA) is an independent agency of the United States government within the executive branch, charged with the preservation and documentation of government and historical records. It is also task ...
in recovering data at risk due to damaged equipment. The hardware was damaged from rain, salt water, and sand, yet it was possible to clean some of the disks and refit them with new cases thus saving the data within.


Recovery techniques

When deciding whether or not to try and recover data, the cost must be taken into account. If there is enough time and money, most data will be able to be recovered. In the case of
magnetic media Magnetic storage or magnetic recording is the storage of data on a magnetized medium. Magnetic storage uses different patterns of magnetisation in a magnetizable material to store data and is a form of non-volatile memory. The information is acc ...
, which are the most common type used for data storage, there are various techniques that can be used to recover the data depending on the type of damage. Humidity can cause tapes to become unusable as they begin to deteriorate and become sticky. In this case, a heat treatment can be applied to fix this problem, by causing the oils and residues to either be reabsorbed into the tape or evaporate off the surface of the tape. However, this should only be done in order to provide access to the data so it can be extracted and copied to a medium that is more stable. Lubrication loss is another source of damage to tapes. This is most commonly caused by heavy use, but can also be a result of improper storage or natural evaporation. As a result of heavy use, some of the lubricant can remain on the read-write heads which then collect dust and particles. This can cause damage to the tape. Loss of lubrication can be addressed by re-lubricating the tapes. This should be done cautiously, as excessive re-lubrication can cause tape slippage, which in turn can lead to media being misread and the loss of data. Water exposure will damage tapes over time. This often occurs in a disaster situation. If the media is in salty or dirty water, it should be rinsed in fresh water. The process of cleaning, rinsing, and drying wet tapes should be done at room temperature in order to prevent heat damage. Older tapes should be recovered prior to newer tapes, as they are more susceptible to water damage. The next step (after investigating the data lineage) is to establish what counts as good data and bad data to ensure that only the 'good' data gets migrated to the new data warehouse or repository. A good example of bad data is 'test data' in the technical data sense is
test data Test data are sets of inputs or information used to verify the correctness, performance, and reliability of software systems. Test data encompass various types, such as positive and negative scenarios, edge cases, and realistic user scenarios, and ...
.


Prevention

To prevent the need of data archaeology, creators and holders of digital documents should take care to employ
digital preservation In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
. Another effective preventive measure is the use of offshore backup facilities that could not be affected should a disaster occur. From these backup servers, copies of the lost data could easily be retrieved. A multi-site and multi-technique data distribution plan is advised for optimal data recovery, especially when dealing with
big data Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
.
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suite are ...
method, snapshot recovery, mirror sites and tapes safeguarding data in a private cloud are also all good preventive methods. Daily transferring data from their mirror sites to the emergency servers.


See also

* Bit rot *
Data curation Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formal ...
*
Data preservation Data preservation is the act of conserving and maintaining both the safety and integrity of data. Preservation is done through formal activities that are governed by policies, regulations and strategies directed towards protecting and prolonging th ...
*
Digital dark age The digital dark age is a lack of historical information in the digital age as a direct result of outdated file formats, software, or hardware that becomes corrupt, scarce, or inaccessible as technologies evolve and data decays. Future generatio ...
*
Digital preservation In library science, library and archival science, digital preservation is a formal process to ensure that digital information of continuing value remains accessible and usable in the long term. It involves planning, resource allocation, and appli ...
* Knowledge discovery


References


Further reading

*O'Donnell, James Joseph. ''Avatars of the Word: From Papyrus to Cyperspace'' Harvard University Press, 1998. * * Dumit, J. and Nafus, D. (2018) ‘The other ninety per cent: Thinking with data science, creating data studies,’ in Knox, H. and Nafus, D. (eds), Ethnography for a Data-Saturated World. Manchester University Press, Manchester, pp. 252–274


External links


World Wide Words: Data Archaeology
{{Data Data management Digital preservation Archaeological sub-disciplines