In
information technology
Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of data . and information. IT forms part of information and communications technology (ICT). An information technology system (I ...
, a backup, or data backup is a copy of
computer data
In computer science, data (treated as singular, plural, or as a mass noun) is any sequence of one or more symbols; datum is a single symbol of data. Data requires interpretation to become information. Digital data is data that is represented u ...
taken and stored elsewhere so that it may be used to restore the original after a
data loss Data loss is an error condition in information systems in which information is destroyed by failures (like failed spindle motors or head crashes on hard drives) or neglect (like mishandling, careless handling or storage under unsuitable conditions) ...
event. The verb form, referring to the process of doing so, is "
back up
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is " back up", ...
", whereas the noun and adjective form is "
backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", w ...
".
Backups can be used to recover data after its loss from
data deletion or
corruption
Corruption is a form of dishonesty or a criminal offense which is undertaken by a person or an organization which is entrusted in a position of authority, in order to acquire illicit benefits or abuse power for one's personal gain. Corruption m ...
, or to recover data from an earlier time.
Backups provide a simple form of
disaster recovery
Disaster recovery is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle.It employs policies, tools, and procedures. Disaster recovery focuses on t ...
; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a
computer cluster
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.
The comp ...
,
active directory
Active Directory (AD) is a directory service developed by Microsoft for Windows domain networks. It is included in most Windows Server operating systems as a set of processes and services. Initially, Active Directory was used only for centralize ...
server, or
database server
A database server is a server which uses a database application that provides database services to other computer programs or to computers, as defined by the client–server model. Database management systems (DBMSs) frequently provide database- ...
.
A backup system contains at least one copy of all data considered worth saving. The
data storage
Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are conside ...
requirements can be large. An
information repository
In information technology, an information repository or simply a repository is "a central place in which an aggregation of data is kept and maintained in an organized way, usually in computer storage." It "may be just the aggregation of data itse ...
model may be used to provide structure to this storage. There are different types of
data storage device
Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are conside ...
s used for copying backups of data that is already in secondary storage onto
archive file
In computing, an archive file is a computer file that is composed of one or more files along with metadata. Archive files are used to collect multiple data files together into a single file for easier portability and storage, or simply to compress ...
s.
[In contrast to everyday use of the term "archive", the data stored in an "archive file" is not necessarily old or of historical interest.] There are also different ways these devices can be arranged to provide geographic dispersion,
data security
Data security means protecting digital data, such as those in a database, from destructive forces and from the unwanted actions of unauthorized users, such as a cyberattack or a data breach.
Technologies
Disk encryption
Disk encryption refe ...
, and
portability.
Data is selected, extracted, and manipulated for storage. The process can include methods for
dealing with live data, including open files, as well as compression, encryption, and
de-duplication. Additional techniques apply to
enterprise client-server backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is " back up", ...
. Backup schemes may include
dry runs that validate the reliability of the data being backed up. There are limitations
and human factors involved in any backup scheme.
Storage
A backup strategy requires an information repository, "a secondary storage space for data"
that aggregates backups of data "sources". The repository could be as simple as a list of all backup media (DVDs, etc.) and the dates produced, or could include a computerized index, catalog, or
relational database
A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
.
The backup data needs to be stored, requiring a
backup rotation scheme
A backup rotation scheme is a system of backing up data to computer media (such as tapes) that minimizes, by re-use, the number of media used. The scheme determines how and when each piece of removable storage is used for a backup job and how long ...
,
which is a system of backing up data to computer media that limits the number of backups of different dates retained separately, by appropriate re-use of the data storage media by overwriting of backups no longer needed. The scheme determines how and when each piece of removable storage is used for a backup operation and how long it is retained once it has backup data stored on it. The 3-2-1 rule can aid in the backup process. It states that there should be at least 3 copies of the data, stored on 2 different types of storage media, and one copy should be kept offsite, in a remote location (this can include
cloud storage
Cloud storage is a model of computer data storage in which the digital data is stored in logical pools, said to be on "the cloud". The physical storage spans multiple servers (sometimes in multiple locations), and the physical environment is t ...
). 2 or more different media should be used to eliminate data loss due to similar reasons (for example, optical discs may tolerate being underwater while LTO tapes may not, and SSDs cannot fail due to
head crash
A head crash is a hard-disk failure that occurs when a read–write head of a hard disk drive makes contact with its rotating platter, slashing its surface and permanently damaging its magnetic media. It is most often caused by a sudden seve ...
es or damaged spindle motors since they don't have any moving parts, unlike hard drives). An offsite copy protects against fire, theft of physical media (such as tapes or discs) and natural disasters like floods and earthquakes. Disaster protected hard drives like those made by
ioSafe
ioSafe is a manufacturer of disaster protected hard drives and network attached storage (NAS) appliances. The company was founded in 2004 and is based in Roseville, California. ioSafe's storage systems are optimized for heat from fire and comple ...
are an alternative to an offsite copy, but they have limitations like only being able to resist fire for a limited period of time, so an offsite copy still remains as the ideal choice.
Backup methods
Unstructured
An unstructured repository may simply be a stack of tapes, DVD-Rs or external HDDs with minimal information about what was backed up and when. This method is the easiest to implement, but unlikely to achieve a high level of recoverability as it lacks automation.
Full only/System imaging
A repository using this backup method contains complete source data copies taken at one or more specific points in time. Copying
system image
In computing, a system image is a serialized copy of the entire state of a computer system stored in some non-volatile form such as a file. A system is said to be capable of using system images if it can be shut down and later restored to exactly ...
s, this method is frequently used by computer technicians to record known good configurations. However, imaging is generally more useful as a way of deploying a standard configuration to many systems rather than as a tool for making ongoing backups of diverse systems.
Incremental
An
incremental backup
An incremental backup is one in which successive copies of the data contain only the portion that has changed since the preceding backup copy was made. When a full recovery is needed, the restoration process would need the last full backup plus al ...
stores data changed since a reference point in time. Duplicate copies of unchanged data aren't copied. Typically a full backup of all files is once or at infrequent intervals, serving as the reference point for an incremental repository. Subsequently, a number of incremental backups are made after successive time periods. Restores begin with the last full backup and then apply the incrementals.
Some backup systems
can create a from a series of incrementals, thus providing the equivalent of frequently doing a full backup. When done to modify a single archive file, this speeds restores of recent versions of files.
Near-CDP
Continuous Data Protection Continuous data protection (CDP), also called continuous backup or real-time backup, refers to backup of computer data by automatically saving a copy of every change made to that data, essentially capturing every version of the data that the user s ...
(CDP) refers to a backup that instantly saves a copy of every change made to the data. This allows restoration of data to any point in time and is the most comprehensive and advanced data protection.
Near-CDP backup applications—often
marketed
Marketing is the process of exploring, creating, and delivering value to meet the needs of a target market in terms of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emph ...
as "CDP"—automatically take incremental backups at a specific interval, for example every 15 minutes, one hour, or 24 hours. They can therefore only allow restores to an interval boundary.
Near-CDP backup applications use
journaling and are typically based on periodic "snapshots",
read-only copies of the data frozen at a particular
point in time.
Near-CDP (except for
Apple Time Machine)
intent-logs every change on the host system,
often by saving byte or block-level differences rather than file-level differences. This backup method differs from simple
disk mirroring
In data storage, disk mirroring is the replication of logical disk volumes onto separate physical hard disks in real time to ensure continuous availability. It is most commonly used in RAID 1. A mirrored volume is a complete logical representat ...
in that it enables a roll-back of the log and thus a restoration of old images of data. Intent-logging allows precautions for the consistency of live data, protecting ''self-consistent'' files but requiring ''applications'' "be quiesced and made ready for backup."
Near-CDP is more practicable for ordinary personal backup applications, as opposed to ''true'' CDP, which must be run in conjunction with a virtual machine
or equivalent
and is therefore generally used in enterprise client-server backups.
Software may create copies of individual files such as written documents, multimedia projects, or user preferences, to prevent failed write events caused by power outages, operating system crashes, or exhausted disk space, from causing data loss. A common implementation is an appended
".bak" extension to the
file name
A filename or file name is a name used to uniquely identify a computer file in a directory structure. Different file systems impose different restrictions on filename lengths.
A filename may (depending on the file system) include:
* name &ndas ...
.
Reverse incremental
A
Reverse incremental backup method stores a recent archive file "mirror" of the source data and a series of differences between the "mirror" in its current state and its previous states. A reverse incremental backup method starts with a non-image full backup. After the full backup is performed, the system periodically synchronizes the full backup with the live copy, while storing the data necessary to reconstruct older versions. This can either be done using
hard links
In computing, a hard link is a directory entry (in a directory-based file system) that associates a name with a file. Thus, each file must have at least one hard link. Creating additional hard links for a file makes the contents of that file acc ...
—as Apple Time Machine does, or using binary
diffs.
Differential
A
differential backup
A differential backup is a type of data backup that preserves data, saving only the difference in the data since the last full backup. The rationale in this is that, since changes to data are generally few compared to the entire amount of data in t ...
saves only the data that has changed since the last full backup. This means a maximum of two backups from the repository are used to restore the data. However, as time from the last full backup (and thus the accumulated changes in data) increases, so does the time to perform the differential backup. Restoring an entire system requires starting from the most recent full backup and then applying just the last differential backup.
A differential backup copies files that have been created or changed since the last full backup, regardless of whether any other differential backups have been made since, whereas an incremental backup copies files that have been created or changed since the most recent backup of any type (full or incremental). Changes in files may be detected through a more recent date/time of last modification
file attribute File attributes are a type of meta-data that describe and may modify how files and/or directories in a filesystem behave. Typical file attributes may, for example, indicate or specify whether a file is visible, modifiable, compressed, or encrypted. ...
, and/or changes in file size. Other variations of incremental backup include multi-level incrementals and block-level incrementals that compare parts of files instead of just entire files.
Storage media
Regardless of the repository model that is used, the data has to be copied onto an archive file data storage medium. The medium used is also referred to as the type of backup destination.
Magnetic tape
Magnetic tape
Magnetic tape is a medium for magnetic storage made of a thin, magnetizable coating on a long, narrow strip of plastic film. It was developed in Germany in 1928, based on the earlier magnetic wire recording from Denmark. Devices that use magne ...
was for a long time the most commonly used medium for bulk data storage, backup, archiving, and interchange. It was previously a less expensive option, but this is no longer the case for smaller amounts of data.
Tape is a
sequential access
Sequential access is a term describing a group of elements (such as data in a memory array or a disk file or on magnetic tape data storage) being accessed in a predetermined, ordered sequence. It is the opposite of random access, the ability to ...
medium, so the rate of continuously writing or reading data can be very fast. While tape media itself has a low cost per space,
tape drive
A tape drive is a data storage device that reads and writes data on a magnetic tape. Magnetic tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and a long archival stability.
...
s are typically dozens of times as expensive as
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s and
optical drive
In computing, an optical disc drive is a disk drive, disc drive that uses laser light or electromagnetic waves within or near the visible light spectrum as part of the process of reading or writing data to or from optical discs. Some drives ...
s.
Many tape formats have been proprietary or specific to certain markets like mainframes or a particular brand of personal computer. By 2014
LTO had become the primary tape technology.
The other remaining viable "super" format is the
IBM 3592
The IBM 3592 is a series of tape drives and corresponding magnetic tape data storage media formats developed by IBM. The first drive, having the IBM product number 3592, was introduced under the nickname ''Jaguar''. The next drive was the TS112 ...
(also referred to as the TS11xx series). The
Oracle StorageTek T10000 was discontinued in 2016.
Hard disk
The use of
hard disk
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
storage has increased over time as it has become progressively cheaper. Hard disks are usually easy to use, widely available, and can be accessed quickly.
However, hard disk backups are
close-tolerance mechanical devices and may be more easily damaged than tapes, especially while being transported.
In the mid-2000s, several drive manufacturers began to produce portable drives employing
ramp loading and accelerometer technology (sometimes termed a "shock sensor"),
and by 2010 the industry average in drop tests for drives with that technology showed drives remaining intact and working after a 36-inch non-operating drop onto industrial carpeting.
Some manufacturers also offer 'ruggedized' portable hard drives, which include a shock-absorbing case around the hard disk, and
claim a range of higher drop specifications.
[
] Over a period of years the stability of hard disk backups is shorter than that of tape backups.
External hard disks can be connected via local interfaces like
SCSI
Small Computer System Interface (SCSI, ) is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, electrical, optical and logical interface ...
,
USB
Universal Serial Bus (USB) is an industry standard that establishes specifications for cables, connectors and protocols for connection, communication and power supply (interfacing) between computers, peripherals and other computers. A broad ...
,
FireWire
IEEE 1394 is an interface standard for a serial bus for high-speed communications and isochronous real-time data transfer. It was developed in the late 1980s and early 1990s by Apple in cooperation with a number of companies, primarily Sony an ...
, or
eSATA
SATA (Serial AT Attachment) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives, optical drives, and solid-state drives. Serial ATA succeeded the earlier Parallel ATA (PATA) standard t ...
, or via longer-distance technologies like
Ethernet
Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
,
iSCSI
Internet Small Computer Systems Interface or iSCSI ( ) is an Internet Protocol-based storage networking standard for linking data storage facilities. iSCSI provides block-level access to storage devices by carrying SCSI commands over a TCP/IP ...
, or
Fibre Channel
Fibre Channel (FC) is a high-speed data transfer protocol providing in-order, lossless delivery of raw block data. Fibre Channel is primarily used to connect computer data storage to servers in storage area networks (SAN) in commercial data cen ...
. Some disk-based backup systems, via
Virtual Tape Libraries or otherwise, support data deduplication, which can reduce the amount of disk storage capacity consumed by daily and weekly backup data.
Optical storage
Optical storage
IBM defines optical storage as "any storage method that uses a laser to store and retrieve data from optical media." '' Britannica'' notes that it "uses low-power laser beams to record and retrieve digital (binary) data." Compact disc (CD) an ...
uses lasers to store and retrieve data. Recordable
CDs, DVDs, and
Blu-ray Disc
The Blu-ray Disc (BD), often known simply as Blu-ray, is a Digital media, digital optical disc data storage format. It was invented and developed in 2005 and released on June 20, 2006 worldwide. It is designed to supersede the DVD format, and c ...
s are commonly used with personal computers and are generally cheap. In the past, the capacities and speeds of these discs have been lower than hard disks or tapes, although advances in optical media are slowly shrinking that gap.
Potential future data losses caused by gradual
media degradation can be
predicted by
measuring the rate of correctable minor data errors, of which consecutively too many increase the risk of uncorrectable sectors. Support for error scanning varies among
optical drive
In computing, an optical disc drive is a disk drive, disc drive that uses laser light or electromagnetic waves within or near the visible light spectrum as part of the process of reading or writing data to or from optical discs. Some drives ...
vendors.
Many optical disc formats are
WORM
Worms are many different distantly related bilateral animals that typically have a long cylindrical tube-like body, no limbs, and no eyes (though not always).
Worms vary in size from microscopic to over in length for marine polychaete wor ...
type, which makes them useful for archival purposes since the data cannot be changed. Moreover, optical discs are
not vulnerable to
head crash
A head crash is a hard-disk failure that occurs when a read–write head of a hard disk drive makes contact with its rotating platter, slashing its surface and permanently damaging its magnetic media. It is most often caused by a sudden seve ...
es, magnetism, imminent water ingress or
power surges, and a fault of the drive typically just halts the spinning.
Optical media is
modular
Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a sy ...
; the storage controller is not tied to media itself like with hard drives or flash storage (→
flash memory controller
A flash memory controller (or flash controller) manages data stored on flash memory (usually NAND flash) and communicates with a computer or electronic device. Flash memory controllers can be designed for operating in low duty-cycle environments l ...
), allowing it to be removed and accessed through a different drive. However, recordable media may degrade earlier under long-term exposure to light.
Some optical storage systems allow for cataloged data backups without human contact with the discs, allowing for longer data integrity. A French study in 2008 indicated that the lifespan of typically-sold
CD-Rs was 2–10 years,
but one manufacturer later estimated the longevity of its CD-Rs with a gold-sputtered layer to be as high as 100 years. Sony's