Data scrubbing is an
error correction
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
technique that uses a background task to periodically inspect
main memory
Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processin ...
or
storage for errors, then corrects detected errors using
redundant data in the form of different
checksums or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.
Data integrity
Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle. It is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
is a high-priority concern in writing, reading, storage, transmission, or processing of data in computer
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s and in computer storage and
data transmission
Data communication, including data transmission and data reception, is the transfer of data, signal transmission, transmitted and received over a Point-to-point (telecommunications), point-to-point or point-to-multipoint communication chann ...
systems. However, only a few of the currently existing and used
file systems provide sufficient protection against
data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
.
To address this issue, data scrubbing provides routine checks of all inconsistencies in data and, in general, prevention of hardware or software failure. This "scrubbing" feature occurs commonly in memory, disk arrays,
file systems, or
FPGAs as a mechanism of error detection and correction.
RAID
With data scrubbing, a
RAID controller may periodically read all
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
s in a RAID array and check for defective blocks before applications might actually access them. This reduces the probability of silent data corruption and data loss due to bit-level errors.
In
Dell PowerEdge RAID environments, a feature called "patrol read" can perform data scrubbing and
preventive maintenance
The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure and supporting utilities in industrial, business, and residential installa ...
.
In
OpenBSD
OpenBSD is a security-focused operating system, security-focused, free software, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking NetBSD ...
, the
bioctl(8)
utility allows the
system administrator
An IT administrator, system administrator, sysadmin, or admin is a person who is responsible for the upkeep, configuration, and reliable operation of computer systems, especially multi-user computers, such as Server (computing), servers. The ...
to control these patrol reads through the
BIOCPATROL
ioctl on the
/dev/bio
pseudo-device; as of 2019, this functionality is supported in some device drivers for
LSI Logic and Dell controllers — this includes
mfi(4)
since OpenBSD 5.8 (2015) and
mfii(4)
since OpenBSD 6.4 (2018).
In
FreeBSD
FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
and
DragonFly BSD
DragonFly BSD is a free and open-source Unix-like operating system forked from FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and FreeBSD developer between 1994 and 2003, began working on DragonFly BSD in ...
, patrol can be controlled through a
RAID controller-specific utility
mfiutil(8)
since FreeBSD 8.0 (2009) and 7.3 (2010). The implementation from FreeBSD was used by the OpenBSD developers for adding patrol support to their generic
bio(4)
Bio or BIO may refer to:
Computing
* bio(4), a pseudo-device driver in RAID controller management interface in OpenBSD and NetBSD
* Block I/O, a concept in computer data storage
Politics
* Julius Maada Bio (born 1964), Sierra Leonean politician, ...
framework and the
bioctl utility, without a need for a separate controller-specific utility.
In
NetBSD
NetBSD is a free and open-source Unix-like operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was fork (software development), forked. It continues to ...
in 2008, the bio(4) framework from OpenBSD was extended to feature support for consistency checks, which was implemented for
/dev/bio
pseudo-device under
BIOCSETSTATE
ioctl command, with the options being start and stop (
BIOC_SSCHECKSTART_VOL
and
BIOC_SSCHECKSTOP_VOL
, respectively); this is supported only by a single driver as of 2019 —
arcmsr(4)
.
Linux MD RAID, as a
software RAID implementation, makes data consistency checks available and provides automated repairing of detected data inconsistencies. Such procedures are usually performed by setting up a weekly
cron
The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed t ...
job. Maintenance is performed by issuing operations ''check'', ''repair'', or ''idle'' to each of the examined MD devices. Statuses of all performed operations, as well as general RAID statuses, are always available.
File systems
Btrfs
As a
copy-on-write (CoW)
file system for
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
,
Btrfs provides fault isolation, corruption detection and correction, and file-system scrubbing. If the file system detects a checksum mismatch while reading a block, it first tries to obtain (or create) a good copy of this block from another device if its internal mirroring or RAID techniques are in use.
Btrfs can initiate an online check of the entire file system by triggering a file system scrub job that is performed in the background. The scrub job scans the entire file system for integrity and automatically attempts to report and repair any bad blocks it finds along the way.
ReFS
ReFS features automatic data scrubbing. Files that should not be scrubbed can be marked with the FILE_ATTRIBUTE_NO_SCRUB_DATA flag.
ZFS
The features of ZFS, which is a combined
file system and
logical volume manager, include the verification against
data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
modes, continuous integrity checking, and automatic repair.
Sun Microsystems
Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
designed ZFS from the ground up with a focus on data integrity and to protect the data on disks against issues such as disk firmware bugs and
ghost writes.
ZFS provides a repair utility called
scrub
that examines and repairs silent
data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
caused by
data rot and other problems.
Memory
Due to the high integration density of contemporary computer memory
chips, the individual memory cell structures became small enough to be vulnerable to
cosmic ray
Cosmic rays or astroparticles are high-energy particles or clusters of particles (primarily represented by protons or atomic nuclei) that move through space at nearly the speed of light. They originate from the Sun, from outside of the ...
s and/or
alpha particle
Alpha particles, also called alpha rays or alpha radiation, consist of two protons and two neutrons bound together into a particle identical to a helium-4 nucleus. They are generally produced in the process of alpha decay but may also be produce ...
emission. The errors caused by these phenomena are called
soft errors. This can be a problem for
DRAM
Dram, DRAM, or drams may refer to:
Technology and engineering
* Dram (unit), a unit of mass and volume, and an informal name for a small amount of liquor, especially whisky or whiskey
* Dynamic random-access memory, a type of electronic semicondu ...
- and
SRAM-based memories.
''Memory scrubbing'' does error-detection and correction of bit errors in computer
RAM by using
ECC memory
Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct ''n''-bit data corruption which occurs in memory.
Typically, ECC memory maintains a memory system immun ...
, other copies of the data, or other
error-correction codes.
FPGA
''Scrubbing'' is a technique used to reprogram an
FPGA. It can be used periodically to avoid the accumulation of errors without the need to find one in the configuration bitstream, thus simplifying the design.
Numerous approaches can be taken with respect to scrubbing, from simply reprogramming the FPGA to partial reconfiguration. The simplest method of scrubbing is to completely reprogram the FPGA at some periodic rate (typically 1/10 the calculated upset rate). However, the FPGA is not operational during that reprogram time, on the order of micro to milliseconds. For situations that cannot tolerate that type of interruption, partial reconfiguration is available. This technique allows the FPGA to be reprogrammed while still operational.
See also
*
Data corruption
Data corruption refers to errors in computer data that occur during writing, reading, storage, transmission, or processing, which introduce unintended changes to the original data. Computer, transmission, and storage systems use a number of meas ...
*
Data degradation
Data degradation is the gradual Data corruption, corruption of Data (computing), computer data due to an accumulation of non-critical failures in a data storage device. It is also referred to as data decay, data rot or bit rot. This results in ...
*
Error detection and correction
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
*
fsck
The system utility fsck (''file system check'') is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux
Linux ( ) is a family of open source Unix-like operating systems based on the L ...
- a tool for checking the consistency of a
file system
*
CHKDSK
In computing, CHKDSK (short for "check disk") is a system software, system tool and command (computing), command in DOS and Microsoft Windows (and related operating systems), as well as Digital Research FlexOS, IBM/Toshiba 4690 Operating System, 4 ...
- similar to fsck, used in
Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
operating systems
References
External links
''Soft Errors in Electronic Memory''
{{data
Error detection and correction