Data Scrubbing
   HOME

TheInfoList



OR:

Data scrubbing is an
error correction In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communica ...
technique that uses a background task to periodically inspect
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
or
storage Storage may refer to: Goods Containers * Dry cask storage, for storing high-level radioactive waste * Food storage * Intermodal container, cargo shipping * Storage tank Facilities * Garage (residential), a storage space normally used to store car ...
for errors, then corrects detected errors using redundant data in the form of different
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s or copies of data. Data scrubbing reduces the likelihood that single correctable errors will accumulate, leading to reduced risks of uncorrectable errors.
Data integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, proc ...
is a high-priority concern in writing, reading, storage, transmission, or processing of the
computer data In computer science, data (treated as singular, plural, or as a mass noun) is any sequence of one or more symbols; datum is a single symbol of data. Data requires interpretation to become information. Digital data is data that is represented u ...
in computer
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s and in computer storage and
data transmission Data transmission and data reception or, more broadly, data communication or digital communications is the transfer and reception of data in the form of a digital bitstream or a digitized analog signal transmitted over a point-to-point o ...
systems. However, only a few of the currently existing and used
file systems In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
provide sufficient protection against
data corruption In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
. To address this issue, data scrubbing provides routine checks of all inconsistencies in data and, in general, prevention of hardware or software failure. This "scrubbing" feature occurs commonly in memory, disk arrays,
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
s, or
FPGAs A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
as a mechanism of error detection and correction.


RAID

With data scrubbing, a
RAID controller A disk array controller is a device that manages the physical disk drives and presents them to the computer as logical units. It almost always implements hardware RAID, thus it is sometimes referred to as RAID controller. It also often provides a ...
may periodically read all
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s in a RAID array and check for defective blocks before applications might actually access them. This reduces the probability of silent data corruption and data loss due to bit-level errors. In
Dell PowerEdge The PowerEdge (PE) line is Dell's server computer product line. Most PowerEdge servers use the x86 architecture. The early exceptions to this, the PowerEdge 3250, PowerEdge 7150, and PowerEdge 7250, used Intel's Itanium processor, but Dell ab ...
RAID environments, a feature called "patrol read" can perform data scrubbing and
preventive maintenance The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business, and residential installa ...
. In
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project em ...
, the
bioctl The bio(4) pseudo-device driver and the bioctl(8) utility implement a generic RAID volume management interface in OpenBSD and NetBSD. The idea behind this software is similar to ifconfig, where a single utility from the operating system can be u ...
(8)
utility allows the
system administrator A system administrator, or sysadmin, or admin is a person who is responsible for the upkeep, configuration, and reliable operation of computer systems, especially multi-user computers, such as servers. The system administrator seeks to ensu ...
to control these patrol reads through the BIOCPATROL
ioctl In computing, ioctl (an abbreviation of input/output control) is a system call for device-specific input/output operations and other operations which cannot be expressed by regular system calls. It takes a parameter specifying a request code; th ...
on the /dev/bio
pseudo-device In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow a ...
; as of 2019, this functionality is supported in some device drivers for
LSI Logic LSI Logic Corporation, an American company founded in Milpitas, California, was a pioneer in the ASIC and EDA industries. It evolved over time to design and sell semiconductors and software that accelerated storage and networking in data cente ...
and Dell controllers — this includes mfi(4) since OpenBSD 5.8 (2015) and mfii(4) since OpenBSD 6.4 (2018). In
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
and
DragonFly BSD DragonFly BSD is a free and open-source Unix-like operating system forked from FreeBSD 4.8. Matthew Dillon, an Amiga developer in the late 1980s and early 1990s and FreeBSD developer between 1994 and 2003, began working on DragonFly BSD in Ju ...
, patrol can be controlled through a
RAID controller A disk array controller is a device that manages the physical disk drives and presents them to the computer as logical units. It almost always implements hardware RAID, thus it is sometimes referred to as RAID controller. It also often provides a ...
-specific utility mfiutil(8) since FreeBSD 8.0 (2009) and 7.3 (2010). The implementation from FreeBSD was used by the OpenBSD developers for adding patrol support to their generic
bio(4) Bio or BIO may refer to: Computing * bio(4), a pseudo-device driver in RAID controller management interface in OpenBSD and NetBSD * Block I/O, a concept in computer data storage Politics * Julius Maada Bio (born 1964), Sierra Leonean politician ...
framework and the
bioctl The bio(4) pseudo-device driver and the bioctl(8) utility implement a generic RAID volume management interface in OpenBSD and NetBSD. The idea behind this software is similar to ifconfig, where a single utility from the operating system can be u ...
utility, without a need for a separate controller-specific utility. In
NetBSD NetBSD is a free and open-source Unix operating system based on the Berkeley Software Distribution (BSD). It was the first open-source BSD descendant officially released after 386BSD was forked. It continues to be actively developed and is a ...
in 2008, the bio(4) framework from OpenBSD was extended to feature support for consistency checks, which was implemented for /dev/bio
pseudo-device In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow a ...
under BIOCSETSTATE
ioctl In computing, ioctl (an abbreviation of input/output control) is a system call for device-specific input/output operations and other operations which cannot be expressed by regular system calls. It takes a parameter specifying a request code; th ...
command, with the options being start and stop (BIOC_SSCHECKSTART_VOL and BIOC_SSCHECKSTOP_VOL, respectively); this is supported only by a single driver as of 2019 — arcmsr(4).
Linux MD RAID mdadm is a Linux utility used to manage and monitor software RAID devices. It is used in modern Linux distributions in place of older software RAID utilities such as raidtools2 or raidtools. mdadm is free software originally maintained by, an ...
, as a
software RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
implementation, makes data consistency checks available and provides automated repairing of detected data inconsistencies. Such procedures are usually performed by setting up a weekly
cron The cron command-line utility is a job scheduler on Unix-like operating systems. Users who set up and maintain software environments use cron to schedule jobs (commands or shell scripts), also known as cron jobs, to run periodically at fixed ti ...
job. Maintenance is performed by issuing operations ''check'', ''repair'', or ''idle'' to each of the examined MD devices. Statuses of all performed operations, as well as general RAID statuses, are always available.


File systems


Btrfs

As a
copy-on-write Copy-on-write (COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is dupl ...
(CoW)
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
for
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
,
Btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or simply by spelling it out) is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (not to be confused ...
provides fault isolation, corruption detection and correction, and file-system scrubbing. If the file system detects a checksum mismatch while reading a block, it first tries to obtain (or create) a good copy of this block from another device if its internal mirroring or RAID techniques are in use. Btrfs can initiate an online check of the entire file system by triggering a file system scrub job that is performed in the background. The scrub job scans the entire file system for integrity and automatically attempts to report and repair any bad blocks it finds along the way.


ZFS

The features of ZFS, which is a combined
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
and
logical volume manager In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate ...
, include the verification against
data corruption In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
modes, continuous integrity checking, and automatic repair.
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the ...
designed ZFS from the ground up with a focus on data integrity and to protect the data on disks against issues such as disk firmware bugs and ghost writes. ZFS provides a repair utility called scrub that examines and repairs silent
data corruption In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
caused by
data rot Bit rot may refer to: * " Bit Rot", a short story by Charles Stross * Data rot, the decay of electromagnetic charge in a computer's storage ** Disc rot Disc rot is the tendency of CD, DVD, or other optical discs to become unreadable because of ...
and other problems.


Memory

Due to the high integration density of contemporary computer memory
chips ''CHiPs'' is an American crime drama television series created by Rick Rosner and originally aired on NBC from September 15, 1977, to May 1, 1983. It follows the lives of two motorcycle officers of the California Highway Patrol (CHP). The serie ...
, the individual memory cell structures became small enough to be vulnerable to
cosmic ray Cosmic rays are high-energy particles or clusters of particles (primarily represented by protons or atomic nuclei) that move through space at nearly the speed of light. They originate from the Sun, from outside of the Solar System in our own ...
s and/or
alpha particle Alpha particles, also called alpha rays or alpha radiation, consist of two protons and two neutrons bound together into a particle identical to a helium-4 nucleus. They are generally produced in the process of alpha decay, but may also be produce ...
emission. The errors caused by these phenomena are called
soft error In electronics and computing, a soft error is a type of error where a signal or datum is wrong. Errors may be caused by a defect, usually understood either to be a mistake in design or construction, or a broken component. A soft error is also a s ...
s. This can be a problem for
DRAM Dynamic random-access memory (dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal-oxid ...
- and SRAM-based memories. ''Memory scrubbing'' does error-detection and correction of bit errors in computer
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * Ra ...
by using
ECC memory Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory. ECC memory is used in most computers where data corruption c ...
, other copies of the data, or other
error-correction code In computing, telecommunication, information theory, and coding theory, an error correction code, sometimes error correcting code, (ECC) is used for error control, controlling errors in data over unreliable or noisy communication channels. The ce ...
s.


FPGA

''Scrubbing'' is a technique used to reprogram an
FPGA A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
. It can be used periodically to avoid the accumulation of errors without the need to find one in the configuration bitstream, thus simplifying the design. Numerous approaches can be taken with respect to scrubbing, from simply reprogramming the FPGA to partial reconfiguration. The simplest method of scrubbing is to completely reprogram the FPGA at some periodic rate (typically 1/10 the calculated upset rate). However, the FPGA is not operational during that reprogram time, on the order of micro to milliseconds. For situations that cannot tolerate that type of interruption, partial reconfiguration is available. This technique allows the FPGA to be reprogrammed while still operational.


See also

*
Data corruption In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
*
Data degradation Data degradation is the gradual corruption of computer data due to an accumulation of non-critical failures in a data storage device. The phenomenon is also known as data decay, data rot or bit rot. Example Below are several digital images ill ...
*
Error detection and correction In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
*
fsck The system utility fsck (''file system consistency check'') is a tool for checking the consistency of a file system in Unix and Unix-like operating systems, such as Linux, macOS, and FreeBSD. A similar command, CHKDSK, exists in Microsoft Windows ...
- a tool for checking the consistency of a
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
*
CHKDSK In computing, CHKDSK (short for "check disk") is a system tool and command in DOS, Digital Research FlexOS, IBM/Toshiba 4690 OS, IBM OS/2, Microsoft Windows and related operating systems. It verifies the file system integrity of a volume and at ...
- similar to fsck, used in
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
operating systems


References


External links


''Soft Errors in Electronic Memory''
{{data Error detection and correction