HOME

TheInfoList



OR:

Predictive Failure Analysis (PFA) refers to methods intended to predict imminent failure of systems or components (software or hardware), and potentially enable mechanisms to avoid or counteract failure issues, or recommend maintenance of systems prior to failure. For example, computer mechanisms that analyze trends in corrected errors to predict future failures of hardware/memory components and proactively enabling mechanisms to avoid them. Predictive Failure Analysis was originally used as term for a proprietary IBM technology for monitoring the likelihood of
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magn ...
s to fail, although the term is now used generically for a variety of technologies for judging the imminent failure of CPU's, memory and I/O devices. See also first failure data capture.


Disks

IBM introduced the term ''PFA'' and its technology in 1992 with reference to its 0662-S1x drive (1052 MB Fast-Wide SCSI-2 disk which operated at 5400 rpm). The technology relies on measuring several key (mainly mechanical) parameters of the drive unit, for example the flying height of
head A head is the part of an organism which usually includes the ears, brain, forehead, cheeks, chin, eyes, nose, and mouth, each of which aid in various sensory functions such as sight, hearing, smell, and taste. Some very simple animals may no ...
s. The drive
firmware In computing, firmware is a specific class of computer software that provides the low-level control for a device's specific hardware. Firmware, such as the BIOS of a personal computer, may contain basic functions of a device, and may provide ...
compares the measured parameters against predefined thresholds and evaluates the health status of the drive. If the drive appears likely to fail soon, the system sends notification to the disk controller. The major drawbacks of the technology included: * the binary result - the only status visible to the host was presence or absence of a notification * the unidirectional communications - the drive firmware sending notification The technology merged with IntelliSafe to form the Self-Monitoring, Analysis, and Reporting Technology (SMART).


Processor and Memory

High counts of corrected RAM intermittent errors by ECC can be predictive of future
DIMM A DIMM () (Dual In-line Memory Module), commonly called a RAM stick, comprises a series of dynamic random-access memory integrated circuits. These memory modules are mounted on a printed circuit board and designed for use in personal compute ...
failures and so automatic offlining for memory and CPU caches can be used to avoid future errors, for example under the
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
operating system the mcelog
daemon Daimon or Daemon (Ancient Greek: , "god", "godlike", "power", "fate") originally referred to a lesser deity or guiding spirit such as the daimons of ancient Greek religion and mythology and of later Hellenistic religion and philosophy. The wo ...
will automatically remove from usage memory pages showing excessive corrections, and will remove from usage processor cores showing excessive cache correctable memory errors.


Optical media

On optical media ( CD,
DVD The DVD (common abbreviation for Digital Video Disc or Digital Versatile Disc) is a digital optical disc data storage format. It was invented and developed in 1995 and first released on November 1, 1996, in Japan. The medium can store any kin ...
and
Blu-ray The Blu-ray Disc (BD), often known simply as Blu-ray, is a digital optical disc data storage format. It was invented and developed in 2005 and released on June 20, 2006 worldwide. It is designed to supersede the DVD format, and capable of s ...
), failures caused by degradation of media can be predicted and media of low manufacturing quality can be detected prior to data loss occurring by measuring the rate of correctable data errors using software such as QpxTool or Nero DiscSpeed. However, not all vendors and models of optical drives allow error scanning.List of supported devices by dosc quality scanning software ''QPxTool''
/ref>


References


See also


MCELog- Linux daemon for processing of x86 machine checks for predictive failure analysis
Hard disk computer storage IBM storage devices {{Compu-storage-stub