HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, error recovery control (ERC) (
Western Digital Western Digital Corporation (WDC, commonly known as Western Digital or WD) is an American computer drive manufacturer and data storage company, headquartered in San Jose, California. It designs, manufactures and sells data technology produc ...
: time-limited error recovery (TLER),
Samsung The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
/ Hitachi: command completion time limit (CCTL)) is a feature of hard disks which allow a system administrator to configure the amount of time a drive's firmware is allowed to spend recovering from a read or write error. Limiting the recovery time allows for improved error handling in hardware or software
RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
environments. In some cases, there is a conflict as to whether error handling should be undertaken by the hard drive or by the RAID implementation, which leads to drives being marked as unusable and significant performance degradation, when this could otherwise have been avoided.


Overview

Modern
hard drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magne ...
s feature an ability to recover from some read/write errors by internally remapping
sectors Sector may refer to: Places * Sector, West Virginia, U.S. Geometry * Circular sector, the portion of a disc enclosed by two radii and a circular arc * Hyperbolic sector, a region enclosed by two radii and a hyperbolic arc * Spherical sector, a p ...
and performing other forms of self-test and recovery. The process for this can sometimes take several seconds or (under heavy usage) minutes, during which time the drive is unresponsive. Hardware RAID controllers and software RAID implementations are designed to recognise a drive which does not respond within a few seconds, and mark it as unreliable, indicating that it should be withdrawn from use and the array rebuilt from parity data. This is a long process, degrades performance, and if more drives fail under the resulting additional workload, it may be catastrophic. If the drive itself is inherently reliable but has some bad sectors, then TLER and similar features prevent a disk from being unnecessarily marked as 'failed' by limiting the time spent on correcting detected errors before advising the array controller of a failed operation. The array controller can then handle the data recovery for the limited amount involved, rather than marking the entire drive as faulty.


Desktop computers and TLER

Effectively, TLER and similar features limit the performance of on-drive error handling, to allow hardware RAID controllers and software RAID implementations to handle the error if problematic. Generally, Western Digital enterprise drives such as
Raptor Raptor or RAPTOR may refer to: Animals The word "raptor" refers to several groups of bird-like dinosaurs which primarily capture and subdue/kill prey with their talons. * Raptor (bird) or bird of prey, a bird that primarily hunts and feeds on ...
, Caviar RE2 and RE2-GP (RAID Edition) come with TLER Read "Enabled" (7 seconds) and TLER Write "Enabled" (7 seconds) while desktop drives such as Caviar SE, SE16, and GP come with TLER Read and Write Disabled (configured as 0 seconds, to disable).


Standalone vs. RAID considerations

It is best for TLER to be "enabled" when in a RAID array to prevent the recovery time from a disk read or write error from exceeding the RAID implementation's timeout threshold. If a drive times out, the hard disk will need to be manually re-added to the array, requiring a re-build and re-synchronization of the hard disk. Enabling TLER seeks to prevent this by interrupting error correction before timeout, to report failures only for data segments. The result is increased reliability in a RAID array. In a stand-alone configuration TLER should be disabled. As the drive is not redundant, reporting segments as failed will only increase manual intervention. Without a hardware RAID controller or a software RAID implementation to drop the disk, normal (no TLER) recovery ability is most stable. In a software RAID configuration whether or not TLER is helpful is dependent on the operating system. For example, in FreeBSD the ATA/CAM stack controls the timeouts, and is set to progressively increase the timeouts as they occur. Thus, if a desktop disk without TLER starts delaying a response to a sector read, FreeBSD will retry the read with successively longer timeouts to prevent prematurely dropping the disk out of the array.


Interaction of TLER with the advanced ZFS filesystem

The ZFS filesystem was written to immediately write data to a sector that reports as bad or takes an excessively long time to read (such as non-TLER drives); this will usually force an immediate sector remap on a weak sector in most drives.


Western Digital Time Limit Error Recovery utility

The utility allows for the enabling or disabling of the TLER parameter in the hard disk's firmware settings allowing the user to determine the best setting for his particular usage as either a stand-alone or RAID drive. This utility is written for
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
and you will require a DOS bootable disk with this utility on it to use it. The utility works on and makes changes to all compatible Western Digital hard disk drives connected to the computer. It is important to remember that any change will affect all the hard drives. If you only wish to change specific hard drives on your computer then you should disconnect the other hard drives before you use this utility, then reconnect them after you are finished. The utility comes with three batch files, to get the current state of the TLER setting on all the hard drives, to enable TLER, and to disable TLER. The included will set the Read & Write TLER time to seven seconds. If you wish to use a custom timeout value, you can use the utility directly with the -r# -w# parameters to specify how many seconds the Time Limit value should be. Western Digital claims that using the utility on newer drives can damage the firmware and make the disk unusable. The utility is no longer available from Western Digital, and new drives will not be able to have the TLER setting changed. RE disks are only suitable for RAID arrays and Caviar are only suitable for non-RAID use. The utility still works for older drives.


smartctl utility

The utility (part of the smartmontools package) can be used on hard disk drives that fully implement the ATA-8 standard to control the TLER behavior by setting the SCT Error Recovery Control (scterc) parameter. Controlling the TLER behavior through the utility may not work on all hard disk drives because some manufacturers have changed their desktop drives not to include the support for the ERC parameter, purportedly to force sales of their more expensive RAID/enterprise models.


RAID controllers

Disconnect timeout values for different hardware RAID controllers may vary between vendors; thus, TLER should trigger before the controller times out the drive. For example, 3ware 9650SE uses 20 seconds as the timeout, while for the LSI Logic used in IBM x-series it is 10 seconds. Widely available Intel Matrix RAID / Intel Rapid Storage Technology, embedded in
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
server motherboards and modern desktop motherboards, is a pseudo-hardware controller, not a true hardware RAID controller.


Software RAID

Linux mdadm simply holds and lets the drive complete its recovery – however, the default command timeout for the SCSI Disk layer (/sys/block/sd?/device/timeout) is 30 seconds, after which it will attempt to reset the drive, and if that fails, put the drive offline.


References

{{Reflist


External links


Linux Raid wiki: Timeout Mismatch

Western Digital FAQ answer ID 1397: Difference between Desktop edition and RAID (Enterprise) edition drives

Time-Limited Error Recovery (TLER) Information Sheet
Western Digital, January 2013

Rotating disc computer storage media Hard disk computer storage