Chipkill
   HOME
*





Chipkill
__NOTOC__ Chipkill is IBM's trademark for a form of advanced error checking and correcting (ECC) computer memory technology that protects computer memory systems from any single memory chip failure as well as multi-bit errors from any portion of a single memory chip. One simple scheme to perform this function scatters the bits of a Hamming code ECC word across multiple memory chips, such that the failure of any single memory chip will affect only one ECC bit per word. This allows memory contents to be reconstructed despite the complete failure of one chip. Typical implementations use more advanced codes, such as a BCH code, that can correct multiple bits with less overhead. Chipkill is frequently combined with dynamic bit-steering, so that if a chip fails (or has exceeded a threshold of bit errors), another, spare, memory chip is used to replace the failed chip. The concept is similar to that of RAID, which protects against disk failure, except that now the concept is applied to ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lockstep (computing)
Lockstep systems are fault-tolerant computer systems that run the same set of operations at the same time in parallel. The redundancy (duplication) allows error detection and error correction: the output from lockstep operations can be compared to determine if there has been a fault if there are at least two systems (dual modular redundancy), and the error can be automatically corrected if there are at least three systems (triple modular redundancy), via majority vote. The term "lockstep" originates from army usage, where it refers to synchronized walking, in which marchers walk as closely together as physically practical. To run in lockstep, each system is set up to progress from one well-defined state to the next well-defined state. When a new set of inputs reaches the system, it processes them, generates new outputs and updates its state. This set of changes (new inputs, new outputs, new state) is considered to define that step, and must be treated as an atomic transaction; in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Double-device Data Correction
Lockstep systems are fault-tolerant computer systems that run the same set of operations at the same time in parallel. The redundancy (duplication) allows error detection and error correction: the output from lockstep operations can be compared to determine if there has been a fault if there are at least two systems (dual modular redundancy), and the error can be automatically corrected if there are at least three systems (triple modular redundancy), via majority vote. The term "lockstep" originates from army usage, where it refers to synchronized walking, in which marchers walk as closely together as physically practical. To run in lockstep, each system is set up to progress from one well-defined state to the next well-defined state. When a new set of inputs reaches the system, it processes them, generates new outputs and updates its state. This set of changes (new inputs, new outputs, new state) is considered to define that step, and must be treated as an atomic transaction; in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ECC Memory
Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct n-bit data corruption which occurs in memory. ECC memory is used in most computers where data corruption cannot be tolerated, like industrial control applications, critical databases, and infrastructural memory caches. Typically, ECC memory maintains a memory system immune to single-bit errors: the data that is read from each word is always the same as the data that had been written to it, even if one of the bits actually stored has been flipped to the wrong state. Most non-ECC memory cannot detect errors, although some non-ECC memory with parity support allows detection but not correction. Description Error correction codes protect against undetected data corruption and are used in computers where such corruption is unacceptable, examples being scientific and financial computing applications, or in database and file servers. ECC can als ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lockstep Memory
Lockstep systems are fault-tolerant computer systems that run the same set of operations at the same time in parallel. The redundancy (duplication) allows error detection and error correction: the output from lockstep operations can be compared to determine if there has been a fault if there are at least two systems (dual modular redundancy), and the error can be automatically corrected if there are at least three systems (triple modular redundancy), via majority vote. The term "lockstep" originates from army usage, where it refers to synchronized walking, in which marchers walk as closely together as physically practical. To run in lockstep, each system is set up to progress from one well-defined state to the next well-defined state. When a new set of inputs reaches the system, it processes them, generates new outputs and updates its state. This set of changes (new inputs, new outputs, new state) is considered to define that step, and must be treated as an atomic transaction; in ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Memory ProteXion
For computer memory, Memory ProteXion, found in IBM xSeries servers, is a form of " redundant bit steering". This technology uses redundant bits in a data packet to recover from a DIMM failure. Memory ProteXion is different from normal ECC error correction in that it uses only 6 bits for ECC, leaving 2 bits behind. These 2 bits can be used to re-route data from failed memory, much like hot spare on a RAID. The ECC is used to reconstruct the data, and the extra bits to store it. Memory ProteXion, also known as “redundant bit steering”, is the technology behind using redundant bits in a data packet to provide backup in the event of a DIMM failure. One failure does not cause a predictive failure analysis Predictive Failure Analysis (PFA) refers to methods intended to predict imminent failure of systems or components (software or hardware), and potentially enable mechanisms to avoid or counteract failure issues, or recommend maintenance of systems pr ... to be issued on the DIMM, ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Reliability, Availability And Serviceability (computer Hardware)
Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by International Business Machines ( IBM) as a term to describe the robustness of their mainframe computers. Computers designed with higher levels of RAS have many features that protect data integrity and help them stay available for long periods of time without failure This data integrity and uptime is a particular selling point for mainframes and fault-tolerant systems. Definitions While RAS originated as a hardware-oriented term, systems thinking has extended the concept of reliability-availability-serviceability to systems in general, including software. * ''Reliability'' can be defined as the probability that a system will produce correct outputs up to some given time ''t''. Reliability is enhanced by fea ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Micron Technology
Micron Technology, Inc. is an American producer of computer memory and computer data storage including dynamic random-access memory, flash memory, and USB flash drives. It is headquartered in Boise, Idaho. Its consumer products, including the Ballistix line of memory modules, are marketed under the Crucial brand. Micron and Intel together created IM Flash Technologies, which produced NAND flash memory. It owned Lexar between 2006 and 2017. History 1978–1999 Micron was founded in Boise, Idaho, in 1978 by Ward Parkinson, Joe Parkinson, Dennis Wilson, and Doug Pitman as a semiconductor design consulting company. Startup funding was provided by local Idaho businessmen Tom Nicholson, Allen Noble, Rudolph Nelson, and Ron Yanke. Later it received funding from Idaho billionaire J. R. Simplot, whose fortune was made in the potato business. In 1981, the company moved from consulting to manufacturing with the completion of its first wafer fabrication unit ("Fab 1"), producing 64K ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computer Memory
In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term ''primary storage'' or '' main memory''. An archaic synonym for memory is store. Computer memory operates at a high speed compared to storage that is slower but less expensive and higher in capacity. Besides storing opened programs, computer memory serves as disk cache and write buffer to improve both reading and writing performance. Operating systems borrow RAM capacity for caching so long as not needed by running software. If needed, contents of the computer memory can be transferred to storage; a common way of doing this is through a memory management technique called ''virtual memory''. Modern memory is implemented as semiconductor memory, where data is stored within memory cells built from MOS transistors and other components on an integrated c ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Ars Technica
''Ars Technica'' is a website covering news and opinions in technology, science, politics, and society, created by Ken Fisher and Jon Stokes in 1998. It publishes news, reviews, and guides on issues such as computer hardware and software, science, technology policy, and video games. ''Ars Technica'' was privately owned until May 2008, when it was sold to Condé Nast Digital, the online division of Condé Nast Publications. Condé Nast purchased the site, along with two others, for $25 million and added it to the company's ''Wired'' Digital group, which also includes ''Wired'' and, formerly, Reddit. The staff mostly works from home and has offices in Boston, Chicago, London, New York City, and San Francisco. The operations of ''Ars Technica'' are funded primarily by advertising, and it has offered a paid subscription service since 2001. History Ken Fisher, who serves as the website's current editor-in-chief, and Jon Stokes created ''Ars Technica'' in 1998. Its purpose was ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Single-error Correction And Double-error Detection
In computer science and telecommunication, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the simple parity code cannot correct errors, and can detect only an odd number of bits in error. Hamming codes are perfect codes, that is, they achieve the highest possible rate for codes with their block length and minimum distance of three. Richard W. Hamming invented Hamming codes in 1950 as a way of automatically correcting errors introduced by punched card readers. In his original paper, Hamming elaborated his general idea, but specifically focused on the Hamming(7,4) code which adds three parity bits to four bits of data. In mathematical terms, Hamming codes are a class of binary linear code. For each integer there is a code-word with block length and message length . Hence the rate of Hamming codes is , which is the highest possib ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Redundant Array Of Independent Memory
A redundant array of independent memory (RAIM) is a design feature found in certain computers' main random access memory. RAIM utilizes additional memory modules and striping algorithms to protect against the failure of any particular module and keep the memory system operating continuously. RAIM is similar in concept to a redundant array of independent disks (RAID), which protects against the failure of a disk drive, but in the case of memory it supports several DRAM device chipkills and entire memory channel failures. RAIM is much more robust than parity checking and ECC memory technologies which cannot protect against many varieties of memory failures. On July 22, 2010, IBM introduced the first high end computer server featuring RAIM, the zEnterprise 196. Each z196 machine contains up to 3 TB (usable) of RAIM-protected main memory. In 2011 the business class model z114 was introduced also supporting RAIM. The formal announcement letter offered some additional information r ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

University Of Rochester
The University of Rochester (U of R, UR, or U of Rochester) is a private research university in Rochester, New York. The university grants undergraduate and graduate degrees, including doctoral and professional degrees. The University of Rochester enrolls approximately 6,800 undergraduates and 5,000 graduate students. Its 158 buildings house over 200 academic majors. According to the National Science Foundation, Rochester spent more than $397 million on research and development in 2020, ranking it 66th in the nation. With approximately 28,000 full-time employees, the university is the largest private employer in Upstate New York and the 7th largest in all of New York State. The College of Arts, Sciences, and Engineering is home to departments and divisions of note. The Institute of Optics was founded in 1929 through a grant from Eastman Kodak and Bausch and Lomb as the first educational program in the US devoted exclusively to optics, awards approximately half ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]