HOME





Reliability, Availability And Serviceability
Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, and serviceability design. The phrase was originally used by IBM as a term to describe the robustness of their mainframe computers. Computers designed with higher levels of RAS have many features that protect data integrity and help them stay available for long periods of time without failure. This data integrity and uptime is a particular selling point for mainframes and fault-tolerant systems. Definitions While RAS originated as a hardware-oriented term, systems thinking has extended the concept of reliability-availability-serviceability to systems in general, including software: * ''Reliability'' can be defined as the probability that a system will produce correct outputs up to some given time ''t''. Reliability is enhanced by features that help to avoid, detec ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computer Hardware
Computer hardware includes the physical parts of a computer, such as the central processing unit (CPU), random-access memory (RAM), motherboard, computer data storage, graphics card, sound card, and computer case. It includes external devices such as a Computer monitor, monitor, Computer mouse, mouse, Computer keyboard, keyboard, and Computer speakers, speakers. By contrast, software is a set of written instructions that can be stored and run by hardware. Hardware derived its name from the fact it is ''Hardness, hard'' or rigid with respect to changes, whereas software is ''soft'' because it is easy to change. Hardware is typically directed by the software to execute any command or Instruction (computing), instruction. A combination of hardware and software forms a usable computing system, although Digital electronics, other systems exist with only hardware. History Early computing devices were more complicated than the ancient abacus date to the seventeenth century. French ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Electromigration
Electromigration is the transport of material caused by the gradual movement of the ions in a Conductor (material), conductor due to the momentum transfer between conducting electrons and diffusing metal atoms. The effect is important in applications where high direct current densities are used, such as in microelectronics and related structures. As the structure size in electronics such as integrated circuits (ICs) decreases, the practical significance of this effect increases. History The phenomenon of electromigration has been known for over 100 years, having been discovered by the French scientist Gerardin. The topic first became of practical interest during the late 1960s when packaged ICs first appeared. The earliest commercially available ICs failed in a mere three weeks of use from runaway electromigration, which led to a major industry effort to correct this problem. The first observation of electromigration in thin films was made by I. Blech.I. Blech: ''Electromigratio ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Chipkill
__NOTOC__ Chipkill is IBM's trademark for a form of advanced error checking and correcting (ECC) computer memory technology that protects memory systems from single memory chip failures and multi-bit errors from any portion of a single memory chip. One simple scheme to perform this function scatters the bits of a Hamming code ECC word across multiple memory chips, such that the failure of any single memory chip will affect only one ECC bit per word. This allows memory contents to be reconstructed despite the complete failure of one chip. Typical implementations use more advanced codes, such as a BCH code, that can correct multiple bits with less overhead. Chipkill is frequently combined with dynamic bit-steering, so that if a chip fails (or has exceeded a threshold of bit errors), another, spare, memory chip is used to replace the failed chip. The concept is similar to that of RAID, which protects against disk failure, except that now the concept is applied to individual memory chi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Error Correcting Code
In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels. The central idea is that the sender encodes the message in a redundant way, most often by using an error correction code, or error correcting code (ECC). The redundancy allows the receiver not only to detect errors that may occur anywhere in the message, but often to correct a limited number of errors. Therefore a reverse channel to request re-transmission may not be needed. The cost is a fixed, higher forward channel bandwidth. The American mathematician Richard Hamming pioneered this field in the 1940s and invented the first error-correcting code in 1950: the Hamming (7,4) code. FEC can be applied in situations where re-transmissions are costly or impossible, such as one-way communication links or when transmitting to multiple receivers in m ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Parity Bit
A parity bit, or check bit, is a bit added to a string of binary code. Parity bits are a simple form of error detecting code. Parity bits are generally applied to the smallest units of a communication protocol, typically 8-bit octets (bytes), although they can also be applied separately to an entire message string of bits. The parity bit ensures that the total number of 1-bits in the string is even or odd. Accordingly, there are two variants of parity bits: even parity bit and odd parity bit. In the case of even parity, for a given set of bits, the bits whose value is 1 are counted. If that count is odd, the parity bit value is set to 1, making the total count of occurrences of 1s in the whole set (including the parity bit) an even number. If the count of 1s in a given set of bits is already even, the parity bit's value is 0. In the case of odd parity, the coding is reversed. For a given set of bits, if the count of bits with a value of 1 is even, the parity bit value is se ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computer Memory
Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ''core'' (for magnetic core memory) and ''store''. Main memory operates at a high speed compared to mass storage which is slower but less expensive per bit and higher in capacity. Besides storing opened programs and data being actively processed, computer memory serves as a Page cache, mass storage cache and write buffer to improve both reading and writing performance. Operating systems borrow RAM capacity for caching so long as it is not needed by running software. If needed, contents of the computer memory can be transferred to storage; a common way of doing this is through a memory management technique called ''virtual memory''. Modern computer memory is implemented as semiconductor memory, where data is stored within memory cell (com ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ACPI Platform Error Interface
Advanced Configuration and Power Interface (ACPI) is an open standard that operating systems can use to discover and configure computer hardware components, to perform power management (e.g. putting unused hardware components to sleep), auto configuration (e.g. Plug and Play and hot swapping), and status monitoring. It was first released in December 1996. ACPI aims to replace Advanced Power Management (APM), the MultiProcessor Specification, and the Plug and Play BIOS (PnP) Specification. ACPI brings power management under the control of the operating system, as opposed to the previous BIOS-centric system that relied on platform-specific firmware to determine power management and configuration policies. The specification is central to the Operating System-directed configuration and Power Management (OSPM) system. ACPI defines hardware abstraction interfaces between the device's firmware (e.g. BIOS, UEFI), the computer hardware components, and the operating systems. Internally, AC ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Machine Check Architecture
In computing, Machine Check Architecture (MCA) is an Intel and AMD mechanism in which the CPU reports hardware errors to the operating system. Intel's P6 and Pentium 4 family processors, AMD's K7 and K8 family processors, as well as the Itanium architecture implement a machine check architecture that provides a mechanism for detecting and reporting hardware (machine) errors, such as: system bus errors, ECC errors, parity errors, cache errors, and translation lookaside buffer errors. It consists of a set of model-specific registers ( MSRs) that are used to set up machine checking and additional banks of MSRs used for recording errors that are detected. See also * Machine-check exception (MCE) * High availability (HA) * Reliability, availability and serviceability Reliability, availability and serviceability (RAS), also known as reliability, availability, and maintainability (RAM), is a computer hardware engineering term involving reliability engineering, high availability, an ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Master-checker
Master-checker or master/checker is a hardware-supported fault tolerance architecture for multiprocessor Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ... systems, in which two processors, referred to as the ''master'' and ''checker'', calculate the same functions in parallel in order to increase the probability that the result is exact. The checker CPU is synchronised at clock level with the master CPU and processes the same programs as the master. Whenever the master CPU generates an output, the checker CPU compares this output to its own calculation and in the event of a difference raises a warning. The master-checker system generally gives more accurate answers by ensuring that the answer is correct before passing it on to the application requesting the algorithm being complet ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Lockstep (computing)
Lockstep systems are fault-tolerant computer systems that run the same set of operations at the same time in parallel. The redundancy (duplication) allows error detection and error correction: the output from lockstep operations can be compared to determine if there has been a fault if there are at least two systems ( dual modular redundancy DMR), and the error can be automatically corrected if there are at least three systems ( triple modular redundancy TMR), via majority vote. The term " lockstep" originates from army usage, where it refers to synchronized walking, in which marchers walk as closely together as physically practical. To run in lockstep, each system is set up to progress from one well-defined state to the next well-defined state. When a new set of inputs reaches the system, it processes them, generates new outputs and updates its state. This set of changes (new inputs, new outputs, new state) is considered to define that step, and must be treated as an atomic t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Itanium
Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly developed by HP and Intel. Launching in June 2001, Intel initially marketed the processors for enterprise servers and high-performance computing systems. In the concept phase, engineers said "we could run circles around PowerPC...we could kill the x86". Early predictions were that IA-64 would expand to the lower-end servers, supplanting Xeon, and eventually penetrate into the personal computers, eventually to supplant Reduced instruction set computer, reduced instruction set computing (RISC) and complex instruction set computing (CISC) architectures for all general-purpose applications. When first released in 2001 after a decade of development, Itanium's performance was disappointing compared to better-established RISC and CISC processors. Em ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

IBM System Z
IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers. In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family will soon include the newest model, the IBM z17, as well as the z16, z15, z14, and z13 (released under the IBM z Systems/IBM System z names), the IBM zEnterprise models (in common use the zEC12 and z196), the IBM System z10 models (in common use the z10 EC), the IBM System z9 models (in common use the z9EC) and ''IBM eServer zSeries'' models (in common use refers only to the z900 and z990 generations of mainframe). Architecture The ''zSeries,'' ''zEnterprise,'' ''System z'' and ''IBM Z'' families were named for their availability – ''z'' stands for High availability, zero downtime. The systems are built with spare components capable of hot failovers to ensure continuous operations. The IBM Z family maintains full backward compatibility. In effect, current system ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]