HOME





Dual Modular Redundancy
In reliability engineering, dual modular redundancy (DMR) is when components of a system are duplicated, providing redundancy in case one should fail. It is particularly applied to systems where the duplicated components work in parallel, particularly in fault-tolerant computer systems. A typical example is a complex computer system which has duplicated nodes, so that should one node fail, another is ready to carry on its work. DMR provides robustness to the failure of one component, and error detection in case instruments or computers that should give the same result give different results, but does not provide error correction In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ..., as ''which'' component is correct and which is malfunctioning cannot be automatically determined. Ther ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Reliability Engineering
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability is defined as the probability that a product, system, or service will perform its intended function adequately for a specified period of time, OR will operate in a defined environment without failure. Reliability is closely related to availability, which is typically described as the ability of a component or system to function at a specified moment or interval of time. The ''reliability function'' is theoretically defined as the probability of success. In practice, it is calculated using different techniques, and its value ranges between 0 and 1, where 0 indicates no probability of success while 1 indicates definite success. This probability is estimated from detailed (physics of failure) analysis, previous data sets, or through reliability testing and reliability modeling. Availability, testability, maintainability, and maintenance ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Redundancy (engineering)
In engineering and systems theory, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing. In many safety-critical systems, such as fly-by-wire and hydraulic systems in aircraft, some parts of the control system may be triplicated, which is formally termed triple modular redundancy (TMR). An error in one component may then be out-voted by the other two. In a triply redundant system, the system has three sub components, all three of which must fail before the system fails. Since each one rarely fails, and the sub components are designed to preclude common failure modes (which can then be modelled as independent failure), the probability of all three failing is calculated to be extraordinarily small; it is often outweighed ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fault-tolerant Computer System
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission critical, mission-critical, or even life-critical systems. Fault tolerance specifically refers to a system's capability to handle faults without any degradation or downtime. In the event of an error, end-users remain unaware of any issues. Conversely, a system that experiences errors with some interruption in service or graceful degradation of performance is termed 'resilient'. In resilience, the system adapts to the error, maintaining service but acknowledging a certain impact on performance. Typically, fault tolerance describes computer systems, ensuring the overall system remains functional despite computer hardware, hardware or software issues. Non-computing examples include structures that retain their integrity despite damage from fatigue (material), fatigue, corrosion or impact. H ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Error Detection
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases. Definitions ''Error detection'' is the detection of errors caused by noise or other impairments during transmission from the transmitter to the receiver. ''Error correction'' is the detection of errors and reconstruction of the original, error-free data. History In classical antiquity, copyists of the Hebrew Bible were paid for their work according to the number of stichs (lines of verse). As the prose books of the Bible were hardly ever w ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


The Mythical Man-Month
''The Mythical Man-Month: Essays on Software Engineering'' is a book on software engineering and project management by Fred Brooks first published in 1975, with subsequent editions in 1982 and 1995. Its central theme is that adding manpower to a software project that is behind schedule delays it even longer. This idea is known as Brooks's law, and is presented along with the second-system effect and advocacy of Software prototyping, prototyping. Brooks's observations are based on his experiences at IBM while managing the development of OS/360. He had added more programmers to a project falling behind schedule, a decision that he would later conclude had, counter-intuitively, delayed the project even further. He also made the mistake of asserting that one project—involved in writing an ALGOL compiler—would require six months, regardless of the number of workers involved (it required longer). The tendency for managers to repeat such errors in project development led Brooks to qui ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Marine Chronometer
A marine chronometer is a precision timepiece that is carried on a ship and employed in the determination of the ship's position by celestial navigation. It is used to determine longitude by comparing Greenwich Mean Time (GMT), and the time at the current location found from observations of celestial bodies. When first developed in the 18th century, it was a major technical achievement, as accurate knowledge of the time over a long sea voyage was vital for effective navigation, lacking electronic or communications aids. The first true chronometer was the life work of one man, John Harrison, spanning 31 years of persistent experimentation and testing that revolutionized naval (and later aerial) navigation. The term ''wikt:chronometer, chronometer'' was coined from the Greek words () (meaning time) and (meaning measure). The 1713 book ''Physico-Theology'' by the English cleric and scientist William Derham includes one of the earliest theoretical descriptions of a marine chronome ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Lockstep (computing)
Lockstep systems are fault-tolerant computer systems that run the same set of operations at the same time in parallel. The redundancy (duplication) allows error detection and error correction: the output from lockstep operations can be compared to determine if there has been a fault if there are at least two systems ( dual modular redundancy DMR), and the error can be automatically corrected if there are at least three systems ( triple modular redundancy TMR), via majority vote. The term " lockstep" originates from army usage, where it refers to synchronized walking, in which marchers walk as closely together as physically practical. To run in lockstep, each system is set up to progress from one well-defined state to the next well-defined state. When a new set of inputs reaches the system, it processes them, generates new outputs and updates its state. This set of changes (new inputs, new outputs, new state) is considered to define that step, and must be treated as an atomic t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

1ESS Switch
The Number One Electronic Switching System (1ESS) was the first large-scale stored program control (SPC) telephone exchange or electronic switching system in the Bell System. It was manufactured by Western Electric and first placed into service in Succasunna, New Jersey, in May 1965. The switching fabric was composed of a reed relay matrix controlled by wire spring relays which in turn were controlled by a central processing unit (CPU). The 1AESS central office switch was a plug compatible, higher capacity upgrade from 1ESS with a faster 1A processor that incorporated the existing instruction set for programming compatibility, and used smaller remreed switches, fewer relays, and featured disk storage. It was in service from 1976 to 2017. Switching fabric The voice switching fabric plan was similar to that of the earlier 5XB switch in being bidirectional and in using the call-back principle. The largest full-access matrix switches (the 12A line grids had partial acces ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Triple Modular Redundancy
In computing, triple modular redundancy, sometimes called triple-mode redundancy, (TMR) is a fault-tolerant form of N-modular redundancy, in which three systems perform a process and that result is processed by a majority-voting system to produce a single output. If any one of the three systems fails, the other two systems can correct and mask the fault. The TMR concept can be applied to many forms of Redundancy (engineering), redundancy, such as software redundancy in the form of N-version programming, and is commonly found in fault-tolerant computer systems. Space satellite systems often use TMR, although satellite RAM usually uses Hamming(7,4), Hamming error correction. Some ECC memory uses triple modular redundancy hardware (rather than the more common Hamming code), because triple modular redundancy hardware is faster than Hamming error correction hardware. Called repetition code, some communication systems use N-modular redundancy as a simple form of forward error correct ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Hot Spare
A hot spare or warm spare or hot standby is a component used as a failover mechanism to provide reliability in system configurations. The hot spare is active and connected as part of a working system. When a key component fails, the hot spare is switched into operation. More generally, a hot standby can be used to refer to any device or system that is held in readiness to overcome an otherwise significant start-up delay. Examples Examples of hot spares are components such as A/V switches, computers, network printers, and hard disks. The equipment is powered on, or considered "hot," but not actively functioning in (i.e. used by) the system. Electrical generators may be held on hot standby, or a steam train may be held at the shed fired up (literally hot) ready to replace a possible failure of an engine in service. Explanation In designing a reliable system, it is recognized that there will be failures. At the extreme, a complete system can be duplicated and kept up to date� ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Engineering Concepts
Engineering is the practice of using natural science, mathematics, and the engineering design process to solve problems within technology, increase efficiency and productivity, and improve systems. Modern engineering comprises many subfields which include designing and improving infrastructure, machinery, vehicles, electronics, materials, and energy systems. The discipline of engineering encompasses a broad range of more specialized fields of engineering, each with a more specific emphasis for applications of mathematics and science. See glossary of engineering. The word ''engineering'' is derived from the Latin . Definition The American Engineers' Council for Professional Development (the predecessor of the Accreditation Board for Engineering and Technology aka ABET) has defined "engineering" as: History Engineering has existed since ancient times, when humans devised inventions such as the wedge, lever, wheel and pulley, etc. The term ''engineering'' is derived ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]