In
reliability engineering, dual modular redundancy (DMR) is when components of a system are duplicated, providing
redundancy in case one should fail. It is particularly applied to systems where the duplicated components work in parallel, particularly in
fault-tolerant computer system
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
s. A typical example is a complex computer system which has duplicated nodes, so that should one node fail, another is ready to carry on its work.
DMR provides robustness to the failure of one component, and
error detection
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
in case instruments or computers that should give the same result give different results, but does not provide
error correction, as ''which'' component is correct and which is malfunctioning cannot be automatically determined. There is an old adage to this effect, stating: "Never go to sea with two chronometers; take one or three."
Meaning, if two
chronometers contradict, a sailor may not know which one is reading correctly.
A
lockstep
In the United States, lockstep marching or simply lockstep is marching in a very close single file in such a way that the leg of each person in the file moves in the same way and at the same time as the corresponding leg of the person immediately ...
fault-tolerant machine uses replicated elements operating in parallel. At any time, all the replications of each element should be in the same state. The same inputs are provided to each replication, and the same outputs are expected. The outputs of the replications are compared using a voting circuit. A machine with two replications of each element is termed dual modular redundant (DMR). The voting circuit can then only detect a mismatch and recovery relies on other methods. Examples include
1ESS switch
The Number One Electronic Switching System (1ESS) was the first large-scale stored program control (SPC) telephone exchange or electronic switching system in the Bell System. It was manufactured by Western Electric and first placed into servi ...
.
A machine with three replications of each element is termed
triple modular redundant (TMR). The voting circuit can determine which replication is in error when a two-to-one vote is observed. In this case, the voting circuit can output the correct result, and discard the erroneous version. After this, the internal state of the erroneous replication is assumed to be different from that of the other two, and the voting circuit can switch to a DMR mode. This model can be applied to any larger number of replications.
See also
*
Hot spare
A hot spare or warm spare or hot standby is used as a failover mechanism to provide reliability in system configurations. The hot spare is active and connected as part of a working system. When a key component fails, the hot spare is switched i ...
References
{{reflist
Engineering concepts
Reliability engineering
Safety
Fault-tolerant computer systems
Error detection and correction