
In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, triple modular redundancy, sometimes called triple-mode redundancy, (TMR) is a
fault-tolerant form of N-modular redundancy, in which three systems perform a process and that result is processed by a majority-voting system to produce a single output. If any one of the three systems fails, the other two systems can correct and mask the fault.
The TMR concept can be applied to many forms of
redundancy, such as software redundancy in the form of
N-version programming
''N''-version programming (NVP), also known as multiversion programming or multiple-version dissimilar software, is a method or process in software engineering where multiple functionally equivalent programs are independently generated from the sam ...
, and is commonly found in
fault-tolerant computer system
Fault tolerance is the ability of a system to maintain proper operation despite failures or faults in one or more of its components. This capability is essential for high-availability, mission critical, mission-critical, or even life-critical sys ...
s.
Space satellite systems often use TMR, although satellite RAM usually uses
Hamming error correction.
Some
ECC memory
Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code (ECC) to detect and correct ''n''-bit data corruption which occurs in memory.
Typically, ECC memory maintains a memory system immun ...
uses triple modular redundancy hardware (rather than the more common
Hamming code
In computer science and telecommunications, Hamming codes are a family of linear error-correcting codes. Hamming codes can detect one-bit and two-bit errors, or correct one-bit errors without detection of uncorrected errors. By contrast, the ...
), because triple modular redundancy hardware is faster than Hamming error correction hardware. Called
repetition code
In coding theory, the repetition code is one of the most basic linear error-correcting codes. In order to transmit a message over a noisy channel that may corrupt the transmission in a few places, the idea of the repetition code is to just repeat ...
, some communication systems use N-modular redundancy as a simple form of
forward error correction
In computing, telecommunication, information theory, and coding theory, forward error correction (FEC) or channel coding is a technique used for controlling errors in data transmission over unreliable or noisy communication channels.
The centra ...
. For example, 5-modular redundancy communication systems (such as
FlexRay
FlexRay is an automotive network communications protocol developed by the FlexRay Consortium to govern on-board automotive computing. It is designed to be faster and more reliable than CAN and TTP, but it is also more expensive. The FlexRay co ...
) use the majority of 5 samples – if any 2 of the 5 results are erroneous, the other 3 results can correct and mask the fault.
Modular redundancy is a basic concept, dating to antiquity, while the first use of TMR in a computer was the Czechoslovak computer
SAPO, in the 1950s.
General case
The general case of TMR is called N-modular redundancy, in which any positive number of replications of the same action is used. The number is typically taken to be at least three, so that error correction by majority vote can take place; it is also usually taken to be odd, so that no ties may happen.
[Course notes]
/ref>
Majority logic gate
3-input majority gate
The 3-input majority gate output is 1 if two or more of the inputs of the majority gate are 1; output is 0 if two or more of the majority gate's inputs are 0. Thus, the majority gate is the carry output of a full adder
An adder, or summer, is a digital circuit that performs addition of numbers. In many computers and other kinds of processors, adders are used in the arithmetic logic units (ALUs). They are also used in other parts of the processor, where they ar ...
, i.e., the majority gate is a voting machine
A voting machine is a machine used to record votes in an election without paper. The first voting machines were mechanical but it is increasingly more common to use ''electronic voting machines''. Traditionally, a voting machine has been defi ...
.[
The 3-input majority gate can be represented by the following boolean equation and ]truth table
A truth table is a mathematical table used in logic—specifically in connection with Boolean algebra, Boolean functions, and propositional calculus—which sets out the functional values of logical expressions on each of their functional arg ...
:
:
In TMR, three identical logic circuits (logic gates) are used to compute the same set of specified Boolean function. If there are no circuit failures, the outputs of the three circuits are identical. But due to circuit failures, the outputs of the three circuits may be different.
TMR operation
Assuming the Boolean function computed by the three identical logic gates has value 1, then: (a) if no circuit has failed, all three circuits produce an output of value 1, and the majority gate output has value 1. (b) if one circuit fails and produces an output of 0, while the other two are working correctly and produce an output of 1, the majority gate output is 1, i.e., it still has the correct value. And similarly for the case when the Boolean function computed by the three identical circuits has value 0. Thus, the majority gate output is guaranteed to be correct as long as no more than one of the three identical logic circuits has failed.[Dilip V. Sarwate, Lecture Notes for ECE 413 – Probability with Engineering Applications, Department of Electrical and Computer Engineering (ECE), UIUC College of Engineering, ]University of Illinois at Urbana-Champaign
The University of Illinois Urbana-Champaign (UIUC, U of I, Illinois, or University of Illinois) is a public land-grant research university in the Champaign–Urbana metropolitan area, Illinois, United States. Established in 1867, it is the f ...
For a TMR system with a single voter of reliability (probability of working) and three components of reliability , the probability of it being correct can be shown to be .[
TMR systems should use ]data scrubbing
Data scrubbing is an error correction technique that uses a background task to periodically inspect main memory or storage for errors, then corrects detected errors using redundant data in the form of different checksums or copies of data. Data ...
– rewrite flip-flops periodically – in order to avoid accumulation of errors.
Voter
The majority gate itself could fail. This can be protected against by applying triple redundancy to the voters themselves.
In a few TMR systems, such as the Saturn Launch Vehicle Digital Computer and functional triple modular redundancy (FTMR) systems, the voters are also triplicated. Three voters are used – one for each copy of the next stage of TMR logic. In such systems there is no single point of failure
A single point of failure (SPOF) is a part of a system that would Cascading failure, stop the entire system from working if it were to fail. The term single point of failure implies that there is not a backup or redundant option that would enab ...
.
Even though only using a single voter brings a single point of failure – a failed voter will bring down the entire system – most TMR systems do not use triplicated voters. This is because the majority gates are much less complex than the systems that they guard against, so they are much more reliable
Reliability, reliable, or unreliable may refer to:
Science, technology, and mathematics Computing
* Data reliability (disambiguation), a property of some disk arrays in computer storage
* Reliability (computer networking), a category used to des ...
.[ By using the reliability calculations, it is possible to find the minimum reliability of the voter for TMR to be a win.][
]
Chronometers
To use triple modular redundancy, a ship must have at least three chronometers; two chronometers provided dual modular redundancy
In reliability engineering, dual modular redundancy (DMR) is when components of a system are duplicated, providing redundancy in case one should fail. It is particularly applied to systems where the duplicated components work in parallel, particu ...
, allowing a backup if one should cease to work, but not allowing any error correction
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
if the two displayed a different time, since in case of contradiction between the two chronometers, it would be impossible to know which one was wrong (the error detection
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
obtained would be the same of having only one chronometer and checking it periodically). Three chronometers provided triple modular redundancy, allowing error correction
In information theory and coding theory with applications in computer science and telecommunications, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
if one of the three was wrong, so the pilot would take the average of the two with closer reading (vote for average precision).
There is an old adage to this effect, stating: "Never go to sea with two chronometers; take one or three."
Mainly this means that if two chronometers contradict, how do you know which one is correct? At one time this observation or rule was an expensive one as the cost of three sufficiently accurate chronometers was more than the cost of many types of smaller merchant vessels.
Some vessels carried more than three chronometers – for example, HMS Beagle
HMS ''Beagle'' was a 10-gun brig-sloop of the Royal Navy, one of more than 100 ships of this class. The vessel, constructed at a cost of £7,803, was launched on 11 May 1820 from the Woolwich Dockyard on the River Thames. Later reports say ...
carried 22 chronometers.[
] However, such a large number was usually only carried on ships undertaking survey work as was the case with the ''Beagle''.
In the modern era, ships at sea use GNSS
A satellite navigation or satnav system is a system that uses satellites to provide autonomous geopositioning. A satellite navigation system with global coverage is termed global navigation satellite system (GNSS). , four global systems are op ...
navigation receivers (with GPS, GLONASS
GLONASS (, ; ) is a Russian satellite navigation system operating as part of a radionavigation-satellite service. It provides an alternative to Global Positioning System (GPS) and is the second navigational system in operation with global cove ...
& WAAS
The Wide Area Augmentation System (WAAS) is an air navigation aid developed by the Federal Aviation Administration to augment the Global Positioning System (GPS), with the goal of improving its accuracy, integrity, and availability. Essenti ...
etc. support) – mostly running with WAAS or EGNOS
The European Geostationary Navigation Overlay Service (EGNOS) is a satellite-based augmentation system (SBAS) developed by the European Space Agency and Eurocontrol on behalf of the European Commission. Currently, it supplements GPS by reporting ...
support so as to provide accurate time (and location).
In popular culture
* In Arthur C. Clarke's science fiction novel ''Rendezvous with Rama
''Rendezvous with Rama'' is a 1973 science fiction novel by British writer Arthur C. Clarke. Set in the 2130s, the story involves a cylindrical alien starship that enters the Solar System. The story is told from the point of view of a group ...
'', the Ramans make heavy use of triple redundancy.
* In the popular anime ''Neon Genesis Evangelion
, also known as ''Evangelion'' or ''Eva'', is a Japanese mecha anime television series produced by Gainax and Tatsunoko Production, and directed by Hideaki Anno. It was broadcast on TV Tokyo and its affiliates from October 1995 to March 1 ...
'', the Magi are a set of three biological supercomputers that must agree with a 2/3 majority vote before delivering a decision.
* In the film '' Minority Report'', 3 "precogs" are used to predict impending homicides, using a triple modular redundancy. In the plot, this system fails, causing a false positive: an innocent man is wrongly accused of murder.
See also
* Fault tolerant system
* Lockstep (computing)
Lockstep systems are fault-tolerant computer systems that run the same set of operations at the same time in parallel. The redundancy (duplication) allows error detection and error correction: the output from lockstep operations can be compar ...
* Segal's law
References
{{Reflist
External links
Article about TMR
with reference to TMR usage in avionics and industry
* Johnson, J. M., & Wirthlin, M. J. (2010, February)
Voter insertion algorithms for FPGA designs using triple modular redundancy.
In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays (pp. 249–258). ACM.
Engineering concepts
Reliability engineering
Safety
Fault-tolerant computer systems
Error detection and correction