Fault tree analysis (FTA) is a type of failure analysis in which an undesired state of a system is examined. This analysis method is mainly used in
safety engineering
Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety en ...
and
reliability engineering
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specifie ...
to understand how systems can fail, to identify the best ways to reduce risk and to determine (or get a feeling for) event rates of a safety accident or a particular system level (functional) failure. FTA is used in the
aerospace
Aerospace is a term used to collectively refer to the atmosphere and outer space. Aerospace activity is very diverse, with a multitude of commercial, industrial and military applications. Aerospace engineering consists of aeronautics and astrona ...
,
nuclear power
Nuclear power is the use of nuclear reactions to produce electricity. Nuclear power can be obtained from nuclear fission, nuclear decay and nuclear fusion reactions. Presently, the vast majority of electricity from nuclear power is produced b ...
,
chemical and process,
pharmaceutical
A medication (also called medicament, medicine, pharmaceutical drug, medicinal drug or simply drug) is a drug used to diagnose, cure, treat, or prevent disease. Drug therapy (pharmacotherapy) is an important part of the medical field and re ...
,
petrochemical
Petrochemicals (sometimes abbreviated as petchems) are the chemical products obtained from petroleum by refining. Some chemical compounds made from petroleum are also obtained from other fossil fuels, such as coal or natural gas, or renewable sou ...
and other high-hazard industries; but is also used in fields as diverse as risk factor identification relating to
social service
Social services are a range of public services intended to provide support and assistance towards particular groups, which commonly include the disadvantaged. They may be provided by individuals, private and independent organisations, or administe ...
system failure. FTA is also used in software engineering for debugging purposes and is closely related to cause-elimination technique used to detect bugs.
In aerospace, the more general term "system failure condition" is used for the "undesired state" / top event of the fault tree. These conditions are classified by the severity of their effects. The most severe conditions require the most extensive fault tree analysis. These system failure conditions and their classification are often previously determined in the functional
hazard analysis
A hazard analysis is used as the first step in a process used to assess risk. The result of a hazard analysis is the identification of different types of hazards. A hazard is a potential condition and exists or not (probability is 1 or 0). It may, ...
.
Usage
Fault tree analysis can be used to:
* understand the logic leading to the top event / undesired state.
* show compliance with the (input) system safety / reliability requirements.
* prioritize the contributors leading to the top event- creating the critical equipment/parts/events lists for different importance measures
* monitor and control the safety performance of the
complex system
A complex system is a system composed of many components which may interact with each other. Examples of complex systems are Earth's global climate, organisms, the human brain, infrastructure such as power grid, transportation or communication ...
(e.g., is a particular aircraft safe to fly when fuel valve ''x'' malfunctions? For how long is it allowed to fly with the valve malfunction?).
* minimize and optimize resources.
* assist in designing a system. The FTA can be used as a design tool that helps to create (output / lower level) requirements.
* function as a diagnostic tool to identify and correct causes of the top event. It can help with the creation of diagnostic manuals / processes.
History
Fault tree analysis (FTA) was originally developed in 1962 at
Bell Laboratories
Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984),
then AT&T Bell Laboratories (1984–1996)
and Bell Labs Innovations (1996–2007),
is an American industrial research and scientific development company owned by mult ...
by H.A. Watson, under a
U.S. Air Force
The United States Air Force (USAF) is the air service branch of the United States Armed Forces, and is one of the eight uniformed services of the United States. Originally created on 1 August 1907, as a part of the United States Army Sign ...
Ballistics Systems Division contract to evaluate the
Minuteman I
The LGM-30 Minuteman is an American land-based intercontinental ballistic missile (ICBM) in service with the Air Force Global Strike Command. , the LGM-30G Minuteman III version is the only land-based ICBM in service in the United States and re ...
Intercontinental Ballistic Missile
An intercontinental ballistic missile (ICBM) is a ballistic missile with a range greater than , primarily designed for nuclear weapons delivery (delivering one or more thermonuclear warheads). Conventional, chemical, and biological weapons c ...
(ICBM) Launch Control System. The use of fault trees has since gained widespread support and is often used as a failure analysis tool by reliability experts. Following the first published use of FTA in the 1962 Minuteman I Launch Control Safety Study,
Boeing
The Boeing Company () is an American multinational corporation that designs, manufactures, and sells airplanes, rotorcraft, rockets, satellites, telecommunications equipment, and missiles worldwide. The company also provides leasing and product ...
and
AVCO
Avco Corporation is a subsidiary of Textron which operates Textron Systems Corporation
and Lycoming.
History
The Aviation Corporation was formed on March 2, 1929, to prevent a takeover of CAM-24 airmail service operator Embry-Riddle Compa ...
expanded use of FTA to the entire Minuteman II system in 1963–1964. FTA received extensive coverage at a 1965
System Safety
The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach. This is different from traditional safety strategies which rely on c ...
Symposium in
Seattle
Seattle ( ) is a seaport city on the West Coast of the United States. It is the seat of King County, Washington. With a 2020 population of 737,015, it is the largest city in both the state of Washington and the Pacific Northwest regio ...
sponsored by Boeing and the
University of Washington
The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington.
Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seattle a ...
. Boeing began using FTA for
civil aircraft
Civil aviation is one of two major categories of flying, representing all non-military and non-state aviation, both private and commercial. Most of the countries in the world are members of the International Civil Aviation Organization and work ...
design around 1966.
Subsequently, within the U.S. military, application of FTA for use with fuses was explored by
Picatinny Arsenal
The Picatinny Arsenal ( or ) is an American military research and manufacturing facility located on of land in Jefferson and Rockaway Township in Morris County, New Jersey, United States, encompassing Picatinny Lake and Lake Denmark. The Ars ...
in the 1960s and 1970s. In 1976 the
U.S. Army Materiel Command incorporated FTA into an Engineering Design Handbook on Design for Reliability. The Reliability Analysis Center at
Rome Laboratory
Rome Laboratory (Rome Air Development Center until 1991) is the US "Air Force 'superlab' for command, control, and communications" research and development and is responsible for planning and executing the USAF science and technology program.
...
and its successor organizations now with the
Defense Technical Information Center
The Defense Technical Information Center (DTIC, pronounced "Dee-tick") is the repository for research and engineering information for the United States Department of Defense (DoD). DTIC's services are available to DoD personnel, federal governm ...
(Reliability Information Analysis Center, and now Defense Systems Information Analysis Center) has published documents on FTA and reliability block diagrams since the 1960s. MIL-HDBK-338B provides a more recent reference.
In 1970, the
U.S. Federal Aviation Administration
The Federal Aviation Administration (FAA) is the largest transportation agency of the U.S. government and regulates all aspects of civil aviation in the country as well as over surrounding international waters. Its powers include air traffic m ...
(FAA) published a change to 14
CFR 25.1309
airworthiness
In aviation, airworthiness is the measure of an aircraft's suitability for safe flight. Initial airworthiness is demonstrated by a certificate of airworthiness issued by the civil aviation authority in the state in which the aircraft is register ...
regulations for
transport category Transport category is a category of airworthiness applicable to large civil airplanes and large civil helicopters. Any aircraft's airworthiness category is shown on its airworthiness certificate. The name "transport category" is used in the US, Ca ...
aircraft
An aircraft is a vehicle that is able to fly by gaining support from the air. It counters the force of gravity by using either static lift or by using the dynamic lift of an airfoil, or in a few cases the downward thrust from jet engines ...
in the
Federal Register
The ''Federal Register'' (FR or sometimes Fed. Reg.) is the official journal of the federal government of the United States that contains government agency rules, proposed rules, and public notices. It is published every weekday, except on feder ...
at 35 FR 5665 (1970-04-08). This change adopted failure probability criteria for
aircraft systems
Aircraft systems are those required to operate an aircraft efficiently and safely. Their complexity varies with the type of aircraft.
Aircraft software systems
Aircraft software systems control, manage, and apply the subsystems that are engaged ...
and equipment and led to widespread use of FTA in civil aviation. In 1998, the FAA published Order 8040.4, establishing risk management policy including hazard analysis in a range of critical activities beyond aircraft certification, including
air traffic control
Air traffic control (ATC) is a service provided by ground-based air traffic controllers who direct aircraft on the ground and through a given section of controlled airspace, and can provide advisory services to aircraft in non-controlled airs ...
and modernization of the U.S.
National Airspace System
The National Airspace System (NAS) is the airspace, navigation facilities and airports of the United States along with their associated information, services, rules, regulations, policies, procedures, personnel and equipment. It includes components ...
. This led to the publication of the FAA System Safety Handbook, which describes the use of FTA in various types of formal hazard analysis.
Early in the
Apollo program the question was asked about the probability of successfully sending astronauts to the moon and returning them safely to Earth. A risk, or reliability, calculation of some sort was performed and the result was a mission success probability that was unacceptably low. This result discouraged NASA from further quantitative risk or reliability analysis until after the ''Challenger'' accident in 1986. Instead, NASA decided to rely on the use of
failure modes and effects analysis (FMEA) and other qualitative methods for system safety assessments. After the ''Challenger'' accident, the importance of
probabilistic risk assessment
Probabilistic risk assessment (PRA) is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity (such as an airliner or a nuclear power plant) or the effects of stressors on the environm ...
(PRA) and FTA in systems risk and reliability analysis was realized and its use at NASA has begun to grow and now FTA is considered as one of the most important system reliability and safety analysis techniques.
[ ]
Within the nuclear power industry, the
U.S. Nuclear Regulatory Commission
The Nuclear Regulatory Commission (NRC) is an independent agency of the United States government tasked with protecting public health and safety related to nuclear energy. Established by the Energy Reorganization Act of 1974, the NRC began operat ...
began using PRA methods including FTA in 1975, and significantly expanded PRA research following the 1979 incident at
Three Mile Island
3 is a number, numeral, and glyph.
3, three, or III may also refer to:
* AD 3, the third year of the AD era
* 3 BC, the third year before the AD era
* March, the third month
Books
* ''Three of Them'' (Russian: ', literally, "three"), a 1901 ...
. This eventually led to the 1981 publication of the NRC Fault Tree Handbook NUREG–0492, and mandatory use of PRA under the NRC's regulatory authority.
Following process industry disasters such as the 1984
Bhopal disaster
The Bhopal disaster, also referred to as the Bhopal gas tragedy, was a chemical accident on the night of 2–3 December 1984 at the Union Carbide India Limited (UCIL) pesticide plant in Bhopal, Madhya Pradesh, India. Considered the world's wo ...
and 1988
Piper Alpha
Piper Alpha was an oil platform located in the North Sea approximately north-east of Aberdeen, Scotland. It was operated by Occidental Petroleum (Caledonia) Limited (OPCAL) and began production in 1976, initially as an oil-only platform but la ...
explosion, in 1992 the
United States Department of Labor
The United States Department of Labor (DOL) is one of the executive departments of the U.S. federal government. It is responsible for the administration of federal laws governing occupational safety and health, wage and hour standards, unemploym ...
Occupational Safety and Health Administration
The Occupational Safety and Health Administration'' (OSHA ) is a large regulatory agency of the United States Department of Labor that originally had federal visitorial powers to inspect and examine workplaces. Congress established the agenc ...
(OSHA) published in the Federal Register at 57 FR 6356 (1992-02-24) its
Process Safety Management (PSM) standard in 19 CFR 1910.119. OSHA PSM recognizes FTA as an acceptable method for
process hazard analysis (PHA).
Today FTA is widely used in
system safety
The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach. This is different from traditional safety strategies which rely on c ...
and
reliability engineering
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specifie ...
, and in all major fields of engineering.
Methodology
FTA methodology is described in several industry and government standards, including NRC NUREG–0492 for the nuclear power industry, an aerospace-oriented revision to NUREG–0492 for use by
NASA
The National Aeronautics and Space Administration (NASA ) is an independent agency of the US federal government responsible for the civil space program, aeronautics research, and space research.
NASA was established in 1958, succeeding t ...
,
SAE ARP4761
ARP4761, Guidelines and Methods for Conducting the Safety Assessment Process on Civil Airborne Systems and Equipment is an Aerospace Recommended Practice from SAE International.
In conjunction with ARP4754, ARP4761 is used to demonstrate complian ...
for civil aerospace, MIL–HDBK–338 for military systems,
IEC
The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
standard IEC 61025 is intended for cross-industry use and has been adopted as European Norm EN 61025.
Any sufficiently complex system is subject to failure as a result of one or more subsystems failing. The likelihood of failure, however, can often be reduced through improved system design. Fault tree analysis maps the relationship between faults, subsystems, and redundant safety design elements by creating a logic diagram of the overall system.
The undesired outcome is taken as the root ('top event') of a tree of logic. For instance, the undesired outcome of a metal stamping press operation being considered might be a human appendage being stamped. Working backward from this top event it might be determined that there are two ways this could happen: during normal operation or during maintenance operation. This condition is a logical OR. Considering the branch of the hazard occurring during normal operation, perhaps it is determined that there are two ways this could happen: the press cycles and harms the operator, or the press cycles and harms another person. This is another logical OR. A design improvement can be made by requiring the operator to press two separate buttons to cycle the machine—this is a safety feature in the form of a logical AND. The button may have an intrinsic failure rate—this becomes a fault stimulus that can be analyzed.
When fault trees are labeled with actual numbers for failure probabilities,
computer programs
A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components.
A computer program i ...
can calculate failure probabilities from fault trees. When a specific event is found to have more than one effect event, i.e. it has impact on several subsystems, it is called a common cause or common mode. Graphically speaking, it means this event will appear at several locations in the tree. Common causes introduce dependency relations between events. The probability computations of a tree which contains some common causes are much more complicated than regular trees where all events are considered as independent. Not all software tools available on the market provide such capability.
The tree is usually written out using conventional
logic gate
A logic gate is an idealized or physical device implementing a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic gate, ...
symbols. A cut set is a combination of events, typically component failures, causing the top event. If no event can be removed from a cut set without failing to cause the top event, then it is called a minimal cut set.
Some industries use both fault trees and
event tree An event tree is an inductive analytical diagram in which an event is analyzed using Boolean logic to examine a chronological series of subsequent events or consequences. For example, event tree analysis is a major component of nuclear reactor safet ...
s (see
Probabilistic Risk Assessment
Probabilistic risk assessment (PRA) is a systematic and comprehensive methodology to evaluate risks associated with a complex engineered technological entity (such as an airliner or a nuclear power plant) or the effects of stressors on the environm ...
). An event tree starts from an undesired initiator (loss of critical supply, component failure etc.) and follows possible further system events through to a series of final consequences. As each new event is considered, a new node on the tree is added with a split of probabilities of taking either branch. The probabilities of a range of 'top events' arising from the initial event can then be seen.
Classic programs include the
Electric Power Research Institute
EPRI, is an American independent, nonprofit organization that conducts research and development related to the generation, delivery, and use of electricity to help address challenges in the energy industry, including reliability, efficiency, affor ...
's (EPRI) CAFTA software, which is used by many of the US nuclear power plants and by a majority of US and international aerospace manufacturers, and the
Idaho National Laboratory
Idaho National Laboratory (INL) is one of the national laboratories of the United States Department of Energy and is managed by the Battelle Energy Alliance. While the laboratory does other research, historically it has been involved with nu ...
's
SAPHIRE
{{primary sources, date=March 2015
SAPHIRE is a probabilistic risk and reliability assessment software tool. SAPHIRE stands for ''Systems Analysis Programs for Hands-on Integrated Reliability Evaluations''. The system was developed for the U.S. N ...
, which is used by the U.S. Government to evaluate the safety and
reliability
Reliability, reliable, or unreliable may refer to:
Science, technology, and mathematics Computing
* Data reliability (disambiguation), a property of some disk arrays in computer storage
* High availability
* Reliability (computer networking), a ...
of
nuclear reactor
A nuclear reactor is a device used to initiate and control a fission nuclear chain reaction or nuclear fusion reactions. Nuclear reactors are used at nuclear power plants for electricity generation and in nuclear marine propulsion. Heat from nu ...
s, the
Space Shuttle
The Space Shuttle is a retired, partially reusable low Earth orbital spacecraft system operated from 1981 to 2011 by the U.S. National Aeronautics and Space Administration (NASA) as part of the Space Shuttle program. Its official program na ...
, and the
International Space Station
The International Space Station (ISS) is the largest modular space station currently in low Earth orbit. It is a multinational collaborative project involving five participating space agencies: NASA (United States), Roscosmos (Russia), JAXA ...
. Outside the US, the software RiskSpectrum is a popular tool for fault tree and event tree analysis, and is licensed for use at almost half of the world's nuclear power plants for probabilistic safety assessment. Professional-grade
free software
Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
is also widely available; SCRAM is an open-source tool that implements the Open-PSA Model Exchange Format open standard for probabilistic safety assessment applications.
Graphic symbols
The basic symbols used in FTA are grouped as events, gates, and transfer symbols. Minor variations may be used in FTA software.
Event symbols
Event symbols are used for ''primary events'' and ''intermediate events''. Primary events are not further developed on the fault tree. Intermediate events are found at the output of a gate. The event symbols are shown below:
File:FTA_basic_event.jpg, Basic event
File:FTA_initiating_event.jpg, External event
File:FTA_undeveloped_event.jpg, Undeveloped event
File:FTA_conditioning_event.jpg, Conditioning event
File:FTA_intermediate_event.jpg, Intermediate event
The primary event symbols are typically used as follows:
* Basic event - failure or error in a system component or element (example: switch stuck in open position)
* External event - normally expected to occur (not of itself a fault)
* Undeveloped event - an event about which insufficient information is available, or which is of no consequence
* Conditioning event - conditions that restrict or affect logic gates (example: mode of operation in effect)
An intermediate event gate can be used immediately above a primary event to provide more room to type the event description.
FTA is a top-to-bottom approach.
Gate symbols
Gate symbols describe the relationship between input and output events. The symbols are derived from Boolean logic symbols:
File:FTA_OR_gate.jpg, OR gate
File:FTA_AND_gate.jpg, AND gate
File:FTA_XOR_gate.jpg, Exclusive OR gate
File:FTA_priority_AND_gate.jpg, Priority AND gate
File:FTA_inhibit_gate.jpg, Inhibit gate
The gates work as follows:
* OR gate - the output occurs if any input occurs.
* AND gate - the output occurs only if all inputs occur (inputs are independent from the source).
* Exclusive OR gate - the output occurs if exactly one input occurs.
* Priority AND gate - the output occurs if the inputs occur in a specific sequence specified by a conditioning event.
* Inhibit gate - the output occurs if the input occurs under an enabling condition specified by a conditioning event.
Transfer symbols
Transfer symbols are used to connect the inputs and outputs of related fault trees, such as the fault tree of a subsystem to its system. NASA prepared a complete document about FTA through practical incidents.
File:FTA_transfer_in.jpg, Transfer in
File:FTA_transfer_out.jpg, Transfer out
Basic mathematical foundation
Events in a fault tree are associated with
statistical
Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
probabilities or Poisson-Exponentially distributed constant rates. For example, component failures may typically occur at some constant
failure rate
Failure rate is the frequency with which an engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the Greek letter λ (lambda) and is often used in reliability engineering.
The failure rate of a ...
λ (a constant hazard function). In this simplest case, failure probability depends on the rate λ and the exposure time t:
where:
if
A fault tree is often normalized to a given time interval, such as a flight hour or an average mission time. Event probabilities depend on the relationship of the event hazard function to this interval.
Unlike conventional
logic gate
A logic gate is an idealized or physical device implementing a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic gate, ...
diagrams in which inputs and outputs hold the
binary
Binary may refer to:
Science and technology Mathematics
* Binary number, a representation of numbers using only two digits (0 and 1)
* Binary function, a function that takes two arguments
* Binary operation, a mathematical operation that t ...
values of TRUE (1) or FALSE (0), the gates in a fault tree output probabilities related to the
set operations of
Boolean logic
In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variable (mathematics), variables are the truth values ''true'' and ''false'', usually denote ...
. The probability of a gate's output event depends on the input event probabilities.
An AND gate represents a combination of
independent
Independent or Independents may refer to:
Arts, entertainment, and media Artist groups
* Independents (artist group), a group of modernist painters based in the New Hope, Pennsylvania, area of the United States during the early 1930s
* Independ ...
events. That is, the probability of any input event to an AND gate is unaffected by any other input event to the same gate. In
set theoretic terms, this is equivalent to the intersection of the input event sets, and the probability of the AND gate output is given by:
:P (A and B) = P (A ∩ B) = P(A) P(B)
An OR gate, on the other hand, corresponds to set union:
:P (A or B) = P (A ∪ B) = P(A) + P(B) - P (A ∩ B)
Since failure probabilities on fault trees tend to be small (less than .01), P (A ∩ B) usually becomes a very small error term, and the output of an OR gate may be conservatively approximated by using an assumption that the inputs are
mutually exclusive events
In logic and probability theory, two events (or propositions) are mutually exclusive or disjoint if they cannot both occur at the same time. A clear example is the set of outcomes of a single coin toss, which can result in either heads or tails ...
:
:P (A or B) ≈ P(A) + P(B), P (A ∩ B) ≈ 0
An exclusive OR gate with two inputs represents the probability that one or the other input, but not both, occurs:
:P (A xor B) = P(A) + P(B) - 2P (A ∩ B)
Again, since P (A ∩ B) usually becomes a very small error term, the exclusive OR gate has limited value in a fault tree.
Quite often, Poisson-Exponentially distributed rates are used to quantify a fault tree instead of probabilities. Rates are often modeled as constant in time while probability is a function of time. Poisson-Exponential events are modelled as infinitely short so no two events can overlap. An OR gate is the superposition (addition of rates) of the two input failure frequencies or failure rates which are modeled as
Poisson point process
In probability, statistics and related fields, a Poisson point process is a type of random mathematical object that consists of points randomly located on a mathematical space with the essential feature that the points occur independently of one ...
es. The output of an AND gate is calculated using the unavailability (Q
1) of one event thinning the Poisson point process of the other event (λ
2). The unavailability (Q
2) of the other event then thins the Poisson point process of the first event (λ
1). The two resulting Poisson point processes are superimposed according to the following equations.
The output of an AND gate is the combination of independent input events 1 and 2 to the AND gate:
:Failure Frequency = λ
1Q
2 + λ
2Q
1 where Q = 1 - e
λt ≈ λt if λt < 0.001
:Failure Frequency ≈ λ
1λ
2t
2 + λ
2λ
1t
1 if λ
1t
1 < 0.001 and λ
2t
2 < 0.001
In a fault tree, unavailability (Q) may be defined as the unavailability of safe operation and may not refer to the unavailability of the system operation depending on how the fault tree was structured. The input terms to the fault tree must be carefully defined.
Analysis
Many different approaches can be used to model a FTA, but the most common and popular way can be summarized in a few steps. A single fault tree is used to analyze one and only one undesired event, which may be subsequently fed into another fault tree as a basic event. Though the nature of the undesired event may vary dramatically, a FTA follows the same procedure for any undesired event; be it a delay of 0.25 ms for the generation of electrical power, an undetected cargo bay fire, or the random, unintended launch of an
ICBM
An intercontinental ballistic missile (ICBM) is a ballistic missile with a range greater than , primarily designed for nuclear weapons delivery (delivering one or more thermonuclear warheads). Conventional, chemical, and biological weapons c ...
.
FTA analysis involves five steps:
# Define the undesired event to study.
#* Definition of the undesired event can be very hard to uncover, although some of the events are very easy and obvious to observe. An engineer with a wide knowledge of the design of the system is the best person to help define and number the undesired events. Undesired events are used then to make FTAs. Each FTA is limited to one undesired event.
# Obtain an understanding of the system.
#* Once the undesired event is selected, all causes with probabilities of affecting the undesired event of 0 or more are studied and analyzed. Getting exact numbers for the probabilities leading to the event is usually impossible for the reason that it may be very costly and time-consuming to do so. Computer software is used to study probabilities; this may lead to less costly system analysis.
System analysts can help with understanding the overall system. System designers have full knowledge of the system and this knowledge is very important for not missing any cause affecting the undesired event. For the selected event all causes are then numbered and sequenced in the order of occurrence and then are used for the next step which is drawing or constructing the fault tree.
# Construct the fault tree.
#* After selecting the undesired event and having analyzed the system so that we know all the causing effects (and if possible their probabilities) we can now construct the fault tree. Fault tree is based on AND and OR gates which define the major characteristics of the fault tree.
# Evaluate the fault tree.
#* After the fault tree has been assembled for a specific undesired event, it is evaluated and analyzed for any possible improvement or in other words study the risk management and find ways for system improvement. A wide range of qualitative and quantitative analysis methods can be applied. This step is as an introduction for the final step which will be to control the hazards identified. In short, in this step we identify all possible hazards affecting the system in a direct or indirect way.
# Control the hazards identified.
#* This step is very specific and differs largely from one system to another, but the main point will always be that after identifying the hazards all possible methods are pursued to decrease the probability of occurrence.
Comparison with other analytical methods
FTA is a
deductive
Deductive reasoning is the mental process of drawing deductive inferences. An inference is deductively valid if its conclusion follows logically from its premises, i.e. if it is impossible for the premises to be true and the conclusion to be false ...
, top-down method aimed at analyzing the effects of initiating faults and events on a complex system. This contrasts with
failure mode and effects analysis
Failure mode and effects analysis (FMEA; often written with "failure modes" in plural) is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effe ...
(FMEA), which is an
inductive, bottom-up analysis method aimed at analyzing the effects of single component or function failures on equipment or subsystems. FTA is very good at showing how resistant a system is to single or multiple initiating faults. It is not good at finding all possible initiating faults. FMEA is good at exhaustively cataloging initiating faults, and identifying their local effects. It is not good at examining multiple failures or their effects at a system level. FTA considers external events, FMEA does not.
[
] In civil aerospace the usual practice is to perform both FTA and FMEA, with a
failure mode effects summary (FMES) as the interface between FMEA and FTA.
Alternatives to FTA include
dependence diagram (DD), also known as
reliability block diagram A reliability block diagram (RBD) is a diagrammatic method for showing how component reliability contributes to the success or failure of a redundant. RBD is also known as a dependence diagram (DD).
An RBD is drawn as a series of blocks conn ...
(RBD) and
Markov analysis
A Markov chain or Markov process is a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. Informally, this may be thought of as, "What happe ...
. A dependence diagram is equivalent to a success tree analysis (STA), the logical inverse of an FTA, and depicts the system using paths instead of gates. DD and STA produce probability of success (i.e., avoiding a top event) rather than probability of a top event.
See also
*
Event tree analysis Event tree analysis (ETA) is a forward, top-down, logical modeling technique for both success and failure that explores responses through a single initiating event and lays a path for assessing probabilities of the outcomes and overall system analy ...
*
Failure mode and effects analysis
Failure mode and effects analysis (FMEA; often written with "failure modes" in plural) is the process of reviewing as many components, assemblies, and subsystems as possible to identify potential failure modes in a system and their causes and effe ...
*
Ishikawa diagram
Ishikawa diagrams (also called fishbone diagrams, herringbone diagrams, cause-and-effect diagrams) are causal diagrams created by Kaoru Ishikawa that show the potential causes of a specific event.
Common uses of the Ishikawa diagram are product ...
*
Reliability engineering
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specifie ...
*
Root cause analysis
In science
Science is a systematic endeavor that builds and organizes knowledge in the form of testable explanations and predictions about the universe.
Science may be as old as the human species, and some of the earliest archeologic ...
*
Safety engineering
Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety en ...
*
System safety
The system safety concept calls for a risk management strategy based on identification, analysis of hazards and application of remedial controls using a systems-based approach. This is different from traditional safety strategies which rely on c ...
*
Why-because analysis
References
{{DEFAULTSORT:Fault Tree Analysis
Quality
Reliability engineering
Risk analysis methodologies
Safety engineering