Failure rate is the
frequency
Frequency is the number of occurrences of a repeating event per unit of time. It is also occasionally referred to as ''temporal frequency'' for clarity, and is distinct from ''angular frequency''. Frequency is measured in hertz (Hz) which is eq ...
with which an
engineered system or component fails, expressed in failures per unit of time. It is usually denoted by the
Greek letter
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as w ...
λ (lambda) and is often used in
reliability engineering
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specifie ...
.
The failure rate of a system usually depends on time, with the rate varying over the life cycle of the system. For example, an automobile's failure rate in its fifth year of service may be many times greater than its failure rate during its first year of service. One does not expect to replace an exhaust pipe, overhaul the brakes, or have major
transmission problems in a new vehicle.
In practice, the
mean time between failures
Mean time between failures (MTBF) is the predicted elapsed time between inherent failures of a mechanical or electronic system during normal system operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system ...
(MTBF, 1/λ) is often reported instead of the failure rate. This is valid and useful if the failure rate may be assumed constant – often used for complex units / systems, electronics – and is a general agreement in some reliability standards (Military and Aerospace). It does in this case ''only'' relate to the flat region of the
bathtub curve
The bathtub curve is widely used in reliability engineering and deterioration modeling. It describes a particular form of the hazard function which comprises three parts:
*The first part is a decreasing failure rate, known as early failures.
*Th ...
, which is also called the "useful life period". Because of this, it is incorrect to extrapolate MTBF to give an estimate of the service lifetime of a component, which will typically be much less than suggested by the MTBF due to the much higher failure rates in the "end-of-life wearout" part of the "bathtub curve".
The reason for the preferred use for MTBF numbers is that the use of large positive numbers (such as 2000 hours) is more intuitive and easier to remember than very small numbers (such as 0.0005 per hour).
The MTBF is an important system parameter in systems where failure rate needs to be managed, in particular for safety systems. The MTBF appears frequently in the
engineering
Engineering is the use of scientific method, scientific principles to design and build machines, structures, and other items, including bridges, tunnels, roads, vehicles, and buildings. The discipline of engineering encompasses a broad rang ...
design requirements, and governs frequency of required system maintenance and inspections. In special processes called
renewal process
Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) ...
es, where the time to recover from failure can be neglected and the likelihood of failure remains constant with respect to time, the failure rate is simply the multiplicative inverse of the MTBF (1/λ).
A similar ratio used in the
transport industries, especially in
railway
Rail transport (also known as train transport) is a means of transport that transfers passengers and goods on wheeled vehicles running on rails, which are incorporated in tracks. In contrast to road transport, where the vehicles run on a pre ...
s and trucking is "mean distance between failures", a variation which attempts to
correlate
In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics ...
actual loaded distances to similar reliability needs and practices.
Failure rates are important factors in the insurance, finance, commerce and regulatory industries and fundamental to the design of safe systems in a wide variety of applications.
Failure rate data
Failure rate data can be obtained in several ways. The most common means are:
;Estimation:From field failure rate reports, statistical analysis techniques can be used to estimate failure rates. For accurate failure rates the analyst must have a good understanding of equipment operation, procedures for data collection, the key environmental variables impacting failure rates, how the equipment is used at the system level, and how the failure data will be used by system designers.
;Historical data about the device or system under consideration: Many organizations maintain internal databases of failure information on the devices or systems that they produce, which can be used to calculate failure rates for those devices or systems. For new devices or systems, the historical data for similar devices or systems can serve as a useful estimate.
;Government and commercial failure rate data: Handbooks of failure rate data for various components are available from government and commercial sources. MIL-HDBK-217F, ''Reliability Prediction of Electronic Equipment'', is a
military standard
In military organizations, the practice of carrying colours (or colors), standards, flags, or guidons, both to act as a rallying point for troops and to mark the location of the commander, is thought to have originated in Ancient Egypt som ...
that provides failure rate data for many military electronic components. Several failure rate data sources are available commercially that focus on commercial components, including some non-electronic components.
;Prediction: Time lag is one of the serious drawbacks of all failure rate estimations. Often by the time the failure rate data are available, the devices under study have become obsolete. Due to this drawback, failure-rate prediction methods have been developed. These methods may be used on newly-designed devices to predict the device's failure rates and failure modes. Two approaches have become well known, Cycle Testing and FMEDA.
; Life Testing: The most accurate source of data is to test samples of the actual devices or systems in order to generate failure data. This is often prohibitively expensive or impractical, so that the previous data sources are often used instead.
;Cycle Testing: Mechanical movement is the predominant failure mechanism causing mechanical and electromechanical devices to wear out. For many devices, the wear-out failure point is measured by the number of cycles performed before the device fails, and can be discovered by cycle testing. In cycle testing, a device is cycled as rapidly as practical until it fails. When a collection of these devices are tested, the test will run until 10% of the units fail dangerously.
;FMEDA:
Failure modes, effects, and diagnostic analysis Failure modes, effects, and diagnostic analysis (FMEDA) is a systematic analysis technique to obtain subsystem / product level failure rates, failure modes and diagnostic capability. The FMEDA technique considers:
* All components of a design,
* The ...
(FMEDA) is a systematic analysis technique to obtain subsystem / product level failure rates, failure modes and design strength. The FMEDA technique considers:
* All components of a design,
* The functionality of each component,
* The failure modes of each component,
* The effect of each component failure mode on the product functionality,
* The ability of any automatic diagnostics to detect the failure,
* The design strength (de-rating, safety factors) and
* The operational profile (environmental stress factors).
Given a component database calibrated with field failure data that is reasonably accurate
, the method can predict product level failure rate and failure mode data for a given application. The predictions have been shown to be more accurate than field warranty return analysis or even typical field failure analysis given that these methods depend on reports that typically do not have sufficient detail information in failure records.
Failure rate in the discrete sense
The failure rate can be defined as the following:
:The total number of failures within an item
population
Population typically refers to the number of people in a single area, whether it be a city or town, region, country, continent, or the world. Governments typically quantify the size of the resident population within their jurisdiction using a ...
, divided by the total time expended by that population, during a particular measurement interval under stated conditions. (MacDiarmid, ''et al.'')
Although the failure rate,
, is often thought of as the
probability
Probability is the branch of mathematics concerning numerical descriptions of how likely an Event (probability theory), event is to occur, or how likely it is that a proposition is true. The probability of an event is a number between 0 and ...
that a failure occurs in a specified interval given no failure before time
, it is not actually a probability because it can exceed 1. Erroneous expression of the failure rate in % could result in incorrect perception of the measure, especially if it would be measured from repairable systems and multiple systems with non-constant failure rates or different operation times. It can be defined with the aid of the
reliability function
The survival function is a function that gives the probability that a patient, device, or other object of interest will survive past a certain time.
The survival function is also known as the survivor function
or reliability function.
The ter ...
, also called the survival function,
, the probability of no failure before time
.
::
, where
is the time to (first) failure distribution (i.e. the failure density function).
::
over a time interval
=
from
(or
) to
. Note that this is a
conditional probability
In probability theory, conditional probability is a measure of the probability of an event occurring, given that another event (by assumption, presumption, assertion or evidence) has already occurred. This particular method relies on event B occur ...
, where the condition is that no failure has occurred before time
. Hence the
in the denominator.
Hazard rate and ROCOF (rate of occurrence of failures) are often incorrectly seen as the same and equal to the failure rate. To clarify; the more promptly items are repaired, the sooner they will break again, so the higher the ROCOF. The hazard rate is however independent of the time to repair and of the logistic delay time.
Failure rate in the continuous sense
Calculating the failure rate for ever smaller intervals of time results in the (also called hazard rate),
. This becomes the ''instantaneous'' failure rate or we say instantaneous hazard rate as
approaches to zero:
:
A continuous failure rate depends on the existence of a failure distribution,
, which is a
cumulative distribution function
In probability theory and statistics, the cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.
Ev ...
that describes the probability of failure (at least) up to and including time ''t'',
:
where
is the failure time.
The failure distribution function is the integral of the failure
''density'' function, ''f''(''t''),
:
The hazard function can be defined now as
:
Many probability distributions can be used to model the failure distribution (''see
List of important probability distributions''). A common model is the exponential failure distribution,
:
which is based on the
exponential density function. The hazard rate function for this is:
:
Thus, for an exponential failure distribution, the hazard rate is a constant with respect to time (that is, the distribution is "
memory-less"). For other distributions, such as a
Weibull distribution
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
or a
log-normal distribution
In probability theory, a log-normal (or lognormal) distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. Thus, if the random variable is log-normally distributed, then has a norma ...
, the hazard function may not be constant with respect to time. For some such as the
deterministic distribution
In mathematics, a degenerate distribution is, according to some, a probability distribution in a space with support only on a manifold of lower dimension, and according to others a distribution with support only at a single point. By the latter d ...
it is
monotonic
In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order ...
increasing (analogous to
"wearing out"), for others such as the
Pareto distribution it is monotonic decreasing (analogous to
"burning in"), while for many it is not monotonic.
Solving the differential equation
:
for
, it can be shown that
:
Decreasing failure rate
A decreasing failure rate (DFR) describes a phenomenon where the probability of an event in a fixed time interval in the future decreases over time. A decreasing failure rate can describe a period of "infant mortality" where earlier failures are eliminated or corrected and corresponds to the situation where λ(''t'') is a
decreasing function
In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of orde ...
.
Mixtures of DFR variables are DFR.
Mixtures of
exponentially distributed
In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant averag ...
random variables are
hyperexponentially distributed.
Renewal processes
For a
renewal process
Renewal theory is the branch of probability theory that generalizes the Poisson process for arbitrary holding times. Instead of exponentially distributed holding times, a renewal process may have any independent and identically distributed (IID) ...
with DFR renewal function, inter-renewal times are concave.
Brown conjectured the converse, that DFR is also necessary for the inter-renewal times to be concave, however it has been shown that this conjecture holds neither in the discrete case
nor in the continuous case.
Applications
Increasing failure rate is an intuitive concept caused by components wearing out. Decreasing failure rate describes a system which improves with age.
Decreasing failure rates have been found in the lifetimes of spacecraft, Baker and Baker commenting that "those spacecraft that last, last on and on." The reliability of aircraft air conditioning systems were individually found to have an
exponential distribution
In probability theory and statistics, the exponential distribution is the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average ...
, and thus in the pooled population a DFR.
Coefficient of variation
When the failure rate is decreasing the
coefficient of variation
In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as ...
is ⩾ 1, and when the failure rate is increasing the coefficient of variation is ⩽ 1. Note that this result only holds when the failure rate is defined for all t ⩾ 0 and that the converse result (coefficient of variation determining nature of failure rate) does not hold.
Units
Failure rates can be expressed using any measure of time, but hours is the most common unit in practice. Other units, such as miles, revolutions, etc., can also be used in place of "time" units.
Failure rates are often expressed in
engineering notation
Engineering notation or engineering form (also technical notation) is a version of scientific notation in which the exponent of ten must be divisible by three (i.e., they are powers of a thousand, but written as, for example, 106 instead of 1000 ...
as failures per million, or 10
−6, especially for individual components, since their failure rates are often very low.
The Failures In Time (FIT) rate of a device is the number of failures that can be expected in one billion (10
9) device-hours of operation.
(E.g. 1000 devices for 1 million hours, or 1 million devices for 1000 hours each, or some other combination.) This term is used particularly by the
semiconductor
A semiconductor is a material which has an electrical resistivity and conductivity, electrical conductivity value falling between that of a electrical conductor, conductor, such as copper, and an insulator (electricity), insulator, such as glas ...
industry.
The relationship of FIT to MTBF may be expressed as: MTBF = 1,000,000,000 x 1/FIT.
Additivity
Under certain engineering assumptions (e.g. besides the above assumptions for a constant failure rate, the assumption that the considered system has no relevant
redundancies), the failure rate for a complex system is simply the sum of the individual failure rates of its components, as long as the units are consistent, e.g. failures per million hours. This permits testing of individual components or subsystems, whose failure rates are then added to obtain the total system failure rate.
Adding "redundant" components to eliminate a
single point of failure
A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software appl ...
improves the mission failure rate, but makes the series failure rate (also called the logistics failure rate) worse—the extra components improve the mean time between critical failures (MTBCF), even though the mean time before something fails is worse.
["Mission Reliability and Logistics Reliability: A Design Paradox"]
Example
Suppose it is desired to estimate the failure rate of a certain component. A test can be performed to estimate its failure rate. Ten identical components are each tested until they either fail or reach 1000 hours, at which time the test is terminated for that component. (The level of statistical
confidence
Confidence is a state of being clear-headed either that a hypothesis or prediction is correct or that a chosen course of action is the best or most effective. Confidence comes from a Latin word 'fidere' which means "to trust"; therefore, having ...
is not considered in this example.) The results are as follows:
Estimated failure rate is
:
or 799.8 failures for every million hours of operation.
See also
*
Annualized failure rate
Annualized failure rate (AFR) gives the estimated probability that a device or component will fail during a full year of use. It is a relation between the mean time between failure (MTBF) and the hours that a number of devices are run per year. AF ...
*
Burn-in
Burn-in is the process by which components of a system are exercised before being placed in service (and often, before the system being completely assembled from those components). This testing process will force certain failures to occur under ...
*
Failure
*
Failure mode
Failure causes are defects in design, process, quality, or part application, which are the underlying cause of a failure or which initiate a process which leads to failure. Where failure depends on the user of the product or process, then human er ...
*
Failure modes, effects, and diagnostic analysis Failure modes, effects, and diagnostic analysis (FMEDA) is a systematic analysis technique to obtain subsystem / product level failure rates, failure modes and diagnostic capability. The FMEDA technique considers:
* All components of a design,
* The ...
*
Force of mortality In actuarial science, force of mortality represents the instantaneous rate of mortality at a certain age measured on an annualized basis. It is identical in concept to failure rate, also called hazard function, in reliability theory.
Motivation a ...
*
Frequency of exceedance The frequency of exceedance, sometimes called the annual rate of exceedance, is the frequency with which a random process exceeds some critical value. Typically, the critical value is far from the mean. It is usually defined in terms of the number ...
*
Reliability engineering
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specifie ...
*
Reliability theory
Reliability engineering is a sub-discipline of systems engineering that emphasizes the ability of equipment to function without failure. Reliability describes the ability of a system or component to function under stated conditions for a specifi ...
*
Reliability theory of aging and longevity
The reliability theory of aging is an attempt to apply the principles of reliability theory to create a mathematical model of senescence. The theory was published in Russian by Leonid A. Gavrilov and Natalia S. Gavrilova as ''Biologiia prodolzh ...
*
Survival analysis
Survival analysis is a branch of statistics for analyzing the expected duration of time until one event occurs, such as death in biological organisms and failure in mechanical systems. This topic is called reliability theory or reliability analysi ...
*
Weibull distribution
In probability theory and statistics, the Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Maurice Re ...
References
Further reading
*
*
*
*
Federal Standard 1037C
Federal Standard 1037C, titled Telecommunications: Glossary of Telecommunication Terms, is a United States Federal Standard issued by the General Services Administration pursuant to the Federal Property and Administrative Services Act of 1949, a ...
*
*
*
*
*
*
*
*U.S. Department of Defense, (1991) ''Military Handbook, “Reliability Prediction of Electronic Equipment, MIL-HDBK-217F, 2''
External links
Bathtub curve issues, ASQC
''Fault Tolerant Computing in Industrial Automation''by Hubert Kirrmann, ABB Research Center, Switzerland
{{DEFAULTSORT:Failure Rate
Actuarial science
Engineering failures
Reliability engineering
Survival analysis
Maintenance
Statistical ratios
Error measures
Rates