computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...

, fault injection is a testing technique for understanding how computing systems behave when stressed in unusual ways. This can be achieved using physical- or software-based means, or using a hybrid approach. Widely studied physical fault injections include the application of high voltages, extreme temperatures and

electromagnetic pulses An electromagnetic pulse (EMP), also a transient electromagnetic disturbance (TED), is a brief burst of electromagnetic energy. Depending upon the source, the origin of an EMP can be natural or artificial, and can occur as an electromagnetic fi ...

on electronic components, such as

computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term ''primary storage ...

and central processing units. By exposing components to conditions beyond their intended operating limits, computing systems can be coerced into mis-executing instructions and corrupting critical data. In

software testing Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to apprecia ...

, fault injection is a technique for improving the

coverage Coverage may refer to: Filmmaking * Coverage (lens), the size of the image a lens can produce * Camera coverage, the amount of footage shot and different camera setups used in filming a scene * Script coverage, a short summary of a script, wri ...

of a test by introducing faults to test code paths; in particular

error handling In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an ...

code paths, that might otherwise rarely be followed. It is often used with

stress testing Stress testing (sometimes called torture testing) is a form of deliberately intense or thorough testing used to determine the stability of a given system, critical infrastructure or entity. It involves testing beyond normal operational capacity, ...

and is widely considered to be an important part of developing

robust Robustness is the property of being strong and healthy in constitution. When it is transposed into a system, it refers to the ability of tolerating perturbations that might affect the system’s functional body. In the same line ''robustness'' ca ...

software.

Robustness testing Robustness testing is any quality assurance methodology focused on testing the robustness of software. Robustness testing has also been used to describe the process of verifying the robustness (i.e. correctness) of test cases in a test process. A ...

Kaksonen, Rauli. A Functional Method for Assessing Protocol Implementation Security. 2001.
/ref> (also known as syntax testing,

fuzzing In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions ...

fuzz testing Fuzz may refer to: * Fuzz (film), ''Fuzz'' (film), a 1972 American comedy * ''Fuzz: When Nature Breaks the Law'', a nonfiction book by Mary Roach * The fuzz, a List of slang terms for police officers, slang term for police officers Music * Fuzz ...

) is a type of fault injection commonly used to test for vulnerabilities in communication interfaces such as protocols, command line parameters, or APIs. The propagation of a fault through to an observable failure follows a well-defined cycle. When executed, a fault may cause an error, which is an invalid state within a system boundary. An error may cause further errors within the system boundary, therefore each new error acts as a fault, or it may propagate to the system boundary and be observable. When error states are observed at the system boundary they are termed failures. This mechanism is termed the fault-error-failure cycle and is a key mechanism in

dependability In systems engineering, dependability is a measure of a system's availability, reliability, maintainability, and in some cases, other characteristics such as durability, safety and security. In real-time computing, dependability is the ability to ...

History

The technique of fault injection dates back to the 1970sJ. V. Carreira, D. Costa, and S. J. G, "Fault Injection Spot-Checks Computer System Dependability," IEEE Spectrum, pp. 50–55, 1999. when it was first used to induce faults at a hardware level. This type of fault injection is called Hardware Implemented Fault Injection (HWIFI) and attempts to simulate hardware failures within a system. The first experiments in hardware fault involved nothing more than shorting connections on circuit boards and observing the effect on the system (bridging faults). It was used primarily as a test of the dependability of the hardware system. Later specialised hardware was developed to extend this technique, such as devices to bombard specific areas of a circuit board with heavy radiation. It was soon found that faults could be induced by software techniques and that aspects of this technique could be useful for assessing software systems. Collectively these techniques are known as Software Implemented Fault Injection (SWIFI).

Model implemented fault injection

By increasing complexity of Cyber-Physical Systems, applying traditional fault injection methods are not efficient anymore, so tester trying to use fault injection in the model level.

Software implemented fault injection

SWIFI techniques for software fault injection can be categorized into two types: compile-time injection and runtime injection. Compile-time injection is an injection technique where source code is modified to inject simulated faults into a system. One method is called

mutation testing Mutation testing (or ''mutation analysis'' or ''program mutation'') is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is call ...

which changes existing lines of code so that they contain faults. A simple example of this technique could be changing a = a + 1 to a = a – 1 Code mutation produces faults which are very similar to those unintentionally added by programmers. A refinement of code mutation is ''Code Insertion Fault Injection'' which adds code, rather than modifying existing code. This is usually done through the use of perturbation functions which are simple functions which take an existing value and perturb it via some logic into another value, for example int pFunc(int value) int main(int argc, char * argv[]) In this case, pFunc is the perturbation function and it is applied to the return value of the function that has been called introducing a fault into the system. Runtime Injection techniques use a software trigger to inject a fault into a running software system. Faults can be injected via a number of physical methods and triggers can be implemented in a number of ways, such as: Time Based triggers (When the timer reaches a specified time an interrupt is generated and the interrupt handler associated with the timer can inject the fault. ); Interrupt Based Triggers (Hardware exceptions and software trap mechanisms are used to generate an interrupt at a specific place in the system code or on a particular event within the system, for instance, access to a specific memory location). Runtime injection techniques can use a number of different techniques to insert faults into a system via a trigger. * Corruption of memory space: This technique consists of corrupting RAM, processor registers, and I/O map. * Syscall interposition techniques: This is concerned with the fault propagation from operating system kernel interfaces to executing systems software. This is done by intercepting operating system calls made by user-level software and injecting faults into them. * Network Level fault injection: This technique is concerned with the corruption, loss or reordering of network packets at the network interface. These techniques are often based around the debugging facilities provided by computer processor architectures.

Protocol software fault injection

Complex software systems, especially multi-vendor distributed systems based on open standards, perform input/output operations to exchange data via stateful, structured exchanges known as "

protocols Protocol may refer to: Sociology and politics * Protocol (politics), a formal agreement between nation states * Protocol (diplomacy), the etiquette of diplomacy and affairs of state * Etiquette, a code of personal behavior Science and technology ...

." One kind of fault injection that is particularly useful to test protocol implementations (a type of software code that has the unusual characteristic in that it cannot predict or control its input) is

. Fuzzing is an especially useful form of

Black-box testing Black-box testing is a method of software testing that examines the functionality of an application without peering into its internal structures or workings. This method of test can be applied virtually to every level of software testing: unit, ...

since the various invalid inputs that are submitted to the software system do not depend on, and are not created based on knowledge of, the details of the code running inside the system.

Hardware implemented fault injection

This technique was applied on a hardware prototype. Testers inject fault by changing voltage of some parts in a circuit, increasing or decreasing temperature, bombarding the board by high energy radiation, etc.

Characteristics of fault injection

Faults have three main parameters. * Type: What type of fault should be injected? For example stuck-to-value, delay, ignoring some functions, ignoring some parameters/variable, random faults, the bias fault, the noise, etc. The amplitude of each fault is also important. * Time: When should be activated? For example the activation time of fault or the activation condition of fault. * Location: Where should be in the system? For example fault in the link/connection between systems, faults within systems/subsystems/function, etc. These parameters create the fault space realm. The fault space realm will increase exponentially by increasing system complexity. Therefore, the traditional fault injection method will not be applicable to use in the modern cyber-physical systems, because they will be so slow, and they will find a small number of faults (less fault coverage). Hence, the testers need an efficient algorithm to choose critical faults that have a higher impact on system behavior. Thus, the main research question is how to find critical faults in the fault space realm which have catastrophic effects on system behavior. Here are some methods that can aid fault injection to efficiently explore the fault space to reach higher fault coverage in less simulation time. * Sensitivity analysis: In this method, sensitivity analysis has been used to identify the most important signals that have a higher impact on the system's specification. By identifying those important signals or parameters, the fault injection tool will focus on those effective signals instead of focusing on all signals in the system. * Reinforcement learning: In this method, the reinforcement learning algorithm has been used to efficiently explore the fault space and find critical faults. * Realism analysis: In this method, the faults to be injected are defined based on the ones that naturally happen in production.

Fault injection tools

Although these types of faults can be injected by hand the possibility of introducing an unintended fault is high, so tools exist to parse a program automatically and insert faults.

Research tools

A number of SWIFI Tools have been developed and a selection of these tools is given here. Six commonly used fault injection tools are Ferrari, FTAPE, Doctor, Orchestra, Xception and Grid-FIT. * MODIFI (MODel-Implemented Fault Injection) is a fault injection tool for robustness evaluation of Simulink behavior models. It supports fault modelling in XML for implementation of domain-specific fault models. * Ferrari (Fault and ERRor Automatic Real-time Injection) is based around software traps that inject errors into a system. The traps are activated by either a call to a specific memory location or a timeout. When a trap is called the handler injects a fault into the system. The faults can either be transient or permanent. Research conducted with Ferrari shows that error detection is dependent on the fault type and where the fault is inserted. * FTAPE (Fault Tolerance and Performance Evaluator) can inject faults, not only into memory and registers, but into disk accesses as well. This is achieved by inserting a special disk driver into the system that can inject faults into data sent and received from the disk unit. FTAPE also has a synthetic load unit that can simulate specific amounts of load for robustness testing purposes. * DOCTOR (IntegrateD SOftware Fault InjeCTiOn EnviRonment) allows injection of memory and register faults, as well as network communication faults. It uses a combination of time-out, trap and code modification. Time-out triggers inject transient memory faults and traps inject transient emulated hardware failures, such as register corruption. Code modification is used to inject permanent faults. * Orchestra is a script-driven fault injector that is based around Network Level Fault Injection. Its primary use is the evaluation and validation of the fault-tolerance and timing characteristics of distributed protocols. Orchestra was initially developed for the Mach Operating System and uses certain features of this platform to compensate for latencies introduced by the fault injector. It has also been successfully ported to other operating systems. * Xception is designed to take advantage of the advanced debugging features available on many modern processors. It is written to require no modification of system source and no insertion of software traps, since the processor's exception handling capabilities trigger fault injection. These triggers are based around accesses to specific memory locations. Such accesses could be either for data or fetching instructions. It is therefore possible to accurately reproduce test runs because triggers can be tied to specific events, instead of timeouts. * Grid-FIT (Grid – Fault Injection Technology) is a dependability assessment method and tool for assessing Grid services by fault injection. Grid-FIT is derived from an earlier fault injector WS-FIT which was targeted towards Java Web Services implemented using Apache Axis transport. Grid-FIT utilises a novel fault injection mechanism that allows network-level fault injection to be used to give a level of control similar to Code Insertion fault injection whilst being less invasive. * LFI (Library-level Fault Injector) is an automatic testing tool suite, used to simulate in a controlled testing environment, exceptional situations that programs need to handle at runtime but that are not easy to check via input testing alone. LFI automatically identifies the errors exposed by shared libraries, finds potentially buggy error recovery code in program binaries and injects the desired faults at the boundary between shared libraries and applications. * ChaosMachine, a tool that does chaos engineering at the application level in the JVM. It concentrates on analyzing the error-handling capability of each try-catch block involved in the application by injecting exceptions. * TripleAgent, a resilience evaluation and improvement system for Java applications. The unique feature of TripleAgent is to combine automated monitoring, automated perturbation injection, and automated resilience improvement. *FIBlock (Fault Injection Block), a model-based fault injection method implemented as a highly-customizable Simulink block. It supports the injection in MATLAB Simulink models typical faults of essential heterogeneous components of Cyber-Physical Systems such as sensors, computing hardware, and network. Additional trigger inputs and outputs of the block enable the modeling of conditional faults. Furthermore, two or more FIBlocks connected with the trigger signals can model so-called chained errors.

Commercial tools

* Beyond Security beSTORM is a commercial

black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...

software security analysis tool. It is often used during development by original equipment manufacturers but is also used for testing products prior to implementation, notably in aerospace, banking and defense. beSTORM's test process starts with the most likely attack scenarios, then resorts to exhaustive generation based

. beSTORM provides modules for common protocols and 'auto learns' new or proprietary protocols, including mutation-based attacks. Highlights: binary and textual analysis, custom protocol testing, debugging and stack tracing, development language independent, CVE compliant. * ExhaustiF is a commercial software tool used for

grey box testing Gray-box testing (International English spelling: grey-box testing) is a combination of white-box testing and black-box testing. The aim of this testing is to search for the defects, if any, due to improper structure or improper usage of applicati ...

based on software fault injection (SWIFI) to improve reliability of software-intensive systems. The tool can be used during system integration and system testing phases of any software development lifecycle, complementing other testing tools as well. ExhaustiF is able to inject faults into both software and hardware. When injecting simulated faults in software, ExhaustiF offers the following fault types: Variable Corruption and Procedure Corruption. The catalogue for hardware fault injections includes faults in Memory (I/O, RAM) and CPU (Integer Unit, Floating Unit). There are different versions available for RTEMS/ERC32, RTEMS/Pentium, Linux/Pentium and MS-Windows/Pentium. * Holodeck is a test tool developed by Security Innovation that uses fault injection to simulate real-world application and system errors for Windows applications and services. Holodeck customers include many major commercial software development companies, including Microsoft, Symantec, EMC and Adobe. It provides a controlled, repeatable environment in which to analyze and debug error-handling code and application attack surfaces for fragility and security testing. It simulates file and network fuzzing faults as well as a wide range of other resource, system and custom-defined faults. It analyzes code and recommends test plans and also performs function call logging, API interception, stress testing, code coverage analysis and many other application security assurance functions. * Proofdock's Chaos Engineering Platform has a focus on the

Microsoft Azure Microsoft Azure, often referred to as Azure ( , ), is a cloud computing platform operated by Microsoft for application management via around the world-distributed data centers. Microsoft Azure has multiple capabilities such as software as a ...

cloud platform. It injects failures at the infrastructure level, platform level and application level. * Gremlin is a "Failure-as-a-Service" platform that helps companies build more

resilient Resilience, resilient, resiliency, or ''variation'', may refer to: Science Ecology * Ecological resilience, the capacity of an ecosystem to recover from perturbations ** Climate resilience, the ability of systems to recover from climate change * ...

systems through the practice of chaos engineering. Gremlin recreates the most common failures across three categories --

Resource Resource refers to all the materials available in our environment which are technologically accessible, economically feasible and culturally sustainable and help us to satisfy our needs and wants. Resources can broadly be classified upon their ...

Network Network, networking and networked may refer to: Science and technology * Network theory, the study of graphs as a representation of relations between discrete objects * Network science, an academic field that studies complex networks Mathematics ...

, and

State State may refer to: Arts, entertainment, and media Literature * ''State Magazine'', a monthly magazine published by the U.S. Department of State * ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States * ''Our S ...

—by safely injecting failure into systems in order to proactively identify and fix unknown faults. *

Codenomicon Synopsys is an American electronic design automation (EDA) company that focuses on silicon design and verification, silicon intellectual property and software security and quality. Products include tools for logic synthesis and physical desig ...

Defensics is a black-box test automation framework that does fault injection to more than 150 different interfaces including network protocols, API interfaces, files, and XML structures. The commercial product was launched in 2001, after five years of research at University of Oulu in the area of software fault injection. A thesis work explaining the used fuzzing principles was published by VTT, one of the PROTOS consortium members. * The Mu Service Analyzer is a commercial service testing tool developed by

Mu Dynamics Spirent Communications plc is a British multinational telecommunications testing company headquartered in Crawley, West Sussex, in the United Kingdom. It is listed on the London Stock Exchange and is a constituent of the FTSE 250 Index. Histor ...

. The Mu Service Analyzer performs

and white box testing of services based on their exposed software interfaces, using denial-of-service simulations, service-level traffic variations (to generate invalid inputs) and the replay of known vulnerability triggers. All these techniques exercise input validation and error handling and are used in conjunction with valid protocol monitors and SNMP to characterize the effects of the test traffic on the software system. The Mu Service Analyzer allows users to establish and track system-level reliability, availability and security metrics for any exposed protocol implementation. The tool has been available in the market since 2005 by customers in North America, Asia and Europe, especially in the critical markets of network operators (and their vendors) and

Industrial control systems An industrial control system (ICS) is an electronic control system and associated instrumentation used for industrial process control. Control systems can range in size from a few modular panel-mounted controllers to large interconnected and i ...

(including

Critical infrastructure Critical infrastructure (or critical national infrastructure (CNI) in the UK) is a term used by governments to describe assets that are essential for the functioning of a society and economy – the infrastructure. Most commonly associated wi ...

). * Xception is a commercial software tool developed by Critical Software SA used for

and white box testing based on software fault injection (SWIFI) and Scan Chain fault injection (SCIFI). Xception allows users to test the robustness of their systems or just part of them, allowing both Software fault injection and Hardware fault injection for a specific set of architectures. The tool has been used in the market since 1999 and has customers in the American, Asian and European markets, especially in the critical market of aerospace and the telecom market. The full Xception product family includes: a) The main Xception tool, a state-of-the-art leader in Software Implemented Fault Injection (SWIFI) technology; b) The Easy Fault Definition (EFD) and Xtract (Xception Analysis Tool) add-on tools; c) The extended Xception tool (eXception), with the fault injection extensions for Scan Chain and pin-level forcing.

Libraries

libfiu
(Fault injection in userspace), C library to simulate faults in POSIX routines without modifying the source code. An API is included to simulate arbitrary faults at run-time at any point of the program.
TestApi
is a shared-source API library, which provides facilities for fault injection testing as well as other testing types, data-structures and algorithms for .NET applications.
Fuzzino
is an open source library, which provides a rich set of fuzzing heuristics that are generated from a type specification and/or valid values.
krf
is an open source Linux kernel module which provides a configurable facility to probabilistically return failure values for system calls. Explained further in th
blog post

nlfaultinjection
is designed to provide a simple, portable fault injection framework capable of running on just about any system, no matter how constrained and depends only on the C Standard Library.

Fault injection in functional properties or test cases

In contrast to traditional mutation testing where mutant faults are generated and injected into the code description of the model, application of a series of newly defined mutation operators directly to the model properties rather than to the model code has also been investigated. Mutant properties that are generated from the initial properties (or test cases) and validated by the model checker should be considered as new properties that have been missed during the initial verification procedure. Therefore, adding these newly identified properties to the existing list of properties improves the coverage metric of the formal verification and consequently lead to a more reliable design.

Application of fault injection

Fault injection can take many forms. In the testing of

operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...

s for example, fault injection is often performed by a ''driver'' (

kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...

-mode software) that intercepts ''system calls'' (calls into the kernel) and randomly returning a failure for some of the calls. This type of fault injection is useful for testing low-level user-mode software. For higher level software, various methods inject faults. In

managed code Managed code is computer program code that requires and will execute only under the management of a Common Language Infrastructure (CLI); Virtual Execution System (VES); virtual machine, e.g. .NET, CoreFX, or .NET Framework; Common Language Runt ...

, it is common to use

instrumentation Instrumentation a collective term for measuring instruments that are used for indicating, measuring and recording physical quantities. The term has its origins in the art and science of scientific instrument-making. Instrumentation can refer to ...

. Although fault injection can be undertaken by hand, a number of fault injection tools exist to automate the process of fault injection.N. Looker, M. Munro, and J. Xu, "Simulating Errors in Web Services," International Journal of Simulation Systems, Science & Technology, vol. 5, 2004. Depending on the complexity of the

API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...

for the level where faults are injected, fault injection tests often must be carefully designed to minimize the number of false positives. Even a well designed fault injection test can sometimes produce situations that are impossible in the normal operation of the software. For example, imagine there are two API functions, Commit and PrepareForCommit, such that alone, each of these functions can possibly fail, but if PrepareForCommit is called and succeeds, a subsequent call to Commit is guaranteed to succeed. Now consider the following code: error = PrepareForCommit(); if (error

SUCCESS) Often, it will be infeasible for the fault injection implementation to keep track of enough state to make the guarantee that the API functions make. In this example, a fault injection test of the above code might hit the

assert Assertion or assert may refer to: Computing * Assertion (software development), a computer programming technique * assert.h, a header file in the standard library of the C programming language * Assertion definition language, a specification lang ...

, whereas this would never happen in normal operation. Fault-injection can be used at testing time, during the execution of test cases. For example, the short-circuit testing algorithm injects exceptions during test suite execution so as to simulate unanticipated errors. This algorithm collects data for verifying two resilience properties.

References

{{Reflist

External links

Certitude Software from Certess Inc.

How DoorDash utilizes fault injection testing to improve reliability
Software testing