Software Fault Tolerance
   HOME
*





Software Fault Tolerance
Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault-tolerant software has the ability to satisfy requirements despite failures. Introduction The only thing constant is change. This is certainly more true of software systems than almost any phenomenon, not all software change in the same way so software fault tolerance methods are designed to overcome execution errors by modifying variable values to create an acceptable program state.Ray Giguette and Johnette Hassell, “Toward A Resourceful Method of Software Fault Tolerance”, ACM Southeast regional conference, April, 1999. The need to control software fault is one of the most rising challenges facing software industries today. Fault tolerance must be a key consideration in the early stage of software development. There exist different mechanisms for software fault tolerance, among which: * Recovery blocks * N-version software * ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Computer Software
Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists of machine language instructions supported by an individual processor—typically a central processing unit (CPU) or a graphics processing unit (GPU). Machine language consists of groups of binary values signifying processor instructions that change the state of the computer from its preceding state. For example, an instruction may change the value stored in a particular storage location in the computer—an effect that is not directly observable to the user. An instruction may also invoke one of many input or output operations, for example displaying some text on a computer screen; causing state changes which should be visible to the user. The processor executes the instructions in the order they are provided, unless it is instructed ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Redundancy (engineering)
In engineering, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing. In many safety-critical systems, such as fly-by-wire and hydraulic systems in aircraft, some parts of the control system may be triplicated, which is formally termed triple modular redundancy (TMR). An error in one component may then be out-voted by the other two. In a triply redundant system, the system has three sub components, all three of which must fail before the system fails. Since each one rarely fails, and the sub components are expected to fail independently, the probability of all three failing is calculated to be extraordinarily small; it is often outweighed by other risk factors, such as human error. Redundancy may also be known by the terms "m ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Software Quality
In the context of software engineering, software quality refers to two related but distinct notions: * Software functional quality reflects how well it complies with or conforms to a given design, based on functional requirements or specifications. That attribute can also be described as the fitness for purpose of a piece of software or how it compares to competitors in the marketplace as a worthwhile product. It is the degree to which the correct software was produced. * Software structural quality refers to how it meets non-functional requirements that support the delivery of the functional requirements, such as robustness or maintainability. It has a lot more to do with the degree to which the software works as needed. Many aspects of structural quality can be evaluated only statically through the analysis of the software inner structure, its source code (see Software metrics), at the unit level, system level (sometimes referred to as end-to-end testing), which is in effect ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

OpenSAF
OpenSAF (commonly styled SAF, the Service Availability Framework) is an open-source service-orchestration system for automating computer application deployment, scaling, and management. OpenSAF is consistent with, and expands upon, Service Availability Forum (SAF) and SCOPE Alliance standards. It was originally designed by Motorola ECC, and is maintained by the OpenSAF Project. OpenSAF is the most complete implementation of the SAF AIS specifications, providing a platform for automating deployment, scaling, and operations of application services across clusters of hosts. It works across a range of virtualization tools and runs services in a cluster, often integrating with JVM, Vagrant, and/or Docker runtimes. OpenSAF originally interfaced with standard C Application Programming interfaces (APIs), but has added Java and Python bindings. OpenSAF is focused on Service Availability beyond High Availability (HA) requirements. While little formal research is published to improve ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Safety Engineering
Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail. Analysis techniques Analysis techniques can be split into two categories: qualitative and quantitative methods. Both approaches share the goal of finding causal dependencies between a hazard on system level and failures of individual components. Qualitative approaches focus on the question "What must go wrong, such that a system hazard may occur?", while quantitative methods aim at providing estimations about probabilities, rates and/or severity of consequences. The complexity of the technical systems such as Improvements of Design and Materials, Planned Inspections, Fool-proof design, and Backup Redundancy decreases risk and increases the cost. T ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


N-version Programming
''N''-version programming (NVP), also known as multiversion programming or multiple-version dissimilar software, is a method or process in software engineering where multiple functionally equivalent programs are independently generated from the same initial specifications.N-Version Programming: A Fault-Tolerance Approach to Reliability of Software Operation
Liming Chen; Avizienis, A., Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'., Twenty-Fifth International Symposium on, Vol., Iss., 27-30 Jun 1995, Pages:113-
The concept of ''N''-version programming was introduced in 1977 by Liming Chen and Algirdas Avizienis with the central conjecture that the "independence of programming efforts will greatly reduce the probability of ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Logic Built-in Self-test
Logic built-in self-test (or LBIST) is a form of built-in self-test (BIST) in which hardware and/or software is built into integrated circuits allowing them to test their own operation, as opposed to reliance on external automated test equipment. Advantages The main advantage of LBIST is the ability to test internal circuits having no direct connections to external pins, and thus unreachable by external automated test equipment. Another advantage is the ability to trigger the LBIST of an integrated circuit while running a built-in self test or power-on self test of the finished product. Disadvantages LBIST that requires additional circuitry (or read-only memory) increases the cost of the integrated circuit. LBIST that only requires temporary changes to programmable logic or rewritable memory avoids this extra cost, but requires more time to first program in the BIST and then to remove it and program in the final configuration. Another disadvantage of LBIST is the possibility that ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fault-tolerant System
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation. A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails. The term is most commonly used to describe computer systems designed to continue more or less fully operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial f ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Fault-tolerant Design
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation. A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails. The term is most commonly used to describe computer systems designed to continue more or less fully operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial f ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Built-in Test Equipment
Built-in test equipment (BITE) for avionics primarily refers to passive fault management and diagnosis equipment built into airborne systems to support maintenance processes. Built-in test equipment includes multimeters, oscilloscopes, discharge probes, and frequency generators that are provided as part of the system to enable testing and perform diagnostics. The acronym BIT is often used for this same function or, more specifically, in reference to the individual tests. BIT often includes: * The detection of the fault * The accommodation of the fault (how the system actively responds to the fault) * The annunciation or logging of the fault to warn of possible effects and/or aid in troubleshooting the faulty equipment. Functionality * Analysis of failure monitoring results * Reporting and memorization of failures * Management of tests See also *Built-in self-test *Logic built-in self-test Logic built-in self-test (or LBIST) is a form of built-in self-test (BIST) in which hardwa ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Built-in Self-test
A built-in self-test (BIST) or built-in test (BIT) is a mechanism that permits a machine to test itself. Engineers design BISTs to meet requirements such as: *high reliability *lower repair cycle times or constraints such as: *limited technician accessibility *cost of testing during manufacture The main purpose of BIST is to reduce the complexity, and thereby decrease the cost and reduce reliance upon external (pattern-programmed) test equipment. BIST reduces cost in two ways: # reduces test-cycle duration # reduces the complexity of the test/probe setup, by reducing the number of I/O signals that must be driven/examined under tester control. Both lead to a reduction in hourly charges for automated test equipment (ATE) service. Applications BIST is commonly placed in weapons, avionics, medical devices, automotive electronics, complex machinery of all types, unattended machinery of all types, and integrated circuits. Automotive Automotive tests itself to enhance safety ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Backup
In information technology, a backup, or data backup is a copy of computer data taken and stored elsewhere so that it may be used to restore the original after a data loss event. The verb form, referring to the process of doing so, is "back up", whereas the noun and adjective form is " backup". Backups can be used to recover data after its loss from data deletion or corruption, or to recover data from an earlier time. Backups provide a simple form of disaster recovery; however not all backup systems are able to reconstitute a computer system or other complex configuration such as a computer cluster, active directory server, or database server. A backup system contains at least one copy of all data considered worth saving. The data storage requirements can be large. An information repository model may be used to provide structure to this storage. There are different types of data storage devices used for copying backups of data that is already in secondary storage onto archive fi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]