In a
distributed system
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
, failure transparency refers to the extent to which errors and subsequent recoveries of
hosts
A host is a person responsible for guests at an event or for providing hospitality during it.
Host may also refer to:
Places
*Host, Pennsylvania, a village in Berks County
People
*Jim Host (born 1937), American businessman
*Michel Host ( ...
and
services
Service may refer to:
Activities
* Administrative service, a required part of the workload of university faculty
* Civil service, the body of employees of a government
* Community service, volunteer service for the benefit of a community or a p ...
within the system are invisible to users and
applications
Application may refer to:
Mathematics and computing
* Application software, computer software designed to help the user to perform specific tasks
** Application layer, an abstraction layer that specifies protocols and interface methods used in a c ...
.
For example, if a server fails, but users are automatically redirected to another server and never notice the failure, the system is said to exhibit ''high failure transparency''.
Failure transparency is one of the most difficult types of transparency to achieve since it is often difficult to determine whether a server has actually failed, or whether it is simply responding very slowly.
[Tanenbaum, Andrew S. and Maarten van Steen, Distributed Systems: Principles and Paradigms, Prentice Hall, Second Edition, 2007. ] Additionally, it is generally impossible to achieve full failure transparency in a distributed system since networks are unreliable.
There is also usually a trade-off between achieving a high level of failure transparency and maintaining an adequate level of system performance. For example, if a distributed system attempts to mask a transient server failure by having the client try to contact the failed server multiple times, performance of the system may be negatively affected. In this case, it would have been preferable to have given up earlier and tried another server.
References
{{Reflist
See also
*
Byzantine fault tolerance
A Byzantine fault (also Byzantine generals problem, interactive consistency, source congruency, error avalanche, Byzantine agreement problem, and Byzantine failure) is a condition of a computer system, particularly distributed computing systems, ...
*
Intrusion Tolerance {{Primary sources, date=December 2013
Intrusion tolerance is a fault-tolerant design approach to defending information systems against malicious attacks. In that sense, it is also a computer security approach. Abandoning the conventional aim of prev ...
*
Capillary routing
Multipath routing is a routing technique simultaneously using multiple alternative paths through a network. This can yield a variety of benefits such as fault tolerance, increased bandwidth, and improved security.
Mobile networks
To improve p ...
*
Cluster (computing)
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software.
The compo ...
*
Data redundancy In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a compl ...
*
Elegant degradation Elegant degradation is a term used in engineering to describe what occurs to machines which are subject to constant, repetitive stress.
Externally, such a machine maintains the same appearance to the user, appearing to function properly. Internally ...
*
Error detection and correction
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communi ...
*
Fail-safe
In engineering, a fail-safe is a design feature or practice that in the event of a specific type of failure, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. Unlike inherent safe ...
*
Fault-tolerant design
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
*
Fault-tolerant system
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
*
Progressive Enhancement
Progressive enhancement is a strategy in web design that puts emphasis on web content first, allowing everyone to access the basic content and functionality of a web page, whilst users with additional browser features or faster Internet access ...
*
Separation of protection and security
In computer sciences, the separation of protection and security is a design choice. Wulf et al. identified protection as a mechanism and security as a policy,Wulf 74 pp.337-345 therefore making the protection-security distinction a particular cas ...
*
Transparency (computing)
Distributed computing