Failure Transparency
   HOME
*





Failure Transparency
In a distributed system, failure transparency refers to the extent to which errors and subsequent recoveries of hosts and services within the system are invisible to users and applications. For example, if a server fails, but users are automatically redirected to another server and never notice the failure, the system is said to exhibit ''high failure transparency''. Failure transparency is one of the most difficult types of transparency to achieve since it is often difficult to determine whether a server has actually failed, or whether it is simply responding very slowly.Tanenbaum, Andrew S. and Maarten van Steen, Distributed Systems: Principles and Paradigms, Prentice Hall, Second Edition, 2007. Additionally, it is generally impossible to achieve full failure transparency in a distributed system since networks are unreliable. There is also usually a trade-off between achieving a high level of failure transparency and maintaining an adequate level of system performance. For ex ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Distributed System
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer science that studies distributed systems. The components of a distributed system interact with one another in order to achieve a common goal. Three significant challenges of distributed systems are: maintaining concurrency of components, overcoming the lack of a global clock, and managing the independent failure of components. When a component of one system fails, the entire system does not fail. Examples of distributed systems vary from SOA-based systems to massively multiplayer online games to peer-to-peer applications. A computer program that runs within a distributed system is called a distributed program, and ''distributed programming'' is the process of writing such programs. There are many different types of implementations for t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Data Redundancy
In computer main memory, auxiliary storage and computer buses, data redundancy is the existence of data that is additional to the actual data and permits correction of errors in stored or transmitted data. The additional data can simply be a complete copy of the actual data (a type of repetition code), or only select pieces of data that allow detection of errors and reconstruction of lost or damaged data up to a certain level. For example, by including additional data checksums, ECC memory is capable of detecting and correcting single-bit errors within each memory word, while RAID 1 combines two hard disk drives (HDDs) into a logical storage unit that allows stored data to survive a complete failure of one drive. Data redundancy can also be used as a measure against silent data corruption; for example, file systems such as Btrfs and ZFS use data and metadata checksumming in combination with copies of stored data to detect silent data corruption and repair its effects. In d ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Separation Of Protection And Security
In computer sciences, the separation of protection and security is a design choice. Wulf et al. identified protection as a mechanism and security as a policy,Wulf 74 pp.337-345 therefore making the protection-security distinction a particular case of the separation of mechanism and policy principle. Many frameworks consider both as security controls of varying types. For example, protection mechanisms would be considered technical controls, while a policy would be considered an administrative control. Overview The adoption of this distinction in a computer architecture usually means that protection is provided as a fault tolerance mechanism by hardware/firmware and kernel, whereas the operating system and applications implement their security policies. In this design, security policies rely therefore on the protection mechanisms and on additional cryptography techniques. The major hardware approachSwift 2005 p.26 for security or protection is the use of hierarchical protection ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Progressive Enhancement
Progressive enhancement is a strategy in web design that puts emphasis on web content first, allowing everyone to access the basic content and functionality of a web page, whilst users with additional browser features or faster Internet access receive the enhanced version instead. Additionally, it speeds up loading and facilitates crawling by web search engines, as pages' text is loaded immediately through the HTML source code rather than having to wait for JavaScript to initiate and load the content subsequently, meaning content ready for consumption "out of the box" is served imminently, not behind additional layers. This strategy involves separating the presentation semantics from the content, with presentation being implemented in one or more optional layers, activated based on aspects of the browser or Internet connection of the client. In practice, this means serving content through HTML, the "lowest common denominator" of web standards, and applying styling and animation ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fault-tolerant Design
Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation. A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails. The term is most commonly used to describe computer systems designed to continue more or less fully operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial f ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fail-safe
In engineering, a fail-safe is a design feature or practice that in the event of a specific type of failure, inherently responds in a way that will cause minimal or no harm to other equipment, to the environment or to people. Unlike inherent safety to a particular hazard, a system being "fail-safe" does not mean that failure is impossible or improbable, but rather that the system's design prevents or mitigates unsafe consequences of the system's failure. That is, if and when a "fail-safe" system fails, it remains at least as safe as it was before the failure. Since many types of failure are possible, failure mode and effects analysis is used to examine failure situations and recommend safety design and procedures. Some systems can never be made fail-safe, as continuous availability is needed. Redundancy, fault tolerance, or contingency plans are used for these situations (e.g. multiple independently controlled and fuel-fed engines). Examples Mechanical or physical Examples inc ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Error Detection And Correction
In information theory and coding theory with applications in computer science and telecommunication, error detection and correction (EDAC) or error control are techniques that enable reliable delivery of digital data over unreliable communication channels. Many communication channels are subject to channel noise, and thus errors may be introduced during transmission from the source to a receiver. Error detection techniques allow detecting such errors, while error correction enables reconstruction of the original data in many cases. Definitions ''Error detection'' is the detection of errors caused by noise or other impairments during transmission from the transmitter to the receiver. ''Error correction'' is the detection of errors and reconstruction of the original, error-free data. History In classical antiquity, copyists of the Hebrew Bible were paid for their work according to the number of stichs (lines of verse). As the prose books of the Bible were hardly ever ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Elegant Degradation
Elegant degradation is a term used in engineering to describe what occurs to machines which are subject to constant, repetitive stress. Externally, such a machine maintains the same appearance to the user, appearing to function properly. Internally, the machine slowly weakens over time. Eventually, unable to withstand the stress, it eventually breaks down. Compared to graceful degradation, the operational quality does not decrease at all, but the breakdown may be just as sudden. This term's meaning varies depending on context and field, and may not be strictly considered exclusive to engineering. For instance, this is used as a mechanism in the food industry as applied in the degradation of lignin, cellulose, pentosan, and polymers, among others. The concept is also used to extract chemicals such as the elegant degradation of ''Paederus fuscipes'' to obtain pederin and hemiacetal pseuodopederin. In this process degradation is induced by heat. A play with the same name also used it a ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cluster (computing)
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The components of a cluster are usually connected to each other through fast local area networks, with each node (computer used as a server) running its own instance of an operating system. In most circumstances, all of the nodes use the same hardware and the same operating system, although in some setups (e.g. using Open Source Cluster Application Resources (OSCAR)), different operating systems can be used on each computer, or different hardware. Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. Computer clusters emerged as a result of convergence of a number of computing trends including th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Server (computing)
In computing, a server is a piece of computer hardware or software (computer program) that provides functionality for other programs or devices, called " clients". This architecture is called the client–server model. Servers can provide various functionalities, often called "services", such as sharing data or resources among multiple clients, or performing computation for a client. A single server can serve multiple clients, and a single client can use multiple servers. A client process may run on the same device or may connect over a network to a server on a different device. Typical servers are database servers, file servers, mail servers, print servers, web servers, game servers, and application servers. Client–server systems are usually most frequently implemented by (and often identified with) the request–response model: a client sends a request to the server, which performs some action and sends a response back to the client, typically with a result or acknowledg ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Capillary Routing
Multipath routing is a routing technique simultaneously using multiple alternative paths through a network. This can yield a variety of benefits such as fault tolerance, increased bandwidth, and improved security. Mobile networks To improve performance or fault tolerance, concurrent multipath routing (CMR) is often taken to mean simultaneous management and utilization of multiple available paths for the transmission of streams of data. The streams may be emanating from a single application or multiple applications. A stream is assigned a separate path, as uniquely possible given the number of paths available. If there are more streams than available paths, some streams will share paths. CMR provides better utilization of bandwidth by creating multiple transmission queues. It provides a degree of fault tolerance in that should a path fail, only the traffic assigned to that path is affected. There is also, ideally, an alternative path immediately available upon which to continue o ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Intrusion Tolerance
{{Primary sources, date=December 2013 Intrusion tolerance is a fault-tolerant design approach to defending information systems against malicious attacks. In that sense, it is also a computer security approach. Abandoning the conventional aim of preventing all intrusions, intrusion tolerance instead calls for triggering mechanisms that prevent intrusions from leading to a system security failure. There are two major variants of intrusion tolerance mechanisms: mechanisms based on redundancy (e.g., as in Byzantine fault tolerance); mechanisms based on intrusion detection (e.g., with an intrusion detection system An intrusion detection system (IDS; also intrusion prevention system or IPS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically rep ...) and reaction. See also * Byzantine fault tolerance External links *Paulo Veríssimo, Nuno Ferreira Neves, Miguel Pupo Correi ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]