Failover

	Failover Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention. Systems designers usually provide failover capability in servers, systems or networks requiring near-continuous availability and a high degree of reliability. At the server level, failover automation usually uses a " heartbeat" system that connects two servers, either through using a separate cable (for example, RS-232 serial ports/cable) or a network connection. As long as a regular "pulse" or "heartbeat" continues between the main server and the second server, the second server will not bring its systems online. There may also be a third "spare parts" se ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	High-availability Cluster High-availability clusters (also known as HA clusters, fail-over clusters) are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Load Balancing (computing) In computing, load balancing is the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle. Load balancing is the subject of research in the field of parallel computers. Two main approaches exist: static algorithms, which do not take into account the state of the different machines, and dynamic algorithms, which are usually more general and more efficient but require exchanges of information between the different computing units, at the risk of a loss of efficiency. Problem overview A load-balancing algorithm always tries to answer a specific problem. Among other things, the nature of the tasks, the algorithmic complexity, the hardware architecture on which the algorithms will run as well as required error tolerance, must be taken into account. Therefore c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Migration (virtualization) In the context of virtualization, where a ''guest'' simulation of an entire computer is actually merely a software virtual machine (VM) running on a ''host'' computer under a hypervisor, migration (also known as teleportation) is the process by which a ''running'' virtual machine is moved from one physical host to another, with little or no disruption in service. Subjective effects Ideally, the process is completely transparent, resulting in no disruption of service (or downtime). In practice, there is always some minor pause in availability, though it may be low enough that only hard real-time systems are affected. Virtualization is far more frequently used with network services and user applications, and these can generally tolerate the brief delays which may be involved. The perceived impact, if any, is similar to a longer-than-usual kernel delay. Objective effects The actual process is heavily dependent on the particular virtualization package in use, but in general, the pr ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Teleportation (virtualization) In the context of virtualization, where a ''guest'' simulation of an entire computer is actually merely a software virtual machine (VM) running on a ''host'' computer under a hypervisor, migration (also known as teleportation) is the process by which a ''running'' virtual machine is moved from one physical host to another, with little or no disruption in service. Subjective effects Ideally, the process is completely transparent, resulting in no disruption of service (or downtime). In practice, there is always some minor pause in availability, though it may be low enough that only hard real-time systems are affected. Virtualization is far more frequently used with network services and user applications, and these can generally tolerate the brief delays which may be involved. The perceived impact, if any, is similar to a longer-than-usual kernel delay. Objective effects The actual process is heavily dependent on the particular virtualization package in use, but in general, the p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Log Shipping Log shipping is the process of automating the backup of transaction log files on a primary (production) database server, and then restoring them onto a standby server. This technique is supported by Microsoft SQL Server, How to Perform SQL Server Log Shipping , "What is Log Shipping". Retrieved on 2008-12-16. 4D Server, MySQL , and PostgreSQL . Similar to [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Switchover Switchover is the manual switch from one system to a redundant or standby computer server, system, or network upon the failure or abnormal termination of the previously active server, system, or network, or to perform system maintenance, such as installing patches, and upgrading software or hardware. Automatic switchover of a redundant system on an error condition, without human intervention, is called failover. Manual switchover on error would be used if automatic failover is not available, possibly because the overall system is too complex. See also * Safety engineering * Data integrity * Fault-tolerance * HA Clusters * Business continuity planning Business continuity may be defined as "the capability of an organization to continue the delivery of products or services at pre-defined acceptable levels following a disruptive incident", and business continuity planning (or business continuity a ... References Engineering failures Fault-tolerant computer systems {{c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	High Availability High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. Modernization has resulted in an increased reliance on these systems. For example, hospitals and data centers require high availability of their systems to perform routine daily activities. Availability refers to the ability of the user community to obtain a service or good, access the system, whether to submit new work, update or alter existing work, or collect the results of previous work. If a user cannot access the system, it is – from the user's point of view – ''unavailable''. Generally, the term ''downtime'' is used to refer to periods when a system is unavailable. Principles There are three principles of systems design in reliability engineering which can help achieve high availability. # Elimination of single points of failure. This means adding or building redundancy into the system so that ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Fault-tolerance Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation. A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails. The term is most commonly used to describe computer systems designed to continue more or less fully operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Heartbeat (computing) In computer science, a heartbeat is a periodic signal generated by hardware or software to indicate normal operation or to synchronize other parts of a computer system. Heartbeat mechanism is one of the common techniques in mission critical systems for providing high availability and fault tolerance of network services by detecting the network or systems failures of nodes or daemons which belongs to a network cluster—administered by a master server—for the purpose of automatic adaptation and rebalancing of the system by using the remaining redundant nodes on the cluster to take over the load of failed nodes for providing constant services. Usually a heartbeat is sent between machines at a regular interval in the order of seconds; a heartbeat message. If the endpoint does not receive a heartbeat for a time—usually a few heartbeat intervals—the machine that should have sent the heartbeat is assumed to have failed. Heartbeat messages are typically sent non-stop on a perio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Redundancy (engineering) In engineering, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing. In many safety-critical systems, such as fly-by-wire and hydraulic systems in aircraft, some parts of the control system may be triplicated, which is formally termed triple modular redundancy (TMR). An error in one component may then be out-voted by the other two. In a triply redundant system, the system has three sub components, all three of which must fail before the system fails. Since each one rarely fails, and the sub components are expected to fail independently, the probability of all three failing is calculated to be extraordinarily small; it is often outweighed by other risk factors, such as human error. Redundancy may also be known by the terms "m ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Safety Engineering Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail. Analysis techniques Analysis techniques can be split into two categories: qualitative and quantitative methods. Both approaches share the goal of finding causal dependencies between a hazard on system level and failures of individual components. Qualitative approaches focus on the question "What must go wrong, such that a system hazard may occur?", while quantitative methods aim at providing estimations about probabilities, rates and/or severity of consequences. The complexity of the technical systems such as Improvements of Design and Materials, Planned Inspections, Fool-proof design, and Backup Redundancy decreases risk and increases the cost. T ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Fencing (computing) Fencing is the process of isolating a node of a computer cluster or protecting shared resources when a node appears to be malfunctioning.''Sun Cluster environment: Sun Cluster 2.2'' by Enrique Vargas, Joseph Bianco, David Deeths 2001 ISBN page 58 As the number of nodes in a cluster increases, so does the likelihood that one of them may fail at some point. The failed node may have control over shared resources that need to be reclaimed and if the node is acting erratically, the rest of the system needs to be protected. Fencing may thus either disable the node, or disallow shared storage access, thus ensuring data integrity. Basic concepts A node fence (or I/O fence) is a virtual "fence" that separates nodes which must not have access to a shared resource from that resource. It may separate an active node from its backup. If the backup crosses the fence and, for example, tries to control the same disk array as the primary, a data hazard may occur. Mechanisms such as STONITH are d ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]