Fail-over

	Fail-over Failover is switching to a redundant or standby computer server, system, hardware component or network upon the failure or abnormal termination of the previously active application, server, system, hardware component, or network in a computer network. Failover and switchover are essentially the same operation, except that failover is automatic and usually operates without warning, while switchover requires human intervention. Systems designers usually provide failover capability in servers, systems or networks requiring near-continuous availability and a high degree of reliability. At the server level, failover automation usually uses a " heartbeat" system that connects two servers, either through using a separate cable (for example, RS-232 serial ports/cable) or a network connection. As long as a regular "pulse" or "heartbeat" continues between the main server and the second server, the second server will not bring its systems online. There may also be a third "spare parts" se ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	High-availability Cluster High-availability clusters (also known as HA clusters, fail-over clusters) are groups of computers that support server applications that can be reliably utilized with a minimum amount of down-time. They operate by using high availability software to harness redundant computers in groups or clusters that provide continued service when system components fail. Without clustering, if a server running a particular application crashes, the application will be unavailable until the crashed server is fixed. HA clustering remedies this situation by detecting hardware/software faults, and immediately restarting the application on another system without requiring administrative intervention, a process known as failover. As part of this process, clustering software may configure the node before starting the application on it. For example, appropriate file systems may need to be imported and mounted, network hardware may have to be configured, and some supporting applications may need to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Safety Engineering Safety engineering is an engineering discipline which assures that engineered systems provide acceptable levels of safety. It is strongly related to industrial engineering/systems engineering, and the subset system safety engineering. Safety engineering assures that a life-critical system behaves as needed, even when components fail. Analysis techniques Analysis techniques can be split into two categories: qualitative and quantitative methods. Both approaches share the goal of finding causal dependencies between a hazard on system level and failures of individual components. Qualitative approaches focus on the question "What must go wrong, such that a system hazard may occur?", while quantitative methods aim at providing estimations about probabilities, rates and/or severity of consequences. The complexity of the technical systems such as Improvements of Design and Materials, Planned Inspections, Fool-proof design, and Backup Redundancy decreases risk and increases the cost. T ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Disaster Recovery Disaster recovery is the process of maintaining or reestablishing vital infrastructure and systems following a natural or human-induced disaster, such as a storm or battle.It employs policies, tools, and procedures. Disaster recovery focuses on the information technology (IT) or technology systems supporting critical business functions as opposed to business continuity. This involves keeping all essential aspects of a business functioning despite significant disruptive events; it can therefore be considered a subset of business continuity. Disaster recovery assumes that the primary site is not immediately recoverable and restores data and services to a secondary site. IT service continuity IT Service Continuity (ITSC) is a subset of business continuity planning (BCP) that focuses on Recovery Point Objective (RPO) and Recovery Time Objective (RTO). It encompasses IT disaster recovery planning and wider IT resilience planning. It also incorporates IT infrastructure and services rela ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Redundancy (engineering) In engineering, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing. In many safety-critical systems, such as fly-by-wire and hydraulic systems in aircraft, some parts of the control system may be triplicated, which is formally termed triple modular redundancy (TMR). An error in one component may then be out-voted by the other two. In a triply redundant system, the system has three sub components, all three of which must fail before the system fails. Since each one rarely fails, and the sub components are expected to fail independently, the probability of all three failing is calculated to be extraordinarily small; it is often outweighed by other risk factors, such as human error. Redundancy may also be known by the terms "m ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Migration (virtualization) In the context of virtualization, where a ''guest'' simulation of an entire computer is actually merely a software virtual machine (VM) running on a ''host'' computer under a hypervisor, migration (also known as teleportation) is the process by which a ''running'' virtual machine is moved from one physical host to another, with little or no disruption in service. Subjective effects Ideally, the process is completely transparent, resulting in no disruption of service (or downtime). In practice, there is always some minor pause in availability, though it may be low enough that only hard real-time systems are affected. Virtualization is far more frequently used with network services and user applications, and these can generally tolerate the brief delays which may be involved. The perceived impact, if any, is similar to a longer-than-usual kernel delay. Objective effects The actual process is heavily dependent on the particular virtualization package in use, but in general, the pr ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Teleportation (virtualization) In the context of virtualization, where a ''guest'' simulation of an entire computer is actually merely a software virtual machine (VM) running on a ''host'' computer under a hypervisor, migration (also known as teleportation) is the process by which a ''running'' virtual machine is moved from one physical host to another, with little or no disruption in service. Subjective effects Ideally, the process is completely transparent, resulting in no disruption of service (or downtime). In practice, there is always some minor pause in availability, though it may be low enough that only hard real-time systems are affected. Virtualization is far more frequently used with network services and user applications, and these can generally tolerate the brief delays which may be involved. The perceived impact, if any, is similar to a longer-than-usual kernel delay. Objective effects The actual process is heavily dependent on the particular virtualization package in use, but in general, the p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Log Shipping Log shipping is the process of automating the backup of transaction log files on a primary (production) database server, and then restoring them onto a standby server. This technique is supported by Microsoft SQL Server, How to Perform SQL Server Log Shipping , "What is Log Shipping". Retrieved on 2008-12-16. 4D Server, MySQL , and PostgreSQL . Similar to [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Load Balancing (computing) In computing, load balancing is the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenly overloading some compute nodes while other compute nodes are left idle. Load balancing is the subject of research in the field of parallel computers. Two main approaches exist: static algorithms, which do not take into account the state of the different machines, and dynamic algorithms, which are usually more general and more efficient but require exchanges of information between the different computing units, at the risk of a loss of efficiency. Problem overview A load-balancing algorithm always tries to answer a specific problem. Among other things, the nature of the tasks, the algorithmic complexity, the hardware architecture on which the algorithms will run as well as required error tolerance, must be taken into account. Therefore c ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Fencing (computing) Fencing is the process of isolating a node of a computer cluster or protecting shared resources when a node appears to be malfunctioning.''Sun Cluster environment: Sun Cluster 2.2'' by Enrique Vargas, Joseph Bianco, David Deeths 2001 ISBN page 58 As the number of nodes in a cluster increases, so does the likelihood that one of them may fail at some point. The failed node may have control over shared resources that need to be reclaimed and if the node is acting erratically, the rest of the system needs to be protected. Fencing may thus either disable the node, or disallow shared storage access, thus ensuring data integrity. Basic concepts A node fence (or I/O fence) is a virtual "fence" that separates nodes which must not have access to a shared resource from that resource. It may separate an active node from its backup. If the backup crosses the fence and, for example, tries to control the same disk array as the primary, a data hazard may occur. Mechanisms such as STONITH are d ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Fault-tolerance Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Fault tolerance is particularly sought after in high-availability, mission-critical, or even life-critical systems. The ability of maintaining functionality when portions of a system break down is referred to as graceful degradation. A fault-tolerant design enables a system to continue its intended operation, possibly at a reduced level, rather than failing completely, when some part of the system fails. The term is most commonly used to describe computer systems designed to continue more or less fully operational with, perhaps, a reduction in throughput or an increase in response time in the event of some partial ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Data Integrity Data integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire Information Lifecycle Management, life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The term is broad in scope and may have widely different meanings depending on the specific context even under the same general umbrella of computing. It is at times used as a proxy term for data quality, while data validation is a prerequisite for data integrity. Data integrity is the opposite of data corruption. The overall intent of any data integrity technique is the same: ensure data is recorded exactly as intended (such as a database correctly rejecting mutually exclusive possibilities). Moreover, upon later Data retrieval, retrieval, ensure the data is the same as when it was originally recorded. In short, data integrity aims to prevent unintentional changes to information. Data integrity is not to be confus ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	NASA The National Aeronautics and Space Administration (NASA ) is an independent agency of the US federal government responsible for the civil space program, aeronautics research, and space research. NASA was established in 1958, succeeding the National Advisory Committee for Aeronautics (NACA), to give the U.S. space development effort a distinctly civilian orientation, emphasizing peaceful applications in space science. NASA has since led most American space exploration, including Project Mercury, Project Gemini, the 1968-1972 Apollo Moon landing missions, the Skylab space station, and the Space Shuttle. NASA supports the International Space Station and oversees the development of the Orion spacecraft and the Space Launch System for the crewed lunar Artemis program, Commercial Crew spacecraft, and the planned Lunar Gateway space station. The agency is also responsible for the Launch Services Program, which provides oversight of launch operations and countdown management f ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]