Sun Microsystems Solaris computer cluster

A computer cluster is a set of

computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...

s that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each

node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines ...

set to perform the same task, controlled and scheduled by software. The newest manifestation of cluster computing is

cloud computing Cloud computing is "a paradigm for enabling network access to a scalable and elastic pool of shareable physical or virtual resources with self-service provisioning and administration on-demand," according to International Organization for ...

. The components of a cluster are usually connected to each other through fast

local area network A local area network (LAN) is a computer network that interconnects computers within a limited area such as a residence, campus, or building, and has its network equipment and interconnects locally managed. LANs facilitate the distribution of da ...

s, with each

(computer used as a server) running its own instance of an

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

. In most circumstances, all of the nodes use the same hardware and the same operating system, although in some setups (e.g. using Open Source Cluster Application Resources (OSCAR)), different operating systems can be used on each computer, or different hardware. Clusters are usually deployed to improve performance and availability over that of a single computer, while typically being much more cost-effective than single computers of comparable speed or availability. Computer clusters emerged as a result of the convergence of a number of computing trends including the availability of low-cost microprocessors, high-speed networks, and software for high-performance

distributed computing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commu ...

. They have a wide range of applicability and deployment, ranging from small business clusters with a handful of nodes to some of the fastest

supercomputer A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...

s in the world such as IBM's Sequoia. Prior to the advent of clusters, single-unit fault tolerant mainframes with modular redundancy were employed; but the lower upfront cost of clusters, and increased speed of network fabric has favoured the adoption of clusters. In contrast to high-reliability mainframes, clusters are cheaper to scale out, but also have increased complexity in error handling, as in clusters error modes are not opaque to running programs.

Basic concepts

The desire to get more computing power and better reliability by orchestrating a number of low-cost

commercial off-the-shelf Commercial-off-the-shelf or commercially available off-the-shelf (COTS) products are packaged or canned (ready-made) hardware or software, which are adapted aftermarket to the needs of the purchasing organization, rather than the commissioning of ...

computers has given rise to a variety of architectures and configurations. The computer clustering approach usually (but not always) connects a number of readily available computing nodes (e.g. personal computers used as servers) via a fast

. The activities of the computing nodes are orchestrated by "clustering middleware", a software layer that sits atop the nodes and allows the users to treat the cluster as by and large one cohesive computing unit, e.g. via a single system image concept. Computer clustering relies on a centralized management approach which makes the nodes available as orchestrated shared servers. It is distinct from other approaches such as

peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network, forming a peer-to-peer network of Node ...

grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished fro ...

which also use many nodes, but with a far more distributed nature. A computer cluster may be a simple two-node system which just connects two personal computers, or may be a very fast

. A basic approach to building a cluster is that of a

Beowulf ''Beowulf'' (; ) is an Old English poetry, Old English poem, an Epic poetry, epic in the tradition of Germanic heroic legend consisting of 3,182 Alliterative verse, alliterative lines. It is one of the most important and List of translat ...

cluster which may be built with a few personal computers to produce a cost-effective alternative to traditional

high-performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...

. An early project that showed the viability of the concept was the 133-node Stone Soupercomputer. The developers used

Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...

, the Parallel Virtual Machine toolkit and the

Message Passing Interface The Message Passing Interface (MPI) is a portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of use ...

library to achieve high performance at a relatively low cost. Although a cluster may consist of just a few personal computers connected by a simple network, the cluster architecture may also be used to achieve very high levels of performance. The

TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...

organization's semiannual list of the 500 fastest supercomputers often includes many clusters, e.g. the world's fastest machine in 2011 was the K computer which has a

distributed memory In computer science, distributed memory refers to a Multiprocessing, multiprocessor computer system in which each Central processing unit, processor has its own private Computer memory, memory. Computational tasks can only operate on local data ...

, cluster architecture.

History

Greg Pfister has stated that clusters were not invented by any specific vendor but by customers who could not fit all their work on one computer, or needed a backup. Pfister estimates the date as some time in the 1960s. The formal engineering basis of cluster computing as a means of doing parallel work of any sort was arguably invented by Gene Amdahl of

IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...

, who in 1967 published what has come to be regarded as the seminal paper on parallel processing: Amdahl's Law. The history of early computer clusters is more or less directly tied to the history of early networks, as one of the primary motivations for the development of a network was to link computing resources, creating a de facto computer cluster. The first production system designed as a cluster was the Burroughs B5700 in the mid-1960s. This allowed up to four computers, each with either one or two processors, to be tightly coupled to a common disk storage subsystem in order to distribute the workload. Unlike standard multiprocessor systems, each computer could be restarted without disrupting overall operation. TNSII

The first commercial loosely coupled clustering product was Datapoint Corporation's "Attached Resource Computer" (ARC) system, developed in 1977, and using ARCnet as the cluster interface. Clustering per se did not really take off until

Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president until ...

released their VAXcluster product in 1984 for the VMS operating system. The ARC and VAXcluster products not only supported

parallel computing Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...

, but also shared file systems and

peripheral A peripheral device, or simply peripheral, is an auxiliary hardware device that a computer uses to transfer information externally. A peripheral is a hardware component that is accessible to and controlled by a computer but is not a core compo ...

devices. The idea was to provide the advantages of parallel processing, while maintaining data reliability and uniqueness. Two other noteworthy early commercial clusters were the ''Tandem NonStop'' (a 1976 high-availability commercial product) and the ''IBM S/390 Parallel Sysplex'' (circa 1994, primarily for business use). Within the same time frame, while computer clusters used parallelism outside the computer on a commodity network,

s began to use them within the same computer. Following the success of the CDC 6600 in 1964, the Cray 1 was delivered in 1976, and introduced internal parallelism via

vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its Instruction (computer science), instructions are designed to operate efficiently and effectively on large Array d ...

. While early supercomputers excluded clusters and relied on shared memory, in time some of the fastest supercomputers (e.g. the K computer) relied on cluster architectures.

Attributes of clusters

Computer clusters may be configured for different purposes ranging from general purpose business needs such as web-service support, to computation-intensive scientific calculations. In either case, the cluster may use a high-availability approach. Note that the attributes described below are not exclusive and a "computer cluster" may also use a high-availability approach, etc. " Load-balancing" clusters are configurations in which cluster-nodes share computational workload to provide better overall performance. For example, a web server cluster may assign different queries to different nodes, so the overall response time will be optimized. However, approaches to load-balancing may significantly differ among applications, e.g. a high-performance cluster used for scientific computations would balance load with different algorithms from a web-server cluster which may just use a simple round-robin method by assigning each new request to a different node. Computer clusters are used for computation-intensive purposes, rather than handling IO-oriented operations such as web service or databases. For instance, a computer cluster might support computational simulations of vehicle crashes or weather. Very tightly coupled computer clusters are designed for work that may approach "

supercomputing A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...

". " High-availability clusters" (also known as failover clusters, or HA clusters) improve the availability of the cluster approach. They operate by having redundant nodes, which are then used to provide service when system components fail. HA cluster implementations attempt to use redundancy of cluster components to eliminate single points of failure. There are commercial implementations of High-Availability clusters for many operating systems. The Linux-HA project is one commonly used

free software Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...

HA package for the

operating system.

Benefits

Clusters are primarily designed with performance in mind, but installations are based on many other factors. Fault tolerance (''the ability of a system to continue operating despite a malfunctioning node'') enables

scalability Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system. In an economic context, a scalable business model implies that ...

, and in high-performance situations, allows for a low frequency of maintenance routines, resource consolidation (e.g.,

RAID RAID (; redundant array of inexpensive disks or redundant array of independent disks) is a data storage virtualization technology that combines multiple physical Computer data storage, data storage components into one or more logical units for th ...

), and centralized management. Advantages include enabling data recovery in the event of a disaster and providing parallel data processing and high processing capacity. In terms of scalability, clusters provide this in their ability to add nodes horizontally. This means that more computers may be added to the cluster, to improve its performance, redundancy and fault tolerance. This can be an inexpensive solution for a higher performing cluster compared to scaling up a single node in the cluster. This property of computer clusters can allow for larger computational loads to be executed by a larger number of lower performing computers. When adding a new node to a cluster, reliability increases because the entire cluster does not need to be taken down. A single node can be taken down for maintenance, while the rest of the cluster takes on the load of that individual node. If you have a large number of computers clustered together, this lends itself to the use of

distributed file systems A clustered file system (CFS) is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached stora ...

and

, both of which can increase the reliability and speed of a cluster.

Design and configuration

One of the issues in designing a cluster is how tightly coupled the individual nodes may be. For instance, a single computer job may require frequent communication among nodes: this implies that the cluster shares a dedicated network, is densely located, and probably has homogeneous nodes. The other extreme is where a computer job uses one or few nodes, and needs little or no inter-node communication, approaching

. In a

Beowulf cluster A Beowulf cluster is a computer cluster of normally identical, commodity-grade computers networked into a small local area network with libraries and programs installed that allow processing to be shared among them. The result is a high-performa ...

, the application programs never see the computational nodes (also called slave computers) but only interact with the "Master" which is a specific computer handling the scheduling and management of the slaves. In a typical implementation the Master has two network interfaces, one that communicates with the private Beowulf network for the slaves, the other for the general purpose network of the organization. The slave computers typically have their own version of the same operating system, and local memory and disk space. However, the private slave network may also have a large and shared file server that stores global persistent data, accessed by the slaves as needed. A special purpose 144-node DEGIMA cluster is tuned to running astrophysical N-body simulations using the Multiple-Walk parallel tree code, rather than general purpose scientific computations. Due to the increasing computing power of each generation of

game console A video game console is an electronic device that outputs a video signal or image to display a video game that can typically be played with a game controller. These may be home consoles, which are generally placed in a permanent location conne ...

s, a novel use has emerged where they are repurposed into

High-performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...

(HPC) clusters. Some examples of game console clusters are Sony PlayStation clusters and

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

Xbox Xbox is a video gaming brand that consists of four main home video game console lines, as well as application software, applications (games), the streaming media, streaming service Xbox Cloud Gaming, and online services such as the Xbox networ ...

clusters. Another example of consumer game product is the Nvidia Tesla Personal Supercomputer workstation, which uses multiple graphics accelerator processor chips. Besides game consoles, high-end graphics cards too can be used instead. The use of graphics cards (or rather their GPU's) to do calculations for grid computing is vastly more economical than using CPU's, despite being less precise. However, when using double-precision values, they become as precise to work with as CPU's and are still much less costly (purchase cost). Computer clusters have historically run on separate physical

s with the same

. With the advent of

virtualization In computing, virtualization (abbreviated v12n) is a series of technologies that allows dividing of physical computing resources into a series of virtual machines, operating systems, processes or containers. Virtualization began in the 1960s wit ...

, the cluster nodes may run on separate physical computers with different operating systems which are painted above with a virtual layer to look similar. The cluster may also be virtualized on various configurations as maintenance takes place; an example implementation is Xen as the virtualization manager with Linux-HA.

Data sharing and communication

Data sharing

As the computer clusters were appearing during the 1980s, so were

s. One of the elements that distinguished the three classes at that time was that the early supercomputers relied on shared memory. Clusters do not typically use physically shared memory, while many supercomputer architectures have also abandoned it. However, the use of a clustered file system is essential in modern computer clusters. Examples include the

IBM General Parallel File System GPFS (General Parallel File System, brand name IBM Storage Scale and previously IBM Spectrum Scale) is a high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel ...

, Microsoft's

Cluster Shared Volumes Cluster Shared Volumes (CSV) is a feature of Failover Clustering first introduced in Windows Server 2008 R2 for use with the Hyper-V role. A Cluster Shared Volume is a shared disk containing an NTFS or ReFS (ReFS: Windows Server 2012 R2 or newer) v ...

or the Oracle Cluster File System.

Message passing and communication

Two widely used approaches for communication between cluster nodes are MPI (

) and PVM ( Parallel Virtual Machine). PVM was developed at the

Oak Ridge National Laboratory Oak Ridge National Laboratory (ORNL) is a federally funded research and development centers, federally funded research and development center in Oak Ridge, Tennessee, United States. Founded in 1943, the laboratory is sponsored by the United Sta ...

around 1989 before MPI was available. PVM must be directly installed on every cluster node and provides a set of software libraries that paint the node as a "parallel virtual machine". PVM provides a run-time environment for message-passing, task and resource management, and fault notification. PVM can be used by user programs written in C, C++, or Fortran, etc. MPI emerged in the early 1990s out of discussions among 40 organizations. The initial effort was supported by ARPA and

National Science Foundation The U.S. National Science Foundation (NSF) is an Independent agencies of the United States government#Examples of independent agencies, independent agency of the Federal government of the United States, United States federal government that su ...

. Rather than starting anew, the design of MPI drew on various features available in commercial systems of the time. The MPI specifications then gave rise to specific implementations. MPI implementations typically use

TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suite are ...

and socket connections. MPI is now a widely available communications model that enables parallel programs to be written in languages such as C, Fortran, Python, etc. Thus, unlike PVM which provides a concrete implementation, MPI is a specification which has been implemented in systems such as MPICH and

Open MPI Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI). It is used by many TOP500 supercomputers including Roadrunner, which was th ...

Cluster management

One of the challenges in the use of a computer cluster is the cost of administrating it which can at times be as high as the cost of administrating N independent machines, if the cluster has N nodes. In some cases this provides an advantage to

shared memory architecture A shared-memory architecture (SM) is a distributed computing architecture in which the nodes share the same memory as well as the same storage.{{Cite web , title=Memory: Shared vs Distributed - UFRC , url=https://help.rc.ufl.edu/doc/Memory:_Shared_ ...

s with lower administration costs. This has also made

virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...

s popular, due to the ease of administration.

Task scheduling

When a large multi-user cluster needs to access very large amounts of data,

task scheduling In computing, scheduling is the action of assigning resources to perform tasks. The resources may be processors, network links or expansion cards. The tasks may be threads, processes or data flows. The scheduling activity is carried out by ...

becomes a challenge. In a heterogeneous CPU-GPU cluster with a complex application environment, the performance of each job depends on the characteristics of the underlying cluster. Therefore, mapping tasks onto CPU cores and GPU devices provides significant challenges. This is an area of ongoing research; algorithms that combine and extend

MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filte ...

and

Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...

have been proposed and studied.

Node failure management

When a node in a cluster fails, strategies such as "

fencing Fencing is a combat sport that features sword fighting. It consists of three primary disciplines: Foil (fencing), foil, épée, and Sabre (fencing), sabre (also spelled ''saber''), each with its own blade and set of rules. Most competitive fe ...

" may be employed to keep the rest of the system operational. Fencing is the process of isolating a node or protecting shared resources when a node appears to be malfunctioning. There are two classes of fencing methods; one disables a node itself, and the other disallows access to resources such as shared disks. The STONITH method stands for "Shoot The Other Node In The Head", meaning that the suspected node is disabled or powered off. For instance, ''power fencing'' uses a power controller to turn off an inoperable node. The ''resources fencing'' approach disallows access to resources without powering off the node. This may include ''persistent reservation fencing'' via the SCSI3, fibre channel fencing to disable the

fibre channel Fibre Channel (FC) is a high-speed data transfer protocol providing in-order, lossless delivery of raw block data. Fibre Channel is primarily used to connect computer data storage to Server (computing), servers in storage area networks (SAN) in ...

port, or global network block device (GNBD) fencing to disable access to the GNBD server.

Software development and administration

Parallel programming

Load balancing clusters such as web servers use cluster architectures to support a large number of users and typically each user request is routed to a specific node, achieving

task parallelism Task parallelism (also known as function parallelism and control parallelism) is a form of parallelization of computer code across multiple processors in parallel computing environments. Task parallelism focuses on distributing tasks—concurre ...

without multi-node cooperation, given that the main goal of the system is providing rapid user access to shared data. However, "computer clusters" which perform complex computations for a small number of users need to take advantage of the parallel processing capabilities of the cluster and partition "the same computation" among several nodes. Automatic parallelization of programs remains a technical challenge, but

parallel programming model In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its ''generalit ...

s can be used to effectuate a higher degree of parallelism via the simultaneous execution of separate portions of a program on different processors.

Debugging and monitoring

Developing and debugging parallel programs on a cluster requires parallel language primitives and suitable tools such as those discussed by the ''High Performance Debugging Forum'' (HPDF) which resulted in the HPD specifications. Tools such as TotalView were then developed to debug parallel implementations on computer clusters which use

(MPI) or Parallel Virtual Machine (PVM) for message passing. The

University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California), is a Public university, public Land-grant university, land-grant research university in Berkeley, California, United States. Founded in 1868 and named after t ...

''Network of Workstations'' (NOW) system gathers cluster data and stores them in a database, while a system such as PARMON, developed in India, allows visually observing and managing large clusters.

Application checkpointing Checkpointing is a technique that provides fault tolerance for computing systems. It involves saving a Snapshot (computer storage), snapshot of an application software, application's state, so that it can restart from that point in case of failure ...

can be used to restore a given state of the system when a node fails during a long multi-node computation. This is essential in large clusters, given that as the number of nodes increases, so does the likelihood of node failure under heavy computational loads. Checkpointing can restore the system to a stable state so that processing can resume without needing to recompute results.

Implementations

The Linux world supports various cluster software; for application clustering, there is distcc, and MPICH.

Linux Virtual Server Linux Virtual Server (LVS) is load balancing software for Linux kernel–based operating systems. LVS is a free and open-source project started by Wensong Zhang in May 1998, subject to the requirements of the GNU General Public License (GPL ...

, Linux-HA – director-based clusters that allow incoming requests for services to be distributed across multiple cluster nodes. MOSIX, LinuxPMI, Kerrighed,

OpenSSI OpenSSI is an open-source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster ...

are full-blown clusters integrated into the kernel that provide for automatic process migration among homogeneous nodes.

, openMosix and Kerrighed are single-system image implementations.

Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...

computer cluster Server 2003 based on the

Windows Server Windows Server (formerly Windows NT Server) is a brand name for Server (computing), server-oriented releases of the Windows NT operating system (OS) that have been developed by Microsoft since 1993. The first release under this brand name i ...

platform provides pieces for high-performance computing like the job scheduler, MSMPI library and management tools. gLite is a set of middleware technologies created by the Enabling Grids for E-sciencE (EGEE) project. slurm is also used to schedule and manage some of the largest supercomputer clusters (see top500 list).

Other approaches

Although most computer clusters are permanent fixtures, attempts at flash mob computing have been made to build short-lived clusters for specific computations. However, larger-scale

volunteer computing Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop ...

systems such as

BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it became the ...

-based systems have had more followers.

References

External links

IEEE Technical Committee on Scalable Computing (TCSC)

Tivoli System Automation Wiki

Large-scale cluster management at Google with Borg
April 2015, by Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, David Oppenheimer, Eric Tune and John Wilkes {{DEFAULTSORT:Cluster (Computing) Parallel computing Concurrent computing *Computer cluster Local area networks Classes of computers Fault-tolerant computer systems Server hardware