HOME

TheInfoList



OR:

In
distributed computing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commu ...
, a single system image (SSI) cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a
distributed operating system A distributed operating system is system software over a collection of independent software, networked, communicating, and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs. Each individual node holds a ...
, but a single image may be presented for more limited purposes, just
job scheduling A job scheduler is a computer application for controlling unattended background program execution of job (computing), jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though tr ...
for instance, which may be achieved by means of an additional layer of software over conventional operating system images running on each
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines ...
. The interest in SSI clusters is based on the perception that they may be simpler to use and administer than more specialized clusters. Different SSI systems may provide a more or less complete
illusion An illusion is a distortion of the senses, which can reveal how the mind normally organizes and interprets sensory stimulation. Although illusions distort the human perception of reality, they are generally shared by most people. Illusions may ...
of a single system.


Features of SSI clustering systems

Different SSI systems may, depending on their intended usage, provide some subset of these features.


Process migration

Many SSI systems provide
process migration In computing, process migration is a specialized form of process management whereby processes are moved from one computing environment to another. This originated in distributed computing, but is now used more widely. On multicore machines (mul ...
. Processes may start on one
node In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics * Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines ...
and be moved to another node, possibly for resource balancing or administrative reasons.for example it may be necessary to move long running processes off a node that is to be closed down for maintenance As processes are moved from one node to another, other associated resources (for example IPC resources) may be moved with them.


Process checkpointing

Some SSI systems allow checkpointing of running processes, allowing their current state to be saved and reloaded at a later date.Checkpointing is particularly useful in clusters used for
high-performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...
, avoiding lost work in case of a cluster or node restart.
Checkpointing can be seen as related to migration, as migrating a process from one node to another can be implemented by first checkpointing the process, then restarting it on another node. Alternatively checkpointing can be considered as ''migration to disk''.


Single process space

Some SSI systems provide the illusion that all processes are running on the same machine - the process management tools (e.g. "ps", "kill" on
Unix Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
like systems) operate on all processes in the cluster.


Single root

Most SSI systems provide a single view of the file system. This may be achieved by a simple NFS server, shared disk devices or even file replication. The advantage of a single root view is that processes may be run on any available node and access needed files with no special precautions. If the cluster implements process migration a single root view enables direct accesses to the files from the node where the process is currently running. Some SSI systems provide a way of "breaking the illusion", having some node-specific files even in a single root. HP
TruCluster TruCluster is a closed-source high-availability clustering solution for the Tru64 UNIX operating system. It was originally developed by Digital Equipment Corporation (DEC), but was transferred to Compaq in 1998 when Digital was acquired by the co ...
provides a "context dependent symbolic link" (CDSL) which points to different files depending on the node that accesses it. HP VMScluster provides a search list logical name with node specific files occluding cluster shared files where necessary. This capability may be necessary to deal with ''heterogeneous'' clusters, where not all nodes have the same configuration. In more complex configurations such as multiple nodes of multiple architectures over multiple sites, several local disks may combine to form the logical single root.


Single I/O space

Some SSI systems allow all nodes to access the I/O devices (e.g. tapes, disks, serial lines and so on) of other nodes. There may be some restrictions on the kinds of accesses allowed (For example,
OpenSSI OpenSSI is an open-source single-system image clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster ...
can't mount disk devices from one node on another node).


Single IPC space

Some SSI systems allow processes on different nodes to communicate using
inter-process communication In computer science, interprocess communication (IPC) is the sharing of data between running Process (computing), processes in a computer system. Mechanisms for IPC may be provided by an operating system. Applications which use IPC are often cat ...
s mechanisms as if they were running on the same machine. On some SSI systems this can even include shared memory (can be emulated in software with
distributed shared memory In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memo ...
). In most cases inter-node IPC will be slower than IPC on the same machine, possibly drastically slower for shared memory. Some SSI clusters include special hardware to reduce this slowdown.


Cluster IP address

Some SSI systems provide a "
cluster IP A cluster IP is a term in cloud computing to refer to a proxy that represents a computer cluster with a single IP address. It is a term used by the cloud computing system Kubernetes Kubernetes (), also known as K8s is an open-source software, o ...
address", a single address visible from outside the cluster that can be used to contact the cluster as if it were one machine. This can be used for load balancing inbound calls to the cluster, directing them to lightly loaded nodes, or for redundancy, moving the cluster address from one machine to another as nodes join or leave the cluster."leaving a cluster" is often a euphemism for crashing


Examples

Examples here vary from commercial platforms with scaling capabilities, to packages/frameworks for creating distributed systems, as well as those that actually implement a single system image.


See also

* Diskless shared-root cluster *
Distributed lock manager A distributed lock manager (DLM) runs in every machine in a cluster, with an identical copy of a cluster-wide lock database. Operating systems use lock managers to organise and serialise the access to resources. In this way a DLM provides software ...
*
Distributed cache In computing, a distributed cache is an extension of the traditional concept of cache used in a single locale. A distributed cache may span multiple servers so that it can grow in size and in transactional capacity. It is mainly used to store ap ...
* Parallel Virtual Machine - multiple system image alternative *
Message Passing Interface The Message Passing Interface (MPI) is a portable message-passing standard designed to function on parallel computing architectures. The MPI standard defines the syntax and semantics of library routines that are useful to a wide range of use ...
- multiple system image alternative


Notes


References

{{DEFAULTSORT:Single System Image Cluster computing