OpenSSI
   HOME

TheInfoList



OR:

OpenSSI is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
single-system image In distributed computing, a single system image (SSI) cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented ...
clustering system. It allows a collection of computers to be treated as one large system, allowing applications running on any one machine access to the resources of all the machines in the cluster. OpenSSI is based on the
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
and was released as an open source project by
Compaq Compaq Computer Corporation (sometimes abbreviated to CQ prior to a 2007 rebranding) was an American information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced ...
in 2001. It is the final stage of a long process of development, stretching back to
LOCUS Locus (plural loci) is Latin for "place". It may refer to: Entertainment * Locus (comics), a Marvel Comics mutant villainess, a member of the Mutant Liberation Front * ''Locus'' (magazine), science fiction and fantasy magazine ** ''Locus Award' ...
, developed in the early 1980s.


Description

OpenSSI allows a cluster of individual computers (''nodes'') to be treated as one large system. Processes run on any node have full access to the resources of all nodes. Processes can be migrated from node to node automatically to balance system utilization. Inbound network connections can be directed to the least loaded node available. OpenSSI is designed to be used for both high performance and
high availability High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period. Modernization has resulted in an increased reliance on these systems. Fo ...
clusters. It is possible to create an OpenSSI cluster with no
single point of failure A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software appl ...
, for example the file system can be mirrored between two nodes, so if one node crashes the process accessing the file will ''fail over'' to the other node. Alternatively the cluster can be designed in such a manner that every node has direct access to the file system.


Features


Single process space

OpenSSI provides a single process space – every process is visible from every node, and can be managed from any node using the normal Linux commands (ps, kill, renice and so on). The Linux /proc virtual filesystem shows all running processes on all nodes. The implementation of the single process space is accomplished using the VPROC abstraction invented by
Locus Locus (plural loci) is Latin for "place". It may refer to: Entertainment * Locus (comics), a Marvel Comics mutant villainess, a member of the Mutant Liberation Front * ''Locus'' (magazine), science fiction and fantasy magazine ** ''Locus Award' ...
for the OSF/1 AD operating system.


Migration

OpenSSI allows migration of running processes between nodes. When running processes are migrated they continue to have access to any open files, IPC objects or network connections. Processes can be ''manually'' migrated, either by the process calling the special OpenSSI ''migrate(2)'' system call, or by writing a node number to a special file in the processes /proc directory. Processes may also, if the user wants, be automatically migrated in order to balance load across the cluster. OpenSSI uses an algorithm developed by the
MOSIX MOSIX is a proprietary distributed operating system. Although early versions were based on older UNIX systems, since 1999 it focuses on Linux clusters and grids. In a MOSIX cluster/grid there is no need to modify or to link applications with an ...
project for determining the load on each node.


Single root

OpenSSI provides a single root for the cluster - from any node the same files and directories are available. OpenSSI uses several mechanisms to provide the single root – CFS (the OpenSSI Cluster File System), SAN cluster filesystems and parallel mounts of network file systems. OpenSSI uses the context dependent symbolic link (CDSL) feature, inspired by HP's
TruCluster TruCluster is a closed-source high-availability clustering solution for the Tru64 UNIX operating system. It was originally developed by Digital Equipment Corporation (DEC), but was transferred to Compaq Compaq Computer Corporation (sometimes abb ...
system, to allow access to node-specific files in a manner transparent to non cluster-aware applications. A CDSL may point to different files on each node in the cluster.


CFS

CFS, the OpenSSI Cluster File System provides transparent inter-node access to an underlying ''real'' file system on one node. CFS is ''stacked'' on top of the real file system and co-ordinates access from different nodes using a ''token'' mechanism. One node has physical access to the underlying file system and performs all read and write operations. At any one time one node ''owns'' a token, representing a part of the underlying file, this implies that that part of the file is in the cache of the owning node. If another node tries to access that part of the file the token is ''stolen'' and the cache contents are copied to the stealing node. The OpenSSI CFS implementation is remarkably similar to that used by HP
TruCluster TruCluster is a closed-source high-availability clustering solution for the Tru64 UNIX operating system. It was originally developed by Digital Equipment Corporation (DEC), but was transferred to Compaq Compaq Computer Corporation (sometimes abb ...
. CFS is also used to co-ordinate access to shared memory segments. CFS can be used in a fault tolerant system by using shared disk subsystems (dual ported
SCSI Small Computer System Interface (SCSI, ) is a set of standards for physically connecting and transferring data between computers and peripheral devices. The SCSI standards define commands, protocols, electrical, optical and logical interface ...
or SAN), or by using
DRBD DRBD is a distributed replicated storage system for the Linux platform. It is implemented as a kernel driver, several userspace management applications, and some shell scripts. DRBD is traditionally used in high availability (HA) computer clust ...
. If the node that is currently directly accessing the file system crashes then the CFS mount ''fails over'' to the other node that is directly connected to the disk and the cluster now accesses the file system via that node.


SAN clustered file systems

OpenSSI can use SAN based clustered file systems for its root provided they provide a
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
compatible file system interface. Currently
Lustre Lustre or Luster may refer to: Places * Luster, Norway, a municipality in Vestlandet, Norway ** Luster (village), a village in the municipality of Luster * Lustre, Montana, an unincorporated community in the United States Entertainment * '' ...
and GFS have been tested. With a clustered file system, each node mounts the file system in parallel and access to the files goes directly from the node to the file system.


NFS

OpenSSI mounts NFS files systems in parallel on each node. Every node accesses the NFS server directly.


Single I/O space

OpenSSI provides cluster-wide access to all I/O devices on the system, with some limitations - it is not possible for a node to mount a block device from another node. The
udev udev (userspace ) is a device manager for the Linux kernel. As the successor of devfsd and hotplug, udev primarily manages device nodes in the directory. At the same time, udev also handles all user space events raised when hardware devices ...
device manager is used to manage the /dev directory. Each node runs its own copy of udev to create the appropriate device nodes in a subdirectory of /dev, /dev/1 for node 1, /dev/2 for node 2 and so on.


Single IPC space

OpenSSI provides internode access to all the standard Linux inter-process communication mechanisms,
shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
, semaphores,
SYSV Unix System V (pronounced: "System Five") is one of the first commercial versions of the Unix operating system. It was originally developed by AT&T and first released in 1983. Four major versions of System V were released, numbered 1, 2, 3, an ...
message queues,
pipe Pipe(s), PIPE(S) or piping may refer to: Objects * Pipe (fluid conveyance), a hollow cylinder following certain dimension rules ** Piping, the use of pipes in industry * Smoking pipe ** Tobacco pipe * Half-pipe and quarter pipe, semi-circula ...
s and
Unix domain socket A Unix domain socket aka UDS or IPC socket ( inter-process communication socket) is a data communications endpoint for exchanging data between processes executing on the same host operating system. It is also referred to by its address family AF_U ...
s. In order to implement cluster wide shared memory –
distributed shared memory In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memor ...
– OpenSSI uses the CFS ''token'' system. At any one time a memory segment may be readable by one or more nodes, or writable by one node. If a node without write access to a segment tries to write then the segment is marked unreadable on all other nodes and writable on the current node. If a node without read access tries to read a segment then the current value is copied from a node where it was valid and if it was writable it is marked readable.


Cluster IP address

OpenSSI uses LVS to provide fault-tolerant load balanced IP services. Inbound network connections are received by a ''director'' node which redirects them to the least loaded server node. (A node may be both a director and server). In the event of director node failure another director node takes over and the system continues to accept inbound connections.


Distributions

The OpenSSI software is available for various
Linux distribution A Linux distribution (often abbreviated as distro) is an operating system made from a software collection that includes the Linux kernel and, often, a package management system. Linux users usually obtain their operating system by downloading one ...
s. The OpenSSI
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...
is distribution independent but various distribution specific Linux user level systems need to be modified, for example the
init In Unix-based computer operating systems, init (short for ''initialization'') is the first process started during booting of the computer system. Init is a daemon process that continues running until the system is shut down. It is the direct ...
process and the system startup scripts. Currently the supported distributions are: #
Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...
Core 3 #
Debian Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of D ...
Sarge In 2008, work was in progress to port OpenSSI to Debian Etch and Lenny.


History

The origins of OpenSSI date back to the early 1980s, when the
LOCUS Locus (plural loci) is Latin for "place". It may refer to: Entertainment * Locus (comics), a Marvel Comics mutant villainess, a member of the Mutant Liberation Front * ''Locus'' (magazine), science fiction and fantasy magazine ** ''Locus Award' ...
distributed operating system A distributed operating system is system software over a collection of independent software, networked, communicating, and physically separate computational nodes. They handle jobs which are serviced by multiple CPUs. Each individual node holds a ...
was developed at
UCLA The University of California, Los Angeles (UCLA) is a public land-grant research university in Los Angeles, California. UCLA's academic roots were established in 1881 as a teachers college then known as the southern branch of the California St ...
. The team that developed LOCUS went on to form the
Locus Computing Corporation Locus Computing Corporation was formed in 1982 by Gerald J. Popek, Charles S. Kline and Gregory I. Thiel to commercialize the technologies developed for the LOCUS distributed operating system at UCLA. Locus was notable for commercializing si ...
and produced various versions of the LOCUS technology under several names, culminating in the development of the
UnixWare NonStop Clusters NonStop Clusters (NSC) was an add-on package for SCO UnixWare that allowed creation of fault-tolerant single-system image clusters of machines running UnixWare. NSC was one of the first commercially available highly available clustering solutio ...
product at
Tandem Computers Tandem Computers, Inc. was the dominant manufacturer of fault-tolerant computer systems for Automated teller machine, ATM networks, banks, stock exchanges, telephone switching centers, and other similar commercial transaction processing applicati ...
, which had by that time acquired the LOCUS team and rights to the technology. NonStop Clusters for Unixware was commercialized by SCO as an add-on for UnixWare. When SCO stopped selling NonStop Clusters, the former Locus team, now working for
Compaq Compaq Computer Corporation (sometimes abbreviated to CQ prior to a 2007 rebranding) was an American information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced ...
(which had acquired Tandem in the interim), ported the NonStop Clusters code to
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
and released it as open source. The team at Compaq continued to develop the system, now called OpenSSI, for some time after HP acquired Compaq. OpenSSI is currently developed by an independent team.


See also

*
Kerrighed Kerrighed is an open source single-system image (SSI) cluster software project. The project started in October 1998 at the Paris research group The French National Institute for Research in Computer Science and Control. From 2006 to 2011, the pro ...
*
OpenMosix openMosix was a free cluster management system that provided single-system image (SSI) capabilities, e.g. automatic work distribution among nodes. It allowed program processes (not threads) to migrate to machines in the node's network that ...
* LinuxPMI
DIPC


References

{{Reflist


External links




Sourceforge.net Project Summary Page

Cluster Infrastructure Project

OpenSSI Webview homepage

Popcorn Linux
Cluster computing Internet Protocol based network software Parallel computing High-availability cluster computing Distributed operating systems