Cluster Management
   HOME

TheInfoList



OR:

Within cluster and parallel computing, a cluster manager is usually backend
graphical user interface The GUI ( "UI" by itself is still usually pronounced . or ), graphical user interface, is a form of user interface that allows users to interact with electronic devices through graphical icons and audio indicator such as primary notation, inst ...
(GUI) or command-line interface (CLI) software that runs on a set of cluster nodes that it manages (in some cases it runs on a different server or cluster of management servers). The cluster manager works together with a cluster management agent. These agents run on each node of the cluster to manage and configure services, a set of services, or to manage and configure the complete cluster server itself (see
super computing A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions p ...
.) In some cases the cluster manager is mostly used to dispatch work for the cluster (or
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
) to perform. In this last case a subset of the cluster manager can be a remote desktop application that is used not for configuration but just to send work and get back work results from a cluster. In other cases the cluster is more related to
availability In reliability engineering, the term availability has the following meanings: * The degree to which a system, subsystem or equipment is in a specified operable and committable state at the start of a mission, when the mission is called for at ...
and load balancing than to computational or specific service clusters.


See also

*
List of cluster management software List of software for cluster management. Free and open source * HA ** Apache Mesos, from the Apache Software Foundation ** Kubernetes, founded by Google Inc, from the Cloud Native Computing Foundation ** Heartbeat, from Linux-HA ** Docker Swarm ...
*
Grid network Elex Media Komputindo is a publishing company in Indonesia which publishes books, comics, magazines, novels and other print media. Established on January 15, 1985, Elex Media Komputindo is a subsidiary of Kompas Gramedia Group. Elex is headquarte ...


Further reading


Cluster management


Adaptive Control of Extreme-scale Stream Processing Systems
Proceedings of the 26th IEEE International Conference on Distributed Computing Systems.
Design, implementation, and evaluation of the linear road benchmark on the stream processing core
Proceedings of the 2006 ACM SIGMOD international conference on Management of data.
Parallel Job Scheduling A Status Report (2004)
10th Workshop on Job Scheduling Strategies for Parallel Processing, New-York, NY, June 2004.
Condor-G: A Computation Management Agent for Multi-Institutional Grids
Springer Journal Cluster Computing Volume 5, Number 3 / July, 2002
From clusters to the fabric: the job management perspective
Cluster Computing, 2003. Proceedings. 2003 IEEE International Conference on
An Overview of the Galaxy Management Framework for Scalable Enterprise Cluster Computing
IEEE International Conference on Cluster Computing (Cluster'00), 2000.
Performance and Interoperability Issues in Incorporating Cluster Management Systems within a Wide-Area Network-Computing Environment
ACM/IEEE Supercomputing 2000: High Performance Networking and Computing.
DIRAC: a scalable lightweight architecture for high throughput computing
Grid Computing, 2004. Proceedings. Fifth IEEE/ACM International Workshop on
AgentTeamwork: Coordinating grid-computing jobs with mobile agents
Springer Journal Applied Intelligence Volume 25, Number 2 / October, 2006
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
UC Berkeley Tech Report, May, 2010


Autonomic computing


The Laundromat Model for Autonomic Cluster Computing
Autonomic Computing, 2006. ICAC '06. IEEE International Conference on.
Distributed Stream Management using Utility-Driven Self-Adaptive Middleware
Proceedings of the Second International Conference on Automatic Computing (2005).


Fault tolerance


Fault-tolerance in the Borealis distributed stream processing system
Proceedings of the 2005 ACM SIGMOD international conference on Management of data.
A Global-State-Triggered Fault Injector for Distributed System Evaluation
{dead link, date=August 2017 , bot=InternetArchiveBot , fix-attempted=yes IEEE Transactions On Parallel And Distributed Systems / July, 2004
Job-Site Level Fault Tolerance for Cluster and Grid environments
IEEE International Conference on Cluster Computing (Cluster 2005)
Fault Injection in Distributed Java Applications
Parallel and Distributed Processing Symposium, 2006. IPDPS 2006. 20th International
Load balancing and fault tolerance in workstation clusters migrating groups of communicating processes
ACM SIGOPS Operating Systems Review, October 1995.


Background


A Short Survey of Commercial Cluster Batch Schedulers
Parallel computing Cluster computing