HTCondor is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
high-throughput computing
In computer science, high-throughput computing (HTC) is the use of many computing resources over long periods of time to accomplish a computational task.
Challenges
The HTC community is also concerned with robustness and reliability of jobs over ...
software framework for coarse-grained distributed parallelization of computationally intensive tasks.
It can be used to manage workload on a dedicated
cluster of computers, or to farm out work to idle desktop computersso-called
cycle scavenging. HTCondor runs on
Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
,
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
,
Mac OS X
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
,
FreeBSD
FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
, and
Microsoft Windows operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s. HTCondor can integrate both dedicated resources (rack-mounted clusters) and non-dedicated desktop machines (cycle scavenging) into one computing environment.
HTCondor is developed by the HTCondor team at the
University of Wisconsin–Madison
A university () is an institution of higher (or tertiary) education and research which awards academic degrees in several academic disciplines. ''University'' is derived from the Latin phrase ''universitas magistrorum et scholarium'', which ...
and is freely available for use. HTCondor follows an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
philosophy and is licensed under the
Apache License 2.0.
While HTCondor makes use of unused computing time, leaving computers turned on for use with HTCondor will increase energy consumption and associated costs. Starting from version 7.1.1, HTCondor can hibernate and wake machines based on user-specified policies, a feature previously available only via third-party software.
History
The development of HTCondor started in 1988.
HTCondor was formerly known as Condor; the name was changed in October 2012 to resolve a trademark lawsuit.
HTCondor was the scheduler software used to distribute jobs for the first draft assembly of the Human Genome.
Example of use
The
NASA Advanced Supercomputing facility
The NASA Advanced Supercomputing (NAS) Division is located at NASA Ames Research Center, Moffett Field in the heart of Silicon Valley in Mountain View, California. It has been the major supercomputing and modeling and simulation resource for N ...
(NAS) HTCondor pool consists of approximately 350
SGI and
Sun
The Sun is the star at the center of the Solar System. It is a nearly perfect ball of hot plasma, heated to incandescence by nuclear fusion reactions in its core. The Sun radiates this energy mainly as light, ultraviolet, and infrared rad ...
workstations purchased and used for software development, visualization, email, document preparation, and other tasks. Each workstation runs a
daemon
Daimon or Daemon (Ancient Greek: , "god", "godlike", "power", "fate") originally referred to a lesser deity or guiding spirit such as the daimons of ancient Greek religion and mythology and of later Hellenistic religion and philosophy.
The wo ...
that watches user
I/O and CPU load. When a workstation has been idle for two hours, a job from the batch queue is assigned to the workstation and will run until the daemon detects a keystroke, mouse motion, or high non-HTCondor CPU usage. At that point, the job will be removed from the workstation and placed back on the batch queue.
Features
HTCondor can run both sequential and parallel jobs. Sequential jobs can be run in several different "universes", including "vanilla" which provides the ability to run most "batch ready" programs, and "standard universe" in which the target application is re-linked with the HTCondor I/O library which provides for remote job I/O and job checkpointing. HTCondor also provides a "local universe" which allows jobs to run on the "submit host".
In the world of parallel jobs, HTCondor supports the standard
Message Passing Interface and
Parallel Virtual Machine
Parallel Virtual Machine (PVM) is a software tool for parallel networking of computers. It is designed to allow a network of heterogeneous Unix and/or Windows machines to be used as a single distributed parallel processor. Thus large computati ...
(Goux, et al. 2000) in addition to its own Master Worker "MW" library for extremely parallel tasks.
''HTCondor-G'' allows HTCondor jobs to use resources not under its direct control.
It is mostly used to talk to
grid
Grid, The Grid, or GRID may refer to:
Common usage
* Cattle grid or stock grid, a type of obstacle is used to prevent livestock from crossing the road
* Grid reference, used to define a location on a map
Arts, entertainment, and media
* News g ...
and
cloud
In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may ...
resources, like pre-WS and WS
Globus
Globus is Latin for ''sphere'' or ''globe''. It may also refer to:
Business
* Globus Medical, a medical device company in Audubon, PA
* Globus (clothing retailer), an Indian clothing retail store
* Globus (company), a Swiss department store c ...
,
Nordugrid ARC,
UNICORE
UNICORE (UNiform Interface to COmputing REsources) is a grid computing technology for resources such as supercomputers or cluster systems and information stored in databases. UNICORE was developed in two projects funded by the German ministry ...
and
Amazon Elastic Compute Cloud
Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment o ...
.
But it can also be used to talk to other batch systems, like
Torque/PBS and
LSF. Support for
Sun Grid Engine
Oracle Grid Engine, previously known as Sun Grid Engine (SGE), CODINE (Computing in Distributed Networked Environments) or GRD (Global Resource Director), was a grid computing computer cluster software system (otherwise known as a batch-queuing ...
is currently under development as part of the
EGEE project.
HTCondor supports the
DRMAA
Distributed Resource Management Application API (DRMAA) is a high-level Open Grid Forum (OGF) API specification for the submission and control of jobs to a distributed resource management (DRM) system, such as a cluster or grid computing infrastr ...
job API. This allows DRMAA compliant clients to submit and monitor HTCondor jobs. The
SAGA C++ Reference Implementation
is a series of science fantasy role-playing video games by Square Enix. The series originated on the Game Boy in 1989 as the creation of Akitoshi Kawazu at Square. It has since continued across multiple platforms, from the Super NES to the Pla ...
provides an HTCondor plug-in (adaptor), which makes HTCondor job submission and monitoring available via SAGA's Python and C++ APIs.
Other HTCondor features include "
DAGMan" which provides a mechanism to describe job dependencies.
See also
*
List of volunteer computing projects
This is a comprehensive list of volunteer computing projects; a type of distributed computing where volunteers donate computing time to specific causes. The donated computing power comes from idle CPUs and GPUs in personal computers, video game c ...
*
Sun Grid Engine
Oracle Grid Engine, previously known as Sun Grid Engine (SGE), CODINE (Computing in Distributed Networked Environments) or GRD (Global Resource Director), was a grid computing computer cluster software system (otherwise known as a batch-queuing ...
*
IBM Spectrum LSF
*
High-throughput computing
In computer science, high-throughput computing (HTC) is the use of many computing resources over long periods of time to accomplish a computational task.
Challenges
The HTC community is also concerned with robustness and reliability of jobs over ...
References
External links
* {{Official website, research.cs.wisc.edu/htcondor/
Free software programmed in C++
Parallel computing
Grid computing
Job scheduling