Quasi-opportunistic supercomputing
   HOME

TheInfoList



OR:

Quasi-opportunistic supercomputing is a computational paradigm for supercomputing on a large number of geographically disperse computers. Quasi-opportunistic supercomputing aims to provide a higher quality of service than opportunistic resource sharing. The quasi-opportunistic approach coordinates computers which are often under different ownerships to achieve reliable and
fault-tolerant Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
high performance with more control than opportunistic computer grids in which computational resources are used whenever they may become available.''Quasi-opportunistic supercomputing in grids'' by Valentin Kravtsov, David Carmeli, Werner Dubitzky, Ariel Orda,
Assaf Schuster Assaf Schuster is an Israeli entrepreneur and professor of computer science whose works have been published in such journals as ''Computer Aided Verification'' and ''Journal of Systems and Software''. Biography Schuster was born in 1958 in the Is ...
, Benny Yoshpa, in IEEE International Symposium on High Performance Distributed Computing, 2007, pages 233-24

/ref> While the "opportunistic match-making" approach to
task scheduling In computing, scheduling is the action of assigning ''resources'' to perform ''tasks''. The ''resources'' may be processors, network links or expansion cards. The ''tasks'' may be threads, processes or data flows. The scheduling activity is ca ...
on computer grids is simpler in that it merely matches tasks to whatever resources may be available at a given time, demanding supercomputer applications such as weather simulations or
computational fluid dynamics Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. Computers are used to perform the calculations required to simulate ...
have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time.''Computational Science - Iccs 2009: 9th International Conference'' edited by
Gabrielle Allen Gabrielle D. Allen is a British and American computational astrophysicist known for her work in astrophysical simulations and multi-messenger astronomy, and as one of the original developers of the Cactus Framework for parallel scientific comput ...
, Jarek Nabrzyski 2009 pages 387-38

/ref> The quasi-opportunistic approach enables the execution of demanding applications within computer grids by establishing grid-wise resource allocation agreements; and Fault-tolerant system, fault tolerant message passing to abstractly shield against the failures of the underlying resources, thus maintaining some opportunism, while allowing a higher level of control.


Opportunistic supercomputing on grids

The general principle of grid computing is to use distributed computing resources from diverse administrative domains to solve a single task, by using resources as they become available. Traditionally, most grid systems have approached the
task scheduling In computing, scheduling is the action of assigning ''resources'' to perform ''tasks''. The ''resources'' may be processors, network links or expansion cards. The ''tasks'' may be threads, processes or data flows. The scheduling activity is ca ...
challenge by using an "opportunistic match-making" approach in which tasks are matched to whatever resources may be available at a given time.''Grid computing: experiment management, tool integration, and scientific workflows'' by Radu Prodan, Thomas Fahringer 2007 pages 1-4
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
, developed at the
University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
is an example of a volunteer-based, opportunistic grid computing system.''Parallel and Distributed Computational Intelligence'' by Francisco Fernández de Vega 2010 pages 65-68 The applications based on the
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
grid have reached multi-petaflop levels by using close to half a million computers connected on the internet, whenever volunteer resources become available. Another system,
Folding@home Folding@home (FAH or F@h) is a volunteer computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements ...
, which is not based on BOINC, computes
protein folding Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduc ...
, has reached 8.8 petaflops by using clients that include
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
and
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on November 11, 2006, in Japan, November ...
systems. However, these results are not applicable to the
TOP500 The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinci ...
ratings because they do not run the general purpose Linpack benchmark. A key strategy for grid computing is the use of
middleware Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue". Middleware makes it easier for software developers to implement c ...
that partitions pieces of a program among the different computers on the network.''Languages and Compilers for Parallel Computing'' by Guang R. Gao 2010 pages 10-11 Although general grid computing has had success in parallel task execution, demanding supercomputer applications such as weather simulations or
computational fluid dynamics Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. Computers are used to perform the calculations required to simulate ...
have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time. The opportunisti
Internet PrimeNet Server
supports
GIMPS The Great Internet Mersenne Prime Search (GIMPS) is a collaborative project of volunteers who use freely available software to search for Mersenne prime numbers. GIMPS was founded in 1996 by George Woltman, who also wrote the Prime95 client an ...
, one of the earliest grid computing projects since 1997, researching
Mersenne prime In mathematics, a Mersenne prime is a prime number that is one less than a power of two. That is, it is a prime number of the form for some integer . They are named after Marin Mersenne, a French Minim friar, who studied them in the early 17th ...
numbers. , GIMPS's distributed research currently achieves about 60 teraflops as an volunteer-based computing project. The use of computing resources on " volunteer grids" such as GIMPS is usually purely opportunistic: geographically disperse distributively owned computers are contributing whenever they become available, with no preset commitments that any resources will be available at any given time. Hence, hypothetically, if many of the volunteers unwittingly decide to switch their computers off on a certain day, grid resources will become significantly reduced.''Euro-par 2010, Parallel Processing Workshop'' edited by Mario R. Guarracino 2011 pages 274-277 Furthermore, users will find it exceedingly costly to organize a very large number of opportunistic computing resources in a manner that can achieve reasonable
high performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a multi ...
.''Grid Computing: Towards a Global Interconnected Infrastructure'' edited by Nikolaos P. Preve 2011 page 71


Quasi-control of computational resources

An example of a more structured grid for high performance computing is
DEISA The Distributed European Infrastructure for Supercomputing Applications (DEISA) was a European Union supercomputer project. A consortium of eleven national supercomputing centres from seven European countries promoted pan-European research on E ...
, a supercomputer project organized by the
European Community The European Economic Community (EEC) was a regional organization created by the Treaty of Rome of 1957,Today the largely rewritten treaty continues in force as the ''Treaty on the functioning of the European Union'', as renamed by the Lisb ...
which uses computers in seven European countries. Although different parts of a program executing within DEISA may be running on computers located in different countries under different ownerships and administrations, there is more control and coordination than with a purely opportunistic approach. DEISA has a two level integration scheme: the "inner level" consists of a number of strongly connected high performance
computer clusters A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The comp ...
that share similar operating systems and scheduling mechanisms and provide a ''homogeneous computing'' environment; while the "outer level" consists of ''heterogeneous systems'' that have supercomputing capabilities.''Euro-Par 2006 workshops: parallel processing: CoreGRID 2006'' edited by Wolfgang Lehner 2007 pages Thus DEISA can provide somewhat controlled, yet dispersed high performance computing services to users.''Grid computing: International Symposium on Grid Computing'' (ISGC 2007) edited by Stella Shen 2008 page 170 The quasi-opportunistic paradigm aims to overcome this by achieving more control over the assignment of tasks to distributed resources and the use of pre-negotiated scenarios for the availability of systems within the network. Quasi-opportunistic distributed execution of demanding parallel computing software in grids focuses on the implementation of grid-wise allocation agreements, co-allocation subsystems, communication topology-aware allocation mechanisms, fault tolerant message passing libraries and data pre-conditioning. In this approach,
fault tolerant Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
message passing is essential to abstractly shield against the failures of the underlying resources. The quasi-opportunistic approach goes beyond
volunteer computing Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop co ...
on a highly distributed systems such as
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
, or general grid computing on a system such as Globus by allowing the
middleware Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue". Middleware makes it easier for software developers to implement c ...
to provide almost seamless access to many computing clusters so that existing programs in languages such as Fortran or C can be distributed among multiple computing resources. A key component of the quasi-opportunistic approach, as in the Qoscos Grid, is an economic-based resource allocation model in which resources are provided based on agreements among specific supercomputer administration sites. Unlike volunteer systems that rely on altruism, specific contractual terms are stipulated for the performance of specific types of tasks. However, "tit-for-tat" paradigms in which computations are paid back via future computations is not suitable for supercomputing applications, and is avoided.''Algorithms and architectures for parallel processing'' by Anu G. Bourgeois 2008 pages 234-242 The other key component of the quasi-opportunistic approach is a reliable
message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its support ...
system to provide distributed checkpoint restart mechanisms when computer hardware or networks inevitably experience failures. In this way, if some part of a large computation fails, the entire run need not be abandoned, but can restart from the last saved checkpoint.


See also

* Grid computing *
History of supercomputing The term supercomputing arose in the late 1920s in the United States in response to the IBM tabulators at Columbia University. The CDC 6600, released in 1964, is sometimes considered the first supercomputer. However, some earlier computers were c ...
* Qoscos Grid * Supercomputer architecture * Supercomputer operating systems


References

{{Reflist, 2 Supercomputing Grid computing