Simple API For Grid Applications
   HOME

TheInfoList



OR:

The Simple API for Grid Applications (SAGA) is a family of related standards specified by the
Open Grid Forum The Open Grid Forum (OGF) is a community of users, developers, and vendors for standardization of grid computing. It was formed in 2006 in a merger of the Global Grid Forum and the Enterprise Grid Alliance. The OGF models its process on the In ...
to define an
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
(API) for common
distributed computing A distributed system is a system whose components are located on different computer network, networked computers, which communicate and coordinate their actions by message passing, passing messages to one another from any system. Distributed com ...
functionality.


Overview

The SAGA specification for distributed computing originally consisted of a single document, GFD.90, which was released in 2009. The SAGA
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
does not strive to replace
Globus Globus is Latin for ''sphere'' or ''globe''. It may also refer to: Business * Globus Medical, a medical device company in Audubon, PA * Globus (clothing retailer), an Indian clothing retail store * Globus (company), a Swiss department store chai ...
or similar
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
middleware systems, and does not target middleware developers, but application developers with no background on grid computing. Such developers typically wish to devote their time to their own goals and minimize the time spent coding infrastructure functionality. The API insulates application developers from middleware. The specification of services, and the protocols to interact with them, is out of the scope of SAGA. Rather, the API seeks to hide the detail of any service infrastructures that may or may not be used to implement the functionality that the application developer needs. The API aligns, however, with all middleware standards within
Open Grid Forum The Open Grid Forum (OGF) is a community of users, developers, and vendors for standardization of grid computing. It was formed in 2006 in a merger of the Global Grid Forum and the Enterprise Grid Alliance. The OGF models its process on the In ...
(OGF). The SAGA API defined a mechanism to specify additional API ''packages'' which expand its scope. The SAGA Core API itself defines a number of packages: job management, file management, replica management, remote procedure calls, and streams. SAGA covers the most important and frequently used distributed functionality and is supported and available on every major grid systems -
Extreme Science and Engineering Discovery Environment TeraGrid was an e-Science grid computing infrastructure combining resources at eleven partner sites. The project started in 2001 and operated from 2004 through 2011. The TeraGrid integrated high-performance computers, data resources and tools, an ...
(XSEDE), EGI and FutureGrid. SAGA not only supports a wide range of distributed programming and coordination models but is also easily extensible to support new and emerging middleware.


Standardization

The SAGA API is standardised in the SAGA Working Group the
Open Grid Forum The Open Grid Forum (OGF) is a community of users, developers, and vendors for standardization of grid computing. It was formed in 2006 in a merger of the Global Grid Forum and the Enterprise Grid Alliance. The OGF models its process on the In ...
. Based on a set of use cases , the SAGA Core API specification defines a set of general API principles (the 'SAGA Look and Feel', and a set of API packages which render commonly used Grid programming patterns (job management, file management and access, replica management etc.) The SAGA Core specification also defines how additional API packages are to be defined, and how they relate to the Core API, and to its 'Look and Feel'. Based on that, a number of API extensions have been defined, and are in various states of the standardisation process. All SAGA specifications are defined in (a flavor of)
IDL IDL may refer to: Computing * Interface description language, any computer language used to describe a software component's interface ** IDL specification language, the original IDL created by Lamb, Wulf and Nestor at Queen's University, Canada ...
, and thus object oriented, but language neutral. Different language bindings exist (Java, C++, Python), but are, at this point, not standardised. Nevertheless, different implementations of these language bindings have a relatively coherent API definition (in particular, the different Java implementations share the same abstract API classes). The 'Look and Feel' part of the SAGA Core API specification covers the following areas: * security and session management * permission management * asynchronous operations * monitoring * asynchronous notifications * attribute management * I/O buffer management


Architecture

SAGA is designed as an
object oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of pro ...
interface. It encapsulates related functionality in a set of objects, that are grouped in functional
namespace In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified. Namespaces ...
s, which are called ''packages'' in SAGA. The SAGA core implementation defines the following packages: * saga::advert - interface for Advert Service access * saga::filesystem - interface for file and directory access * saga::job - interface for job definition, management and control * saga::namespace - abstract interface (used by advert, filesystem and replica interfaces) * saga::replica - interface for
replica A 1:1 replica is an exact copy of an object, made out of the same raw materials, whether a molecule, a work of art, or a commercial product. The term is also used for copies that closely resemble the original, without claiming to be identical. Al ...
management * saga::rpc - interface for
remote procedure call In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space (commonly on another computer on a shared network), which is coded as if it were a normal (l ...
s client and servers * saga::sd- interface for
service discovery Service discovery is the process of automatically detecting devices and services on a computer network. This reduces the need for manual configuration by users and administrators. A service discovery protocol (SDP) is a network protocol that hel ...
in distributed environments * saga::stream - interface for data stream client and servers The overall architecture of SAGA follows the
adaptor pattern In software engineering, the adapter pattern is a software design pattern (also known as wrapper, an alternative naming shared with the decorator pattern) that allows the interface of an existing class to be used as another interface. It is often ...
, a
software design pattern In software engineering, a software design pattern is a general, reusable solution to a commonly occurring problem within a given context in software design. It is not a finished design that can be transformed directly into source or machine code ...
which is used for translating one interface into another. In SAGA it translates the calls from the API packages to the interfaces of the underlying middleware. The SAGA run-time system uses
late binding In computing, late binding or dynamic linkage—though not an identical process to Dynamic linker, dynamically linking imported code Library (computing), libraries—is a computer programming mechanism in which the Method (computer programming), ...
to decide at run-time which plug-in (''middleware adaptor'') to load and bind.SAGA: How it works (on Vimeo


Supported middleware

The following table lists the distributed middleware systems that are currently supported by SAGA. The column labeled ''Adaptor Suite'' names the collection (release package) of the (set of) middleware adaptors that provides support for the middleware system.


Implementations

Since the SAGA interface definitions are not bound to any specific programming language, several implementations of the SAGA standards exist in different programming languages. Apart from the implementation language, they differ from each other in their completeness in terms of standard coverage, as well as in their support for distributed middleware.


SAGA C++


SAGA C++
was the first complete implementation of the SAGA Core specification, written in C++. Currently the C++ implementation is not under active development.


RADICAL-SAGA(Python)


RADICAL-SAGA
is a light-weight Python package that implements parts of th
OGF GFD.90
interface specification and provides plug-ins for different distributed middleware systems and services. RADICAL-SAGA implements the most commonly used features of GFD.90 based upon extensive use-case analysis, and focuses on usability and simple deployment in real-world heterogeneous distributed computing environments and application scenarios. RADICAL-SAGA currently implements the job and the file management core APIs as well as the resource management API extension. RADICAL-SAGA provides plug-ins for different distributed middleware systems and services, including support for the Portable Batch System, PBS,
Sun Grid Engine Oracle Grid Engine, previously known as Sun Grid Engine (SGE), CODINE (Computing in Distributed Networked Environments) or GRD (Global Resource Director), was a grid computing computer cluster software system (otherwise known as a batch-queui ...
,
SSH The Secure Shell Protocol (SSH) is a cryptographic network protocol for operating network services securely over an unsecured network. Its most notable applications are remote login and command-line execution. SSH applications are based on a ...
, SFTP and others. RADICAL-SAGA can be used to develop distributed applications and frameworks that run on distributed cyber-infrastructure includin
XSEDE
LONI an
FutureGrid
other clouds and local clusters.


JavaSAGA

JavaSAGA is a Java implementation of SAGA. This status of JavaSAGA remains uncertain. import java.util.io.* int main (int argc, char** argv)


jSAGA


jSAGA
is another Java implementation of the SAGA Core specification. jSAGA is currently under active development.


DESHL

Th
DESHL
(DEISA Services for Heterogeneous management Layer), provides functionality for submission and management of computational jobs within
DEISA The Distributed European Infrastructure for Supercomputing Applications (DEISA) was a European Union supercomputer project. A consortium of eleven national supercomputing centres from seven European countries promoted pan-European research on Eu ...
. DESHL is implemented as a set of command-line tools on-top of a SAGA-inspired API implemented in Java. On the back-end, it interfaces with HiLA, a generic grid access client library, which is part of the
UNICORE UNICORE (UNiform Interface to COmputing REsources) is a grid computing technology for resources such as supercomputers or cluster systems and information stored in databases. UNICORE was developed in two projects funded by the German ministry ...
system.


Examples


Job submission

A typical task in a distributed application is to submit a ''job'' to a local or remote
distributed resource manager A job scheduler is a computer application for controlling unattended background program execution of jobs. This is commonly called batch scheduling, as execution of non-interactive jobs is often called batch processing, though traditional ''job' ...
. SAGA provides a high-level API called the ''job package'' for this. The following two simple examples show how the SAGA job package API can be used to submit a Message Passing Interface (MPI) job to a remote Globus GRAM resource manager.


C++

#include int main (int argc, char** argv)


Python

#!/usr/bin/env python3 import sys import time import bliss.saga as saga def main(jobno: int, session, jobservice) -> None: bfast_base_dir = saga.Url("sftp://india.futuregrid.org/N/u/oweidner/software/bfast/") try: workdir = "%s/tmp/run/%s" % (bfast_base_dir.path, str(int(time.time()))) basedir = saga.filesystem.Directory(bfast_base_dir, session=session) basedir.make_dir(workdir) jd = saga.job.Description() jd.wall_time_limit = 5 # wall-time in minutes jd.total_cpu_count = 1 jd.environment = jd.working_directory = workdir jd.executable = '$BFAST_DIR/bin/bfast' jd.arguments = match', '-A 1', '-r $BFAST_DIR/data/small/reads_5K/reads.10.fastq', '-f $BFAST_DIR/data/small/reference/hg_2122.fa' myjob = js.create_job(jd) myjob.run() print("Job #%s started with ID '%s' and working directory: '%s'" % (jobno, myjob.jobid, workdir)) myjob.wait() print("Job #%s with ID '%s' finished (RC: %s). Output available in: '%s'" % (jobno, myjob.jobid, myjob.exitcode, workdir)) basedir.close() except saga.Exception, ex: print(f"An error occurred during job execution: ") sys.exit(-1) if __name__

"__main__": execution_host = saga.Url("pbs+ssh://india.futuregrid.org") ctx = saga.Context() ctx.type = saga.Context.SSH ctx.userid = 'oweidner' # like 'ssh username@host ...' ctx.userkey = '/Users/oweidner/.ssh/rsa_work' # like ssh -i ...' session = saga.Session() session.contexts.append(ctx) js = saga.job.Service(execution_host, session) for i in range(0, 4): main(i, session, js)


Grants

The work related to the SAGA Project is funded by the following grant
NSF-CHE 1125332 (CDI)NSF-EPS 1003897 (LaSIGMA)
Previous grants include: NSF-OCI 0710874 (HPCOPS), NIH grant number P20RR016456 and UK EPSRC grant number GR/D0766171/1 vi
OMII-UK


External links


SAGA-Bliss - A Python implementation of SAGA

jSAGA - A Java implementation of SAGA

SAGA C++ - A C++ implementation of SAGA

SAGA-GLib - A Vala implementation of SAGA for GLib

SAGA PROJECT
*
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...


Notes

{{reflist Grid computing