SHMEM

	SHMEM SHMEM (from Cray Research’s “shared memory” library) is a family of parallel programming libraries, providing one-sided, RDMA, parallel-processing interfaces for low-latency distributed-memory supercomputers. The SHMEM acronym was subsequently reverse engineered to mean "Symmetric Hierarchical MEMory”. Later it was expanded to distributed memory parallel computer clusters, and is used as parallel programming interface or as low-level interface to build partitioned global address space (PGAS) systems and languages. “Libsma”, the first SHMEM library, was created by Richard Smith at Cray Research in 1993 as a set of thin interfaces to access the CRAY T3D’s inter-processor-communication hardware. SHMEM has been implemented by Cray Research, SGI, Cray Inc., Quadrics, HP, GSHMEM, IBM, QLogic, Mellanox, Universities of Houston and Florida; there is also open-source OpenSHMEM. SHMEM laid the foundations for low-latency (sub-microsecond) one-sided communication. After its use ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Partitioned Global Address Space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion is local to each process, thread, or processing element. The novelty of PGAS is that the portions of the shared memory space may have an affinity for a particular process, thereby exploiting locality of reference in order to improve performance. A PGAS memory model is featured in various parallel programming languages and libraries, including: Coarray Fortran, Unified Parallel CSplit-C Fortress, Chapel, X10UPC++Coarray C++ Global Arrays [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Distributed Memory In computer science, distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data are required, the computational task must communicate with one or more remote processors. In contrast, a shared memory multiprocessor offers a single memory space used by all processors. Processors do not have to be aware where data resides, except that there may be performance penalties, and that race conditions are to be avoided. In a distributed memory system there is typically a processor, a memory, and some form of interconnection that allows programs on each processor to interact with each other. The interconnect can be organised with point to point links or separate hardware can provide a switching network. The network topology is a key factor in determining how the multiprocessor machine scales. The links between nodes can be implemented using some standard network pro ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	SPMD In computing, single program, multiple data (SPMD) is a technique employed to achieve parallelism; it is a subcategory of MIMD. Tasks are split up and run simultaneously on multiple processors with different input in order to obtain results faster. SPMD is the most common style of parallel programming. It is also a prerequisite for research concepts such as active messages and distributed shared memory. SPMD vs SIMD In SPMD, multiple autonomous processors simultaneously execute the same program at independent points, rather than in the lockstep that SIMD or SIMT imposes on different data. With SPMD, tasks can be executed on general purpose CPUs; SIMD requires vector processors to manipulate data streams. Note that the two are not mutually exclusive. Distributed memory SPMD usually refers to message passing programming on distributed memory computer architectures. A distributed memory computer consists of a collection of independent computers, called nodes. Each node starts ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Compare-and-swap In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of that memory location to a new given value. This is done as a single atomic operation. The atomicity guarantees that the new value is calculated based on up-to-date information; if the value had been updated by another thread in the meantime, the write would fail. The result of the operation must indicate whether it performed the substitution; this can be done either with a simple boolean response (this variant is often called compare-and-set), or by returning the value read from the memory location (''not'' the value written to it). Overview A compare-and-swap operation is an atomic version of the following pseudocode, where denotes access through a pointer: function cas(p: pointer to int, old: int, new: int) is if p ≠ old ... [...More Info...] [...Related Items...] OR:* [Wikipedia] [Google] [Baidu]
	Portals Network Programming Api Portals is a low-level network API for high-performance networking on high-performance computing systems developed by Sandia National Laboratories and the University of New Mexico. Portals is currently the lowest-level network programming interface on the commercially successful XT line of supercomputers from Cray. Overview Portals is based on the concept of elementary building blocks that can be combined to support a wide variety of upper-level network transport semantics. Portals provides one-sided data movement operations, but unlike other one-sided programming interfaces, the target of a remote operation is not a virtual address. Instead, the ultimate destination in memory of an incoming message is determined at the receiver by comparing contents of the message header with the contents of structures at the destination. This flexibility allows for efficient implementations of both one-sided and two-sided communications. In particular, Portals is aimed at providing the fundam ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Open MPI Open MPI is a Message Passing Interface (MPI) library project combining technologies and resources from several other projects (FT-MPI, LA-MPI, LAM/MPI, and PACX-MPI). It is used by many TOP500 supercomputers including Roadrunner, which was the world's fastest supercomputer from June 2008 to November 2009, and K computer, the fastest supercomputer from June 2011 to June 2012. Overview Open MPI represents the merger between three well-known MPI implementations: * FT-MPI from the University of Tennessee * LA-MPI from Los Alamos National Laboratory * LAM/MPI from Indiana University with contributions from the PACX-MPI team at the University of Stuttgart. These four institutions comprise the founding members of the Open MPI development team. The Open MPI developers selected these MPI implementations as excelling in one or more areas. Open MPI aims to use the best ideas and technologies from the individual projects and create one world-class open-source MPI implementation that ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Vendor Lock-in In economics, vendor lock-in, also known as proprietary lock-in or customer lock-in, makes a customer dependent on a vendor for products, unable to use another vendor without substantial switching costs. The use of open standards and alternative options makes systems tolerant of change, so that decisions can be postponed until more information is available or unforeseen events are addressed. Vendor lock-in does the opposite: it makes it difficult to move from one solution to another. Lock-in costs that create barriers to market entry may result in antitrust action against a monopoly. Lock-in types ; Monopolistic : Whether a single vendor controls the market for the method or technology being locked in to. Distinguishes between being locked to the mere technology, or specifically the vendor of it. This class of lock-in is potentially technologically hard to overcome if the monopoly is held up by barriers to market that are nontrivial to circumvent, such as patents, secrecy, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Active Messages An Active message (in computing) is a messaging object capable of performing processing on its own. It is a lightweight messaging protocol used to optimize network communications with an emphasis on reducing latency by removing software overheads associated with buffering and providing applications with direct user-level access to the network hardware. This contrasts with traditional computer-based messaging systems in which messages are passive entities with no processing power. Distributed Memory Programming Active messages are communications primitive for exploiting the full performance and flexibility of modern computer interconnects. They are often classified as one of the three main types of distributed memory programming, the other two being data parallel and message passing. The view is that Active Messages are actually a lower-level mechanism that can be used to implement data parallel or message passing efficiently. The basic idea is that each message has a header con ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Unified Parallel C Unified Parallel C (UPC) is an extension of the C programming language designed for high-performance computing on large-scale parallel machines, including those with a common global address space ( SMP and NUMA) and those with distributed memory (e. g. clusters). The programmer is presented with a single partitioned global address space; where shared variables may be directly read and written by any processor, but each variable is physically associated with a single processor. UPC uses a ''single program, multiple data'' (SPMD) model of computation in which the amount of parallelism is fixed at program startup time, typically with a single thread of execution per processor. In order to express parallelism, UPC extends ISO C 99 with the following constructs: * An explicitly parallel execution model * A shared address space ( storage qualifier) with thread-local parts (normal variables) * Synchronization primitives and a memory consistency model * Explicit communication ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Parallel Computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling.S.V. Adve ''et al.'' (November 2008)"Parallel Computing Research at Illinois: The UPCRC Agenda" (PDF). Parallel@Illinois, University of Illinois at Urbana-Champaign. "The main techniques for these performance benefits—increased clock frequency and smarter but increasingly complex architectures—are now hitting the so-called power wall. The computer industry has accepted that future performance increases must largely come from increasing the number of processors (or cores) on a die, rather than m ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]