HOME

TheInfoList



OR:

Approaches to supercomputer architecture have taken dramatic turns since the earliest systems were introduced in the 1960s. Early
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
architectures pioneered by
Seymour Cray Seymour Roger Cray (September 28, 1925 – October 5, 1996
) was an American
parallelism to achieve superior computational peak performance. However, in time the demand for increased computational power ushered in the age of
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
systems. While the supercomputers of the 1970s used only a few
processors A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, ...
, in the 1990s, machines with thousands of processors began to appear and by the end of the 20th century, massively parallel supercomputers with tens of thousands of "off-the-shelf" processors were the norm. Supercomputers of the 21st century can use over 100,000 processors (some being graphic units) connected by fast connections. Throughout the decades, the management of
heat density Heat flux or thermal flux, sometimes also referred to as ''heat flux density'', heat-flow density or ''heat flow rate intensity'' is a flow of energy per unit area per unit time. In SI its units are watts per square metre (W/m2). It has both a ...
has remained a key issue for most centralized supercomputers. The large amount of heat generated by a system may also have other effects, such as reducing the lifetime of other system components. There have been diverse approaches to heat management, from pumping
Fluorinert Fluorinert is the trademarked brand name for the line of electronics coolant liquids sold commercially by 3M. As perfluorinated compounds (PFCs), all Fluorinert variants have an extremely high Global Warming Potential (GWP), so should be used wit ...
through the system, to a hybrid liquid-air cooling system or air cooling with normal
air conditioning Air conditioning, often abbreviated as A/C or AC, is the process of removing heat from an enclosed space to achieve a more comfortable interior environment (sometimes referred to as 'comfort cooling') and in some cases also strictly controlling ...
temperatures. Systems with a massive number of processors generally take one of two paths: in one approach, e.g., in
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
the processing power of a large number of computers in distributed, diverse administrative domains, is opportunistically used whenever a computer is available. In another approach, a large number of processors are used in close proximity to each other, e.g., in a
computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The comp ...
. In such a centralized
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
system the speed and flexibility of the interconnect becomes very important, and modern supercomputers have used various approaches ranging from enhanced
Infiniband InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
systems to three-dimensional
torus interconnect A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system. Introduction In geometry, a torus is created by revolving a circle about an axis coplanar to the circle. While this is a ...
s.


Context and overview

Since the late 1960s the growth in the power and proliferation of supercomputers has been dramatic, and the underlying architectural directions of these systems have taken significant turns. While the early supercomputers relied on a small number of closely connected processors that accessed
shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
, the supercomputers of the 21st century use over 100,000 processors connected by fast networks. Throughout the decades, the management of
heat density Heat flux or thermal flux, sometimes also referred to as ''heat flux density'', heat-flow density or ''heat flow rate intensity'' is a flow of energy per unit area per unit time. In SI its units are watts per square metre (W/m2). It has both a ...
has remained a key issue for most centralized supercomputers.
Seymour Cray Seymour Roger Cray (September 28, 1925 – October 5, 1996
) was an American
Blue Waters Blue Waters was a petascale supercomputer operated by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. On August 8, 2007, the National Science Board approved a resolution which auth ...
. The large amount of heat generated by a system may also have other effects, such as reducing the lifetime of other system components. There have been diverse approaches to heat management, ''e.g.'', the Cray 2 pumped
Fluorinert Fluorinert is the trademarked brand name for the line of electronics coolant liquids sold commercially by 3M. As perfluorinated compounds (PFCs), all Fluorinert variants have an extremely high Global Warming Potential (GWP), so should be used wit ...
through the system, while System X used a hybrid liquid-air cooling system and the
Blue Gene/P Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
is air-cooled with normal
air conditioning Air conditioning, often abbreviated as A/C or AC, is the process of removing heat from an enclosed space to achieve a more comfortable interior environment (sometimes referred to as 'comfort cooling') and in some cases also strictly controlling ...
temperatures. The heat from the
Aquasar Aquasar is a supercomputer (a high-performance computer) prototype created by IBM Labs in collaboration with ETH Zurich in Zürich, Switzerland and ETH Lausanne in Lausanne, Switzerland. While most supercomputers use air as their coolant of choi ...
supercomputer is used to warm a university campus. The heat density generated by a supercomputer has a direct dependence on the processor type used in the system, with more powerful processors typically generating more heat, given similar underlying semiconductor technologies. While early supercomputers used a few fast, closely packed processors that took advantage of local parallelism (e.g., pipelining and
vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ' ...
), in time the number of processors grew, and computing nodes could be placed further away,e.g., in a
computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The comp ...
, or could be geographically dispersed in
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
. As the number of processors in a supercomputer grows, " component failure rate" begins to become a serious issue. If a supercomputer uses thousands of nodes, each of which may fail once per year on the average, then the system will experience several
node failure In general, a node is a localized swelling (a "knot") or a point of intersection (a vertex). Node may refer to: In mathematics *Vertex (graph theory), a vertex in a mathematical graph *Vertex (geometry), a point where two or more curves, lines, ...
s each day. As the price/performance of general purpose graphic processors (GPGPUs) has improved, a number of
petaflop In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
supercomputers such as
Tianhe-I Tianhe-I, Tianhe-1, or TH-1 (, ; '' Sky River Number One'') is a supercomputer capable of an Rmax (maximum range) of 2.5 peta FLOPS. Located at the National Supercomputing Center of Tianjin, China, it was the fastest computer in the world fro ...
and
Nebulae A nebula ('cloud' or 'fog' in Latin; pl. nebulae, nebulæ or nebulas) is a distinct luminescent part of interstellar medium, which can consist of ionized, neutral or molecular hydrogen and also cosmic dust. Nebulae are often star-forming region ...
have started to rely on them. However, other systems such as the
K computer The K computer named for the Japanese word/numeral , meaning 10 quadrillion (1016)See Japanese numbers was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Pref ...
continue to use conventional processors such as
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...
-based designs and the overall applicability of GPGPUs in general purpose high performance computing applications has been the subject of debate, in that while a GPGPU may be tuned to score well on specific benchmarks its overall applicability to everyday algorithms may be limited unless significant effort is spent to tune the application towards it. However, GPUs are gaining ground and in 2012 the
Jaguar supercomputer Jaguar or OLCF-2 was a petascale supercomputer built by Cray at Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tennessee. The massively parallel Jaguar had a peak performance of just over 1,750 teraFLOPS (1.75 petaFLOPS). It had 224,256 x8 ...
was transformed into Titan by replacing CPUs with GPUs. As the number of independent processors in a supercomputer increases, the way they access data in the
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one larg ...
and how they share and access
secondary storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
resources becomes prominent. Over the years a number of systems for distributed file management were developed, ''e.g.'', the
IBM General Parallel File System GPFS (General Parallel File System, brand name IBM Spectrum Scale) is high-performance clustered file system software developed by IBM. It can be deployed in shared-disk or shared-nothing distributed parallel modes, or a combination of these. It ...
,
BeeGFS BeeGFS (formerly FhGFS) is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is data ...
, the Parallel Virtual File System,
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
, etc. A number of supercomputers on the TOP100 list such as the Tianhe-I use
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
's Lustre file system.


Early systems with a few processors

The
CDC 6600 The CDC 6600 was the flagship of the 6000 series of mainframe computer systems manufactured by Control Data Corporation. Generally considered to be the first successful supercomputer, it outperformed the industry's prior recordholder, the IBM ...
series of computers were very early attempts at supercomputing and gained their advantage over the existing systems by relegating work to
peripheral device A peripheral or peripheral device is an auxiliary device used to put information into and get information out of a computer. The term ''peripheral device'' refers to all hardware components that are attached to a computer and are controlled by the ...
s, freeing the CPU (
Central Processing Unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
) to process actual data. With the Minnesota FORTRAN compiler the 6600 could sustain 500 kiloflops on standard mathematical operations. Other early supercomputers such as the
Cray 1 The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, over 100 Cray-1s were sold, making it one of the ...
and Cray 2 that appeared afterwards used a small number of fast processors that worked in harmony and were uniformly connected to the largest amount of
shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
that could be managed at the time. These early architectures introduced parallel processing at the processor level, with innovations such as
vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ' ...
, in which the processor can perform several operations during one
clock cycle In electronics and especially synchronous digital circuits, a clock signal (historically also known as ''logic beat'') oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits. A clock sig ...
, rather than having to wait for successive cycles. In time, as the number of processors increased, different architectural issues emerged. Two issues that need to be addressed as the number of processors increases are the distribution of memory and processing. In the distributed memory approach, each processor is physically packaged close with some local memory. The memory associated with other processors is then "further away" based on
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
and latency parameters in
non-uniform memory access Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non- ...
. In the 1960s pipelining was viewed as an innovation, and by the 1970s the use of
vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
s had been well established. By the 1980s, many supercomputers used parallel vector processors. The relatively small number of processors in early systems, allowed them to easily use a
shared memory architecture In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
, which allows processors to access a common pool of memory. In the early days a common approach was the use of
uniform memory access Uniform memory access (UMA) is a shared memory architecture used in parallel computers. All the processors in the UMA model share the physical memory uniformly. In an UMA architecture, access time to a memory location is independent of which proces ...
(UMA), in which access time to a memory location was similar between processors. The use of
non-uniform memory access Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non- ...
(NUMA) allowed a processor to access its own local memory faster than other memory locations, while
cache-only memory architecture Cache only memory architecture (COMA) is a computer memory organization for use in multiprocessors in which the local memories (typically DRAM) at each node are used as cache. This is in contrast to using the local memories as actual main memory, ...
s (COMA) allowed for the local memory of each processor to be used as cache, thus requiring coordination as memory values changed. As the number of processors increases, efficient interprocessor communication and synchronization on a supercomputer becomes a challenge. A number of approaches may be used to achieve this goal. For instance, in the early 1980s, in the
Cray X-MP The Cray X-MP was a supercomputer designed, built and sold by Cray Research. It was announced in 1982 as the "cleaned up" successor to the 1975 Cray-1, and was the world's fastest computer from 1983 to 1985 with a quad-processor system performance ...
system,
shared register In distributed computing, shared-memory systems and message-passing systems are two means of interprocess communication which have been heavily studied. In shared memory (interprocess communication), shared-memory systems, processes communicate by a ...
s were used. In this approach, all processors had access to
shared register In distributed computing, shared-memory systems and message-passing systems are two means of interprocess communication which have been heavily studied. In shared memory (interprocess communication), shared-memory systems, processes communicate by a ...
s that did not move data back and forth but were only used for interprocessor communication and synchronization. However, inherent challenges in managing a large amount of shared memory among many processors resulted in a move to more
distributed architecture A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
s.


Massive centralized parallelism

During the 1980s, as the demand for computing power increased, the trend to a much larger number of processors began, ushering in the age of
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
systems, with
distributed memory In computer science, distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data are required, the computational task mu ...
and
distributed file system A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for ...
s, given that
shared memory architecture In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
s could not scale to a large number of processors. Hybrid approaches such as
distributed shared memory In computer science, distributed shared memory (DSM) is a form of memory architecture where physically separated memories can be addressed as a single shared address space. The term "shared" does not mean that there is a single centralized memor ...
also appeared after the early systems. The computer clustering approach connects a number of readily available computing nodes (e.g. personal computers used as servers) via a fast, private
local area network A local area network (LAN) is a computer network that interconnects computers within a limited area such as a residence, school, laboratory, university campus or office building. By contrast, a wide area network (WAN) not only covers a larger ...
. The activities of the computing nodes are orchestrated by "clustering middleware", a software layer that sits atop the nodes and allows the users to treat the cluster as by and large one cohesive computing unit, e.g. via a
single system image In distributed computing, a single system image (SSI) cluster is a cluster of machines that appears to be one single system. The concept is often considered synonymous with that of a distributed operating system, but a single image may be presented ...
concept. Computer clustering relies on a centralized management approach which makes the nodes available as orchestrated shared servers. It is distinct from other approaches such as
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
or
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
which also use many nodes, but with a far more distributed nature. By the 21st century, the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
organization's semiannual list of the 500 fastest supercomputers often includes many clusters, e.g. the world's fastest in 2011, the
K computer The K computer named for the Japanese word/numeral , meaning 10 quadrillion (1016)See Japanese numbers was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Pref ...
with a
distributed memory In computer science, distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data are required, the computational task mu ...
, cluster architecture. When a large number of local semi-independent computing nodes are used (e.g. in a cluster architecture) the speed and flexibility of the interconnect becomes very important. Modern supercomputers have taken different approaches to address this issue, e.g.
Tianhe-1 Tianhe-I, Tianhe-1, or TH-1 (, ; '' Sky River Number One'') is a supercomputer capable of an Rmax (maximum range) of 2.5 peta FLOPS. Located at the National Supercomputing Center of Tianjin, China, it was the fastest computer in the world fro ...
uses a proprietary high-speed network based on the
Infiniband InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
QDR, enhanced with FeiTeng-1000 CPUs. On the other hand, the
Blue Gene Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
/L system uses a three-dimensional
torus In geometry, a torus (plural tori, colloquially donut or doughnut) is a surface of revolution generated by revolving a circle in three-dimensional space about an axis that is coplanar with the circle. If the axis of revolution does not tou ...
interconnect with auxiliary networks for global communications. In this approach each node is connected to its six nearest neighbors. A similar torus was used by the
Cray T3E The Cray T3E was Cray Research's second-generation massively parallel supercomputer architecture, launched in late November 1995. The first T3E was installed at the Pittsburgh Supercomputing Center in 1996. Like the previous Cray T3D, it was a ful ...
. Massive centralized systems at times use special-purpose processors designed for a specific application, and may use
field-programmable gate array A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware d ...
s (FPGA) chips to gain performance by sacrificing generality. Examples of special-purpose supercomputers include Belle,Condon, J.H. and K.Thompson, "Belle Chess Hardware", In ''Advances in Computer Chess 3'' (ed.M.R.B.Clarke), Pergamon Press, 1982.
Deep Blue Deep Blue may refer to: Film * ''Deep Blues: A Musical Pilgrimage to the Crossroads'', a 1992 documentary film about Mississippi Delta blues music * Deep Blue (2001 film), ''Deep Blue'' (2001 film), a film by Dwight H. Little * Deep Blue (2003 ...
, and Hydra, for playing
chess Chess is a board game for two players, called White and Black, each controlling an army of chess pieces in their color, with the objective to checkmate the opponent's king. It is sometimes called international chess or Western chess to disti ...
,
Gravity Pipe Gravity Pipe (abbreviated GRAPE) is a project which uses hardware acceleration to perform gravitational computations. Integrated with Beowulf-style commodity computers, the GRAPE system calculates the force of gravity that a given mass, such ...
for astrophysics,
MDGRAPE-3 MDGRAPE-3 is an ultra-high performance petascale supercomputer system developed by the Riken research institute in Japan. It is a special purpose system built for molecular dynamics simulations, especially protein structure prediction. MDGRAPE ...
for protein structure computation molecular dynamics and
Deep Crack In cryptography, the EFF DES cracker (nicknamed "Deep Crack") is a machine built by the Electronic Frontier Foundation (EFF) in 1998, to perform a brute force search of the Data Encryption Standard (DES) cipher's key space – that is, to dec ...
, for breaking the
DES Des is a masculine given name, mostly a short form (hypocorism) of Desmond. People named Des include: People * Des Buckingham, English football manager * Des Corcoran, (1928–2004), Australian politician * Des Dillon (disambiguation), sever ...
cipher In cryptography, a cipher (or cypher) is an algorithm for performing encryption or decryption—a series of well-defined steps that can be followed as a procedure. An alternative, less common term is ''encipherment''. To encipher or encode i ...
.


Massive distributed parallelism

Grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
uses a large number of computers in distributed, diverse administrative domains. It is an opportunistic approach which uses resources whenever they are available. An example is
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
a volunteer-based, opportunistic grid system. Some
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
applications have reached multi-petaflop levels by using close to half a million computers connected on the internet, whenever volunteer resources become available. However, these types of results often do not appear in the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
ratings because they do not run the general purpose Linpack benchmark. Although grid computing has had success in parallel task execution, demanding supercomputer applications such as weather simulations or
computational fluid dynamics Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. Computers are used to perform the calculations required to simulate th ...
have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time. In quasi-opportunistic supercomputing a large number of geographically disperse computers are orchestrated with built-in safeguards. The quasi-opportunistic approach goes beyond
volunteer computing Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop co ...
on a highly distributed systems such as
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
, or general
grid computing Grid computing is the use of widely distributed computer resources to reach a common goal. A computing grid can be thought of as a distributed system with non-interactive workloads that involve many files. Grid computing is distinguished from co ...
on a system such as
Globus Globus is Latin for ''sphere'' or ''globe''. It may also refer to: Business * Globus Medical, a medical device company in Audubon, PA * Globus (clothing retailer), an Indian clothing retail store * Globus (company), a Swiss department store chai ...
by allowing the
middleware Middleware is a type of computer software that provides services to software applications beyond those available from the operating system. It can be described as "software glue". Middleware makes it easier for software developers to implement co ...
to provide almost seamless access to many computing clusters so that existing programs in languages such as Fortran or C can be distributed among multiple computing resources. Quasi-opportunistic supercomputing aims to provide a higher quality of service than opportunistic resource sharing. The quasi-opportunistic approach enables the execution of demanding applications within computer grids by establishing grid-wise resource allocation agreements; and
fault tolerant Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
message passing to abstractly shield against the failures of the underlying resources, thus maintaining some opportunism, while allowing a higher level of control.


21st-century architectural trends

The air-cooled IBM
Blue Gene Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
supercomputer architecture trades processor speed for low power consumption so that a larger number of processors can be used at room temperature, by using normal air-conditioning. The second-generation Blue Gene/P system has processors with integrated node-to-node communication logic. It is energy-efficient, achieving 371 MFLOPS/W. The
K computer The K computer named for the Japanese word/numeral , meaning 10 quadrillion (1016)See Japanese numbers was a supercomputer manufactured by Fujitsu, installed at the Riken Advanced Institute for Computational Science campus in Kobe, Hyōgo Pref ...
is a
water-cooled Cooling tower and water discharge of a nuclear power plant Water cooling is a method of heat removal from components and industrial equipment. Evaporative cooling using water is often more efficient than air cooling. Water is inexpensive and non ...
, homogeneous processor,
distributed memory In computer science, distributed memory refers to a multiprocessor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data are required, the computational task mu ...
system with a cluster architecture. It uses more than 80,000 SPARC64 VIIIfx processors, each with eight cores, for a total of over 700,000 cores—almost twice as many as any other system. It comprises more than 800 cabinets, each with 96 computing nodes (each with 16 GB of memory), and 6 I/O nodes. Although it is more powerful than the next five systems on the TOP500 list combined, at 824.56 MFLOPS/W it has the lowest power to performance ratio of any current major supercomputer system. The follow up system for the K computer, called the
PRIMEHPC FX10 The PRIMEHPC FX10 is a supercomputer designed and manufactured by Fujitsu. Announced on 7 November 2011 at the Supercomputing Conference, the PRIMEHPC FX10 is an improved and commercialized version of the K computer, which was the first supercompute ...
uses the same six-dimensional torus interconnect, but still only one processor per node.''Fujitsu Unveils Post-K Supercomputer
HPC Wire Nov 7 2011
/ref> Unlike the K computer, the
Tianhe-1A Tianhe-I, Tianhe-1, or TH-1 (, ; '' Sky River Number One'') is a supercomputer capable of an Rmax (maximum range) of 2.5 peta FLOPS. Located at the National Supercomputing Center of Tianjin, China, it was the fastest computer in the worl ...
system uses a hybrid architecture and integrates CPUs and GPUs. It uses more than 14,000
Xeon Xeon ( ) is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same arc ...
general-purpose processors and more than 7,000
Nvidia Tesla Nvidia Tesla was the name of Nvidia's line of products targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 ser ...
general-purpose graphics processing units (GPGPUs) on about 3,500
blades A blade is the portion of a tool, weapon, or machine with an edge that is designed to puncture, chop, slice or scrape surfaces or materials. Blades are typically made from materials that are harder than those they are to be used on. Historic ...
. It has 112 computer cabinets and 262 terabytes of distributed memory; 2 petabytes of disk storage is implemented via
Lustre Lustre or Luster may refer to: Places * Luster, Norway, a municipality in Vestlandet, Norway ** Luster (village), a village in the municipality of Luster * Lustre, Montana, an unincorporated community in the United States Entertainment * '' ...
clustered files. Tianhe-1 uses a proprietary high-speed communication network to connect the processors. The proprietary interconnect network was based on the
Infiniband InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
QDR, enhanced with Chinese made FeiTeng-1000 CPUs. In the case of the interconnect the system is twice as fast as the Infiniband, but slower than some interconnects on other supercomputers. The limits of specific approaches continue to be tested, as boundaries are reached through large scale experiments, e.g., in 2011 IBM ended its participation in the
Blue Waters Blue Waters was a petascale supercomputer operated by the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign. On August 8, 2007, the National Science Board approved a resolution which auth ...
petaflops project at the University of Illinois.''The Register'': IBM yanks chain on 'Blue Waters' super
/ref>
/ref> The Blue Waters architecture was based on the IBM
POWER7 POWER7 is a family of superscalar multi-core microprocessors based on the Power ISA 2.06 instruction set architecture released in 2010 that succeeded the POWER6 and POWER6+. POWER7 was developed by IBM at several sites including IBM's Roche ...
processor and intended to have 200,000 cores with a petabyte of "globally addressable memory" and 10 petabytes of disk space. The goal of a sustained petaflop led to design choices that optimized single-core performance, and hence a lower number of cores. The lower number of cores was then expected to help performance on programs that did not scale well to a large number of processors. The large globally addressable memory architecture aimed to solve memory address problems in an efficient manner, for the same type of programs. Blue Waters had been expected to run at sustained speeds of at least one petaflop, and relied on the specific water-cooling approach to manage heat. In the first four years of operation, the National Science Foundation spent about $200 million on the project. IBM released the Power 775 computing node derived from that project's technology soon thereafter, but effectively abandoned the Blue Waters approach. Architectural experiments are continuing in a number of directions, e.g. the
Cyclops64 Cyclops64 (formerly known as Blue Gene/C) is a cellular architecture in development by IBM. The Cyclops64 project aims to create the first " supercomputer on a chip". History Cyclops64 is part of the Blue Gene effort, to produce the next sever ...
system uses a "supercomputer on a chip" approach, in a direction away from the use of massive distributed processors. Each 64-bit Cyclops64 chip contains 80 processors, and the entire system uses a globally addressable memory architecture. The processors are connected with non-internally blocking crossbar switch and communicate with each other via global interleaved memory. There is no
data cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
in the architecture, but half of each SRAM bank can be used as a scratchpad memory. Although this type of architecture allows unstructured parallelism in a dynamically non-contiguous memory system, it also produces challenges in the efficient mapping of parallel algorithms to a
many-core Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are use ...
system.


See also

* Supercomputer operating systems * Supercomputing in China *
Supercomputing in Europe Several centers for supercomputing exist across Europe, and distributed access to them is coordinated by European initiatives to facilitate high-performance computing. One such initiative, the HPC Europa project, fits within the Distributed Eu ...
*
History of supercomputing The term supercomputing arose in the late 1920s in the United States in response to the IBM tabulators at Columbia University. The CDC 6600, released in 1964, is sometimes considered the first supercomputer. However, some earlier computers were c ...
* Supercomputing in India * Supercomputing in Japan


References

{{Parallel Computing Concurrent computing Distributed computing architecture *Supercomputers