QPACE (
QCD
In theoretical physics, quantum chromodynamics (QCD) is the theory of the strong interaction between quarks mediated by gluons. Quarks are fundamental particles that make up composite hadrons such as the proton, neutron and pion. QCD is a ty ...
Parallel Computing on the
Cell Broadband Engine
Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as m ...
) is a
massively parallel
Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
and scalable
supercomputer designed for applications in
lattice quantum chromodynamics.
Overview
The QPACE supercomputer is a research project carried out by several academic institutions in collaboration with the
IBM Research and Development Laboratory in Böblingen, Germany, and other industrial partners including
Eurotech,
Knürr, and
Xilinx
Xilinx, Inc. ( ) was an American technology and semiconductor company that primarily supplied programmable logic devices. The company was known for inventing the first commercially viable field-programmable gate array (FPGA) and creating the ...
. The academic design team of about 20 junior and senior scientists, mostly physicists, came from the
University of Regensburg
The University of Regensburg (german: link=no, Universität Regensburg) is a public research university located in the medieval city of Regensburg, Bavaria, a city that is listed as a UNESCO World Heritage Site. The university was founded on 18 ...
(project lead), the
University of Wuppertal
The University of Wuppertal (''Universität Wuppertal'') is a German scientific institution, located in Wuppertal, in the state of North Rhine-Westphalia, Germany.
The university's official name in German is ''Bergische Universität Wuppertal'' ...
,
DESY Zeuthen,
Jülich Research Centre
Jülich (; in old spellings also known as ''Guelich'' or ''Gülich'', nl, Gulik, french: Juliers, Ripuarian: ''Jöllesch'') is a town in the district of Düren, in the federal state of North Rhine-Westphalia, Germany. As a border region betwe ...
, and the
University of Ferrara
The University of Ferrara ( it, Università degli Studi di Ferrara) is the main university of the city of Ferrara in the Emilia-Romagna region of northern Italy. In the years prior to the First World War the University of Ferrara, with more than 5 ...
. The main goal was the design of an application-optimized scalable architecture that beats industrial products in terms of compute performance, price-performance ratio, and energy efficiency. The project officially started in 2008. Two installations were deployed in the summer of 2009. The final design was completed in early 2010. Since then QPACE is used for calculations of
lattice QCD
Lattice QCD is a well-established non-perturbative approach to solving the quantum chromodynamics (QCD) theory of quarks and gluons. It is a lattice gauge theory formulated on a grid or lattice of points in space and time. When the size of the lat ...
. The system architecture is also suitable for other applications that mainly rely on nearest-neighbor communication, e.g.,
lattice Boltzmann methods.
In November 2009 QPACE was the leading architecture on the Green500 list of the most energy-efficient supercomputers in the world.
The title was defended in June 2010, when the architecture achieved an energy signature of 773
MFLOPS per Watt in the
Linpack benchmark
The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense ''n'' by ''n'' system of linear equations ''Ax'' = ''b'', which is a common ...
.
In the
Top500
The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinci ...
list of most powerful supercomputers, QPACE ranked #110-#112 in November 2009, and #131-#133 in June 2010.
QPACE was funded by the
German Research Foundation (DFG) in the framework of SFB/TRR-55 and by
IBM. Additional contributions were made by
Eurotech,
Knürr, and
Xilinx
Xilinx, Inc. ( ) was an American technology and semiconductor company that primarily supplied programmable logic devices. The company was known for inventing the first commercially viable field-programmable gate array (FPGA) and creating the ...
.
Architecture
In 2008 IBM released the
PowerXCell 8i
Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ...
multi-core processor
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
, an enhanced version of the IBM
Cell Broadband Engine
Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as m ...
used, e.g., in the
PlayStation 3
The PlayStation 3 (PS3) is a home video game console developed by Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on November 11, 2006, in Japan, November ...
. The processor received much attention in the scientific community due to its outstanding floating-point performance.
It is one of the building blocks of the
IBM Roadrunner
Roadrunner was a supercomputer built by IBM for the Los Alamos National Laboratory in New Mexico, USA. The US$100-million Roadrunner was designed for a peak performance of 1.7 petaflops. It achieved 1.026 petaflops on May 25, 2008, to become t ...
cluster, which was the first supercomputer architecture to break the PFLOPS barrier. Cluster architectures based on the PowerXCell 8i typically rely on
IBM BladeCenter
The IBM BladeCenter was IBM's blade server architecture, until it was replaced by Flex System in 2012. The x86 division was later sold to Lenovo in 2014.
History
Introduced in 2002, based on engineering work started in 1999, the IBM eSe ...
blade servers interconnected by industry-standard networks such as
Infiniband
InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also use ...
. For QPACE an entirely different approach was chosen. A custom-designed network co-processor implemented on
Xilinx Virtex-5 FPGAs is used to connect the compute nodes.
FPGAs are re-programmable semiconductor devices that allow for a customized specification of the functional behavior. The QPACE network processor is tightly coupled to the PowerXCell 8i via a Rambus-proprietary I/O interface.
The smallest building block of QPACE is the node card, which hosts the PowerXCell 8i and the FPGA. Node cards are mounted on backplanes, each of which can host up to 32 node cards. One QPACE rack houses up to eight
backplane
A backplane (or "backplane system") is a group of electrical connectors in parallel with each other, so that each pin of each connector is linked to the same relative pin of all the other connectors, forming a computer bus. It is used as a back ...
s, with four backplanes each mounted to the front and back side. The maximum number of node cards per rack is 256. QPACE relies on a water-cooling solution to achieve this packaging density.
Sixteen node cards are monitored and controlled by a separate administration card, called the root card. One more administration card per rack, called the superroot card, is used to monitor and control the power supplies. The root cards and superroot cards are also used for synchronization of the compute nodes.
Node card
The heart of QPACE is the IBM
PowerXCell 8i
Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ...
multi-core processor. Each node card hosts one PowerXCell 8i, 4 GB of
DDR2 SDRAM with
ECC, one
Xilinx Virtex-5 FPGA and seven network
transceiver
In radio communication, a transceiver is an electronic device which is a combination of a radio ''trans''mitter and a re''ceiver'', hence the name. It can both transmit and receive radio waves using an antenna, for communication purposes. Thes ...
s. A single
1 Gigabit Ethernet transceiver connects the node card to the I/O network. Six 10 Gigabit transceivers are used for passing messages between neighboring nodes in a
three-dimensional toroidal mesh.
The QPACE network co-processor is implemented on a Xilinx Virtex-5 FPGA, which is directly connected to the
I/O interface
In computing, input/output (I/O, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, possibly a human or another information processing system. Inputs are the signals ...
of the PowerXCell 8i.
The functional behavior of the FPGA is defined by a
hardware description language
In computer engineering, a hardware description language (HDL) is a specialized computer language used to describe the structure and behavior of electronic circuits, and most commonly, digital logic circuits.
A hardware description language en ...
and can be changed at any time at the cost of rebooting the node card. Most entities of the QPACE network co-coprocessor are coded in
VHDL
The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gate ...
.
Networks
The QPACE network co-processor connects the PowerXCell 8i to three communications networks:
* The torus network is a high-speed communication path that allows for nearest-neighbor communication in a
three-dimensional toroidal mesh. The torus network relies on the
physical layer of
10 Gigabit Ethernet
10 Gigabit Ethernet (10GE, 10GbE, or 10 GigE) is a group of computer networking technologies for transmitting Ethernet frames at a rate of 10 gigabits per second. It was first defined by the IEEE 802.3ae-2002 standard. Unlike previous ...
, while a custom-designed communications protocol optimized for small message sizes is used for message passing. A unique feature of the torus network design is the support for
zero-copy
"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwi ...
communication between the private memory areas, called the Local Stores, of the
Synergistic Processing Elements (SPEs) by
direct memory access
Direct memory access (DMA) is a feature of computer systems and allows certain hardware subsystems to access main system memory independently of the central processing unit (CPU).
Without DMA, when the CPU is using programmed input/output, it is ...
. The latency for communication between two SPEs on neighboring nodes is 3 μs. The peak bandwidth per link and direction is about 1 GB/s.
* Switched 1 Gigabit Ethernet is used for file I/O and maintenance.
* The global signals network is a simple 2-wire system arranged as a tree network. This network is used for evaluation of global conditions and synchronization of the nodes.
Cooling
The compute nodes of the QPACE supercomputer are cooled by water. Roughly 115 Watt have to be dissipated from each node card.
The cooling solution is based on a two-component design. Each node card is mounted to a thermal box, which acts as a large
heat sink
A heat sink (also commonly spelled heatsink) is a passive heat exchanger that transfers the heat generated by an electronic or a mechanical device to a fluid medium, often air or a liquid coolant, where it is dissipated away from the device, th ...
for heat-critical components. The thermal box interfaces to a coldplate, which is connected to the water-cooling circuit. The performance of the coldplate allows for the removal of the heat from up to 32 nodes. The node cards are mounted on both sides of the coldplate, i.e., 16 nodes each are mounted on the top and bottom of the coldplate. The efficiency of the cooling solution allows for the cooling of the compute nodes with warm water. The QPACE cooling solution also influenced other supercomputer designs such as
SuperMUC.
Installations
Two identical installations of QPACE with four racks have been operating since 2009:
*
Jülich Research Centre
Jülich (; in old spellings also known as ''Guelich'' or ''Gülich'', nl, Gulik, french: Juliers, Ripuarian: ''Jöllesch'') is a town in the district of Düren, in the federal state of North Rhine-Westphalia, Germany. As a border region betwe ...
*
University of Wuppertal
The University of Wuppertal (''Universität Wuppertal'') is a German scientific institution, located in Wuppertal, in the state of North Rhine-Westphalia, Germany.
The university's official name in German is ''Bergische Universität Wuppertal'' ...
The aggregate peak performance is about 200
TFLOPS
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
in double precision, and 400 TFLOPS in single precision. The installations are operated by the
University of Regensburg
The University of Regensburg (german: link=no, Universität Regensburg) is a public research university located in the medieval city of Regensburg, Bavaria, a city that is listed as a UNESCO World Heritage Site. The university was founded on 18 ...
, Jülich Research Centre, and the
University of Wuppertal
The University of Wuppertal (''Universität Wuppertal'') is a German scientific institution, located in Wuppertal, in the state of North Rhine-Westphalia, Germany.
The university's official name in German is ''Bergische Universität Wuppertal'' ...
.
See also
*
QPACE2, a follow-up project to QPACE
*
Supercomputer
*
Cell (microprocessor)
*
Torus interconnect
A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system.
Introduction
In geometry, a torus is created by revolving a circle about an axis coplanar to the circle. While this is a ...
*
FPGA
*
Lattice QCD
Lattice QCD is a well-established non-perturbative approach to solving the quantum chromodynamics (QCD) theory of quarks and gluons. It is a lattice gauge theory formulated on a grid or lattice of points in space and time. When the size of the lat ...
References
[The Green500 list, November 2009, http://www.green500.org/lists/green200911]
[The Top500 list, November 2009, ]
[The Green500 list, June 2010, http://www.green500.org/lists/green201006]
[The Top500 list, June 2010, ]
[H. Baier et al.]
''QPACE - a QCD parallel computer based on Cell processors''
Proceedings of Science (LAT2009), 001
[G. Bilardi et al., ]
The Potential of On-Chip Multiprocessing for QCD Machines
', Lecture Notes in Computer Science 3769 (2005) 386
[I. Ouda, K. Schleupen]
''Application Note: FPGA to IBM Power Processor Interface Setup''
IBM Research report, 2008
[G. Goldrian et al., ]
QPACE: Quantum Chromodynamics Parallel Computing on the Cell Broadband Engine
', Computing in Science and Engineering 10 (2008) 46
[L. Biferale et al.]
''Lattice Boltzmann fluid-dynamics on the QPACE supercomputer''
Procedia Computer Science 1 (2010) 1075
[S. Williams et al., ]
The Potential of the Cell Processor for Scientific Computing
', Proceedings of the 3rd conference on Computing frontiers (2006) 9
[S. Solbrig]
''Synchronization in QPACE''
STRONGnet Conference, Cyprus, 2010
[B. Michel et al.]
''Aquasar: Der Weg zu optimal effizienten Rechenzentren''
2011
[Qpace - کیوپیس](_blank)
/ref>
{{DEFAULTSORT:Qpace
Supercomputers
Cell_BE_architecture
Parallel computing