HOME

TheInfoList



OR:

Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, and Blue Gene/Q. During their deployment, Blue Gene systems often led the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
and
Green500 The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precisi ...
rankings of the most powerful and most power-efficient supercomputers, respectively. Blue Gene systems have also consistently scored top positions in the
Graph500 The Graph500 is a rating of supercomputer systems, focused on data-intensive loads. The project was announced on International Supercomputing Conference in June 2010. The first list was published at the ACM/IEEE Supercomputing Conference in Novem ...
list. The project was awarded the 2009
National Medal of Technology and Innovation The National Medal of Technology and Innovation (formerly the National Medal of Technology) is an honor granted by the President of the United States to American inventors and innovators who have made significant contributions to the development ...
. As of 2015, IBM seems to have ended the development of the Blue Gene family though no public announcement has been made. IBM's continuing efforts of the supercomputer scene seems to be concentrated around
OpenPower The OpenPOWER Foundation is a collaboration around Power ISA-based products initiated by IBM and announced as the "OpenPOWER Consortium" on August 6, 2013. IBM is opening up technology surrounding their Power Architecture offerings, such as proce ...
, using accelerators such as
FPGA A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
s and
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s to battle the end of
Moore's law Moore's law is the observation that the number of transistors in a dense integrated circuit (IC) doubles about every two years. Moore's law is an observation and projection of a historical trend. Rather than a law of physics, it is an empir ...
.


History

In December 1999, IBM announced a US$100 million research initiative for a five-year effort to build a massively
parallel computer Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different for ...
, to be applied to the study of biomolecular phenomena such as
protein folding Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduci ...
. The project had two main goals: to advance our understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software. Major areas of investigation included: how to use this novel platform to effectively meet its scientific goals, how to make such massively parallel machines more usable, and how to achieve performance targets at a reasonable cost, through novel machine architectures. The initial design for Blue Gene was based on an early version of the
Cyclops64 Cyclops64 (formerly known as Blue Gene/C) is a cellular architecture in development by IBM. The Cyclops64 project aims to create the first " supercomputer on a chip". History Cyclops64 is part of the Blue Gene effort, to produce the next sever ...
architecture, designed by
Monty Denneau Monty M. Denneau is a computer architect and mathematician. Denneau was awarded the 2002 Seymour Cray Computer Engineering Award for "ingenious and sustained contributions to designs and implementations at the frontier of high performance computin ...
. The initial research and development work was pursued at IBM T.J. Watson Research Center and led by William R. Pulleyblank. At IBM, Alan Gara started working on an extension of the QCDOC architecture into a more general-purpose supercomputer: The 4D nearest-neighbor interconnection network was replaced by a network supporting routing of messages from any node to any other; and a parallel I/O subsystem was added. DOE started funding the development of this system and it became known as Blue Gene/L (L for Light); development of the original Blue Gene system continued under the name Blue Gene/C (C for Cyclops) and, later, Cyclops64. In November 2004 a 16-
rack Rack or racks may refer to: Storage and installation * Amp rack, short for amplifier rack, a piece of furniture in which amplifiers are mounted * Bicycle rack, a frame for storing bicycles when not in use * Bustle rack, a type of storage bi ...
system, with each rack holding 1,024 compute nodes, achieved first place in the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
list, with a Linpack performance of 70.72 TFLOPS. It thereby overtook NEC's
Earth Simulator The is a series of supercomputers deployed at Japan Agency for Marine-Earth Science and Technology Yokohama Institute of Earth Sciences. Earth Simulator (first generation) The first generation of Earth Simulator, developed by the Japanese go ...
, which had held the title of the fastest computer in the world since 2002. From 2004 through 2007 the Blue Gene/L installation at LLNL gradually expanded to 104 racks, achieving 478 TFLOPS Linpack and 596 TFLOPS peak. The LLNL BlueGene/L installation held the first position in the TOP500 list for 3.5 years, until in June 2008 it was overtaken by IBM's Cell-based
Roadrunner The roadrunners (genus ''Geococcyx''), also known as chaparral birds or chaparral cocks, are two species of fast-running ground cuckoos with long tails and crests. They are found in the southwestern and south-central United States and Mexico, us ...
system at
Los Alamos National Laboratory Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, ...
, which was the first system to surpass the 1 PetaFLOPS mark. The system was built in Rochester, MN IBM plant. While the LLNL installation was the largest Blue Gene/L installation, many smaller installations followed. In November 2006, there were 27 computers on the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
list using the Blue Gene/L architecture. All these computers were listed as having an architecture of ''eServer Blue Gene Solution''. For example, three racks of Blue Gene/L were housed at the
San Diego Supercomputer Center The San Diego Supercomputer Center (SDSC) is an organized research unit of the University of California, San Diego (UCSD). SDSC is located at the UCSD campus' Eleanor Roosevelt College east end, immediately north the Hopkins Parking Structure. ...
. While the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
measures performance on a single benchmark application, Linpack, Blue Gene/L also set records for performance on a wider set of applications. Blue Gene/L was the first supercomputer ever to run over 100 
TFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
sustained on a real-world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions. This achievement won the 2005
Gordon Bell Prize The Gordon Bell Prize, commonly referred to as the Nobel Prize of Supercomputing, is an award presented by the Association for Computing Machinery each year in conjunction with the SC Conference series (formerly known as the Supercomputing Conferen ...
. In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application (
Qbox Qbox is an open-source software package for atomic-scale simulations of molecules, liquids and solids. It implements first principles (or ab initio) molecular dynamics, a simulation method in which inter-atomic forces are derived from quantum mech ...
). At Supercomputing 2006, Blue Gene/L was awarded the winning prize in all HPC Challenge Classes of awards. In 2007, a team from the
IBM Almaden Research Center IBM Research is the research and development division for IBM, an American multinational information technology company headquartered in Armonk, New York, with operations in over 170 countries. IBM Research is the largest industrial research org ...
and the
University of Nevada The University of Nevada, Reno (Nevada, the University of Nevada, or UNR) is a public land-grant research university in Reno, Nevada. It is the state's flagship public university and primary land grant institution. It was founded on October 12 ...
ran an
artificial neural network Artificial neural networks (ANNs), usually simply called neural networks (NNs) or neural nets, are computing systems inspired by the biological neural networks that constitute animal brains. An ANN is based on a collection of connected unit ...
almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).


The name

The name Blue Gene comes from what it was originally designed to do, help biologists understand the processes of
protein folding Protein folding is the physical process by which a protein chain is translated to its native three-dimensional structure, typically a "folded" conformation by which the protein becomes biologically functional. Via an expeditious and reproduci ...
and gene development. "Blue" is a traditional moniker that IBM uses for many of its products and the company itself. The original Blue Gene design was renamed "Blue Gene/C" and eventually
Cyclops64 Cyclops64 (formerly known as Blue Gene/C) is a cellular architecture in development by IBM. The Cyclops64 project aims to create the first " supercomputer on a chip". History Cyclops64 is part of the Blue Gene effort, to produce the next sever ...
. The "L" in Blue Gene/L comes from "Light" as that design's original name was "Blue Light". The "P" version was designed to be a
petascale Petascale computing refers to computing systems capable of calculating at least 1015 floating point operations per second (1 petaFLOPS). Petascale computing allowed faster processing of traditional supercomputer applications. The first system to ...
design. "Q" is just the letter after "P". There is no Blue Gene/R.


Major features

The Blue Gene/L supercomputer was unique in the following aspects: * Trading the speed of processors for lower power consumption. Blue Gene/L used low frequency and low power embedded PowerPC cores with floating-point accelerators. While the performance of each chip was relatively low, the system could achieve better power efficiency for applications that could use large numbers of nodes. * Dual processors per node with two working modes: co-processor mode where one processor handles computation and the other handles communication; and virtual-node mode, where both processors are available to run user code, but the processors share both the computation and the communication load. * System-on-a-chip design. Components were embedded on a single chip for each node, with the exception of 512 MB external DRAM. * A large number of nodes (scalable in increments of 1024 up to at least 65,536) * Three-dimensional
torus interconnect A torus interconnect is a switch-less network topology for connecting processing nodes in a parallel computer system. Introduction In geometry, a torus is created by revolving a circle about an axis coplanar to the circle. While this is a ...
with auxiliary networks for global communications (broadcast and reductions), I/O, and management * Lightweight OS per node for minimum system overhead (system noise).


Architecture

The Blue Gene/L architecture was an evolution of the QCDSP and QCDOC architectures. Each Blue Gene/L Compute or I/O node was a single
ASIC An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
with associated
DRAM Dynamic random-access memory (dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal-oxid ...
memory chips. The ASIC integrated two 700 MHz PowerPC 440 embedded processors, each with a double-pipeline-double-precision
Floating-Point Unit In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
(FPU), a
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
sub-system with built-in DRAM controller and the logic to support multiple communication sub-systems. The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6  GFLOPS (gigaFLOPS). The two CPUs were not cache coherent with one another. Compute nodes were packaged two per compute card, with 16 compute cards plus up to 2 I/O nodes per node board. There were 32 node boards per cabinet/rack. By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated low power (about 17 watts, including DRAMs). This allowed aggressive packaging of up to 1024 compute nodes, plus additional I/O nodes, in a standard
19-inch rack A 19-inch rack is a standardized frame or enclosure for mounting multiple electronic equipment modules. Each module has a front panel that is wide. The 19 inch dimension includes the edges or "ears" that protrude from each side of the equ ...
, within reasonable limits of electrical power supply and air cooling. The performance metrics, in terms of
FLOPS per watt In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consum ...
, FLOPS per m2 of floorspace and FLOPS per unit cost, allowed scaling up to very high performance. With so many nodes, component failures were inevitable. The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run. Each Blue Gene/L node was attached to three parallel communications networks: a 3D toroidal network for peer-to-peer communication between compute nodes, a collective network for collective communication (broadcasts and reduce operations), and a global interrupt network for fast barriers. The I/O nodes, which run the
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
, provided communication to storage and external hosts via an
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
network. The I/O nodes handled filesystem operations on behalf of the compute nodes. Finally, a separate and private
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
network provided access to any node for configuration,
booting In computing, booting is the process of starting a computer as initiated via hardware such as a button or by a software command. After it is switched on, a computer's central processing unit (CPU) has no software in its main memory, so som ...
and diagnostics. To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes. The number of nodes in a partition had to be a positive
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
power of 2, with at least 25 = 32 nodes. To run a program on Blue Gene/L, a partition of the computer was first to be reserved. The program was then loaded and run on all the nodes within the partition, and no other program could access nodes within the partition while it was in use. Upon completion, the partition nodes were released for future programs to use. Blue Gene/L compute nodes used a minimal
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
supporting a single user program. Only a subset of
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
calls was supported, and only one process could run at a time on node in co-processor mode—or one process per CPU in virtual mode. Programmers needed to implement
green threads In computer programming, a green thread is a thread that is scheduled by a runtime library or virtual machine (VM) instead of natively by the underlying operating system (OS). Green threads emulate multithreaded environments without relying on an ...
in order to simulate local concurrency. Application development was usually performed in C,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, or Fortran using MPI for communication. However, some scripting languages such as
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
and
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
have been ported to the compute nodes. IBM has published BlueMatter, the application developed to exercise Blue Gene/L, as open source here. This serves to document how the torus and collective interfaces were used by applications, and may serve as a base for others to exercise the current generation of supercomputers.


Blue Gene/P

In June 2007, IBM unveiled Blue Gene/P, the second generation of the Blue Gene series of supercomputers and designed through a collaboration that included IBM, LLNL, and
Argonne National Laboratory Argonne National Laboratory is a science and engineering research United States Department of Energy National Labs, national laboratory operated by University of Chicago, UChicago Argonne LLC for the United States Department of Energy. The facil ...
'
Leadership Computing Facility


Design

The design of Blue Gene/P is a technology evolution from Blue Gene/L. Each Blue Gene/P Compute chip contains four PowerPC 450 processor cores, running at 850 MHz. The cores are cache coherent and the chip can operate as a 4-way
symmetric multiprocessor Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all ...
(SMP). The memory subsystem on the chip consists of small private L2 caches, a central shared 8 MB L3 cache, and dual DDR2 memory controllers. The chip also integrates the logic for node-to-node communication, using the same network topologies as Blue Gene/L, but at more than twice the bandwidth. A compute card contains a Blue Gene/P chip with 2 or 4 GB DRAM, comprising a "compute node". A single compute node has a peak performance of 13.6 GFLOPS. 32 Compute cards are plugged into an air-cooled node board. A
rack Rack or racks may refer to: Storage and installation * Amp rack, short for amplifier rack, a piece of furniture in which amplifiers are mounted * Bicycle rack, a frame for storing bicycles when not in use * Bustle rack, a type of storage bi ...
contains 32 node boards (thus 1024 nodes, 4096 processor cores). By using many small, low-power, densely packaged chips, Blue Gene/P exceeded the power efficiency of other supercomputers of its generation, and at 371  MFLOPS/W Blue Gene/P installations ranked at or near the top of the
Green500 The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precisi ...
lists in 2007-2008.


Installations

The following is an incomplete list of Blue Gene/P installations. Per November 2009, the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
list contained 15 Blue Gene/P installations of 2-racks (2048 nodes, 8192 processor cores, 23.86 
TFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
Linpack) and larger. * On November 12, 2007, the first Blue Gene/P installation,
JUGENE JUGENE (''Jülich Blue Gene'') was a supercomputer built by IBM for Forschungszentrum Jülich in Germany. It was based on the Blue Gene#Blue Gene/P, Blue Gene/P and succeeded the Jülich Research Centre#JUBL, JUBL based on an earlier design. It ...
, with 16 racks (16,384 nodes, 65,536 processors) was running at
Forschungszentrum Jülich Forschungszentrum Jülich (FZJ here for short) is a national research institution that pursues interdisciplinary research in the fields of energy, information, and bioeconomy. It operates research infrastructures with a focus on supercomputers. Cu ...
in
Germany Germany,, officially the Federal Republic of Germany, is a country in Central Europe. It is the second most populous country in Europe after Russia, and the most populous member state of the European Union. Germany is situated betwe ...
with a performance of 167 TFLOPS. When inaugurated it was the fastest supercomputer in Europe and the sixth fastest in the world. In 2009, JUGENE was upgraded to 72 racks (73,728 nodes, 294,912 processor cores) with 144 terabytes of memory and 6 petabytes of storage, and achieved a peak performance of 1 PetaFLOPS. This configuration incorporated new air-to-water heat exchangers between the racks, reducing the cooling cost substantially. JUGENE was shut down in July 2012 and replaced by the Blue Gene/Q system
JUQUEEN JUQUEEN was a Blue Gene/Q system supercomputer built by IBM. Financed by the Helmholtz Association and the Gauss Centre for Supercomputing (GCS) in equal parts from federal funds and state funds from North Rhine-Westphalia, it was put into ope ...
. * The 40-rack (40960 nodes, 163840 processor cores) "Intrepid" system at
Argonne National Laboratory Argonne National Laboratory is a science and engineering research United States Department of Energy National Labs, national laboratory operated by University of Chicago, UChicago Argonne LLC for the United States Department of Energy. The facil ...
was ranked #3 on the June 2008 Top 500 list. The Intrepid system is one of the major resources of the INCITE program, in which processor hours are awarded to "grand challenge" science and engineering projects in a peer-reviewed competition. *
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
installed a 36-rack Blue Gene/P installation, "Dawn", in 2009. * The
King Abdullah University of Science and Technology King Abdullah University of Science and Technology (KAUST; ar, جامعة الملك عبد الله للعلوم و التقنية ') is a private research university located in Thuwal, Saudi Arabia. Founded in 2009, the university provid ...
(
KAUST King Abdullah University of Science and Technology (KAUST; ar, جامعة الملك عبد الله للعلوم و التقنية ') is a private research university located in Thuwal, Saudi Arabia. Founded in 2009, the university provid ...
) installed a 16-rack Blue Gene/P installation, " Shaheen", in 2009. * In 2012, a 6-rack Blue Gene/P was installed at
Rice University William Marsh Rice University (Rice University) is a Private university, private research university in Houston, Houston, Texas. It is on a 300-acre campus near the Houston Museum District and adjacent to the Texas Medical Center. Rice is ranke ...
and will be jointly administered with the
University of São Paulo The University of São Paulo ( pt, Universidade de São Paulo, USP) is a public university in the Brazilian state of São Paulo. It is the largest Brazilian public university and the country's most prestigious educational institution, the best ...
. * A 2.5 rack Blue Gene/P system is the central processor for the Low Frequency Array for Radio astronomy ( LOFAR) project in the Netherlands and surrounding European countries. This application uses the streaming data capabilities of the machine. * A 2-rack Blue Gene/P was installed in September 2008 in
Sofia Sofia ( ; bg, София, Sofiya, ) is the capital and largest city of Bulgaria. It is situated in the Sofia Valley at the foot of the Vitosha mountain in the western parts of the country. The city is built west of the Iskar river, and ha ...
,
Bulgaria Bulgaria (; bg, България, Bǎlgariya), officially the Republic of Bulgaria,, ) is a country in Southeast Europe. It is situated on the eastern flank of the Balkans, and is bordered by Romania to the north, Serbia and North Macedon ...
, and is operated by the
Bulgarian Academy of Sciences The Bulgarian Academy of Sciences (abbreviated BAS; bg, Българска академия на науките, ''Balgarska akademiya na naukite'', abbreviated ''БАН'') is the National Academy of Bulgaria, established in 1869. The Academy ...
and
Sofia University Sofia University, "St. Kliment Ohridski" at the University of Sofia, ( bg, Софийски университет „Св. Климент Охридски“, ''Sofijski universitet „Sv. Kliment Ohridski“'') is the oldest higher education i ...
. * In 2010, a 2-rack (8192-core) Blue Gene/P was installed at the
University of Melbourne The University of Melbourne is a public research university located in Melbourne, Australia. Founded in 1853, it is Australia's second oldest university and the oldest in Victoria. Its main campus is located in Parkville, an inner suburb nor ...
for the
Victorian Life Sciences Computation Initiative Melbourne Bioinformatics (formerly the Victorian Life Sciences Computation Initiative, VLSCI) is a centre for computational life science expertise. It provides bioinformatics support for all researchers and students in a wide range of projects an ...
. * In 2011, a 2-rack Blue Gene/P was installed at
University of Canterbury The University of Canterbury ( mi, Te Whare Wānanga o Waitaha; postnominal abbreviation ''Cantuar.'' or ''Cant.'' for ''Cantuariensis'', the Latin name for Canterbury) is a public research university based in Christchurch, New Zealand. It was ...
in Christchurch, New Zealand. * In 2012, a 2-rack Blue Gene/P was installed at
Rutgers University Rutgers University (; RU), officially Rutgers, The State University of New Jersey, is a Public university, public land-grant research university consisting of four campuses in New Jersey. Chartered in 1766, Rutgers was originally called Queen's ...
in Piscataway, New Jersey. It was dubbed "Excalibur" as an homage to the Rutgers mascot, the Scarlet Knight. * In 2008, a 1-rack (1024 nodes) Blue Gene/P with 180 TB of storage was installed at the
University of Rochester The University of Rochester (U of R, UR, or U of Rochester) is a private research university in Rochester, New York. The university grants undergraduate and graduate degrees, including doctoral and professional degrees. The University of Roc ...
in
Rochester, New York Rochester () is a City (New York), city in the U.S. state of New York (state), New York, the county seat, seat of Monroe County, New York, Monroe County, and the fourth-most populous in the state after New York City, Buffalo, New York, Buffalo, ...
. * The first Blue Gene/P in the ASEAN region was installed in 2010 at the Universiti of Brunei Darussalam’s research centre, the UBD-IBM Centre. The installation has prompted research collaboration between the university and IBM research on climate modeling that will investigate the impact of climate change on flood forecasting, crop yields, renewable energy and the health of rainforests in the region among others. * In 2013, a 1-rack Blue Gene/P was donated to the Department of Science and Technology for weather forecasts, disaster management, precision agriculture, and health it is housed in the National Computer Center, Diliman, Quezon City, under the auspices of Philippine Genome Center (PGC) Core Facility for Bioinformatics (CFB) at UP Diliman, Quezon City.


Applications

*
Veselin Topalov Veselin Aleksandrov Topalov (pronounced ; bg, Весели́н Александров Топа́лов; born 15 March 1975) is a Bulgarian chess International Grandmaster, grandmaster and former FIDE World Chess Championship, World Chess Champ ...
, the challenger to the
World Chess Champion The World Chess Championship is played to determine the world champion in chess. The current world champion is Magnus Carlsen of Norway, who has held the title since 2013. The first event recognized as a world championship was the 1886 match ...
title in 2010, confirmed in an interview that he had used a Blue Gene/P supercomputer during his preparation for the match. * The Blue Gene/P computer has been used to simulate approximately one percent of a human cerebral cortex, containing 1.6 billion
neuron A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. N ...
s with approximately 9 trillion connections. * The IBM Kittyhawk project team has ported Linux to the compute nodes and demonstrated generic Web 2.0 workloads running at scale on a Blue Gene/P. Their paper, published in the ACM Operating Systems Review, describes a kernel driver that tunnels Ethernet over the tree network, which results in all-to-all
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the suit ...
connectivity. Running standard Linux software like
MySQL MySQL () is an open-source relational database management system (RDBMS). Its name is a combination of "My", the name of co-founder Michael Widenius's daughter My, and "SQL", the acronym for Structured Query Language. A relational database o ...
, their performance results on SpecJBB rank among the highest on record. * In 2011, a Rutgers University / IBM / University of Texas team linked the
KAUST King Abdullah University of Science and Technology (KAUST; ar, جامعة الملك عبد الله للعلوم و التقنية ') is a private research university located in Thuwal, Saudi Arabia. Founded in 2009, the university provid ...
Shaheen installation together with a Blue Gene/P installation at the
IBM Watson Research Center The Thomas J. Watson Research Center is the headquarters for IBM Research. The center comprises three sites, with its main laboratory in Yorktown Heights, New York, Yorktown Heights, New York (state), New York, U.S., 38 miles (61 km) north ...
into a "federated high performance computing cloud", winning the IEEE SCALE 2011 challenge with an oil reservoir optimization application.


Blue Gene/Q

The third supercomputer design in the Blue Gene series, Blue Gene/Q has a peak performance of 20
Petaflops In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
, reaching
LINPACK benchmarks The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense ''n'' by ''n'' system of linear equations ''Ax'' = ''b'', which is a commo ...
br>performance
of 17
Petaflops In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
. Blue Gene/Q continues to expand and enhance the Blue Gene/L and /P architectures.


Design

The Blue Gene/Q Compute chip is an 18-core chip. The
64-bit In computer architecture, 64-bit Integer (computer science), integers, memory addresses, or other Data (computing), data units are those that are 64 bits wide. Also, 64-bit central processing unit, CPUs and arithmetic logic unit, ALUs are those ...
A2 processor cores are 4-way simultaneously multithreaded, and run at 1.6 GHz. Each processor core has a
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
quad-vector
double-precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Flo ...
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
unit (IBM QPX). 16 Processor cores are used for computing, and a 17th core for operating system assist functions such as
interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted, ...
s,
asynchronous I/O In computer science, asynchronous I/O (also non-sequential I/O) is a form of input/output processing that permits other processing to continue before the transmission has finished. A name used for asynchronous I/O in the Windows API is overlappe ...
, MPI pacing and
RAS Ras or RAS may refer to: Arts and media * RAS Records Real Authentic Sound, a reggae record label * Rundfunk Anstalt Südtirol, a south Tyrolese public broadcasting service * Rás 1, an Icelandic radio station * Rás 2, an Icelandic radio stati ...
. The 18th core is used as a redundant spare, used to increase manufacturing yield. The spared-out core is shut down in functional operation. The processor cores are linked by a crossbar switch to a 32 MB
eDRAM Embedded DRAM (eDRAM) is dynamic random-access memory (DRAM) integrated on the same die or multi-chip module (MCM) of an application-specific integrated circuit (ASIC) or microprocessor. eDRAM's cost-per-bit is higher when compared to equivalen ...
L2 cache, operating at half core speed. The L2 cache is multi-versioned, supporting
transactional memory In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transa ...
and
speculative execution Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing t ...
, and has hardware support for
atomic operations In concurrent programming, an operation (or set of operations) is linearizable if it consists of an ordered list of invocation and response events (event), that may be extended by adding response events such that: # The extended list can be re-e ...
. L2 cache misses are handled by two built-in
DDR3 Double Data Rate 3 Synchronous Dynamic Random-Access Memory (DDR3 SDRAM) is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth (" double data rate") interface, and has been in use since 2007. It is the higher-spee ...
memory controllers running at 1.33 GHz. The chip also integrates logic for chip-to-chip communications in a 5D torus configuration, with 2GB/s chip-to-chip links. The Blue Gene/Q chip is manufactured on IBM's copper SOI process at 45 nm. It delivers a peak performance of 204.8 GFLOPS at 1.6 GHz, drawing about 55 watts. The chip measures 19×19 mm (359.5 mm²) and comprises 1.47 billion transistors. The chip is mounted on a compute card along with 16 GB
DDR3 Double Data Rate 3 Synchronous Dynamic Random-Access Memory (DDR3 SDRAM) is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth (" double data rate") interface, and has been in use since 2007. It is the higher-spee ...
DRAM Dynamic random-access memory (dynamic RAM or DRAM) is a type of random-access semiconductor memory that stores each bit of data in a memory cell, usually consisting of a tiny capacitor and a transistor, both typically based on metal-oxid ...
(i.e., 1 GB for each user processor core). A Q32 compute drawer contains 32 compute cards, each water cooled. A "midplane" (crate) contains 16 Q32 compute drawers for a total of 512 compute nodes, electrically interconnected in a 5D torus configuration (4x4x4x4x2). Beyond the midplane level, all connections are optical. Racks have two midplanes, thus 32 compute drawers, for a total of 1024 compute nodes, 16,384 user cores and 16 TB RAM. Separate I/O drawers, placed at the top of a rack or in a separate rack, are air cooled and contain 8 compute cards and 8 PCIe expansion slots for
InfiniBand InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
or 10 Gigabit Ethernet networking.


Performance

At the time of the Blue Gene/Q system announcement in November 2011, an initial 4-rack Blue Gene/Q system (4096 nodes, 65536 user processor cores) achieved #17 in the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
list with 677.1 TeraFLOPS Linpack, outperforming the original 2007 104-rack BlueGene/L installation described above. The same 4-rack system achieved the top position in the
Graph500 The Graph500 is a rating of supercomputer systems, focused on data-intensive loads. The project was announced on International Supercomputing Conference in June 2010. The first list was published at the ACM/IEEE Supercomputing Conference in Novem ...
list with over 250 GTEPS (giga
traversed edges per second The number of traversed edges per second (TEPS) that can be performed by a supercomputer cluster is a measure of both the communications capabilities and computational power of the machine. This is in contrast to the more standard metric of floating ...
). Blue Gene/Q systems also topped the
Green500 The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precisi ...
list of most energy efficient supercomputers with up to 2.1  GFLOPS/W. In June 2012, Blue Gene/Q installations took the top positions in all three lists:
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
,
Graph500 The Graph500 is a rating of supercomputer systems, focused on data-intensive loads. The project was announced on International Supercomputing Conference in June 2010. The first list was published at the ACM/IEEE Supercomputing Conference in Novem ...
and
Green500 The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precisi ...
.


Installations

The following is an incomplete list of Blue Gene/Q installations. Per June 2012, the TOP500 list contained 20 Blue Gene/Q installations of 1/2-rack (512 nodes, 8192 processor cores, 86.35 TFLOPS Linpack) and larger. At a (size-independent) power efficiency of about 2.1 GFLOPS/W, all these systems also populated the top of the June 2012
Green 500 The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precisi ...
list. * A Blue Gene/Q system called Sequoia was delivered to the
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
(LLNL) beginning in 2011 and was fully deployed in June 2012. It is part of the
Advanced Simulation and Computing Program The Advanced Simulation and Computing Program (or ASC) is a super-computing program run by the National Nuclear Security Administration, in order to simulate, test, and maintain the United States nuclear stockpile. The program was created in 1995 ...
running nuclear simulations and advanced scientific research. It consists of 96 racks (comprising 98,304 compute nodes with 1.6 million processor cores and 1.6  PB of memory) covering an area of about . In June 2012, the system was ranked as the world's fastest supercomputer. at 20.1  PFLOPS peak, 16.32  PFLOPS sustained (Linpack), drawing up to 7.9
megawatts The watt (symbol: W) is the unit of power or radiant flux in the International System of Units (SI), equal to 1 joule per second or 1 kg⋅m2⋅s−3. It is used to quantify the rate of energy transfer. The watt is named after James Watt ...
of power. In June 2013, its performance is listed at 17.17  PFLOPS sustained (Linpack). * A 10 PFLOPS (peak) Blue Gene/Q system called ''
Mira Mira (), designation Omicron Ceti (ο Ceti, abbreviated Omicron Cet, ο Cet), is a red-giant star estimated to be 200–400 light-years from the Sun in the constellation Cetus. ο Ceti is a binary stellar system, consisting of a varia ...
'' was installed at
Argonne National Laboratory Argonne National Laboratory is a science and engineering research United States Department of Energy National Labs, national laboratory operated by University of Chicago, UChicago Argonne LLC for the United States Department of Energy. The facil ...
in th
Argonne Leadership Computing Facility
in 2012. It consist of 48 racks (49,152 compute nodes), with 70  PB of disk storage (470 GB/s I/O bandwidth). * ''JUQUEEN'' at the Forschungzentrum Jülich is a 28-rack Blue Gene/Q system, and was from June 2013 to November 2015 the highest ranked machine in Europe in the Top500. * ''Vulcan'' at
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
(LLNL) is a 24-rack, 5 PFLOPS (peak), Blue Gene/Q system that was commissioned in 2012 and decommissioned in 2019. Vulcan served Lab-industry projects through Livermore's High Performance Computing (HPC) Innovation Center as well as academic collaborations in support of DOE/National Nuclear Security Administration (NNSA) missions. * ''Fermi'' at the
CINECA Cineca is a non-profit consortium, made up of 69 Italian universities, 27 national public research centres, the Italian Ministry of Universities and Research (MUR) and the Italian Ministry of Education (MI), and was established in 1969 in Casalecc ...
Supercomputing facility, Bologna, Italy, is a 10-rack, 2 PFLOPS (peak), Blue Gene/Q system. * As part of
DiRAC Distributed Research using Advanced Computing (DiRAC) is an integrated supercomputing facility used for research in particle physics, astronomy and cosmology in the United Kingdom. DiRAC makes use of multi-core processors and provides a variety o ...
, the
EPCC EPCC, formerly the Edinburgh Parallel Computing Centre, is a supercomputing centre based at the University of Edinburgh. Since its foundation in 1990, its stated mission has been to ''accelerate the effective exploitation of novel computing th ...
hosts a 6 rack (6144-node) Blue Gene/Q system at the
University of Edinburgh The University of Edinburgh ( sco, University o Edinburgh, gd, Oilthigh Dhùn Èideann; abbreviated as ''Edin.'' in post-nominals) is a public research university based in Edinburgh, Scotland. Granted a royal charter by King James VI in 15 ...
* A five rack Blue Gene/Q system with additional compute hardware called ''AMOS'' was installed at Rensselaer Polytechnic Institute in 2013. The system was rated at 1048.6 teraflops, the most powerful supercomputer at any private university, and third most powerful supercomputer among all universities in 2014. * An 838 TFLOPS (peak) Blue Gene/Q system called ''Avoca'' was installed at the
Victorian Life Sciences Computation Initiative Melbourne Bioinformatics (formerly the Victorian Life Sciences Computation Initiative, VLSCI) is a centre for computational life science expertise. It provides bioinformatics support for all researchers and students in a wide range of projects an ...
in June, 2012. This system is part of a collaboration between IBM and VLSCI, with the aims of improving diagnostics, finding new drug targets, refining treatments and furthering our understanding of diseases. The system consists of 4 racks, with 350 TB of storage, 65,536 cores, 64 TB RAM. * A 209 TFLOPS (peak) Blue Gene/Q system was installed at the
University of Rochester The University of Rochester (U of R, UR, or U of Rochester) is a private research university in Rochester, New York. The university grants undergraduate and graduate degrees, including doctoral and professional degrees. The University of Roc ...
in July, 2012. This system is part of th
Health Sciences Center for Computational Innovation
, which is dedicated to the application of
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
to research programs in the
health sciences The following outline is provided as an overview of and topical guide to health sciences: Health sciences are those sciences which focus on health, or health care, as core parts of their subject matter. Health sciences relate to multiple ac ...
. The system consists of a single rack (1,024 compute nodes) with 400 TB of high-performance storage. * A 209 TFLOPS peak (172 TFLOPS LINPACK) Blue Gene/Q system called ''Lemanicus'' was installed at the EPFL in March 2013. This system belongs to the Center for Advanced Modeling Science CADMOS () which is a collaboration between the three main research institutions on the shore of the
Lake Geneva , image = Lake Geneva by Sentinel-2.jpg , caption = Satellite image , image_bathymetry = , caption_bathymetry = , location = Switzerland, France , coords = , lake_type = Glacial lak ...
in the French speaking part of Switzerland :
University of Lausanne The University of Lausanne (UNIL; french: links=no, Université de Lausanne) in Lausanne, Switzerland was founded in 1537 as a school of Protestant theology, before being made a university in 1890. The university is the second oldest in Switzer ...
,
University of Geneva The University of Geneva (French: ''Université de Genève'') is a public research university located in Geneva, Switzerland. It was founded in 1559 by John Calvin as a theological seminary. It remained focused on theology until the 17th centu ...
and EPFL. The system consists of a single rack (1,024 compute nodes) with 2.1  PB of IBM GPFS-GSS storage. * A half-rack Blue Gene/Q system, with about 100 TFLOPS (peak), called ''Cumulus'' was installed at A*STAR Computational Resource Centre, Singapore, at early 2011.


Applications

Record-breaking science applications have been run on the BG/Q, the first to cross 10
petaflops In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
of sustained performance. The cosmology simulation framework HACC achieved almost 14 petaflops with a 3.6 trillion particle benchmark run, while the Cardioid code, which models the electrophysiology of the human heart, achieved nearly 12 petaflops with a near real-time simulation, both on Sequoia. A fully compressible flow solver has also achieved 14.4 PFLOP/s (originally 11 PFLOP/s) on Sequoia, 72% of the machine's nominal peak performance.


See also

*
CNK operating system Compute Node Kernel (CNK) is the node level operating system for the IBM Blue Gene series of supercomputers.''Euro-Par 2004 Parallel Processing: 10th International Euro-Par Conference'' 2004, by Marco Danelutto, Marco Vanneschi and Domenico Lafore ...
* INK operating system *
Deep Blue (chess computer) Deep Blue was a chess-playing expert system run on a unique purpose-built IBM supercomputer. It was the first computer to win a game, and the first to win a match, against a reigning world champion under regular time controls. Developmen ...


References


External links


IBM Research: Blue Gene

Next generation supercomputers - Blue Gene/P overview (pdf)
{{IBM
Blue Gene Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
Blue Gene Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
Power microprocessors Parallel computing Transactional memory 32-bit computers 64-bit computers Lawrence Livermore National Laboratory