The National Center for Computational Sciences (NCCS) is a

United States Department of Energy The United States Department of Energy (DOE) is an executive department of the U.S. federal government that oversees U.S. national energy policy and manages the research and development of nuclear power and nuclear weapons in the United Stat ...

(DOE) Leadership Computing Facility that houses the

Oak Ridge Leadership Computing Facility The Oak Ridge Leadership Computing Facility (OLCF), formerly the National Leadership Computing Facility, is a designated user facility operated by Oak Ridge National Laboratory and the Department of Energy. It contains several supercomputers, th ...

(OLCF), a DOE Office of Science User Facility charged with helping researchers solve challenging scientific problems of global interest with a combination of leading

high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...

(HPC) resources and international expertise in scientific computing. The NCCS provides resources for calculation and simulation in fields including astrophysics, materials science, and climate research to users from government, academia, and industry who have many of the largest computing problems in science. The OLCF’s flagship supercomputer, the IBM AC922 Summit, is supported by advanced data management and analysis tools. The center hosted the Cray XK7 Titan system, one of the most powerful scientific tools of its time, from 2012 through its retirement in August 2019. The same year, construction began for Frontier, which is slated to debut as the OLCF’s first

exascale Exascale computing refers to computing systems capable of calculating at least "1018 IEEE 754 Double Precision (64-bit) operations (multiplications and/or additions) per second ( exa FLOPS)"; it is a measure of supercomputer performance. Exasca ...

system in 2021.

History

On December 9, 1991, Congress signed the High-Performance Computing Act (HPCA) of 1991, created by Senator

Al Gore Albert Arnold Gore Jr. (born March 31, 1948) is an American politician, businessman, and environmentalist who served as the 45th vice president of the United States from 1993 to 2001 under President Bill Clinton. Gore was the Democratic no ...

. HPCA proposed a national information infrastructure to build communications networks and databases and also called for proposals to build new high-performance computing facilities to serve science. On May 24, 1992, ORNL was awarded a high-performance computing research center called the Center for Computational Sciences, or CCS, as part of HPCA. ORNL also received a 66-processor, serial #1 Intel Paragon XP/S 5 for code development the same year. The system had a peak performance of 5 gigaflops (5 billion floating-point operations per second).

Oak Ridge National Laboratory Oak Ridge National Laboratory (ORNL) is a U.S. multiprogram science and technology national laboratory sponsored by the U.S. Department of Energy (DOE) and administered, managed, and operated by UT–Battelle as a federally funded research an ...

(ORNL) joined with three other national laboratories and seven universities to submit the Partnership in Computational Science (PICS) proposal to the US Department of Energy as part of the High-Performance Computing and Communications Initiative. With the High-End Computing Revitalization Act of 2004, CCS was tasked with carrying out the Leadership Computing Facility (LCF) Project at ORNL with the goal of developing and installing a petaflops-speed supercomputer by the end of 2008. The center officially changed its name from the Center for Computational Sciences to NCCS the same year. On December 9, 2019, Georgia Tourassi–who previously served as the director of ORNL's Health Data Sciences Institute and as group leader for ORNL’s Biomedical Sciences, Engineering, and Computing Group–was appointed to director of the NCCS, succeeding James Hack.

Previous Systems

Intel Paragons

The creation of the CCS in 1992 ushered in a series of Intel Paragon computers, including: * Intel Paragon XP/S 5 (1992): The Intel Paragon XP/S 5 provided 128 GP compute nodes arranged in a 16 row by 8 column rectangular mesh consisting of one 8 by 8 group of 16MB nodes and one 8 by 8 group of 32MB nodes. Also available were four 128MB MP compute nodes in a 2 row by 2 column mesh. In addition, there was the128 MB MP boot node, four 32MB GP service nodes and six I/O nodes, five of which were connected to 4.8 GB

RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...

disks and the sixth to a 16 GB RAID disk. This provided a total of 40 GB of system disk space. * Intel Paragon XP/S 35 (1992): The Intel Paragon XP/S 35 provided 512 compute processors arranged in a 16 row by 32 column rectangular mesh. In addition, there were five service nodes and 27 I/O nodes each connected to a 4.8 GB RAID disk. This provided a total of 130 GB of system disk space. Each of the five service nodes and the 512 compute nodes had 32MB of memory. * Intel Paragon XP/S 150 (1995): The fastest computer in the world at the time of its delivery to ORNL, the Intel Paragon XP/S 150 provided 1,024 nodes arranged in a 16 row by 64 column rectangular mesh. These were MP nodes, which meant there were two compute processors per node. Most of the nodes had 64MB, but 64 of the nodes had 128MB. In addition, there were five service nodes and 127 I/O nodes (119 regular I/O nodes and 4 high-performance SCSI-16 I/O nodes) each connected to a 4.8 GB RAID disk. This provided a total of 610 GB of system disk space.

Eagle (2000–2005)

Eagle was a 184-node

IBM RS/6000 The RISC System/6000 (RS/6000) is a family of RISC-based Unix servers, workstations and supercomputers made by IBM in the 1990s. The RS/6000 family replaced the IBM RT PC computer platform in February 1990 and was the first computer line to see ...

SP operated by the Computer Science and Mathematics Division of ORNL. It had 176 Winterhawk-II “thin” nodes, each with four 375 MHz

Power3 The POWER3 is a microprocessor, designed and exclusively manufactured by IBM, that implemented the 64-bit version of the PowerPC instruction set architecture (ISA), including all of the optional instructions of the ISA (at the time) such as ...

-II processors and 2GB of memory. Eagle also had eight Winterhawk-II “wide” nodes - each with two 375 MHz Power3-II processors and 2 GB of memory—for use as filesystem servers and other infrastructure tasks. Eagle’s estimated computational power was greater than 1 teraflop in the compute partition.

Falcon (2000)

Falcon was a 64-node Compaq

AlphaServer AlphaServer is a series of server computers, produced from 1994 onwards by Digital Equipment Corporation, and later by Compaq and HP. AlphaServers were based on the DEC Alpha 64-bit microprocessor. Supported operating systems for AlphaSe ...

SC operated by the CCS and acquired as part of an early-evaluation project. It had four 667 MHz Alpha EV67 processors with 2 GB of memory per node and 2 TB of Fiber Channel disk attached, resulting in an estimated computational power of 342 gigaflops.

Cheetah (2001–2008)

Cheetah was a 4.5 TF

IBM pSeries The IBM System p is a high-end line of RISC (Power)/UNIX-based servers. It was the successor of the RS/6000 line, and predecessor of the IBM Power Systems server series. History The previous RS/6000 line was originally a line of workstations an ...

System operated by the CCS. The compute partition of Cheetah included 27 p690 nodes, each with thirty-two 1.3 GHz

Power4 The POWER4 is a microprocessor developed by International Business Machines (IBM) that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, ena ...

processors. The login and I/O partitions together included 8 p655 nodes, each with four 1.7 GHz Power4 processors. All nodes were connected via IBM’s Federation interconnect. The Power4 memory hierarchy consisted of three levels of cache. The first and second levels were on the Power4 chip (two processors to a chip). Level-1 instruction cache was 128 KB (64 KB per processor) and the data cache was 64 KB (32 KB per processor.) The level-2 cache was 1.5 MB shared between the two processors. The level 3 cache was 32 MB and was off-chip. There were 16 chips per node, or 32 processors. Most of Cheetah’s compute nodes had 32 GB of memory. Five had 64 GB of memory and two had 128 GB of memory. Some of the nodes in Cheetah had approximately 160 GB of local disk space that could be used as temporary scratch space. In June 2002, Cheetah was ranked the eighth-fastest computer in the world, according to

TOP500 The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinci ...

, the semi-annual list of the world's top supercomputers.

Ram (2003–2007)

Ram was an

SGI Altix SGI may refer to: Companies *Saskatchewan Government Insurance *Scientific Games International, a gambling company *Silicon Graphics, Inc., a former manufacturer of high-performance computing products *Silicon Graphics International, formerly Rac ...

supercomputer provided as a support system for the NCCS. Ram was installed in 2003 and was used as a pre- and post-processing support system for allocated NCCS projects until 2007. Ram had 256 Intel Itanium2 processors running at 1.5 GHz, each with 6 MB of L3 cache, 256K of L2 cache, and 32K of L1 cache. Ram had 8 GB of memory per processor for a total of 2 TB of shared memory. By contrast, the first supercomputer at ORNL, the Cray XMP installed in 1985, had one-millionth the memory of the SGI Altix.

Phoenix (OLCF-1) (2003–2008)

Phoenix was a

Cray X1E The Cray X1 is a non-uniform memory access, vector processor supercomputer manufactured and sold by Cray Inc. since 2003. The X1 is often described as the unification of the Cray T90, Cray SV1, and Cray T3E architectures into a single machine. ...

provided as a primary system in NCCS. The original X1 was installed in 2003 and went through several upgrades, arriving at its final configuration in 2005. From October 2005 until 2008, it provided almost 17 million processor-hours. The system supported more than 40 large projects in research areas including climate, combustion, high energy physics, fusion, chemistry, computer science, materials science, and astrophysics. At its final configuration, Phoenix had 1,024 multistreaming vector processors (MSPs). Each MSP had 2 MB of cache and a peak computation rate of 18 gigaflops. Four MSPs formed a node with 8 GB of shared memory. Memory bandwidth was very high, roughly half the cache bandwidth. The interconnect functioned as an extension of the memory system, offering each node direct access to memory on other nodes at high bandwidth and low latency.

Jaguar (OLCF-2) (2005–2012)

Jaguar began as a 25-teraflop Cray XT3 in 2005. Later, it was upgraded to an XT4 containing 7,832 compute nodes, each containing a quad-core

AMD Opteron Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64 or AMD64). It was released on April 22, 2003, with the ''Sledge ...

1354 processor running at 2.1 GHz, 8 GB of DDR2-800 memory (some nodes used DDR2-667 memory), and a SeaStar2 router. The resulting partition contained 31,328 processing cores, more than 62 TB of memory, more than 600 TB of disk space, and a peak performance of 263 teraflops (263 trillion floating point operations per second). In 2008, Jaguar was upgraded to a Cray XT5 and became the first system to run a scientific application at a sustained petaflop. By the time of its ultimate transformation into Titan in 2012, Jaguar contained nearly 300,000 processing cores and had a theoretical performance peak of 3.3 petaflops. Jaguar had 224,256 x86-based AMD

Opteron Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64 or AMD64). It was released on April 22, 2003, with the ''Sledg ...

processor core A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...

s and operated with a version of

Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...

called the Cray Linux Environment. From November 2009 until November 2010, Jaguar was the world's most powerful computer.

Hawk (2006–2008)

Hawk was a 64-node Linux cluster dedicated to high-end visualization. Hawk was installed in 2006 and was used as the Center’s primary visualization cluster until May 2008 when it was replaced by a 512-core system named Lens. Each node contained two single-core Opteron processors and 2 GB of memory. The cluster was connected by a Quadrics Elan3 network, providing high-bandwidth and low-latency communication. The cluster was populated with two flavors of NVIDIA graphics cards connected with AGP8x: 5900 and QuadroFX 3000G. Nodes with 3000G cards were directly connected to the EVEREST PowerWall and were reserved for PowerWall use.

Ewok (2006–2011)

Ewok was an Intel-based

InfiniBand InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also use ...

cluster running Linux. The system was provided as an end-to-end resource for center users. It was used for workflow automation for jobs running from the Jaguar supercomputer and for advanced data analysis. The system contained 81 nodes. Each node contained two 3.4 GHz Pentium IV processors, a 3.4 GHz Intel Xeon central processing unit (CPU), and 6 GB of memory. An additional node contained 4 dual-core AMD processors and 64 GB of memory. The system was configured with a 13 TB Lustre file system for scratch space.

Eugene (2008–2011)

Eugene was a 27-teraflop IBM Blue Gene/P System operated by NCCS. It provided approximately 45 million processor-hours yearly for ORNL staff and for the promotion of research collaborations between ORNL and its core university partner members. The system consisted of 2,048 850Mhz IBM quad-core 450d PowerPC processors and 2 GB of memory per each node. Eugene had 64 I/O nodes; each submitted job was required to use at least one I/O node. This means that each job consumed a minimum of 32 nodes per execution. Eugene was officially decommissioned in October 2011. However, on December 13 of the same year, a portion of Eugene’s hardware was donated to Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory.

Eos (2013–2019)

Eos was a 736-node

Cray XC30 The Cray XC30 is a massively parallel multiprocessor supercomputer manufactured by Cray. It consists of Intel Xeon processors, with optional Nvidia Tesla or Xeon Phi accelerators, connected together by Cray's proprietary "Aries" interconnect, ...

cluster with a total of 47.104 TB of memory. Its processor was the Intel Xeon E5-2670. It featured 16 I/O service nodes and 2 external login nodes. Its compute nodes were organized in blades. Each blade contained 4 nodes. Every node had 2 sockets with 8 physical cores each. Intel’s HyperThreading (HT) Technology allowed each physical core to work as 2 logical cores so each node could function as if it had 32 cores. In total, the Eos compute partition contained 11,776 traditional processor cores (23,552 logical cores with HT Technology enabled). Eos provided a space for tool and application porting, small scale jobs to prepare capability runs on Titan, as well as software generation, verification, and optimization.

Titan (OLCF-3) (2012–2019)

Titan was a hybrid-architecture Cray XK7 system with a theoretical peak performance exceeding 27,000 trillion calculations per second (27 petaflops). It contained both advanced 16-core AMD Opteron CPUs and NVIDIA Kepler graphics processing units (GPUs). This combination allowed Titan to achieve 10 times the speed and 5 times the energy efficiency of its predecessor, the Jaguar supercomputer, while using only modestly more energy and occupying the same physical footprint. Titan featured 18,688 compute nodes, a total system memory of 710 TB, and Cray’s high-performance Gemini network. Its 299,008 CPU cores guided simulations and the accompanying GPUs handled hundreds of calculations simultaneously. The system provided decreased time to solution, increased complexity of models, and greater realism in simulations. In November 2012, Titan received the Number 1 position on the TOP500 supercomputer list. After 7 years of service, Titan was decommissioned in August 2019 to make room for the Frontier supercomputer.

Current Systems

Spider

The OLCF’s center-wide Lustre file system, called Spider, is the operational work file system for most OLCF computational resources. As an extremely high-performance system, Spider has over 20,000 clients, providing 32 PB of disk space, and it can move data at more than 1 TB/s. Spider comprises two filesystems, Atlas1 and Atlas2, in order to provide high availability and load balance across multiple metadata servers for increased performance.

HPSS

HPSS, ORNL’s archival mass-storage resource, consists of tape and disk storage components, Linux servers, and

High Performance Storage System High Performance Storage System (HPSS) is a flexible, scalable, policy-based Hierarchical Storage Management product developed by the HPSS Collaboration. It provides scalable hierarchical storage management (HSM), archive, and file system servi ...

(HPSS) software. Tape storage is provided by

StorageTek Storage Technology Corporation (StorageTek or STK, earlier STC) was a data storage technology company headquartered in Louisville, Colorado. New products include data retention systems, which it calls "information lifecycle management" (ILM). ...

SL8500 robotic tape libraries, each of which can hold up to 10,000 cartridges. Each library has 24 T10K-A drives, 60 T10K-B drives, 36 T10K-C drives, and 72 T10K-D drives.

EVEREST

EVEREST (Exploratory Visualization Environment for Research in Science and Technology) is a large-scale venue for data exploration and analysis. EVEREST measures 30 feet long by 8 feet tall, and its main feature is a 27-projector PowerWall with an aggregate pixel count of 35 million pixels. The projectors are arranged in a 9×3 array, each providing 3,500 lumens for a very bright display. Displaying 11,520 by 3,072 pixels, the wall offers a tremendous amount of visual detail. The wall is integrated with the rest of the computing center, creating a high-bandwidth data path between large-scale high-performance computing and large-scale data visualization. EVEREST is controlled by a 14-node cluster. Each node contains four dual-core AMD Opteron processors. These 14 nodes have NVIDIA QuadroFX 3000G graphics cards connected to the projectors, providing a very-high-throughput visualization capability. The visualization lab acts as an experimental facility for development of future

visualization Visualization or visualisation may refer to: * Visualization (graphics), the physical or imagining creation of images, diagrams, or animations to communicate a message * Data visualization, the graphic representation of data * Information visuali ...

capabilities. It houses a 12-panel tiled LCD display, test cluster nodes, interaction devices, and video equipment.

Rhea

Rhea is a 521-node, commodity-type Linux cluster. Rhea provides a conduit for large-scale scientific discovery via pre- and post-processing of simulation data generated on the Titan supercomputer. Each of Rhea’s first 512 nodes contain two 8-core 2.0 GHz Intel Xeon processors with Intel’s HT Technology and 128 GB of main memory. Rhea also has nine large memory GPU nodes. These nodes each have 1 TB of main memory and two NVIDIA K80 GPUs with two 14-core 2.30 GHz Intel Xeon processors with HT Technology. Rhea is connected to the OLCF’s high performance Lustre filesystem, Atlas.

Wombat

Wombat is a single-rack cluster from HPE based on the 64-bit ARM architecture instead of traditional x86-based architecture. This system is available to support computer science research projects aimed at exploring the ARM architecture. The Wombat cluster has 16 compute nodes, four of which have two AMD GPU accelerators attached (eight GPUs total in the system). Each compute node has two 28-core Cavium ThunderX2 processors, 256 GB RAM (16 DDR4 DIMMs) and a 480 GB SSD for node-local storage. Nodes are connected with EDR InfiniBand (~100 Gbit/s).

Summit (OLCF-4)

The IBM AC922 Summit, or OLCF-4, is ORNL’s 200-petaflop flagship supercomputer. Summit was originally launched in June 2018, and as of the November 2019 TOP500 list, is the fastest computer in the world with a High Performance Linpack (HPL) performance of 148.6 petaflops. Summit is also the first computer to reach

performance, achieving a peak throughput of 1.88 exaops through a mixture of single- and half-precision floating point operations. Like its predecessor Titan, Summit makes use of a hybrid architecture that integrates its 9,216

Power9 POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in ...

CPUs and 27,648 NVIDIA Volta V100 GPUs using NVIDIA’s NVLink. Summit features 4,608 nodes (nearly a quarter of Titan’s 18,688 nodes), each with 512 GB of Double Data Rate 4 Synchronous Dynamic Random-Access Memory (DDR4) and 96 GB of

High Bandwidth Memory High Bandwidth Memory (HBM) is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerator ...

(HBM2) per node, with a total storage capacity of 250 petabytes.

Frontier (OLCF-5)

Scheduled for delivery in 2021 with user access becoming available the following year, Frontier will be ORNL’s first sustainable exascale system, meaning it will be capable of performing one quintillion—one billion billion—operations per second. The system will be composed of more than 100 Cray Shasta cabinets with an anticipated peak performance around 1.5 exaflops.

Research areas

* Biology – With OLCF supercomputing resources, researchers can use knowledge of the molecular scale to develop new drugs and medical therapies, study complex biological systems, and model gene regulation. * Chemistry – Supercomputers like Summit can explore the intricacies of matter at the atomic level, allowing for first principles discoveries and detailed molecular models. * Computer Science – Researchers are developing the tools necessary to evaluate a range of supercomputing systems, with the goals of discovering how best to use each, how to find the best fit for any given application, and how to tailor applications to get the best performance. * Earth Science – High performance computing allows for large scale computation of complex environmental and geographical systems, and NCCS researchers use this information to better understand the changes in Earth's climate brought on by global warming. * Engineering – OLCF resources like Summit are being used for engineering applications such as simulations of gas turbines and combustion engines. * Fusion – Understanding the behavior of fusion plasmas and simulating various device aspects gives researchers insight into the construction of ITER, a prototype fusion power plant. * Materials Science – Research into materials science at ORNL has aimed at improving various areas of modern life, from power generation and transmission to transportation to the production of faster, smaller, more versatile computers and storage devices. * Nuclear Energy – The development of new nuclear reactors that employ advanced fuel cycles and adhere to modern safety and nonproliferation constraints requires complex modelling and simulations. Often, the complexity of these simulations necessitates the use of supercomputers that can ensure accuracy of models. * Physics – Physicists use NCCS’s high performance computing power to reveal the fundamental nature of matter, including the behavior of quarks, electrons, and other fundamental particles that make up atoms.

References

External links

*
The Oak Ridge Leadership Computing Facility siteThe website for Oak Ridge National Laboratory
{{authority control United States Department of Energy Oak Ridge National Laboratory Supercomputer sites Government agencies established in 1992 1992 establishments in Tennessee