Cray-3
   HOME

TheInfoList



OR:

The Cray-3 was a
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructio ...
,
Seymour Cray Seymour Roger Cray (September 28, 1925 – October 5, 1996
) was an American
Cray-2. The system was one of the first major applications of
gallium arsenide Gallium arsenide (GaAs) is a III-V direct band gap semiconductor with a zinc blende crystal structure. Gallium arsenide is used in the manufacture of devices such as microwave frequency integrated circuits, monolithic microwave integrated c ...
(GaAs) semiconductors in computing, using hundreds of custom built ICs packed into a
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
. The design goal was performance around 16
GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
, about 12 times that of the Cray-2. Work started on the Cray-3 in 1988 at
Cray Research Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed ...
's (CRI) development labs in Chippewa Falls, Wisconsin. Other teams at the lab were working on designs with similar performance. To focus the teams, the Cray-3 effort was moved to a new lab in
Colorado Springs, Colorado Colorado Springs is a home rule municipality in, and the county seat of, El Paso County, Colorado, United States. It is the largest city in El Paso County, with a population of 478,961 at the 2020 United States Census, a 15.02% increase since ...
later that year. Shortly thereafter, the corporate headquarters in
Minneapolis Minneapolis () is the largest city in Minnesota, United States, and the county seat of Hennepin County. The city is abundant in water, with thirteen lakes, wetlands, the Mississippi River, creeks and waterfalls. Minneapolis has its origin ...
decided to end work on the Cray-3 in favor of another design, the
Cray C90 The Cray C90 series (initially named the Y-MP C90) was a vector processor supercomputer launched by Cray Research in 1991. The C90 was a development of the Cray Y-MP architecture. Compared to the Y-MP, the C90 processor had a dual vector pipeline ...
. In 1989 the Cray-3 effort was spun off to a newly formed company,
Cray Computer Corporation Seymour Roger Cray (September 28, 1925 – October 5, 1996
) was an American
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
, cancelled their order in 1991 and a number of company executives left shortly thereafter. The first machine was finally ready in 1993, but with no launch customer, it was instead loaned as a demonstration unit to the nearby National Center for Atmospheric Research in
Boulder In geology, a boulder (or rarely bowlder) is a rock fragment with size greater than in diameter. Smaller pieces are called cobbles and pebbles. While a boulder may be small enough to move or roll manually, others are extremely massive. In ...
. The company went bankrupt in May 1995, and the machine was officially decommissioned. With the delivery of the first Cray-3,
Seymour Cray Seymour Roger Cray (September 28, 1925 – October 5, 1996
) was an American
Cray-4 design, but the company went bankrupt before it was completely tested. The Cray-3 was Cray's last completed design; with CCC's bankruptcy, he formed SRC Computers to concentrate on parallel designs, but died in a car accident in 1996 before this work was delivered.


History


Background

Seymour Cray began the design of the Cray-3 in 1985, as soon as the Cray-2 reached production. Cray generally set himself the goal of producing new machines with ten times the performance of the previous models. Although the machines did not always meet this goal, this was a useful technique in defining the project and clarifying what sort of process improvements would be needed to meet it. For the Cray-3, he decided to set an even higher performance improvement goal, an increase of 12x over the Cray-2. Cray had always attacked the problem of increased speed with three simultaneous advances; more
execution unit In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...
s to give the system higher parallelism, tighter packaging to decrease signal delays, and faster components to allow for a higher clock speed. Of the three, Cray was normally least aggressive on the last; his designs tended to use components that were already in widespread use, as opposed to leading-edge designs. For the Cray-2, he introduced a novel 3D-packaging system for its
integrated circuit An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece (or "chip") of semiconductor material, usually silicon. Large numbers of tiny ...
s to allow higher densities, and it appeared that there was some room for improvement in this process. For the new design, he stated that all wires would be limited to a maximum length of . This would demand the processor be able to fit into a block, about that of the Cray-2 CPU. This would not only increase performance but make the system 27 times smaller. For a 12x performance increase, the packaging alone would not be enough, the circuits on the chips themselves would also have to speed up. The Cray-2 appeared to be pushing the limits of the speed of
silicon Silicon is a chemical element with the symbol Si and atomic number 14. It is a hard, brittle crystalline solid with a blue-grey metallic luster, and is a tetravalent metalloid and semiconductor. It is a member of group 14 in the periodic ...
-based
transistor upright=1.4, gate (G), body (B), source (S) and drain (D) terminals. The gate is separated from the body by an insulating layer (pink). A transistor is a semiconductor device used to Electronic amplifier, amplify or electronic switch, switch ...
s at 4.1 ns (244 MHz), and it did not appear that anything more than another 2x would be possible. If the goal of 12x was to be met, more radical changes would be needed, and a "high tech" approach would have to be used. Cray had intended to use
gallium arsenide Gallium arsenide (GaAs) is a III-V direct band gap semiconductor with a zinc blende crystal structure. Gallium arsenide is used in the manufacture of devices such as microwave frequency integrated circuits, monolithic microwave integrated c ...
circuitry in the Cray-2, which would not only offer much higher switching speeds but also used less energy and thus ran cooler as well. At the time the Cray-2 was being designed, the state of GaAs manufacturing simply was not up to the task of supplying a supercomputer. By the mid-1980s, things had changed and Cray decided it was the only way forward. Given a lack of investment on the part of large chip makers, Cray decided to invest in a GaAs chipmaking startup, GigaBit Logic, and use them as an internal supplier. Describing the system in November 1988, Cray stated that the 12 times performance increase would be made up of a three times increase due to GaAs circuits, and four times due to the use of more processors. One of the problems with the Cray-2 had been poor multiprocessing performance due to limited bandwidth between the processors, and to address this the Cray-3 would adopt the much faster architecture used in the
Cray Y-MP The Cray Y-MP was a supercomputer sold by Cray Research from 1988, and the successor to the company's X-MP. The Y-MP retained software compatibility with the X-MP, but extended the address registers from 24 to 32 bits. High-density VLSI ECL tec ...
. This would provide a design performance of 8000 MIPS, or 16
GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
.


Development

The Cray-3 was originally slated for delivery in 1991. This was during a time when the supercomputer market was rapidly shrinking from 50% annual growth in 1980, to 10% in 1988. At the same time, Cray Research was also working on the Y-MP, a faster multi-processor version of the system architecture tracing its ancestry to the original
Cray-1 The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, over 100 Cray-1s were sold, making it one of the ...
. In order to focus the Y-MP and Cray-3 groups, and with Cray's personal support, the Cray-3 project moved to a new research center in
Colorado Springs Colorado Springs is a home rule municipality in, and the county seat of, El Paso County, Colorado, United States. It is the largest city in El Paso County, with a population of 478,961 at the 2020 United States Census, a 15.02% increase since ...
. By 1989, the Y-MP was starting deliveries, and the main CRI lab in Chippewa Falls, Wisconsin, moved on to the C90, a further improvement in the Y-MP series. With only 25 Cray-2s sold, management decided that the Cray-3 should be put on "low priority" development. In November 1988, the Colorado Springs lab was spun off as
Cray Computer Corporation Seymour Roger Cray (September 28, 1925 – October 5, 1996
) was an American
As CRI retained the lease on the original building, the new company had to move once again, introducing further delays. By 1991, development was behind schedule. Development slowed even more when
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
cancelled its order for the first machine, in favor of the C90. Several executives, including the CEO, left the company. The company then announced they would be looking for a customer that needed a smaller version of the machine, with four to eight processors. The first (and only) production model (serial number S5, named ''Graywolf'') was loaned to
NCAR The US National Center for Atmospheric Research (NCAR ) is a US federally funded research and development center (FFRDC) managed by the nonprofit University Corporation for Atmospheric Research (UCAR) and funded by the National Science Foundat ...
as a demonstration system in May 1993. NCAR's version was configured with 4 processors and a 128 MWord (64-bit words, 1 GB) common memory. In service, the
static RAM Static random-access memory (static RAM or SRAM) is a type of random-access memory (RAM) that uses latching circuitry (flip-flop) to store each bit. SRAM is volatile memory; data is lost when power is removed. The term ''static'' differe ...
proved to be problematic. It was also discovered that the
square root In mathematics, a square root of a number is a number such that ; in other words, a number whose '' square'' (the result of multiplying the number by itself, or  ⋅ ) is . For example, 4 and −4 are square roots of 16, because . ...
code contained a bug that resulted in 1 in 60 million calculations being wrong. Additionally, one of the four CPUs was not running reliably. CCC declared bankruptcy in March 1995, after spending about $300 million of financing. NCAR's machine was officially decommissioned the next day. Seven system cabinets, or "tanks", serial numbers S1 to S7, were built for Cray-3 machines. Most were for smaller two-CPU machines. Three of the smaller tanks were used on the Cray-4 project, essentially a Cray-3 with 64 faster CPUs running at 1 ns (1 GHz) and packed into an even smaller space. Another was used for the Cray-3/SSS project. The failure of the Cray-3 was in large part due to the changing political and technical climate. The machine was being designed during the collapse of the
Warsaw Pact The Warsaw Pact (WP) or Treaty of Warsaw, formally the Treaty of Friendship, Cooperation and Mutual Assistance, was a collective defense treaty signed in Warsaw, Poland, between the Soviet Union and seven other Eastern Bloc socialist republi ...
and ending of the
cold war The Cold War is a term commonly used to refer to a period of geopolitical tension between the United States and the Soviet Union and their respective allies, the Western Bloc and the Eastern Bloc. The term '' cold war'' is used because t ...
, which led to a massive downsizing in supercomputer purchases. At the same time, the market was increasingly investing in
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of th ...
(MP or MPP) designs. Cray was critical of this approach, and was quoted by ''
The Wall Street Journal ''The Wall Street Journal'' is an American business-focused, international daily newspaper based in New York City, with international editions also available in Chinese and Japanese. The ''Journal'', along with its Asian editions, is published ...
'' as saying that MPP systems had not yet proven their supremacy over vector computers, noting the difficulty many users have had programming for large parallel machines. "I don't think they'll ever be universally successful, at least not in my lifetime".


Architecture


Logical design

The Cray-3 system architecture comprised a ''foreground processing system'', up to 16 ''background processors'' and up to 2 gigawords (16 GB) of ''common memory''. The foreground system was dedicated to
input/output In computing, input/output (I/O, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, possibly a human or another information processing system. Inputs are the signals ...
and system management. It included a 32-bit processor and four synchronous data channels for
mass storage In computing, mass storage refers to the storage of large amounts of data in a persisting and machine-readable fashion. In general, the term is used as large in relation to contemporaneous hard disk drives, but it has been used large in relati ...
and network devices, primarily via HiPPI channels. Each background processor consisted of a ''computation section'', a ''control section'' and ''local memory''. The computation section performed
64-bit In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit CPUs and ALUs are those that are based on processor registers, address buses, or data buses of that size. A ...
scalar,
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can ...
and vector arithmetic. The control section provided instruction buffers, memory management functions, and a
real-time clock A real-time clock (RTC) is an electronic device (most often in the form of an integrated circuit) that measures the passage of time. Although the term often refers to the devices in personal computers, servers and embedded systems, RTCs are pr ...
. 16 kwords (128 kbytes) of high-speed local memory was incorporated into each background processor for use as temporary scratch memory. Common memory consisted of silicon
CMOS Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss", ) is a type of metal–oxide–semiconductor field-effect transistor (MOSFET) fabrication process that uses complementary and symmetrical pairs of p-type and n-type MOSF ...
SRAM, organized into ''octants'' of 64 banks each, with up to eight octants possible. The
word size In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word s ...
was 64-bits plus eight error-correction bits, and total memory bandwidth was rated at 128 gigabytes per second.


CPU design

As with previous designs, the core of the Cray-3 consisted of a number of
modules Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a s ...
, each containing several circuit boards packed with parts. In order to increase density, the individual
GaAs Gallium arsenide (GaAs) is a III-V direct band gap semiconductor with a zinc blende crystal structure. Gallium arsenide is used in the manufacture of devices such as microwave frequency integrated circuits, monolithic microwave integrat ...
chips were not
packaged Packaging is the science, art and technology of enclosing or protecting products for distribution, storage, sale, and use. Packaging also refers to the process of designing, evaluating, and producing packages. Packaging can be described as a ...
, and instead several were mounted directly with ultrasonic gold bonding to a board approximately square. The boards were then turned over and mated to a second board carrying the electrical wiring, with wires on this card running through holes to the "bottom" (opposite the chips) side of the chip carrier where they were bonded, hence sandwiching the chip between the two layers of board. These ''submodules'' were then stacked four-deep and, as in the Cray-2, wired to each other to make a 3D circuit. Unlike the Cray-2, the Cray-3 modules also included
edge connector An edge connector is the portion of a printed circuit board (PCB) consisting of traces leading to the edge of the board that are intended to plug into a matching socket. The edge connector is a money-saving device because it only requires a si ...
s. Sixteen such submodules were connected together in a 4×4 array to make a single module measuring . Even with this advanced packaging the circuit density was low even by 1990s standards, at about 96,000 gates per cubic inch. Modern CPUs offer gate counts of millions per square inch, and the move to 3D circuits was still just being considered . Thirty-two such modules were then stacked and wired together with a mass of twisted-pair wires into a single processor. The basic cycle time was 2.11 ns, or 474 MHz, allowing each processor to reach about 0.948
GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
, and a 16 processor machine a theoretical 15.17 GFLOP. Key to the high performance was the high-speed access to main memory, which allowed each process to burst up to 8 GB/s.


Mechanical design

The modules were held together in an aluminum chassis known as a "brick". The bricks were immersed in liquid
fluorinert Fluorinert is the trademarked brand name for the line of electronics coolant liquids sold commercially by 3M. As perfluorinated compounds (PFCs), all Fluorinert variants have an extremely high Global Warming Potential (GWP), so should be used wit ...
for cooling, as in the Cray-2. A four-processor system with 64 memory modules dissipated about 88 kW of power. The entire four-processor system was about tall and front-to-back, and a little over wide. For systems with up to four processors, the processor assembly sat under a translucent bronzed acrylic cover at the top of a cabinet wide, deep and high, with the memory below it, and then the power supplies and cooling systems on the bottom. Eight and 16-processors system would have been housed in a larger octagonal cabinet. All in all, the Cray-3 was considerably smaller than the Cray-2, itself relatively small compared to other supercomputers. In addition to the system cabinet, a Cray-3 system also needed one or two (depending on number of processors) ''system control pods'' (or "C-Pods"), square and high, containing power and cooling control equipment.


System configurations

The following possible Cray-3 configurations were officially specified:


Software

The Cray-3 ran the Colorado Springs Operating System (''CSOS'') which was based upon Cray Research's
UNICOS UNICOS is a range of Unix and after it Linux operating system (OS) variants developed by Cray for its supercomputers. UNICOS is the successor of the Cray Operating System (COS). It provides network clustering and source code compatibility lay ...
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
version 5.0. A major difference between CSOS and UNICOS was that CSOS was ported to standard C with all PCC extensions that were used in UNICOS removed. Much of the software available under the Cray-3 was derived from Cray Research and included for instance the
X Window System The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems. X provides the basic framework for a GUI environment: drawing and moving windows on the display device and interacting wi ...
, vectorizing FORTRAN and C compilers, NFS and a
TCP/IP The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the su ...
stack.


References


Citations


Bibliography

* * * * * *


External links


Digibarn's Cray-3 Modules



Cray-2 and −3 instruction setsarchived)
{{Cray computers 3 Vector supercomputers