S-1, short for Stanford-1, was a

supercomputer A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...

designed at

Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a Federally funded research and development centers, federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now i ...

(LLNL) by Lowell Wood's "O-group" beginning in 1975. It was developed primarily by the engineering department at

Stanford University Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...

while the MIT AI-lab designed its Amber

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

. Funding was provided by the

US Navy The United States Navy (USN) is the naval warfare, maritime military branch, service branch of the United States Department of Defense. It is the world's most powerful navy with the largest Displacement (ship), displacement, at 4.5 millio ...

. The basic design used a core "uniprocessor" design that could be connected together in a multiprocessor configuration. Early designs supported up to sixteen uniprocessors connected together using a

crossbar switch In electronics and telecommunications, a crossbar switch (cross-point switch, matrix switch) is a collection of switches arranged in a Matrix (mathematics), matrix configuration. A crossbar switch has multiple input and output lines that form a ...

to sixteen memory banks up to 1 GiB each. The uniprocessors also had cache memory to reduce the number of trips through the switch, and there was a separate system for quickly passing small amounts of data between the processors. The immediate goal was to produce a single-processor machine with the performance of the

CDC 7600 The CDC 7600 was designed by Seymour Cray to be the successor to the CDC 6600, extending Control Data Corporation, Control Data's dominance of the supercomputer field into the 1970s. The 7600 ran at 36.4 MHz (27.5 ns clock cycle) and had ...

for much lower cost. This would be quickly followed by one with 16 faster processors, each with the performance of the

Cray-1 The Cray-1 was a supercomputer designed, manufactured and marketed by Cray Research. Announced in 1975, the first Cray-1 system was installed at Los Alamos National Laboratory in 1976. Eventually, eighty Cray-1s were sold, making it one of the ...

and an aggregate machine performance about 10 times the Cray. This would be followed by process-shrinks, culminating in the Mark V design, planned for 1985, that would be a "supercomputer on a wafer". The single-processor Mark I was completed in 1978, but the follow-up multiprocessor Mark IIA was repeatedly delayed until around 1985. The IIA proved to be highly unreliable and was abandoned after about a year of use. None of the later generations of the design were built, and the S-1 project ended in 1988. The only lasting legacy of the project was the

CAD Computer-aided design (CAD) is the use of computers (or ) to aid in the creation, modification, analysis, or optimization of a design. This software is used to increase the productivity of the designer, improve the quality of design, improve c ...

program used to design it, known as

SCALD The structured computer-aided logic design (SCALD) software was a computer aided design system developed for building the S-1 computer. It used the Stanford University Drawing System (SUDS), and it was developed by Thomas M. McWilliams and Lawrence ...

, which became a successful 3rd party product.

History

Background

Lowell Wood was a physicist at LLNL and protege of

Edward Teller Edward Teller (; January 15, 1908 – September 9, 2003) was a Hungarian and American Theoretical physics, theoretical physicist and chemical engineer who is known colloquially as "the father of the hydrogen bomb" and one of the creators of ...

. Wood ran the speculative "O-group" within the lab, which was not tied specifically to weapons design. In the early 1970s, Wood noticed that other branches of the military were not using computers in places where he felt they could be useful. In particular, he noticed that the

SOSUS Sound Surveillance System (SOSUS) was the original name for a submarine detection system based on passive sonar developed by the United States Navy to track Soviet Navy, Soviet submarines. The system's true nature was classified with the name a ...

system was gathering much more information than they had the ability to process. The high cost of a computer capable of processing the data, around $10 million (), was too high in an era of tightening military budgets for it to be practical. Wood's concern with contemporary designs was that they were built in a way that made it difficult for them to adapt to the latest technologies in the chip fabrication world. He proposed a concept specifically to take advantage of these advances as they became available. The idea was to use a simplified CPU design that could be implemented with

medium scale integration An integrated circuit (IC), also known as a microchip or simply chip, is a set of electronic circuits, consisting of various electronic components (such as transistors, resistors, and capacitors) and their interconnections. These components a ...

chips, then move to

large scale integration An integrated circuit (IC), also known as a microchip or simply chip, is a set of electronic circuits, consisting of various electronic components (such as transistors, resistors, and capacitors) and their interconnections. These components ...

(LSI) and finally to a single-chip implementation. Ultimately, the goal was to combine a number of these single-chip designs along with memory onto a single wafer scale integration supercomputer. Wood was an interviewer for the

Hertz Foundation The Fannie and John Hertz Foundation is an American non-profit organization that awards fellowships to Ph.D. students in the applied physical, biological and engineering sciences. The fellowship begins with up to $250,000 of financial support ...

scholarships, which put him in touch with many of the brightest students in many fields. Wood would often offer Hertz applicants summer jobs at Livermore to those in applicable fields. Through this process, Wood hired Curt Widdoes in 1973. In 1975, Widdoes was using his Hertz scholarship to pay his way through the Ph.D. program in

computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, where he was working on the Minerva multiprocessor system. During the summer of 1975, Tom McWilliams was also given a summer job at the lab. Wood called Widdoes, introduced the two and convinced them to begin the design of a supercomputer. Wood sold the concept to the Navy who funded development. The Livermore Computer Center, the organization within Livermore who provided computing support, was upset by Wood's S-1 to the point of open hostility. The Center had been leading the development of supercomputer systems with a variety of commercial vendors, notably

Control Data Corporation Control Data Corporation (CDC) was a mainframe and supercomputer company that in the 1960s was one of the nine major U.S. computer companies, which group included IBM, the Burroughs Corporation, and the Digital Equipment Corporation (DEC), the N ...

(CDC) and

Cray Inc Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed i ...

, and had recently been involved in the aborted development of the Westinghouse SOLOMON, the first massively parallel computer. When SOLOMON ended, they funded development of the

CDC Star-100 The CDC STAR-100 is a vector supercomputer that was designed, manufactured, and marketed by Control Data Corporation (CDC). It was one of the first machines to use a vector processor to improve performance on appropriate scientific applications. I ...

, the first vector computer.

SCALD

Widdoes had used the SUDS (Stanford University Drawing System) while working on the Minerva design. SUDS allowed you to lay out individual chips on-screen and the provide a list of connections between them which it would then plot as a complete

schematic diagram A schematic, or schematic diagram, is a designed representation of the elements of a system using abstract, graphic symbols rather than realistic pictures. A schematic usually omits all details that are not relevant to the key information the sc ...

. For the S-1 project, Widdoes expanded on this system to produce SCALD, short for Structured Computer-Aided Logic Design. The basic idea behind SCALD was to build SUDS models out of other SUDS models in a programmed fashion. For instance, an individual adder circuit in a design might consist of five LSI chips connected together with various wires. The uniprocessor's

arithmetic logic unit In computing, an arithmetic logic unit (ALU) is a Combinational logic, combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on ...

(ALU) might contain 36 of these adders. Using SCALD, the adder could be designed once, and then imported 36 times into the ALU design, and then the ALU could be imported into the machine's overall design. In this fashion, the design could be worked on in a hierarchical fashion, and corrections to the macros would work their way up through the entire design. The system later added the ability to take the designs and produce a list of instructions for an automated

wire wrap Close-up of a wire-wrap connection Typical wire wrap construction of crossbar_switch.html" ;"title="Bell System telephone crossbar switch">Bell System telephone crossbar switch. Some types of connection were soldered. Wire wrap is an electron ...

machine, allowing SCALD to be used from initial design to actual physical boards. During the early design stages, Widdoes and McWilliams had to use borrowed time on the

DEC PDP-10 Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especi ...

at the

Stanford Artificial Intelligence Laboratory Stanford University has many centers and institutes dedicated to the study of various specific topics. These centers and institutes may be within a department, within a school but across departments, an independent laboratory, institute or center ...

(SAIL). They were given time by John McCarthy, who allowed them to use it from 5 AM until the "real users" showed up at 9. The wrote the system in the

Pascal programming language Pascal is an imperative and procedural programming language, designed by Niklaus Wirth as a small, efficient language intended to encourage good programming practices using structured programming and data structuring. It is named after French ...

, with the aim of allowing it to be ported to other machines like the IBM 370. At the time, only Pascal and FORTRAN were highly portable, and FORTRAN was deemed unsuitable.

Mark I

The first generation S-1, Mark 1, was completed in 1978. The core logic was deliberately based on the PDP-10, using an expanded version of the PDP-10

instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...

. Like the PDP-10, the Mark I used a 36-bit word length with 9-bit bytes. The FPU had three formats, 18-bit halfword, 36-bit single word, and 72-bit double word. There were 32 registers, mapped onto the first 32 words of memory. Register 3 (R3) was the

program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, ...

. Lacking support for

virtual memory In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...

address translation, a key concept was the use of 18-bit relative pointers which offset from the current address. These could be used as pointers into blocks of memory who's base address might move but who's relative addresses would not. The CPU included a simple branch prediction unit, a system that allows the CPU to guess which side of a branch will be taken and begin processing those instructions before the test has completed. This is a key concept in processor pipelining that adds parallelism to the CPU. The Mark I's system was based on two status bits, the "prediction bit" and the "dynamic reverse bit". The reverse bit was set if the branch prediction failed, that is, the CPU guessed the wrong outcome. If this occurred a second time, the reverse bit would already be set, in which case it was cleared and the prediction bit was set. This caused the processor to flush the pipeline. The system also relied on large, for the era, caches to reduce the need to fetch from

main memory Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processin ...

. Each processor had a 4 kW data cache and a separate 4 kW instruction cache. In a multi-processor machines all the processors shared access to a common main memory which could be up to 16 GB, although any one processor could only access 1 GB bank at a time. In contrast to the other high-performance machines that were emerging at the same time, like the Star-100 and Cray-1, the S-1 did not have an explicit

vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...

and lacked a scatter/gather system. Math was entirely discrete, although it did include several unusual hardware implementations like single-instruction trigonometry, matrix transpose, and even a

Fourier transform In mathematics, the Fourier transform (FT) is an integral transform that takes a function as input then outputs another function that describes the extent to which various frequencies are present in the original function. The output of the tr ...

. The processor was constructed using chips from the ECL-10k family, which used

emitter-coupled logic In electronics, emitter-coupled logic (ECL) is a high-speed integrated circuit bipolar transistor logic family. ECL uses a bipolar junction transistor (BJT) differential amplifier with single-ended input and limited emitter current to avoid th ...

transistors. Although the system was designed to have as many as 16 processors, the Mark I was built with only one. This required 5,300 chips, arranged into twelve boards, each about 18 by 24 inches. The boards were arranged on either side of three vertical cabinets, known as "pages". The pages were hinged along the back so they could be closed up book-like into a relatively compact form, while still being able to be opened for servicing. Like the

CDC The Centers for Disease Control and Prevention (CDC) is the national public health agency of the United States. It is a United States federal agency under the Department of Health and Human Services (HHS), and is headquartered in Atlanta, ...

designs, the S-1 used I/O processors (IOP, or

channel controller In computing, channel I/O is a high-performance input/output (I/O) architecture that is implemented in various forms on a number of computer architectures, especially on mainframe computers. In the past, channels were generally implemented with cu ...

s) to handle input and output. As most devices worked on 8-bit bytes, the IOPs also handled translation from the 9-bit to 8-bit formats and back again, as well and dealing with

endianness file:Gullivers_travels.jpg, ''Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of digital data are transmitted over a data comm ...

. The original IOP was simply a custom programmed DEC PDP-11 connected to the S-1 using

Unibus The Unibus was the earliest of several computer bus (computing), bus and backplane designs used with PDP-11 and early VAX systems manufactured by the Digital Equipment Corporation (DEC) of Maynard, Massachusetts, Maynard, Massachusetts. The Uni ...

. The goal was to have the single-CPU machine match the performance of the

, the fastest machine available at the time. Benchmarks placed it at about of that, around 10 MIPS. This was respectable performance for a machine of its size, and especially cost. Although it did not meet its original performance goal, it did meet the goal of producing a useful high-performance machine for much less cost, and was a success in that regard.

Mark IIA

Work began on the second generation machine, Mark II, in the fall of 1978. The expected delivery of a working multiprocessor version was some time in 1983. The design of the Mark IIA was done using an updated version of SCALD running on the Mark I. This also included the addition of the Timing Verifier module, which could run simulations of the signals travelling through the circuitry to ensure there was enough time for the inputs to become valid. It also included the fully-automated wire wrap system, which was built due to problems with manual wiring leaving bits of metal in the dense mats of

twisted pair Twisted pair cabling is a type of communications cable in which two conductors of a single circuit are twisted together for the purposes of improving electromagnetic compatibility. Compared to a single conductor or an untwisted balanced ...

wires that would sometimes pierce the insulation on surrounding wires. The main team on the Mk. II started with Widdoes and McWilliams, but they were joined by Mike Farmwald and Jeff Rubin. The team in total also included a host of technicians actually building the machine; about 20 to 30 people in total. The logic design itself required the team of four and took two and a half years to complete. During this process, Widdoes and McWilliams left the project to commercialize SCALD at their new company,

Valid Logic Systems Cadence Design Systems, Inc. (stylized as cādence)Investor's Business DailCEO Lip-Bu Tan Molds Troubled Cadence Into Long-Term LeaderRetrieved November 12, 2020 is an American multinational technology and computational software company. Headqua ...

. One simple change to the Mark II was to move from the ECL-10k to ECL-100k chips, providing 15 MIPS due to increased clock speeds. The design also added 4 kW caches for instruction and 16k cache for data to improve memory performance. The main change was a greatly expanded

floating point unit A floating-point unit (FPU), numeric processing unit (NPU), colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multipli ...

(FPU) with support for

transcendental function In mathematics, a transcendental function is an analytic function that does not satisfy a polynomial equation whose coefficients are functions of the independent variable that can be written using only the basic operations of addition, subtraction ...

s, as well as the first implementation of vector instructions. These included a hardware implementation of a

Fast Fourier Transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...

solver. The vector instructions were memory-memory, like the Star-100 but unlike the Cray-1's register-register architecture, and it lacked a scatter/gather unit, so it worked best on large contiguous data sets. This greatly expanded uniprocessor functionality required 64 boards to implement, with a total of 25,000 ECL-100k chips. The effort was continually pushed back, missing several completion dates mentioned at conferences or reports, and did not become fully functional by 1985, and only in a single-processor form. While it did meet its goal of matching performance of the Cray-1, by this time much faster computers were available, including the

Cray-2 The Cray-2 is a supercomputer with four vector processors made by Cray Research starting in 1985. At 1.9 GFLOPS peak performance, it was the fastest machine in the world when it was released, replacing the Cray X-MP in that spot. It was, ...

. The machine proved to be very difficult to keep operating due to the enormous number of wire wrap connections, leading to failures roughly every week. These would require the machine to be opened up and the offending wrap fixed. After about a year, the team tired of the constant maintenance and the machine was abandoned. Additional processors were apparently under construction, but it is not clear if any were completed or connected to the first.

Mark IIB/AAP

At some point during the construction of the Mark IIA, attention turned to an improved Mark IIB, which was later known as the "Advanced Architecture Processor", or AAP. This version abandoned the original PDP-10-like instruction set and moved to one that was

RISC In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a comp ...

-like. It also replaced the original crossbar switch with a ring-like design that allowed up to 256 processors on the bus. It is not clear how much actual work on the AAP was carried out, and no complete machine was ever built. By the time it was being designed, many of the original S-1 team had left for industry, where RISC designs were starting to come to market. The R2000, released in January 1986, offered roughly the same performance as the Mark IIA's CPU but shrunk to a four-chip implementation that easily fit in a desktop case.

Notes

References

Citations

Bibliography

* * * * * * * * {{cite tech report , first=Curt , last=Widdoes , title=S-1 Project: developing high-performance digital computers , publisher=Lawrence Livermore National Laboratory , date=1979 , url=https://www.osti.gov/biblio/5518920 Supercomputers Lawrence Livermore National Laboratory