HOME

TheInfoList



OR:

Cell is a
multi-core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circ ...
microarchitecture that combines a general-purpose
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM– ...
core Core or cores may refer to: Science and technology * Core (anatomy), everything except the appendages * Core (manufacturing), used in casting and molding * Core (optical fiber), the signal-carrying portion of an optical fiber * Core, the centra ...
of modest performance with streamlined coprocessing elements which greatly accelerate
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradit ...
and vector processing applications, as well as many other forms of dedicated computation. It was developed by
Sony , commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professional ...
,
Toshiba , commonly known as Toshiba and stylized as TOSHIBA, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. Its diversified products and services include power, industrial and social infrastructure systems, ...
, and IBM, an alliance known as "STI". The architectural design and first implementation were carried out at the STI Design Center in
Austin, Texas Austin is the capital city of the U.S. state of Texas, as well as the seat and largest city of Travis County, with portions extending into Hays and Williamson counties. Incorporated on December 27, 1839, it is the 11th-most-populous city ...
over a four-year period beginning March 2001 on a budget reported by Sony as approaching US$400 million. Cell is shorthand for Cell Broadband Engine Architecture, commonly abbreviated ''CBEA'' in full or ''Cell BE'' in part. The first major commercial application of Cell was in Sony's
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to a ...
, released in 2006. In May 2008, the Cell-based IBM Roadrunner
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructio ...
became the first
TOP500 The TOP500 project ranks and details the 500 most powerful non- distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these updates always coinci ...
LINPACK sustained 1.0 petaflops system. Mercury Computer Systems also developed designs based on the Cell. The Cell architecture includes a memory coherence architecture that emphasizes power efficiency, prioritizes
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
over low latency, and favors peak computational
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ove ...
over simplicity of
program code A computer language is a formal language used to communicate with a computer. Types of computer languages include: * Construction language – all forms of communication by which a human can specify an executable problem solution to a compu ...
. For these reasons, Cell is widely regarded as a challenging environment for
software development Software development is the process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components. Software development invo ...
. IBM provides a
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
-based development platform to help developers program for Cell chips.


History

In mid-2000,
Sony Computer Entertainment Sony Interactive Entertainment (SIE), formerly known as Sony Computer Entertainment (SCE), is a multinational video game and digital entertainment company wholly owned by multinational conglomerate Sony. The SIE Group is made up of two legal co ...
,
Toshiba Corporation , commonly known as Toshiba and stylized as TOSHIBA, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. Its diversified products and services include power, industrial and social infrastructure syst ...
, and IBM formed an alliance known as "STI" to design and manufacture the processor. The STI Design Center opened in March 2001. The Cell was designed over a period of four years, using enhanced versions of the design tools for the
POWER4 The POWER4 is a microprocessor developed by International Business Machines (IBM) that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, ena ...
processor. Over 400 engineers from the three companies worked together in Austin, with critical support from eleven of IBM's design centers. During this period, IBM filed many
patents A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A ...
pertaining to the Cell architecture, manufacturing process, and software environment. An early patent version of the Broadband Engine was shown to be a chip package comprising four "Processing Elements", which was the patent's description for what is now known as the ''
Power Processing Element The Power Processing Element (PPE) comprises a Power Processing Unit (PPU) and a 512 KB L2 cache. In most instances the PPU is used in a PPE. The PPU is a 64-bit dual-threaded in-order PowerPC 2.02 microprocessor core designed by IBM for u ...
'' (PPE). Each Processing Element would contain 8 ''"Synergistic Processing Elements"'' ( SPEs) on the chip. This chip package was supposed to run at a clock speed of 4 GHz and with 32 SPEs providing 32 
gigaFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate me ...
each (FP8 quarter precision), the Broadband Engine was meant to have 1 teraFLOPS of raw computing power in theory. The design with 4 PPEs and 32 SPEs was never realized. Instead, Sony and IBM only manufactured a design with one PPE and 8 SPEs. This smaller design, the Cell Broadband Engine or Cell/BE was fabricated using a
90 nm The 90  nm process refers to the level of MOSFET (CMOS) fabrication process technology that was commercialized by the 2003–2005 timeframe, by leading semiconductor companies like Toshiba, Sony, Samsung, IBM, Intel, Fujitsu, TSMC, Elpid ...
SOI process. In March 2007, IBM announced that the
65 nm The 65  nm process is an advanced lithographic node used in volume CMOS (MOSFET) semiconductor fabrication. Printed linewidths (i.e. transistor gate lengths) can reach as low as 25 nm on a nominally 65 nm process, while the pitch ...
version of Cell/BE is in production at its plant (at the time, now GlobalFoundries') in
East Fishkill, New York East Fishkill is a town on the southern border of Dutchess County, New York, United States. The population was 29,707 at the 2020 census. The town was once the eastern portion of the town of Fishkill. Hudson Valley Research Park is located in th ...
, with
Bandai Namco Entertainment is a Japanese multinational video game publisher headquartered in Minato-ku, Tokyo. Its international branches, Bandai Namco Entertainment America and Bandai Namco Entertainment Europe, are respectively headquartered in Irvine, California and ...
using the Cell/BE processor for their
357 __NOTOC__ Year 357 ( CCCLVII) was a common year starting on Wednesday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Constantius and Iulianus (or, less frequently, year ...
arcade board as well as the subsequent 369. In February 2008, IBM announced that it would begin to fabricate Cell processors with the
45 nm Per the International Technology Roadmap for Semiconductors, the 45 nm process is a MOSFET technology node referring to the average half-pitch of a memory cell manufactured at around the 2007–2008 time frame. Matsushita and Intel started mass ...
process. In May 2008, IBM introduced the high-performance double-precision floating-point version of the Cell processor, the
PowerXCell 8i Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ma ...
, at the 65 nm feature size. In May 2008, an
Opteron Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64 or AMD64). It was released on April 22, 2003, with the ''Sledg ...
- and PowerXCell 8i-based supercomputer, the IBM Roadrunner system, became the world's first system to achieve one petaFLOPS, and was the fastest computer in the world until third quarter 2009. The world's three most energy-efficient supercomputers, as represented by the
Green500 The Green500 is a biannual ranking of supercomputers, from the TOP500 list of supercomputers, in terms of energy efficiency. The list measures performance per watt using the TOP500 measure of high performance LINPACK benchmarks at double-precisi ...
list, are similarly based on the PowerXCell 8i. In August 2009 the 45 nm Cell processor was introduced in concert with Sony's
PlayStation 3 Slim The PlayStation 3 (PS3) is a home video game console developed by Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on November 11, 2006, in Japan, November ...
. By November 2009, IBM had discontinued the development of a Cell processor with 32 APUs but was still developing other Cell products.


Commercialization

On May 17, 2005, Sony Computer Entertainment confirmed some specifications of the Cell processor that would be shipping in the then-forthcoming
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
console. This Cell configuration has one PPE on the core, with eight physical SPEs in silicon. In the PlayStation 3, one SPE is locked-out during the test process, a practice which helps to improve manufacturing yields, and another one is reserved for the OS, leaving 6 free SPEs to be used by games' code. The target clock-frequency at introduction is 3.2 
GHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that one he ...
. The introductory design is fabricated using a 90 nm SOI process, with initial volume production slated for IBM's facility in
East Fishkill, New York East Fishkill is a town on the southern border of Dutchess County, New York, United States. The population was 29,707 at the 2020 census. The town was once the eastern portion of the town of Fishkill. Hudson Valley Research Park is located in th ...
. The relationship between cores and threads is a common source of confusion. The PPE core is dual threaded and manifests in software as two independent threads of execution while each active SPE manifests as a single thread. In the PlayStation 3 configuration as described by Sony, the Cell processor provides nine independent threads of execution. On June 28, 2005, IBM and Mercury Computer Systems announced a partnership agreement to build Cell-based computer systems for embedded applications such as
medical imaging Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to re ...
,
industrial inspection Industrial may refer to: Industry * Industrial archaeology, the study of the history of the industry * Industrial engineering, engineering dealing with the optimization of complex industrial processes or systems * Industrial city, a city dominate ...
,
aerospace Aerospace is a term used to collectively refer to the atmosphere and outer space. Aerospace activity is very diverse, with a multitude of commercial, industrial and military applications. Aerospace engineering consists of aeronautics and ast ...
and
defense Defense or defence may refer to: Tactical, martial, and political acts or groups * Defense (military), forces primarily intended for warfare * Civil defense, the organizing of civilians to deal with emergencies or enemy attacks * Defense indus ...
, seismic processing, and
telecommunications Telecommunication is the transmission of information by various types of technologies over wire, radio, optical, or other electromagnetic systems. It has its origin in the desire of humans for communication over a distance greater than that ...
. Mercury has since then released blades, conventional rack servers and
PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common ...
accelerator boards with Cell processors. In the fall of 2006, IBM released the QS20 blade module using double Cell BE processors for tremendous performance in certain applications, reaching a peak of 410 gigaFLOPS in FP8 quarter precision per module. The
QS22 The IBM BladeCenter was IBM's blade server architecture, until it was replaced by Flex System in 2012. The x86 division was later sold to Lenovo in 2014. History Introduced in 2002, based on engineering work started in 1999, the IBM eServe ...
based on the PowerXCell 8i processor was used for the IBM Roadrunner supercomputer. Mercury and IBM uses the fully utilized Cell processor with eight active SPEs. On April 8, 2008, Fixstars Corporation released a
PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common ...
accelerator board based on the PowerXCell 8i processor. Sony's high-performance media computing server
ZEGO The ZEGO ("Zest to go") is a rackmount server platform built by Sony, targeted for the video post-production and broadcast markets. The platform is based on Sony's PlayStation 3 as it features both the Cell Processor as well as the RSX 'Reality ...
uses a 3.2 GHz Cell/B.E processor.


Overview

The Cell Broadband Engine, or ''Cell'' as it is more commonly known, is a microprocessor intended as a hybrid of conventional desktop processors (such as the
Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon XP. T ...
, and
Core 2 Intel Core 2 is the processor family encompassing a range of Intel's consumer 64-bit x86-64 single-, dual-, and quad-core microprocessors based on the Core microarchitecture. The single- and dual-core models are single-die, whereas the quad-co ...
families) and more specialized high-performance processors, such as the
NVIDIA Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
and
ATI Ati or ATI may refer to: * Ati people, a Negrito ethnic group in the Philippines **Ati language (Philippines), the language spoken by this people group ** Ati-Atihan festival, an annual celebration held in the Philippines *Ati language (China), a ...
graphics-processors (
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s). The longer name indicates its intended use, namely as a component in current and future
online distribution Digital distribution, also referred to as content delivery, online distribution, or electronic software distribution, among others, is the delivery or distribution of digital media content such as audio, video, e-books, video games, and other so ...
systems; as such it may be utilized in high-definition displays and recording equipment, as well as
HDTV High-definition television (HD or HDTV) describes a television system which provides a substantially higher image resolution than the previous generation of technologies. The term has been used since 1936; in more recent times, it refers to the g ...
systems. Additionally the processor may be suited to
digital imaging Digital imaging or digital image acquisition is the creation of a digital representation of the visual characteristics of an object, such as a physical scene or the interior structure of an object. The term is often assumed to imply or include t ...
systems (medical, scientific, ''etc.'') and
physical simulation Dynamical simulation, in computational physics, is the simulation of systems of objects that are free to move, usually in three dimensions according to Newton's laws of dynamics, or approximations thereof. Dynamical simulation is used in compute ...
(''e.g.'', scientific and
structural engineering Structural engineering is a sub-discipline of civil engineering in which structural engineers are trained to design the 'bones and muscles' that create the form and shape of man-made structures. Structural engineers also must understand and cal ...
modeling). In a simple analysis, the Cell processor can be split into four components: external input and output structures, the main processor called the ''Power Processing Element'' (PPE) (a two-way simultaneous-multithreaded
PowerPC 2.02 PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM– ...
core), eight fully functional co-processors called the ''Synergistic Processing Elements'', or SPEs, and a specialized high-bandwidth
circular data bus A ring network is a network topology in which each node connects to exactly two other nodes, forming a single continuous pathway for signals through each node – a ring. Data travels from node to node, with each node along the way handling ever ...
connecting the PPE, input/output elements and the SPEs, called the ''Element Interconnect Bus'' or EIB. To achieve the high performance needed for mathematically intensive tasks, such as decoding/encoding
MPEG The Moving Picture Experts Group (MPEG) is an alliance of working groups established jointly by ISO and IEC that sets standards for media coding, including compression coding of audio, video, graphics, and genomic data; and transmission and f ...
streams, generating or transforming three-dimensional data, or undertaking
Fourier analysis In mathematics, Fourier analysis () is the study of the way general functions may be represented or approximated by sums of simpler trigonometric functions. Fourier analysis grew from the study of Fourier series, and is named after Joseph ...
of data, the Cell processor marries the SPEs and the PPE via EIB to give access, via fully cache coherent DMA (direct memory access), to both main memory and to other external data storage. To make the best of EIB, and to overlap computation and data transfer, each of the nine processing elements (PPE and SPEs) is equipped with a DMA engine. Since the SPE's load/store instructions can only access its own local
scratchpad memory Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. In reference to a microproces ...
, each SPE entirely depends on DMAs to transfer data to and from the main memory and other SPEs' local memories. A DMA operation can transfer either a single block area of size up to 16KB, or a list of 2 to 2048 such blocks. One of the major design decisions in the architecture of Cell is the use of DMAs as a central means of intra-chip data transfer, with a view to enabling maximal asynchrony and concurrency in data processing inside a chip. The PPE, which is capable of running a conventional operating system, has control over the SPEs and can start, stop, interrupt, and schedule processes running on the SPEs. To this end, the PPE has additional instructions relating to the control of the SPEs. Unlike SPEs, the PPE can read and write the main memory and the local memories of SPEs through the standard load/store instructions. Despite having
Turing complete Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical ...
architectures, the SPEs are not fully autonomous and require the PPE to prime them before they can do any useful work. As most of the "horsepower" of the system comes from the synergistic processing elements, the use of
DMA DMA may refer to: Arts * DMA (magazine), ''DMA'' (magazine), a defunct dance music magazine * Dallas Museum of Art, an art museum in Texas, US * Danish Music Awards, an award show held in Denmark * BT Digital Music Awards, an annual event in the U ...
as a method of data transfer and the limited local memory footprint of each SPE pose a major challenge to software developers who wish to make the most of this horsepower, demanding careful hand-tuning of programs to extract maximal performance from this CPU. The PPE and bus architecture includes various modes of operation giving different levels of
memory protection Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that h ...
, allowing areas of memory to be protected from access by specific processes running on the SPEs or the PPE. Both the PPE and SPE are
RISC In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comp ...
architectures with a fixed-width 32-bit instruction format. The PPE contains a 64-bit
general purpose register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
set (GPR), a 64-bit floating-point register set (FPR), and a 128-bit Altivec register set. The SPE contains 128-bit registers only. These can be used for scalar data types ranging from 8-bits to 64-bits in size or for
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
computations on a variety of integer and floating-point formats. System memory addresses for both the PPE and SPE are expressed as 64-bit values for a theoretic address range of 264 bytes (16 exabytes or 16,777,216 terabytes). In practice, not all of these bits are implemented in hardware. Local store addresses internal to the SPU (Synergistic Processor Unit) processor are expressed as a 32-bit word. In documentation relating to Cell a word is always taken to mean 32 bits, a doubleword means 64 bits, and a quadword means 128 bits.


PowerXCell 8i

In 2008, IBM announced a revised variant of the Cell called the PowerXCell 8i, which is available in QS22
Blade Servers A blade server is a stripped-down server computer with a modular design optimized to minimize the use of physical space and energy. Blade servers have many components removed to save space, minimize power consumption and other considerations, whi ...
from IBM. The PowerXCell is manufactured on a
65 nm The 65  nm process is an advanced lithographic node used in volume CMOS (MOSFET) semiconductor fabrication. Printed linewidths (i.e. transistor gate lengths) can reach as low as 25 nm on a nominally 65 nm process, while the pitch ...
process, and adds support for up to 32 GB of slotted DDR2 memory, as well as dramatically improving
double-precision floating-point Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Flo ...
performance on the SPEs from a peak of about 12.8 
GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
to 102.4 GFLOPS total for eight SPEs, which, coincidentally, is the same peak performance as the
NEC SX-9 The SX-9 is a NEC SX supercomputer built by NEC Corporation. The SX-9 Series implements an SMP system in a compact node module and uses an enhanced version of the single chip vector processor that was introduced with the SX-6. The NEC SX-9 ...
vector processor released around the same time. The IBM Roadrunner supercomputer, the world's fastest during 2008–2009, consisted of 12,240 PowerXCell 8i processors, along with 6,562
AMD Opteron Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64 or AMD64). It was released on April 22, 2003, with the ''Sledge ...
processors. The PowerXCell 8i powered super computers also dominated all of the top 6 "greenest" systems in the Green500 list, with highest MFLOPS/Watt ratio supercomputers in the world. Beside the QS22 and supercomputers, the PowerXCell processor is also available as an accelerator on a PCI Express card and is used as the core processor in the
QPACE QPACE ( QCD Parallel Computing on the Cell Broadband Engine) is a massively parallel and scalable supercomputer designed for applications in lattice quantum chromodynamics. Overview The QPACE supercomputer is a research project carried out by s ...
project. Since the PowerXCell 8i removed the RAMBUS memory interface, and added significantly larger DDR2 interfaces and enhanced SPEs, the chip layout had to be reworked, which resulted in both larger chip die and packaging.


Architecture

While the Cell chip can have a number of different configurations, the basic configuration is a
multi-core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
chip composed of one "Power Processor Element" ("PPE") (sometimes called "Processing Element", or "PE"), and multiple "Synergistic Processing Elements" ("SPE"). The PPE and SPEs are linked together by an internal high speed bus dubbed "Element Interconnect Bus" ("EIB").


Power Processor Element (PPE)

The ''PPE'' is the
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM– ...
based, dual-issue in-order two-way simultaneous-multithreaded
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...
core with a 23-stage pipeline acting as the controller for the eight SPEs, which handle most of the computational workload. PPE has limited out of order execution capabilities; it can perform loads out of order and has delayed execution pipelines. The PPE will work with conventional operating systems due to its similarity to other 64-bit PowerPC processors, while the SPEs are designed for vectorized floating point code execution. The PPE contains a 64
KiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
level 1
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
(32 KiB instruction and a 32 KiB data) and a 512 KiB Level 2 cache. The size of a cache line is 128 bytes. Additionally, IBM has included an AltiVec (VMX) unit which is fully pipelined for
single precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floatin ...
floating point (Altivec 1 does not support
double precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Flo ...
floating-point vectors.), 32-bit Fixed Point Unit (FXU) with 64-bit register file per thread, Load and Store Unit (LSU), 64-bit Floating-Point Unit (FPU), Branch Unit (BRU) and Branch Execution Unit(BXU). PPE consists of three main units: Instruction Unit (IU), Execution Unit (XU), and vector/scalar execution unit (VSU). IU contains L1 instruction cache, branch prediction hardware, instruction buffers, and dependency checking logic. XU contains integer execution units (FXU) and load-store unit (LSU). VSU contains all of the execution resources for FPU and VMX. Each PPE can complete two double-precision operations per clock cycle using a scalar fused-multiply-add instruction, which translates to 6.4 
GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
at 3.2 GHz; or eight single-precision operations per clock cycle with a vector fused-multiply-add instruction, which translates to 25.6 GFLOPS at 3.2 GHz.


Xenon in Xbox 360

The PPE was designed specifically for the Cell processor but during development,
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
approached IBM wanting a high-performance processor core for its
Xbox 360 The Xbox 360 is a home video game console developed by Microsoft. As the successor to the original Xbox, it is the second console in the Xbox series. It competed with Sony's PlayStation 3 and Nintendo's Wii as part of the seventh generati ...
. IBM complied and made the tri-core Xenon processor, based on a slightly modified version of the PPE with added VMX128 extensions.


Synergistic Processing Elements (SPE)

Each SPE is a dual issue in order processor composed of a "Synergistic Processing Unit", SPU, and a "Memory Flow Controller", MFC (
DMA DMA may refer to: Arts * DMA (magazine), ''DMA'' (magazine), a defunct dance music magazine * Dallas Museum of Art, an art museum in Texas, US * Danish Music Awards, an award show held in Denmark * BT Digital Music Awards, an annual event in the U ...
, MMU, and
bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for cha ...
interface). SPEs do not have any
branch prediction In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
hardware (hence there is a heavy burden on the compiler). Each SPE has 6 execution units divided among odd and even pipelines on each SPE : The SPU runs a specially developed
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
(ISA) with 128-bit
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
organization for single and double precision instructions. With the current generation of the Cell, each SPE contains a 256 
KiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
embedded SRAM for instruction and data, called "Local Storage" (not to be mistaken for "Local Memory" in Sony's documents that refer to the VRAM) which is visible to the PPE and can be addressed directly by software. Each SPE can support up to 4 GiB of local store memory. The local store does not operate like a conventional
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whi ...
since it is neither transparent to software nor does it contain hardware structures that predict which data to load. The SPEs contain a 128-bit, 128-entry
register file A register file is an array of processor registers in a central processing unit (CPU). Register banking is the method of using a single name to access multiple different physical registers depending on the operating mode. Modern integrated circuit- ...
and measures 14.5 mm2 on a 90 nm process. An SPE can operate on sixteen 8-bit integers, eight 16-bit integers, four 32-bit integers, or four single-precision floating-point numbers in a single clock cycle, as well as a memory operation. Note that the SPU cannot directly access system memory; the 64-bit virtual memory addresses formed by the SPU must be passed from the SPU to the SPE memory flow controller (MFC) to set up a DMA operation within the system address space. In one typical usage scenario, the system will load the SPEs with small programs (similar to threads), chaining the SPEs together to handle each step in a complex operation. For instance, a
set-top box A set-top box (STB), also colloquially known as a cable box and historically television decoder, is an information appliance device that generally contains a TV-tuner input and displays output to a television set and an external source of s ...
might load programs for reading a DVD, video and audio decoding, and display and the data would be passed off from SPE to SPE until finally ending up on the TV. Another possibility is to partition the input data set and have several SPEs performing the same kind of operation in parallel. At 3.2 GHz, each SPE gives a theoretical 25.6
GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...
of single-precision performance. Compared to its
personal computer A personal computer (PC) is a multi-purpose microcomputer whose size, capabilities, and price make it feasible for individual use. Personal computers are intended to be operated directly by an end user, rather than by a computer expert or te ...
contemporaries, the relatively high overall floating-point performance of a Cell processor seemingly dwarfs the abilities of the SIMD unit in CPUs like the
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 ...
and the
Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon XP. T ...
. However, comparing only floating-point abilities of a system is a one-dimensional and application-specific metric. Unlike a Cell processor, such desktop CPUs are more suited to the general-purpose software usually run on personal computers. In addition to executing multiple instructions per clock, processors from Intel and AMD feature
branch predictor In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
s. The Cell is designed to compensate for this with compiler assistance, in which prepare-to-branch instructions are created. For double-precision floating-point operations, as sometimes used in personal computers and often used in scientific computing, Cell performance drops by an order of magnitude, but still reaches 20.8 GFLOPS (1.8 GFLOPS per SPE, 6.4 GFLOPS per PPE). The PowerXCell 8i variant, which was specifically designed for double-precision, reaches 102.4 GFLOPS in double-precision calculations. Tests by IBM show that the SPEs can reach 98% of their theoretical peak performance running optimized parallel matrix multiplication.
Toshiba , commonly known as Toshiba and stylized as TOSHIBA, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. Its diversified products and services include power, industrial and social infrastructure systems, ...
has developed a
co-processor A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or ...
powered by four SPEs, but no PPE, called the SpursEngine designed to accelerate 3D and movie effects in consumer electronics. Each SPE has a local memory of 256 KB. In total, the SPEs have 2 MB of local memory.


Element Interconnect Bus (EIB)

The EIB is a communication bus internal to the Cell processor which connects the various on-chip system elements: the PPE processor, the memory controller (MIC), the eight SPE coprocessors, and two off-chip I/O interfaces, for a total of 12 participants in the PS3 (the number of SPU can vary in industrial applications). The EIB also includes an arbitration unit which functions as a set of traffic lights. In some documents, IBM refers to EIB participants as 'units'. The EIB is presently implemented as a circular ring consisting of four 16-byte-wide unidirectional channels which counter-rotate in pairs. When traffic patterns permit, each channel can convey up to three transactions concurrently. As the EIB runs at half the system clock rate the effective channel rate is 16 bytes every two system clocks. At maximum concurrency, with three active transactions on each of the four rings, the peak ''instantaneous'' EIB bandwidth is 96 bytes per clock (12 concurrent transactions × 16 bytes wide / 2 system clocks per transfer). While this figure is often quoted in IBM literature, it is unrealistic to simply scale this number by processor clock speed. The arbitration unit imposes additional constraints. IBM Senior Engineer David Krolak, EIB lead designer, explains the concurrency model: Each participant on the EIB has one 16-byte read port and one 16-byte write port. The limit for a single participant is to read and write at a rate of 16 bytes per EIB clock (for simplicity often regarded 8 bytes per system clock). Each SPU processor contains a dedicated
DMA DMA may refer to: Arts * DMA (magazine), ''DMA'' (magazine), a defunct dance music magazine * Dallas Museum of Art, an art museum in Texas, US * Danish Music Awards, an award show held in Denmark * BT Digital Music Awards, an annual event in the U ...
management queue capable of scheduling long sequences of transactions to various endpoints without interfering with the SPU's ongoing computations; these DMA queues can be managed locally or remotely as well, providing additional flexibility in the control model. Data flows on an EIB channel stepwise around the ring. Since there are twelve participants, the total number of steps around the channel back to the point of origin is twelve. Six steps is the longest distance between any pair of participants. An EIB channel is not permitted to convey data requiring more than six steps; such data must take the shorter route around the circle in the other direction. The number of steps involved in sending the packet has very little impact on transfer latency: the clock speed driving the steps is very fast relative to other considerations. However, longer communication distances ''are'' detrimental to the overall performance of the EIB as they reduce available concurrency. Despite IBM's original desire to implement the EIB as a more powerful cross-bar, the circular configuration they adopted to spare resources rarely represents a limiting factor on the performance of the Cell chip as a whole. In the worst case, the programmer must take extra care to schedule communication patterns where the EIB is able to function at high concurrency levels. David Krolak explained:


Bandwidth assessment

At 3.2 GHz, each channel flows at a rate of 25.6 GB/s. Viewing the EIB in isolation from the system elements it connects, achieving twelve concurrent transactions at this flow rate works out to an abstract EIB bandwidth of 307.2 GB/s. Based on this view many IBM publications depict available EIB bandwidth as "greater than 300 GB/s". This number reflects the peak ''instantaneous'' EIB bandwidth scaled by processor frequency. However, other technical restrictions are involved in the arbitration mechanism for packets accepted onto the bus. The IBM Systems Performance group explained: This quote apparently represents the full extent of IBM's public disclosure of this mechanism and its impact. The EIB arbitration unit, the snooping mechanism, and interrupt generation on segment or page translation faults are not well described in the documentation set as yet made public by IBM. In practice, effective EIB bandwidth can also be limited by the ring participants involved. While each of the nine processing cores can sustain 25.6 GB/s read and write concurrently, the memory interface controller (MIC) is tied to a pair of XDR memory channels permitting a maximum flow of 25.6 GB/s for reads and writes combined and the two IO controllers are documented as supporting a peak combined input speed of 25.6 GB/s and a peak combined output speed of 35 GB/s. To add further to the confusion, some older publications cite EIB bandwidth assuming a 4 GHz system clock. This reference frame results in an instantaneous EIB bandwidth figure of 384 GB/s and an arbitration-limited bandwidth figure of 256 GB/s. All things considered the theoretic 204.8 GB/s number most often cited is the best one to bear in mind. The ''IBM Systems Performance'' group has demonstrated SPU-centric data flows achieving 197 GB/s on a Cell processor running at 3.2 GHz so this number is a fair reflection on practice as well.


Memory and I/O controllers

Cell contains a dual channel
Rambus Rambus Incorporated, founded in 1990, is an American technology company that designs, develops and licenses chip interface technologies and architectures that are used in digital electronics products. The company is well known for inventing ...
XIO macro which interfaces to Rambus XDR memory. The memory interface controller (MIC) is separate from the XIO macro and is designed by IBM. The XIO-XDR link runs at 3.2 Gbit/s per pin. Two 32-bit channels can provide a theoretical maximum of 25.6 GB/s. The I/O interface, also a Rambus design, is known as FlexIO. The FlexIO interface is organized into 12 lanes, each lane being a unidirectional 8-bit wide point-to-point path. Five 8-bit wide point-to-point paths are inbound lanes to Cell, while the remaining seven are outbound. This provides a theoretical peak bandwidth of 62.4 GB/s (36.4 GB/s outbound, 26 GB/s inbound) at 2.6 GHz. The FlexIO interface can be clocked independently, typ. at 3.2 GHz. 4 inbound + 4 outbound lanes are supporting memory coherency.


Possible applications


Video processing card

Some companies, such as Leadtek, have released
PCI-E PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common m ...
cards based upon the Cell to allow for "faster than real time" transcoding of
H.264 Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distr ...
,
MPEG-2 MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, ...
and
MPEG-4 MPEG-4 is a group of international standards for the compression of digital audio and visual data, multimedia systems, and file storage formats. It was originally introduced in late 1998 as a group of audio and video coding formats and related t ...
video.


Blade server

On August 29, 2007, IBM announced the
BladeCenter The IBM BladeCenter was IBM's blade server architecture, until it was replaced by Flex System in 2012. The x86 division was later sold to Lenovo in 2014. History Introduced in 2002, based on engineering work started in 1999, the IBM eServe ...
QS21. Generating a measured 1.05 giga–floating point operations per second (gigaFLOPS) per watt, with peak performance of approximately 460 GFLOPS it is one of the most power efficient computing platforms to date. A single BladeCenter chassis can achieve 6.4 tera–floating point operations per second (teraFLOPS) and over 25.8 teraFLOPS in a standard 42U rack. On May 13, 2008, IBM announced the
BladeCenter The IBM BladeCenter was IBM's blade server architecture, until it was replaced by Flex System in 2012. The x86 division was later sold to Lenovo in 2014. History Introduced in 2002, based on engineering work started in 1999, the IBM eServe ...
QS22. The QS22 introduces the PowerXCell 8i processor with five times the double-precision floating point performance of the QS21, and the capacity for up to 32 GB of DDR2 memory on-blade. IBM has discontinued the Blade server line based on Cell processors as of January 12, 2012.


PCI Express board

Several companies provide PCI-e boards utilising the IBM PowerXCell 8i. The performance is reported as 179.2 GFlops (SP), 89.6 GFlops (DP) at 2.8 GHz.


Console video games

Sony , commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professional ...
's
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
video game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to ...
was the first production application of the Cell processor, clocked at 3.2 
GHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that one he ...
and containing seven out of eight operational SPEs, to allow Sony to increase the yield on the processor manufacture. Only six of the seven SPEs are accessible to developers as one is reserved by the OS.


Home cinema

Toshiba has produced HDTVs using Cell. They presented a system to decode 48
standard definition Standard-definition television (SDTV, SD, often shortened to standard definition) is a television system which uses a resolution that is not considered to be either high or enhanced definition. "Standard" refers to it being the prevailing sp ...
MPEG-2 MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic coding of moving pictures and associated audio information". It describes a combination of lossy video compression and lossy audio data compression methods, ...
streams simultaneously on a 1920×1080 screen. This can enable a viewer to choose a channel based on dozens of thumbnail videos displayed simultaneously on the screen.


Supercomputing

IBM's supercomputer, IBM Roadrunner, was a hybrid of General Purpose x86-64
Opteron Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64 or AMD64). It was released on April 22, 2003, with the ''Sledg ...
as well as Cell processors. This system assumed the #1 spot on the June 2008 Top 500 list as the first supercomputer to run at petaFLOPS speeds, having gained a sustained 1.026 petaFLOPS speed using the standard
LINPACK benchmark The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense ''n'' by ''n'' system of linear equations ''Ax'' = ''b'', which is a common ...
. IBM Roadrunner used the PowerXCell 8i version of the Cell processor, manufactured using 65 nm technology and enhanced SPUs that can handle double precision calculations in the 128-bit registers, reaching double precision 102 GFLOPs per chip.


Cluster computing

Clusters of
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
consoles are an attractive alternative to high-end systems based on Cell blades. Innovative Computing Laboratory, a group led by
Jack Dongarra Jack Joseph Dongarra (born July 18, 1950) is an American computer scientist and mathematician. He is the American University Distinguished Professor of Computer Science in the Electrical Engineering and Computer Science Department at the Unive ...
, in the Computer Science Department at the University of Tennessee, investigated such an application in depth. Terrasoft Solutions is selling 8-node and 32-node PS3 clusters with
Yellow Dog Linux Yellow Dog Linux (YDL) is a discontinued free and open-source operating system for high-performance computing on multi-core processor computer architectures, focusing on GPU systems and computers using the POWER7 processor. The original deve ...
pre-installed, an implementation of Dongarra's research. As first reported by ''
Wired ''Wired'' (stylized as ''WIRED'') is a monthly American magazine, published in print and online editions, that focuses on how emerging technologies affect culture, the economy, and politics. Owned by Condé Nast, it is headquartered in San ...
'' on October 17, 2007, an interesting application of using PlayStation 3 in a cluster configuration was implemented by Astrophysicist
Gaurav Khanna Gaurav Khanna (born 11 December 1981) is an Indian television actor and model. He is known for his roles of Neil in '' Jeevan Saathi'', Inspector Kavin in '' CID'' and Akshay in '' Tere Bin''. He currently portrays Anuj Kapadia in Star Plus's ' ...
, from the Physics department of
University of Massachusetts Dartmouth The University of Massachusetts Dartmouth (UMass Dartmouth or UMassD) is a public research university in Dartmouth, Massachusetts. It is the southernmost campus of the University of Massachusetts system. Formerly Southeastern Massachusett ...
, who replaced time used on supercomputers with a cluster of eight PlayStation 3s. Subsequently, the next generation of this machine, now called the ''
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
Gravity Grid'', uses a network of 16 machines, and exploits the Cell processor for the intended application which is binary
black hole A black hole is a region of spacetime where gravity is so strong that nothing, including light or other electromagnetic waves, has enough energy to escape it. The theory of general relativity predicts that a sufficiently compact mass can def ...
coalescence using
perturbation theory In mathematics and applied mathematics, perturbation theory comprises methods for finding an approximate solution to a problem, by starting from the exact solution of a related, simpler problem. A critical feature of the technique is a middle ...
. In particular, the cluster performs astrophysical simulations of large
supermassive black holes A supermassive black hole (SMBH or sometimes SBH) is the largest type of black hole, with its mass being on the order of hundreds of thousands, or millions to billions of times the mass of the Sun (). Black holes are a class of astronomical ob ...
capturing smaller compact objects and has generated numerical data that has been published multiple times in the relevant scientific research literature. The Cell processor version used by the PlayStation 3 has a main CPU and 6 SPEs available to the user, giving the Gravity Grid machine a net of 16 general-purpose processors and 96 vector processors. The machine has a one-time cost of $9,000 to build and is adequate for black-hole simulations which would otherwise cost $6,000 per run on a conventional supercomputer. The black hole calculations are not memory-intensive and are highly localizable, and so are well-suited to this architecture. Khanna claims that the cluster's performance exceeds that of a 100+ Intel Xeon core based traditional Linux cluster on his simulations. The PS3 Gravity Grid gathered significant media attention through 2007, 2008, 2009, and 2010. The computational Biochemistry and Biophysics lab at the
Universitat Pompeu Fabra Pompeu Fabra University ( ca, Universitat Pompeu Fabra, UPF, ; es, link=no, Universidad Pompeu Fabra) is a public university located in the city of Barcelona, Catalonia in Spain. The university was created by the Autonomous Government of Catal ...
, in
Barcelona Barcelona ( , , ) is a city on the coast of northeastern Spain. It is the capital and largest city of the autonomous community of Catalonia, as well as the second most populous municipality of Spain. With a population of 1.6 million within c ...
, deployed in 2007 a
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
system called PS3GRID for collaborative computing based on the CellMD software, the first one designed specifically for the Cell processor. The United States
Air Force Research Laboratory The Air Force Research Laboratory (AFRL) is a scientific research organization operated by the United States Air Force Materiel Command dedicated to leading the discovery, development, and integration of aerospace warfighting technologies, pl ...
has deployed a PlayStation 3 cluster of over 1700 units, nicknamed the "Condor Cluster", for analyzing
high-resolution Image resolution is the detail an image holds. The term applies to digital images, film images, and other types of images. "Higher resolution" means more image detail. Image resolution can be measured in various ways. Resolution quantifies how cl ...
satellite imagery Satellite images (also Earth observation imagery, spaceborne photography, or simply satellite photo) are images of Earth collected by imaging satellites operated by governments and businesses around the world. Satellite imaging companies sell ima ...
. The Air Force claims the Condor Cluster would be the 33rd largest supercomputer in the world in terms of capacity. The lab has opened up the supercomputer for use by universities for research.


Distributed computing

With the help of the computing power of over half a million PlayStation 3 consoles, the distributed computing project
Folding@home Folding@home (FAH or F@h) is a volunteer computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements ...
has been recognized by
Guinness World Records ''Guinness World Records'', known from its inception in 1955 until 1999 as ''The Guinness Book of Records'' and in previous United States editions as ''The Guinness Book of World Records'', is a reference book published annually, listing world ...
as the most powerful distributed network in the world. The first record was achieved on September 16, 2007, as the project surpassed one petaFLOPS, which had never previously been attained by a distributed computing network. Additionally, the collective efforts enabled PS3 alone to reach the petaFLOPS mark on September 23, 2007. In comparison, the world's second-most powerful supercomputer at the time, IBM's
BlueGene/L Blue Gene is an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with low power consumption. The project created three generations of supercomputers, Blue Gene/L, Blue Gene/P, ...
, performed at around 478.2 teraFLOPS, which means Folding@home's computing power is approximately twice BlueGene/L's (although the CPU interconnect in BlueGene/L is more than one million times faster than the mean network speed in Folding@home). As of May 7, 2011, Folding@home runs at about 9.3 x86 petaFLOPS, with 1.6 petaFLOPS generated by 26,000 active PS3s alone.


Mainframes

IBM announced on April 25, 2007, that it would begin integrating its Cell Broadband Engine Architecture microprocessors into the company's line of mainframes. This has led to a
gameframe A gameframe is a hybrid computer system that was first used in the online video game industry. It is an amalgamation of the different technologies and architectures for supercomputers and mainframes, namely high computing power and high through ...
.


Password cracking

The architecture of the processor makes it better suited to hardware-assisted cryptographic
brute force attack In cryptography, a brute-force attack consists of an attacker submitting many passwords or passphrases with the hope of eventually guessing correctly. The attacker systematically checks all possible passwords and passphrases until the correct ...
applications than conventional processors.


Software engineering

Due to the flexible nature of the Cell, there are several possibilities for the utilization of its resources, not limited to just different computing paradigms:


Job queue

The PPE maintains a job queue, schedules jobs in SPEs, and monitors progress. Each SPE runs a "mini kernel" whose role is to fetch a job, execute it, and synchronize with the PPE.


Self-multitasking of SPEs

The mini kernel and scheduling is distributed across the SPEs. Tasks are synchronized using mutexes or semaphores as in a conventional
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
. Ready-to-run tasks wait in a queue for an SPE to execute them. The SPEs use shared memory for all tasks in this configuration.


Stream processing

Each SPE runs a distinct program. Data comes from an input stream and is sent to SPEs. When an SPE has terminated the processing, the output data is sent to an output stream. This provides a flexible and powerful architecture for
stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
, and allows explicit scheduling for each SPE separately. Other processors are also able to perform streaming tasks but are limited by the kernel loaded.


Open source software development

In 2005, patches enabling Cell support in the Linux kernel were submitted for inclusion by IBM developers. Arnd Bergmann (one of the developers of the aforementioned patches) also described the Linux-based Cell architecture at
LinuxTag LinuxTag (the name is a compound with the German ''Tag'' meaning assembly, conference or meeting) is a free software exposition with an emphasis on Linux (but also BSD), held annually in Germany. LinuxTag claims to be Europe's largest exhibitio ...
2005. As of release 2.6.16 (March 20, 2006), the Linux kernel officially supports the Cell processor. Both PPE and SPEs are programmable in C/C++ using a common API provided by libraries. Fixstars Solutions provides
Yellow Dog Linux Yellow Dog Linux (YDL) is a discontinued free and open-source operating system for high-performance computing on multi-core processor computer architectures, focusing on GPU systems and computers using the POWER7 processor. The original deve ...
for IBM and Mercury Cell-based systems, as well as for the PlayStation 3. Terra Soft strategically partnered with Mercury to provide a Linux Board Support Package for Cell, and support and development of software applications on various other Cell platforms, including the IBM BladeCenter JS21 and Cell QS20, and Mercury Cell-based solutions. Terra Soft also maintains the Y-HPC (High Performance Computing) Cluster Construction and Management Suite and Y-Bio gene sequencing tools. Y-Bio is built upon the RPM Linux standard for package management, and offers tools which help bioinformatics researchers conduct their work with greater efficiency. IBM has developed a pseudo-filesystem for Linux coined "Spufs" that simplifies access to and use of the SPE resources. IBM is currently maintaining a Linux
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...
and GDB ports, while Sony maintains the
GNU toolchain The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software applications and operating systems. The GNU toolchai ...
( GCC,
binutils The GNU Binary Utilities, or , are a set of programming tools for creating and managing binary programs, object files, libraries, profile data, and assembly source code. Tools They were originally written by programmers at Cygnus Solutions. ...
). In November 2005, IBM released a "Cell Broadband Engine (CBE) Software Development Kit Version 1.0", consisting of a simulator and assorted tools, to its web site. Development versions of the latest kernel and tools for Fedora Core 4 are maintained at the
Barcelona Supercomputing Center The Barcelona Supercomputing Center ( es, Centro Nacional de Supercomputación) is a public research center located in Barcelona, Catalonia, Spain. It hosts MareNostrum, a 13.7 Petaflops, Intel Xeon Platinum-based supercomputer, which also includ ...
website. In August 2007, Mercury Computer Systems released a Software Development Kit for PlayStation 3 for High-Performance Computing. In November 2007, Fixstars Corporation released the new "CVCell" module aiming to accelerate several important
OpenCV OpenCV (''Open Source Computer Vision Library'') is a library of programming functions mainly aimed at real-time computer vision. Originally developed by Intel, it was later supported by Willow Garage then Itseez (which was later acquired by In ...
APIs for Cell. In a series of software calculation tests, they recorded execution times on a 3.2 GHz Cell processor that were between 6x and 27x faster compared with the same software on a 2.4 GHz Intel Core 2 Duo.


Gallery

Illustrations of the different generations of Cell/B.E. processors and the PowerXCell 8i. The images are not to scale; All Cell/B.E. packages measures 42.5×42.5 mm and the PowerXCell 8i measures 47.5×47.5 mm. File:Cell-BE-90nm-lid.jpg, The 90 nm Cell/B.E. that shipped with the first PlayStation 3. The usual way one would see it is with its lid on, as it is glued on and not easily removed. File:Cell-BE-90nm.jpg, The 90 nm Cell/B.E. that shipped with the first PlayStation 3. It has its lid removed to show the size of the processor die underneath. File:Cell-BE-90-underside.jpg, The underside of the 90 nm Cell/B.E. processor showing its 1242 solder balls, each 0.6 mm in diameter, and its array of 35 capacitors. File:Cell-BE-65nm.jpg, The 65 nm Cell/B.E. that shipped with updated PlayStation 3s. It has its lid removed to show the size of the processor die underneath. File:Cell-BE-45nm.jpg, The 45 nm Cell/B.E. that shipped with updated PlayStation 3s such as the Slim and Super Slim versions. It has its lid removed to show the size of the processor die underneath. File:PowerXCell-8i.jpg, The 65 nm high-performance PowerXCell 8i with extra capacitors on top due to decoupling needed for noise introduced by the DDR2 interface.


See also

* STI Center of Competence for the Cell Processor * Adapteva Epiphany architecture, a similar network-on-a-chip with local stores and DMA, but more cores and easier off-core communication. *
Vision Processing Unit A vision processing unit (VPU) is (as of 2018) an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks. Overview Vision processing units are distinct from video processing uni ...
, an emerging class of processor with some similar features *
MPSoC A multiprocessor system on a chip (, ' or ) is a system on a chip (SoC) which includes multiple microprocessors. As such, it is a multi-core system on a chip. MPSoCs are usually targeted for embedded applications. It is used by platforms that con ...
* Octopiler *
Xenon (processor) Microsoft XCPU, codenamed Xenon, is a CPU used in the Xbox 360 game console, to be used with ATI's Xenos graphics chip. The processor was developed by Microsoft and IBM under the IBM chip program codenamed "Waternoose", which was named after ...
*
IBM PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM� ...


References


External links


Cell Broadband Engine resource center

Sony Computer Entertainment Incorporated's Cell resource page

Cmpware Configurable Multiprocessor Development Kit for Cell BE

ISSCC 2005: The CELL Microprocessor, a comprehensive overview of the CELL microarchitecture





Introducing the IBM/Sony/Toshiba Cell Processor — Part I: the SIMD processing units

Introducing the IBM/Sony/Toshiba Cell Processor -- Part II: The Cell Architecture

The Soul of Cell: An interview with Dr. H. Peter Hofstee
{{DEFAULTSORT:Cell (Microprocessor) * IBM microprocessors PowerPC microprocessors SIMD computing Sony semiconductors Power microprocessors 64-bit microprocessors