HOME

TheInfoList



OR:

The Power Processing Element (PPE) comprises a Power Processing Unit (PPU) and a 512 KB L2 cache. In most instances the PPU is used in a PPE. The PPU is a
64-bit In computer architecture, 64-bit Integer (computer science), integers, memory addresses, or other Data (computing), data units are those that are 64 bits wide. Also, 64-bit central processing unit, CPUs and arithmetic logic unit, ALUs are those ...
dual-threaded in-order PowerPC 2.02
microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
core Core or cores may refer to: Science and technology * Core (anatomy), everything except the appendages * Core (manufacturing), used in casting and molding * Core (optical fiber), the signal-carrying portion of an optical fiber * Core, the centra ...
designed by IBM for use primarily in the
game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to a t ...
s
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
and
Xbox 360 The Xbox 360 is a home video game console developed by Microsoft. As the successor to the original Xbox, it is the second console in the Xbox series. It competed with Sony's PlayStation 3 and Nintendo's Wii as part of the seventh generation ...
, but has also found applications in high performance computing in
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
s such as the record setting
IBM Roadrunner Roadrunner was a supercomputer built by IBM for the Los Alamos National Laboratory in New Mexico, USA. The US$100-million Roadrunner was designed for a peak performance of 1.7 petaflops. It achieved 1.026 petaflops on May 25, 2008, to become the ...
. The PPU is used as a main CPU core in three different processor designs: * The
Cell Broadband Engine Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as m ...
(Cell BE) which is used primarily in
Sony , commonly stylized as SONY, is a Japanese multinational conglomerate corporation headquartered in Minato, Tokyo, Japan. As a major technology company, it operates as one of the world's largest manufacturers of consumer and professional ...
's
PlayStation 3 The PlayStation 3 (PS3) is a home video game console developed by Sony Interactive Entertainment, Sony Computer Entertainment. The successor to the PlayStation 2, it is part of the PlayStation brand of consoles. It was first released on Novemb ...
gaming console. It uses the PPE and comes in three versions, a 90 nm, a 65 nm and a 45 nm part. * The
PowerXCell 8i Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ...
which is a version of the Cell BE with enhanced FPU and memory subsystem. It was only manufactured as a single 65 nm version. * The XCPU which is used in a three core configuration and a unified 1 MB L2 cache inside Microsoft's
Xbox 360 The Xbox 360 is a home video game console developed by Microsoft. As the successor to the original Xbox, it is the second console in the Xbox series. It competed with Sony's PlayStation 3 and Nintendo's Wii as part of the seventh generation ...
. It comes in three versions, the 90 nm and 65 nm versions, and the 45 nm XCGPU with an integrated
graphics processor A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
from
ATI Ati or ATI may refer to: * Ati people, a Negrito ethnic group in the Philippines **Ati language (Philippines), the language spoken by this people group ** Ati-Atihan festival, an annual celebration held in the Philippines *Ati language (China), a ...
.


Main features

* 64-bit, dual-threaded core * 3.2 GHz typical clockrate * 32 KB L1 instruction cache * 32 KB L1 data cache * 512 KB unified L2 cache, 8-way set associative in the PPE variant. * Compatible with 64-bit PowerPC ISA v.2.02 (
POWER4 The POWER4 is a microprocessor developed by International Business Machines (IBM) that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, ena ...
and
PowerPC 970 The PowerPC 970, PowerPC 970FX, and PowerPC 970MP are 64-bit PowerPC processors from IBM introduced in 2002. When used in PowerPC-based Macintosh computers, Apple referred to them as the PowerPC G5. The 970 family was created through a collab ...
) * AltiVec
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
functionality


Execution units

* Branch Unit (BRU) * Fixed Point Integer Unit (FXU) * Load and Store Unit (LSU) * Floating-Point Unit (FPU) * Vector Media Extension Unit (VMX)


In-order

The PPU is an in-order processor, but it has some unique traits which allow it to achieve some benefits of out-of-order execution without expensive re-ordering hardware. Upon reaching an L1 cache miss - it can execute past the cache miss, stopping only when an instruction is actually dependent on a load. It can send up to 8 load instructions to the L2 cache out-of-order. It has an instruction delay pipe - a side path that allows it to execute instructions that would normally cause pipeline stalls without holding up the rest of the
pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
. The instruction delay pipeline is used for the Out-Of-Order Load/Stores: cache misses are put there while it moves on.


The PPE's Pipeline

The PPE has a 23 stage general pipeline with an additional 11 stages possible for Microcode and an additional 4 stages possible for Branch Prediction.


Multithreading

The PPU runs two hardware threads simultaneously. The main registers for code execution are duplicated, as are the exception and interrupt-handling registers, and several essential arrays and queues. They can generate exceptions simultaneously, and perform branch prediction on their individual branch histories. The execution engine and caches are not duplicated though - so it is still just a single-core design.Chapter 2 - The Power Processing Element (PPE)
/ref>


Floating point capacity

Its
64-bit In computer architecture, 64-bit Integer (computer science), integers, memory addresses, or other Data (computing), data units are those that are 64 bits wide. Also, 64-bit central processing unit, CPUs and arithmetic logic unit, ALUs are those ...
double precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. Flo ...
floating-point unit, and 128-bit VMX unit (using the AltiVec instruction set), can perform a theoretical 12 floating-point operations per cycle, as its floating-point unit can do floating-point multiply-adds, and come no smaller than 64-bits. That gives 3.2 billion clock cycles × 12 = 38.4 billion floating-point operations/second. The PPU is enhanced in the
PowerXCell 8i Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ...
processor to be able to make single cycle double precision floating point operations, tailored for high performance computing in supercomputers. The VMX unit in the XCPU in the Xbox 360 is enhanced with 128 registers and is not entirely compatible with regular AltiVec.


References

{{Cell microprocessor segments Cell BE architecture IBM microprocessors PowerPC implementations Xbox 360 hardware