HOME

TheInfoList



OR:

Power10 is a
superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
, multithreading,
multi-core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
family, based on the
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...
, and announced in August 2020 at the
Hot Chips The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
conference; systems with Power10 CPUs. Generally available from September 2021 in the IBM Power10 Enterprise E1080 server. The processor is designed to have 15 cores available, but a spare core will be included during manufacture to cost-effectively allow for yield issues. Power10-based processors will be manufactured by Samsung using a
7 nm In semiconductor manufacturing, the International Technology Roadmap for Semiconductors defines the 7  nm process as the MOSFET technology node following the 10 nm node. It is based on FinFET (fin field-effect transistor) technology, ...
process with 18 layers of metal and 18 billion transistors on a 602 mm2
silicon die A die, in the context of integrated circuits, is a small block of semiconducting material on which a given functional circuit is fabricated. Typically, integrated circuits are produced in large batches on a single wafer of electronic-grade sili ...
. The main features of Power10 are higher
performance per watt In computing, performance per watt is a measure of the energy efficiency of a particular computer architecture or computer hardware. Literally, it measures the rate of computation that can be delivered by a computer for every watt of power consume ...
and better
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
and I/O architectures, with a focus on
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
(AI) workloads.


Design

Each Power10 core has doubled up on most functional units compared to its predecessor
POWER9 POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in ...
. The core is eight-way multithreaded (SMT8) and has 48 KB instruction and 32 KB data L1 caches, a 2 MB large L2 cache and a very large
translation lookaside buffer A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. ...
(TLB) with 4096 entries. Latency cycles to the different cache stages and TLB has been reduced significantly. Each core has eight execution slices each with one
floating-point unit In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
(FPU),
arithmetic logic unit In computing, an arithmetic logic unit (ALU) is a Combinational logic, combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on ...
(ALU),
branch predictor In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
,
load–store unit In computer engineering, a load–store unit (LSU) is a specialized execution unit responsible for executing all load and store instructions, generating virtual addresses of load and store operations and loading data from memory or storing it back ...
and SIMD-engine, able to be fed 128-bit (64+64) instructions from the new prefix/fuse instructions of the Power ISA v.3.1. Each execution slice can handle 20 instructions each, backed up by a shared 512-entry instruction table, and fed to 128-entry-wide (64 single-threaded) load queue and 80-entry (40 single-threaded) wide store queue. Better branch prediction features have doubled the accuracy. A core has four matrix math assist (MMA) engines, for better handling of SIMD code, especially for
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the s ...
instructions where AI inference workloads have a 20-fold performance increase. The processor has two "hemispheres" with eight cores each, sharing a 64 MB L3 cache for a total of 16 cores and 128 MB L3 caches. Due to yield issues, at least one core is always disabled, reducing L3 cache by 8 MB to a usable total of 15 cores and 120 MB L3 cache. Each chip also has eight
crypto Crypto commonly refers to: * Cryptocurrency, a type of digital currency secured by cryptography and decentralization * Cryptography, the practice and study of hiding information Crypto or Krypto may also refer to: Cryptography * Cryptanalysis, ...
accelerators offloading common algorithms such as AES and
SHA-3 SHA-3 (Secure Hash Algorithm 3) is the latest member of the Secure Hash Algorithm family of standards, released by NIST on August 5, 2015. Although part of the same series of standards, SHA-3 is internally different from the MD5-like struct ...
. Increased
clock gating Clock gating is a popular technique used in many synchronous circuits for reducing dynamic power dissipation, by removing the clock signal when the circuit is not in use or ignores clock signal. Clock gating saves power by pruning the clock tree, ...
and reworked microarchitecture at every stage, together with the fuse/prefix instructions enabling more work with fewer work units, and smarter cache with lower memory latencies and effective address tagging reducing cache misses, enables the Power10 core to consume half the power as POWER9. Combined with the improvements in the compute facilities by up to 30% makes the whole processor perform 2.6× better per watt than its predecessor. And in the case of mounting two cores on the same module, up to 3 times as fast in the same power budget. As the cores can act like eight logical processors the 15-core processor looks like 120 cores to the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
. On a dual-chip module, that becomes 240 simultaneous threads per
socket Socket may refer to: Mechanics * Socket wrench, a type of wrench that uses separate, removable sockets to fit different sizes of nuts and bolts * Socket head screw, a screw (or bolt) with a cylindrical head containing a socket into which the hexag ...
.


I/O

The chips have completely reworked memory and I/O architectures, using the open
Coherent Accelerator Processor Interface Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPU ...
(OpenCAPI) and Open Memory Interface (OMI). Using serial memory communications to off chip controllers reduces signaling lanes to and from the chip, increases the
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
and allows the processor to be flexible in its memory technology,. Power10 supports a wide range of memory types, including DDR3 through DDR5, GDDR, HBM, or Persistent Storage Memory. These configurations can be changed by the customer to best fit the use case intended for the system. *
DDR4 Double Data Rate 4 Synchronous Dynamic Random-Access Memory (DDR4 SDRAM) is a type of synchronous dynamic random-access memory with a high bandwidth (" double data rate") interface. Released to the market in 2014, it is a variant of dynamic ra ...
– support for up to 16  TB RAM, 410 GB/s, 10 ns latency *
GDDR6 Graphics Double Data Rate 6 Synchronous Dynamic Random-Access Memory (GDDR6 SDRAM) is a type of synchronous graphics random-access memory (SGRAM) with a high bandwidth, " double data rate" interface, designed for use in graphics cards, game cons ...
– up to 800 GB/s * Persistent storage – up to 2 PB Power10 enables encrypting of data with no performance penalty at every stage from RAM, across accelerators and cluster nodes to data at rest. Power10 comes with PowerAXON facility enabling chip to chip, system to system and OpenCAPI bus for accelerators, I/O and other high performance cache coherent peripherals. It manages the communications between nodes in a 16x socket single chip module (SCM) cluster or a 4x socket dual chip module (DCM) cluster. It also manages the memory semantics for clustering of systems enabling load/store access from the core up to 2 PB of RAM on the entire Power10 cluster. IBM calls this feature ''Memory Inception''. Both OMI and PowerAXON can handle 1 TB/s communications off the chip. Power10 includes PCIe 5. The SCM has 32x and the DCM has 64x PCIe 5 lanes. IBM and
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
agreed that including
NVLink NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was f ...
in Power10 would be redundant since PCIe 5 is fast enough for attaching
GPUs A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
so NVLink is not present. Support for NVLink on-chip was previously a unique selling point for
POWER8 POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for s ...
and POWER9.


Variants

The Power10 chip is available in two variants, defined by
firmware In computing, firmware is a specific class of computer software that provides the low-level control for a device's specific hardware. Firmware, such as the BIOS of a personal computer, may contain basic functions of a device, and may provide h ...
in the packaging. Even though the chips are physically identical and the difference is set in firmware, it cannot be changed by the user nor IBM after manufacturing. * 15× SMT8 cores ** Optimized for high throughput but less compute intensive applications * 30× SMT4 cores ** Optimized for highly compute intensive applications that require complex instruction sets and multiple cycles for information loaded into cache


Modules

The Power10 comes in three flip-chip plastic land grid array (FC-PLGA)
packages Package may refer to: Containers or Enclosures * Packaging and labeling, enclosing or protecting products * Mail, items larger than a letter * Chip package or chip carrier * Electronic packaging, in electrical engineering * Automotive package, ...
: one single chip module (SCM) and two dual-chip modules (DCM and eSCM). * SCM, singe chip module – 3.6-4.15 GHz, up to 15 SMT8 cores. Can be clustered up to 16 sockets. x32 PCIe 5 lanes. Module size: 68.5×77.5 mm. The module has a unique configuration with 8 connectors on the substrate (OTF) for symmetric multiprocessing (SMP) cables directly connecting other Power10 SCM modules. * DCM, dual chip module – 3.4-4.0 GHz, up to 24 SMT8 cores. Can be clustered up to four sockets. x64 PCIe 5 lanes. The DCM is in the same thermal range as previous offerings. Module size: 74.5×85.75 mm. The DCM comes in four variants. **EPEU - 12 cores, 3.36-4.0 GHz **EPEV - 18 cores, 3.2-4.0 GHz **EPGW - 24 cores, 2.95-3.9 GHz **EHC8 - 24 cores, 2.95-3.9 GHz (for North American healthcare) * eSCM, entry single chip module – 3.0-3.9 GHz, up to 8 SMT8 cores. It combines two Power10 chips. The first chip is fully functional with 4-8 active cores. The other chip only uses the PCIe functionality acting as a IO switch with plenty of more PCIe lanes. These eSCM modules can be clustered up to four sockets. Module size: 74.5×85.75 mm. The eSCM is also called the "ioscm".


Systems


Enterprise

The IBM Power E1080, codename ''Denali'', is the top end Power10 computer by IBM. It's made of 1-4× Central Electronics Complex (CEC) nodes, each one taking up 5Us of space. Each node has 4× Power10 SCM, configurable with October 12, 2015 SMT8 cores per processor, and up to 16 TB OMI- DDR4 RAM. The Power E1080 natively runs
PowerVM PowerVM, formerly known as Advanced Power Virtualization (APV), is a chargeable feature of IBM POWER5, POWER6, POWER7, POWER8, POWER9 and Power10 servers and is required for support of micro-partitions and other advanced features. Support is pr ...
running
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set * Athens Internet Exchange, a European Internet exchange point Places Belgi ...
,
IBM i IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS in ...
and
little-endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
. An E1080 system also needs a 2U high System Control Unit for monitoring and configuration. The Power E1080 also supports up to sixteen I/O expansion drawers, four per CEC node. Each expansion drawer is connected to the respective CEC node by two PCIe fanout modules, and has twelve FHFL PCIe slots. Four of these slots are
PCIe PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common mo ...
3.0 x16, while the remaining eight are PCIe 3.0 x8. A maximum specification configuration allows the Power E1080 to support 192 single slot PCIe cards across a 16 socket system.


Mid-range

*IBM Power E1050 - 4U case. 2-4× CPU sockets for 2-4× DCM modules, 24-96 cores. 64× OMI memory slots which support up to 16 TB RAM. 11× PCIe slots, 8× gen.5 and 3× gen.4. 10 slots for up to 64 TB of NVMe based SSDs. Run a mix of Linux, AIX or IBM i operating systems.


Scale-out

The S-models can run Linux, IBM i and AIX. The L-models are made for Linux, but are allowed to run AIX and IBM i on up to 25% of available CPU cores. *IBM Power S1024 & L1024 - 4U case. 1-2× CPU sockets for 1-2× DCM modules, 24-48 cores. 32× OMI memory slots which support up to 8 TB RAM. 10× PCIe slots, 8× gen.5 and 2× gen.4. 16 slots for up to 102 TB of NVMe based SSDs. *IBM Power S1022 & L1022 - 2U case. 1-2× CPU sockets for 1-2× DCM modules, 24-40 cores. 32× OMI memory slots which support up to 4 TB RAM. 10× PCIe slots, 8× gen.5 and 2× gen.4. 8 slots for up to 51 TB of NVMe based SSDs. *IBM Power S1022s - 2U case. 1-2× CPU sockets for 1-2× eSCM modules, 4-16 cores. 16× OMI memory slots which support up to 2 TB RAM. 10× PCIe slots, 8× gen.5 and 2× gen.4. 8 slots for up to 51 TB of NVMe based SSDs. *IBM Power S1014 - 4U case or a tower. 1× Power10 eSCM module with 4 or 8 cores. 8× OMI memory slots which support up to 1 TB RAM. 5× PCIe slots, 4× gen.5 and 1× gen.4. 16 slots for up to 102 TB of NVMe based SSDs.


Operating system support

*
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
, version 5.9 *
PowerVM PowerVM, formerly known as Advanced Power Virtualization (APV), is a chargeable feature of IBM POWER5, POWER6, POWER7, POWER8, POWER9 and Power10 servers and is required for support of micro-partitions and other advanced features. Support is pr ...
*
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set * Athens Internet Exchange, a European Internet exchange point Places Belgi ...
*
IBM i IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS in ...


Comparison with earlier POWER CPUs

The change to a 7-nm fabrication process results in significantly higher performance per watt. The PowerAXON facility now extends all the way to 2  PB of unified clustered memory space, shared across multiple cluster nodes, and includes support for PCIe 5. New
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
instructions and new data types including bfloat16, INT4(INTEGER) and INT8(BIGINT). are aimed at improving AI workloads. Unlike earlier POWER9 and POWER8 CPUs, Power10 requires closed source, third party firmware in security sensitive areas of the CPU module, along with additional closed source, third party firmware in the required off-module memory controller.


Branding

Power10 is unusual in that its name is not capitalised like POWER9 and all other previous POWER processors are. This change is one part in IBM's rebranding of their Power Systems offering, which beginning with Power10 is now just "Power". Power10 also has a logo.No More Shouting The Name “Power” (Well, Except In Our Title Here)
/ref>


See also

*
IBM Power microprocessors IBM Power microprocessors (originally POWER prior to Power10) are designed and sold by IBM for servers and supercomputers. The name "POWER" was originally presented as an acronym for "Performance Optimization With Enhanced RISC". The Power l ...
*
OpenPOWER Foundation The OpenPOWER Foundation is a collaboration around Power ISA-based products initiated by IBM and announced as the "OpenPOWER Consortium" on August 6, 2013. IBM is opening up technology surrounding their Power Architecture offerings, such as pro ...
*
POWER9 POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in ...


References

{{Reflist Computer-related introductions in 2020 IBM microprocessors OpenPower IP cores Parallel computing Power microprocessors 64-bit microprocessors