POWER9

picture info	POWER9 POWER9 is a family of superscalar, multithreading, multi-core microprocessors produced by IBM, based on the Power ISA. It was announced in August 2016. The POWER9-based processors are being manufactured using a 14 nm FinFET process, in 12- and 24-core versions, for scale out and scale up applications, and possibly other variations, since the POWER9 architecture is open for licensing and modification by the OpenPOWER Foundation members. Summit, the fourth fastest supercomputer in the world (based on the Top500 list as of June 2022), is based on POWER9, while also using Nvidia Tesla GPUs as accelerators. Design Core The POWER9 core comes in two variants, a four-way multithreaded one called ''SMT4'' and an eight-way one called ''SMT8''. The SMT4- and SMT8-cores are similar, in that they consist of a number of so-called ''slices'' fed by common schedulers. A slice is a rudimentary 64-bit single-threaded processing core with load store unit (LSU), integer unit (ALU) and a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Power10 Power10 is a superscalar, multithreading, multi-core microprocessor family, based on the open source Power ISA, and announced in August 2020 at the Hot Chips conference; systems with Power10 CPUs. Generally available from September 2021 in the IBM Power10 Enterprise E1080 server. The processor is designed to have 15 cores available, but a spare core will be included during manufacture to cost-effectively allow for yield issues. Power10-based processors will be manufactured by Samsung using a 7 nm process with 18 layers of metal and 18 billion transistors on a 602 mm2 silicon die. The main features of Power10 are higher performance per watt and better memory and I/O architectures, with a focus on artificial intelligence (AI) workloads. Design Each Power10 core has doubled up on most functional units compared to its predecessor POWER9. The core is eight-way multithreaded (SMT8) and has 48 KB instruction and 32 KB data L1 caches, a 2 MB large L2 cache ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	NVLink NVLink is a wire-based serial multi-lane near-range communications link developed by Nvidia. Unlike PCI Express, a device can consist of multiple NVLinks, and devices use mesh networking to communicate instead of a central hub. The protocol was first announced in March 2014 and uses a proprietary high-speed signaling interconnect (NVHS). Principle NVLink is a wire-based communications protocol for near-range semiconductor communications developed by Nvidia that can be used for data and control code transfers in processor systems between CPUs and GPUs and solely between GPUs. NVLink specifies a point-to-point connection with data rates of 20, 25 and 50 Gbit/s (v1.0/v2.0/v3.0 resp.) per differential pair. Eight differential pairs form a "sub-link" and two "sub-links", one for each direction, form a "link". The total data rate for a sub-link is 25 GByte/s and the total data rate for a link is 50 GByte/s. Each V100 GPU supports up to six links. Thus, each GPU is capable of supporti ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Coherent Accelerator Processor Interface Coherent Accelerator Processor Interface (CAPI), is a high-speed processor expansion bus standard for use in large data center computers, initially designed to be layered on top of PCI Express, for directly connecting central processing units (CPUs) to external accelerators like graphics processing units (GPUs), ASICs, FPGAs or fast storage. It offers low latency, high speed, direct memory access connectivity between devices of different instruction set architectures. History The performance scaling traditionally associated with Moore's Law—dating back to 1965—began to taper off around 2004, as both Intel's Prescott architecture and IBM's Cell processor pushed toward a 4 GHz operating frequency. Here both projects ran into a thermal scaling wall, whereby heat extraction problems associated with further increases in operating frequency largely outweighed gains from shorter cycle times. Over the decade that followed, few commercial CPU products exceeded 4 GHz, wi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Summit (supercomputer) Summit or OLCF-4 is a supercomputer developed by IBM for use at Oak Ridge Leadership Computing Facility (OLCF), a facility at the Oak Ridge National Laboratory, capable of 200 petaFLOPS thus making it the 4th fastest supercomputer in the world after Frontier (OLCF-5), Fugaku, and LUMI. It held the number 1 position from November 2018 to June 2020. Its current LINPACK benchmark is clocked at 148.6 petaFLOPS. As of November 2019, the supercomputer had ranked as the 5th most energy efficient in the world with a measured power efficiency of 14.668 gigaFLOPS/watt. Summit was the first supercomputer to reach exaflop (a quintillion operations per second) speed, achieving 1.88 exaflops during a genomic analysis and is expected to reach 3.3 exaflops using mixed-precision calculations. History The United States Department of Energy awarded a $325 million contract in November 2014 to IBM, NVIDIA and Mellanox. The effort resulted in construction of Summit and Sierra. Summit is ta ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA is an evolution of the PowerPC ISA, created by the mergers of the core PowerPC ISA and the optional Book E for embedded applications. The merger of these two components in 2006 was led by Power.org founders IBM and Freescale Semiconductor. The ISA is divided into several ''categories'' which are described in a certain ''Book''. Processors implement a set of these categories as required for their task. Different classes of processors are required to implement certain categories, for example a server-class processor includes the categories: ''Base'', ''Server'', ''Floating-Point'', ''64-Bit'', etc. All processors implement the Base category. Power ISA is a RISC load/store architecture. It has multiple sets of registers: * ''32'' × 32-b ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common motherboard interface for personal computers' graphics cards, hard disk drive host adapters, SSDs, Wi-Fi and Ethernet hardware connections. PCIe has numerous improvements over the older standards, including higher maximum system bus throughput, lower I/O pin count and smaller physical footprint, better performance scaling for bus devices, a more detailed error detection and reporting mechanism (Advanced Error Reporting, AER), and native hot-swap functionality. More recent revisions of the PCIe standard provide hardware support for I/O virtualization. The PCI Express electrical interface is measured by the number of simultaneous lanes. (A lane is a single send/receive line of data. The analogy is a highway with traffic in both directions. ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	POWER8 POWER8 is a family of superscalar multi-core microprocessors based on the Power ISA, announced in August 2013 at the Hot Chips conference. The designs are available for licensing under the OpenPOWER Foundation, which is the first time for such availability of IBM's highest-end processors. Systems based on POWER8 became available from IBM in June 2014. Systems and POWER8 processor designs made by other OpenPOWER members were available in early 2015. Design POWER8 is designed to be a massively multithreaded chip, with each of its cores capable of handling eight hardware threads simultaneously, for a total of 96 threads executed simultaneously on a 12-core chip. The processor makes use of very large amounts of on- and off-chip eDRAM caches, and on-chip memory controllers enable very high bandwidth to memory and system I/O. For most workloads, the chip is said to perform two to three times as fast as its predecessor, the POWER7. POWER8 chips comes in 6- or 12-core variants; e ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	PowerVM PowerVM, formerly known as Advanced Power Virtualization (APV), is a chargeable feature of IBM POWER5, POWER6, POWER7, POWER8, POWER9 and Power10 servers and is required for support of micro-partitions and other advanced features. Support is provided for IBM i, AIX and Linux. Description IBM PowerVM has the following components: * A "VET" code, which activates firmware required to support resource sharing and other features. * Installation media for the Virtual I/O Server (VIOS), which is a service partition providing sharing services for disk and network adapters. * Installation media for Lx86, x86 binary translation software, which allows Linux applications compiled for the Intel x86 platform to run in POWER-emulation mode. A supported Linux distribution is a co-requisite for use of this feature. IBM PowerVM comes in three editions: IBM PowerVM Express * Only supported on "Express" servers (e.g. Power 710/730, 720/740, 750 and Power Blades). * Limited to three partitio ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Multithreading (computer Architecture) In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution concurrently, supported by the operating system. This approach differs from multiprocessing. In a multithreaded application, the threads share the resources of a single or multiple cores, which include the computing units, the CPU caches, and the translation lookaside buffer (TLB). Where multiprocessing systems include multiple complete processing units in one or more cores, multithreading aims to increase utilization of a single core by using thread-level parallelism, as well as instruction-level parallelism. As the two techniques are complementary, they are combined in nearly all modern systems architectures with multiple multithreading CPUs and with CPUs with multiple multithreading cores. Overview The multithreading paradigm has become more popular as efforts to further exploit instruction-level p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Multi-core A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such as add, move data, and branch) but the single processor can run instructions on separate cores at the same time, increasing overall speed for programs that support multithreading or other parallel computing techniques. Manufacturers typically integrate the cores onto a single integrated circuit die (known as a chip multiprocessor or CMP) or onto multiple dies in a single chip package. The microprocessors currently used in almost all personal computers are multi-core. A multi-core processor implements multiprocessing in a single physical package. Designers may couple cores in a multi-core device tightly or loosely. For example, cores may or may not share caches, and they may implement message passing or shared-memory inter-core communicat ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Input/output In computing, input/output (I/O, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, possibly a human or another information processing system. Inputs are the signals or data received by the system and outputs are the signals or data sent from it. The term can also be used as part of an action; to "perform I/O" is to perform an input or output operation. are the pieces of hardware used by a human (or other system) to communicate with a computer. For instance, a keyboard or computer mouse is an input device for a computer, while monitors and printers are output devices. Devices for communication between computers, such as modems and network cards, typically perform both input and output operations. Any interaction with the system by a interactor is an input and the reaction the system responds is called the output. The designation of a device as either input or output depends on perspective. Mice a ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	EDRAM Embedded DRAM (eDRAM) is dynamic random-access memory (DRAM) integrated on the same die or multi-chip module (MCM) of an application-specific integrated circuit (ASIC) or microprocessor. eDRAM's cost-per-bit is higher when compared to equivalent standalone DRAM chips used as external memory, but the performance advantages of placing eDRAM onto the same chip as the processor outweigh the cost disadvantages in many applications. In performance and size, eDRAM is positioned between level 3 cache and conventional DRAM on the memory bus, and effectively functions as a level 4 cache, though architectural descriptions may not explicitly refer to it in those terms. Embedding memory on the ASIC or processor allows for much wider buses and higher operation speeds, and due to much higher density of DRAM in comparison to SRAM, larger amounts of memory can be installed on smaller chips if eDRAM is used instead of eSRAM. eDRAM requires additional fab process steps compared with embedded SR ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]