Manycore
   HOME

TheInfoList



OR:

Manycore processors are special kinds of
multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
s designed for a high degree of parallel processing, containing numerous simpler, independent
processor core A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
s (from a few tens of cores to thousands or more). Manycore processors are used extensively in
embedded computer An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
s and
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
.


Contrast with multicore architecture

Manycore processors are distinct from
multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
s in being optimized from the outset for a higher degree of
explicit parallelism In computer programming, explicit parallelism is the representation of concurrent computations by means of primitives in the form of special-purpose directives or function calls. Most parallel primitives are related to process synchronization, com ...
, and for higher throughput (or lower power consumption) at the expense of latency and lower single-thread performance. The broader category of
multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
s, by contrast, are usually designed to efficiently run ''both'' parallel ''and'' serial code, and therefore place more emphasis on high single-thread performance (e.g. devoting more silicon to
out of order execution In computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a process ...
, deeper
pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
s, more
superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
execution units, and larger, more general caches), and
shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
. These techniques devote runtime resources toward figuring out implicit parallelism in a single thread. They are used in systems where they have evolved continuously (with backward compatibility) from single core processors. They usually have a 'few' cores (e.g. 2,4,8), and may be complemented by a manycore accelerator (such as a
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
) in a
heterogeneous system Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the Uniformity (chemistry), uniformity of a Chemical substance, substance or organism. A material or image that is homogeneous is uniform in compos ...
.


Motivation

Cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
is an issue limiting the scaling of multicore processors. Manycore processors may bypass this with methods such as
message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporting i ...
,
scratchpad memory Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. In reference to a microprocess ...
, DMA,
partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion ...
, or read-only/non-coherent caches. A manycore processor using a
network on a chip A network on a chip or network-on-chip (NoC or )This article uses the convention that "NoC" is pronounced . Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as an ...
and local memories gives software the opportunity to explicitly optimise the spatial layout of tasks (e.g. as seen in tooling developed for
TrueNorth A cognitive computer is a computer that hardwires artificial intelligence and machine-learning algorithms into an integrated circuit (printed circuit board) that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic ...
). Manycore processors may have more in common (conceptually) with technologies originating in
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
such as clusters and
vector processors In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ' ...
. GPUs may be considered a form of manycore processor having multiple
shader processing units Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was lau ...
, and only being suitable for highly parallel code (high throughput, but extremely poor single thread performance).


Suitable programming models

* Message passing interface *
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
or other APIs supporting
compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main progr ...
s *
Partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portion ...
*
Actor model The actor model in computer science is a mathematical model of concurrent computation that treats ''actor'' as the universal primitive of concurrent computation. In response to a message it receives, an actor can: make local decisions, create more ...
*
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...
*
Dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dataf ...


Classes of manycore systems

*
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s, which can be described as manycore
vector processors In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ' ...
*
Massively parallel processor array A massively parallel processor array, also known as a multi purpose processor array (MPPA) is a type of integrated circuit which has a massively parallel array of hundreds or thousands of CPUs and RAM memories. These processors pass work to one an ...
*
Asynchronous array of simple processors Asynchrony is the state of not being in synchronization. Asynchrony or asynchronous may refer to: Electronics and computing * Asynchrony (computer programming), the occurrence of events independent of the main program flow, and ways to deal with ...


Specific manycore architectures

* ZettaScale

Japanese
PEZY Computing PEZY Computing is a Japanese fabless computer chip design company specialising in the design of manycore processors for supercomputers. History PEZY Computing was founded in 2010. The company's first manycore processor the PEZY-1 was launched in ...
2048-core modules *
Xeon Phi Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application program ...
coprocessor, which has MIC (''Many Integrated Cores'') architecture *
Tilera Tilera Corporation was a fabless semiconductor company focusing on manycore embedded processor design. The company shipped multiple processors, including the TILE64, TILE''Pro''64, and the TILE''Pro''36, TILE-Gx72, TILE-Gx36, TILE-Gx16 and TILE- ...
*
Adapteva Zero ASIC Corporation, formerly Adapteva, Inc., is a fabless semiconductor company focusing on low power many core microprocessor design. The company was the second company to announce a design with 1,000 specialized processing cores on a single ...
Epiphany Architecture, a manycore chip using PGAS
scratchpad memory Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. In reference to a microprocess ...
* Coherent Logix hx3100 Processor, a 100-core DSP/GPP processor based on
HyperX Architecture HP Inc. is an American multinational information technology company headquartered in Palo Alto, California, that develops personal computers (PCs), printers and related supplies, as well as 3D printing solutions. It was formed on Novembe ...
*
Movidius Myriad 2 Movidius is a company based in San Mateo, California, that designs specialised low-power processor chips for computer vision. The company was acquired by Intel in September 2016. Company history Movidius was co-founded in Dublin in 2005, by S ...
, a manycore
vision processing unit A vision processing unit (VPU) is (as of 2018) an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks. Overview Vision processing units are distinct from video processing units ...
(VPU) *
Kalray Kalray is a fabless semiconductor company. Founded in 2008 as a spin-off of CEA French lab, with investors such as Renault–Nissan–Mitsubishi Alliance, Safran, NXP Semiconductors, CEA and Bpifrance. Product history The first Kalray paten ...
, a manycore
PCI-e PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial communication, serial computer expansion bus standard, designed to replace the older Conventional PCI, PCI, PCI-X and A ...
accelerator for data-intensive tasks *
Teraflops Research Chip Intel Teraflops Research Chip (codenamed ''Polaris'') is a research manycore processor containing 80 cores, using a network-on-chip architecture, developed by Intel's Tera-Scale Computing Research Program. It was manufactured using a 65 n ...
, a manycore processor using message passing *
TrueNorth A cognitive computer is a computer that hardwires artificial intelligence and machine-learning algorithms into an integrated circuit (printed circuit board) that closely reproduces the behavior of the human brain. It generally adopts a neuromorphic ...
, an
AI accelerator An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications in ...
with a manycore network on a chip architecture *
Green arrays Green is the color between cyan and yellow on the visible spectrum. It is evoked by light which has a dominant wavelength of roughly 495570 nm. In subtractive color systems, used in painting and color printing, it is created by a combin ...
, a manycore processor using message passing aimed at low power applications *
Sunway SW26010 The SW26010 is a 260-core manycore processor designed by the Shanghai Integrated Circuit Technology and Industry Promotion Center (ICC for short)( Chinese: 上海集成电路技术与产业促进中心 (简称ICC)). It implements the Sunway architec ...
, a 260-core manycore processor used in the, then top 1 supercomputer Sunway TaihuLight ** SW52020, an improved 520-core variant of SW26010, with 512-bit SIMD (also adding support for half-precision), used in a prototype, meant for an exascale system (and in the future 10 exascale system), and according to datacenterdynamics China is rumored to already have two separate exascale systems secretly *
Eyeriss An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications i ...
, a manycore processor designed for running convolutional neural nets for embedded vision applications *
Graphcore Graphcore is a British semiconductor company that develops accelerators for AI and machine learning. It aims to make a massively parallel Intelligence Processing Unit (IPU) that holds the complete machine learning model inside the processor. Hi ...
, a manycore
AI accelerator An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications in ...


Specific manycore computers with 1M+ CPU cores

A number of computers built from multicore processors have one million or more individual CPU cores. Examples include: *
Gyoukou is a supercomputer developed by and PEZY Computing, based around ExaScaler's ZettaScaler immersion cooling system. It was deployed at the Japan Agency for Marine-Earth Science and Technology (JAMSTEC) Yokohama Institute for Earth Sciences, th ...
(
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through ancestry or culture ** Japanese diaspor ...
: 暁光 Hepburn: ''gyōkō'', dawn light), a
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
developed by ExaScaler and
PEZY Computing PEZY Computing is a Japanese fabless computer chip design company specialising in the design of manycore processors for supercomputers. History PEZY Computing was founded in 2010. The company's first manycore processor the PEZY-1 was launched in ...
, with 20,480,000 processing elements total plus the 1250 Intel Xeon D host processors. *
SpiNNaker A spinnaker is a sail designed specifically for sailing off the wind on courses between a reach (wind at 90° to the course) to downwind (course in the same direction as the wind). Spinnakers are constructed of lightweight fabric, usually n ...
, a massively parallel (1M CPU cores) manycore processor (ARM-based) built as part of the
Human Brain Project The Human Brain Project (HBP) is a large ten-year scientific research project, based on exascale supercomputers, that aims to build a collaborative ICT-based scientific research infrastructure to allow researchers across Europe to advance knowl ...
.


Specific computers with 5M+ CPU cores

Quite a few
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
s have over a million of even over 5 million CPU cores. When there are also coprocessors, e.g. GPUs used with, then those cores are not listed in the core-count, then quite a few more computers would hit those targets. *
Frontier A frontier is the political and geographical area near or beyond a boundary. A frontier can also be referred to as a "front". The term came from French in the 15th century, with the meaning "borderland"—the region of a country that fronts o ...
* Fugaku, a Japanese
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
using
Fujitsu A64FX The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. It powers the Fugaku supercomputer, the fastest supercomputer in the wor ...
ARM-based cores, 7,630,848 in total. * Sunway TaihuLight, a massively parallel (10M CPU cores) Chinese
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructions ...
, once one of the fastest supercomputers in the world, using a custom manycore architecture. As of November 2018, the world's third fastest supercomputer (as ranked by the
TOP500 The TOP500 project ranks and details the 500 most powerful non-distributed computing, distributed computer systems in the world. The project was started in 1993 and publishes an updated list of the supercomputers twice a year. The first of these ...
list), the Chinese Sunway TaihuLight, obtains its performance from 40,960
SW26010 The SW26010 is a 260-core manycore processor designed by the Shanghai Integrated Circuit Technology and Industry Promotion Center (ICC for short)( Chinese: 上海集成电路技术与产业促进中心 (简称ICC)). It implements the Sunway architec ...
manycore processors, each containing 256 cores.


See also

*
Multicore A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such ...
*
Vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
*
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
*
High-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
*
Computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The comp ...
* Multiprocessor system on a chip *
Vision processing unit A vision processing unit (VPU) is (as of 2018) an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks. Overview Vision processing units are distinct from video processing units ...
*
Memory access pattern In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache performa ...
*
Cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
*
Embarrassingly parallel In parallel computing, an embarrassingly parallel workload or problem (also called embarrassingly parallelizable, perfectly parallel, delightfully parallel or pleasingly parallel) is one where little or no effort is needed to separate the problem ...
*
Massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of t ...
*
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...


References


External links


Architecting solutions for the Manycore future
published on Feb 19, 2010 (more than one dead link in the slide)
Eyeriss architecture
{{Parallel computing Computer architecture Manycore processors Parallel computing