Heterogeneous computing
   HOME

TheInfoList



OR:

Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or
energy efficiency Energy efficiency may refer to: * Energy efficiency (physics), the ratio between the useful output and input of an energy conversion process ** Electrical efficiency, useful power output per electrical power consumed ** Mechanical efficiency, a ra ...
not just by adding the same type of processors, but by adding dissimilar
coprocessors A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or ...
, usually incorporating specialized processing capabilities to handle particular tasks.


Heterogeneity

Usually heterogeneity in the context of computing referred to different instruction-set architectures (ISA), where the main processor has one and other processors have another - usually a very different - architecture (maybe more than one), not just a different microarchitecture (
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
number processing is a special case of this - not usually referred to as heterogeneous). In the past heterogeneous computing meant different ISAs had to be handled differently, while in a modern example,
Heterogeneous System Architecture Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA ...
(HSA) systems eliminate the difference (for the user) while using multiple processor types (typically CPUs and
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s), usually on the same
integrated circuit An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece (or "chip") of semiconductor material, usually silicon. Large numbers of tiny ...
, to provide the best of both worlds: general GPU processing (apart from the GPU's well-known 3D graphics rendering capabilities, it can also perform mathematically intensive computations on very large data-sets), while CPUs can run the operating system and perform traditional serial tasks. The level of heterogeneity in modern computing systems is gradually increasing as further scaling of fabrication technologies allows for formerly discrete components to become integrated parts of a
system-on-chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memor ...
, or SoC. For example, many new processors now include built-in logic for interfacing with other devices (
SATA SATA (Serial AT Attachment) is a computer bus interface that connects host bus adapters to mass storage devices such as hard disk drives, optical drives, and solid-state drives. Serial ATA succeeded the earlier Parallel ATA (PATA) standard to ...
, PCI,
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 198 ...
,
USB Universal Serial Bus (USB) is an industry standard that establishes specifications for cables, connectors and protocols for connection, communication and power supply (interfacing) between computers, peripherals and other computers. A broad ...
,
RFID Radio-frequency identification (RFID) uses electromagnetic fields to automatically identify and track tags attached to objects. An RFID system consists of a tiny radio transponder, a radio receiver and transmitter. When triggered by an electromag ...
,
radio Radio is the technology of signaling and communicating using radio waves. Radio waves are electromagnetic waves of frequency between 30 hertz (Hz) and 300 gigahertz (GHz). They are generated by an electronic device called a transmit ...
s,
UART A universal asynchronous receiver-transmitter (UART ) is a computer hardware device for asynchronous serial communication in which the data format and transmission speeds are configurable. It sends data bits one by one, from the least significan ...
s, and
memory controller The memory controller is a digital circuit that manages the flow of data going to and from the computer's main memory. A memory controller can be a separate chip or integrated into another chip, such as being placed on the same die or as an int ...
s), as well as programmable functional units and
hardware accelerators Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calcul ...
(
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s,
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
co-processors, programmable network processors, A/V encoders/decoders, etc.). Recent findings show that a heterogeneous-ISA chip multiprocessor that exploits diversity offered by multiple ISAs can outperform the best same-ISA homogeneous architecture by as much as 21% with 23% energy savings and a reduction of 32% in Energy Delay Product (EDP). AMD's 2014 announcement on its pin-compatible ARM and x86 SoCs, codename Project Skybridge, suggested a heterogeneous-ISA (ARM+x86) chip multiprocessor in the making.


Heterogeneous CPU topology

A system with heterogeneous CPU topology is a system where the same ISA is used, but the cores themselves are different in speed. The setup is more similar to a
symmetric multiprocessor Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all ...
. (Although such systems are technically asymmetric multiprocessors, the cores do not differ in roles or device access.) There are typically two types of cores: a higher performance core usually known as the "big" or P-core and a more power efficient core usually known as the "small" or E-core. A common use of such topology is to provide better power efficiency in mobile SoCs. * ARM big.LITTLE (succeeded by DynamIQ) is the prototypical case, where faster high-power cores are combined with slower low-power cores. * Apple has produced
Apple silicon Apple silicon is a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture. It is the basis of most new Mac computers as well as iPhone, iPad, iPod Touch, Apple TV, ...
ARM cores with similar organization. * Intel has also produced hybrid x86-64 cores codenamed Lakefield, although not without major limitations in instruction set support. The newer
Alder Lake Alder Lake is Intel's codename for the 12th generation of Intel Core processors based on a hybrid architecture utilizing Golden Cove performance cores and Gracemont efficient cores. It is fabricated using Intel's Intel 7 process, previousl ...
reduces the sacrifice by adding more instruction set support to the "small" core.


Challenges

Heterogeneous computing systems present new challenges not found in typical homogeneous systems. The presence of multiple processing elements raises all of the issues involved with homogeneous parallel processing systems, while the level of heterogeneity in the system can introduce non-uniformity in system development, programming practices, and overall system capability. Areas of heterogeneity can include: ; ISA or instruction-set architecture : Compute elements may have different instruction set architectures, leading to binary incompatibility. ; ABI or
application binary interface In computer software, an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user. An ' ...
: Compute elements may interpret memory in different ways. This may include both
endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...
,
calling convention In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have bee ...
, and memory layout, and depends on both the architecture and
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
being used. ;
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
or
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
: Library and OS services may not be uniformly available to all compute elements. ; Low-Level Implementation of Language Features : Language features such as functions and threads are often implemented using
function pointer A function pointer, also called a subroutine pointer or procedure pointer, is a pointer that points to a function. As opposed to referencing a data value, a function pointer points to executable code within memory. Dereferencing the function poi ...
s, a mechanism which requires additional translation or abstraction when used in heterogeneous environments. ; Memory Interface and
Hierarchy A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an important ...
: Compute elements may have different
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
structures,
cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
protocols, and memory access may be uniform or non-uniform memory access (
NUMA Nuclear mitotic apparatus protein 1 is a protein that in humans is encoded by the ''NUMA1'' gene. Interactions Nuclear mitotic apparatus protein 1 has been shown to interact with PIM1, Band 4.1, GPSM2 G-protein-signaling modulator 2, also call ...
). Differences can also be found in the ability to read arbitrary data lengths as some processors/units can only perform byte-, word-, or burst accesses. ; Interconnect : Compute elements may have differing types of interconnect aside from basic memory/bus interfaces. This may include dedicated network interfaces, Direct memory access ( DMA) devices, mailboxes, FIFOs, and scratchpad memories, etc. Furthermore, certain portions of a heterogeneous system may be cache-coherent, whereas others may require explicit software-involvement for maintaining consistency and coherency. ; Performance : A heterogeneous system may have CPUs that are identical in terms of architecture, but have underlying micro-architectural differences that lead to various levels of performance and power consumption. Asymmetries in capabilities paired with opaque programming models and operating system abstractions can sometimes lead to performance predictability problems, especially with mixed workloads. ;Data Partitioning : While partitioning data on homogeneous platforms is often trivial, it has been shown that for the general heterogeneous case, the problem is NP-Complete. For small numbers of partitions, optimal partitionings that perfectly balance load and minimize communication volume have been shown to exist.


Example hardware

Heterogeneous computing hardware can be found in every domain of computing—from high-end servers and high-performance computing machines all the way down to low-power embedded devices including mobile phones and tablets. * High Performance Computing **
Cydra-5 The Cydra-5 departmental supercomputer is the first minisupercomputer designed by Cydrome. It was completed in 1987. At that time Cydra-5 cost from $0.5 million to $1 million, but achieved one-half the performance of contemporary supercomputers whi ...
(Numeric coprocessor) **
Cray XD1 The Cray XD1 was an entry-level supercomputer range, made by Cray Inc. The XD1 uses AMD Opteron 64-bit CPUs, and utilizes the Direct Connect Architecture over HyperTransport to remove the bottleneck at the PCI and contention at the memory. The ...
(FPGA) ** SRC Computers SRC-6 and SRC-7 (FPGA) * Embedded Systems (DSP and Mobile Platforms) **
Texas Instruments Texas Instruments Incorporated (TI) is an American technology company headquartered in Dallas, Texas, that designs and manufactures semiconductors and various integrated circuits, which it sells to electronics designers and manufacturers globall ...
OMAP (Media coprocessor) ** Analog Devices Blackfin (DSP and media coprocessors) **
Qualcomm Qualcomm () is an American multinational corporation headquartered in San Diego, California, and incorporated in Delaware. It creates semiconductors, software, and services related to wireless technology. It owns patents critical to the 5G, 4 ...
Snapdragon ''Antirrhinum'' is a genus of plants commonly known as dragon flowers, snapdragons and dog flower because of the flowers' fancied resemblance to the face of a dragon that opens and closes its mouth when laterally squeezed. They are native to r ...
(GPU, DSP, image, sometimes AI coprocessor; Modem, Sensors) **
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
Tegra Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics proc ...
(GPU; Modem, Sensors) **
Samsung The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
Exynos Exynos, formerly Hummingbird (), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs. Exy ...
(GPU; Modem, Sensors) **
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
"A" series (CPU, GPU; Modem) ** Movidius Myriad Vision processing units, which includes several symmetric processors, complemented by fixed function units, and a pair of
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...
based controllers. **
HiSilicon HiSilicon () is a Chinese fabless semiconductor company based in Shenzhen, Guangdong and wholly owned by Huawei. HiSilicon purchases licenses for CPU designs from ARM Holdings, including the ARM Cortex-A9 MPCore, ARM Cortex-M3, ARM Cortex-A7 MP ...
Kirin SoCs (GPU; Modem, Sensors) **
MediaTek MediaTek Inc. () is a Taiwanese fabless semiconductor company that provides chips for wireless communications, high-definition television, handheld mobile devices like smartphones and tablet computers, navigation systems, consumer multimedia pr ...
SoCs (GPU; Modem, Sensors) ** Cadence Design Systems Tensilica DSPs * Reconfigurable Computing **
Xilinx Xilinx, Inc. ( ) was an American technology and semiconductor company that primarily supplied programmable logic devices. The company was known for inventing the first commercially viable field-programmable gate array (FPGA) and creating the ...
Field-programmable gate array A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware d ...
(FPGA; e.g., Virtex-II Pro, Virtex 4 FX, Virtex 5 FXT) and
Zynq Xilinx, Inc. ( ) was an American technology and semiconductor company that primarily supplied programmable logic devices. The company was known for inventing the first commercially viable field-programmable gate array (FPGA) and creating the fi ...
and Versal Platforms **
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
"Stellarton" (Atom +
Altera Altera Corporation was a manufacturer of programmable logic devices (PLDs) headquartered in San Jose, California. It was founded in 1983 and acquired by Intel in 2015. The main product lines from Altera were the flagship Stratix series, mid-ran ...
FPGA A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
) * Networking ** Intel
IXP Internet exchange points (IXes or IXPs) are common grounds of IP networking, allowing participant Internet service providers (ISPs) to exchange data destined for their respective networks. IXPs are generally located at places with preexisting ...
Network Processors **
Netronome Netronome is a privately held fabless semiconductor company specializing in the design of network flow processors used for intelligent flow processing in network and communications devices, such as switches, routers and cyber security application ...
NFP Network Processors * General Purpose Computing, Gaming, and Entertainment Devices **
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
Sandy Bridge, Ivy Bridge, and Haswell CPUs (Integrated GPU, OpenCL-capable since Ivy Bridge) **
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
Excavator Excavators are heavy construction equipment consisting of a boom, dipper (or stick), bucket and cab on a rotating platform known as the "house". The house sits atop an undercarriage with tracks or wheels. They are a natural progression fro ...
and
Ryzen Ryzen ( ) is a brand of multi-core x86-64 microprocessors designed and marketed by AMD for desktop, mobile, server, and embedded platforms based on the Zen microarchitecture. It consists of central processing units (CPUs) marketed for mainst ...
APUs (Integrated GPU, OpenCL-capable) ** IBM
Cell Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery ...
, found in the
PlayStation is a video gaming brand that consists of five home video game consoles, two handhelds, a media center, and a smartphone, as well as an online service and multiple magazines. The brand is produced by Sony Interactive Entertainment, a divisi ...
3 (Vector coprocessor) *** SpursEngine, a variant of the IBM Cell processor **
Emotion Engine The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North Americ ...
, found in the
PlayStation 2 The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October 2000, in Europe on 24 November 2000, and in Australia on 3 ...
(Vector and media coprocessors) **
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
big.LITTLE/DynamIQ CPU architecture (heterogenous topology) *** Nearly all ARM vendors offer heterogeneous solutions; ARM, Qualcomm, Nvidia, Apple, Samsung, HiSilicon, MediaTek, etc.


See also

*
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
* MPSoC * big.LITTLE/DynamIQ


References

{{Reflist, 30em