Heterogenous System Architecture
   HOME

TheInfoList



OR:

Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
s and
graphics processors A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
on the same bus, with shared
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
and tasks. The HSA is being developed by the
HSA Foundation The HSA Foundation is a not-for-profit engineering organization of industry and academia that works on the development of the Heterogeneous System Architecture (HSA), a set of royalty-free computer hardware specifications, as well as open source so ...
, which includes (among many others)
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other
compute device OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-pr ...
s, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
or
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
). CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance. Heterogeneous computing is widely used in system-on-chip devices such as tablets,
smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...
s, other mobile devices, and
video game console A video game console is an electronic device that Input/output, outputs a video signal or image to display a video game that can be played with a game controller. These may be home video game console, home consoles, which are generally placed i ...
s. HSA allows programs to use the graphics processor for
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
calculations without separate memory or scheduling.


Rationale

The rationale behind HSA is to ease the burden on programmers when offloading calculations to the GPU. Originally driven solely by AMD and called the FSA, the idea was extended to encompass processing units other than GPUs, such as other manufacturers' DSPs, as well. Modern GPUs are very well suited to perform
single instruction, multiple data Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(SIMD) and single instruction, multiple threads (SIMT), while modern CPUs are still being optimized for branching. etc.


Overview

Originally introduced by
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
s such as the
Cell Broadband Engine Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ma ...
, sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
s (CPUs),
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s (GPUs),
digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio si ...
s (DSPs), or any type of
application-specific integrated circuit An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficie ...
s (ASICs). The system architecture allows any accelerator, for instance a
graphics processor A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
, to operate at the same processing level as the system's CPU. Among its main features, HSA defines a unified virtual address space for compute devices: where GPUs traditionally have their own memory, separate from the main (CPU) memory, HSA requires these devices to share
page tables A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, ...
so that devices can exchange data by sharing pointers. This is to be supported by custom memory management units. To render interoperability possible and also to ease various aspects of programming, HSA is intended to be
ISA Isa or ISA may refer to: Places * Isa, Amur Oblast, Russia * Isa, Kagoshima, Japan * Isa, Nigeria * Isa District, Kagoshima, former district in Japan * Isa Town, middle class town located in Bahrain * Mount Isa, Queensland, Australia * Mount Is ...
-agnostic for both CPUs and accelerators, and to support high-level programming languages. So far, the HSA specifications cover:


HSA Intermediate Layer

HSAIL (Heterogeneous System Architecture Intermediate Language), a virtual instruction set for parallel programs * similar to
LLVM Intermediate Representation LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
and SPIR (used by
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
and Vulkan) * finalized to a specific instruction set by a
JIT compiler In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may cons ...
* make late decisions on which core(s) should run a task * explicitly parallel * supports exceptions, virtual functions and other high-level features * debugging support


HSA memory model

* compatible with
C++11 C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by ...
, OpenCL,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
and .NET memory models * relaxed consistency * designed to support both managed languages (e.g. Java) and unmanaged languages (e.g. C) * will make it much easier to develop 3rd-party compilers for a wide range of heterogeneous products programmed in Fortran, C++,
C++ AMP C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
, Java, et al.


HSA dispatcher and run-time

* designed to enable heterogeneous task queueing: a work queue per core, distribution of work into queues, load balancing by work stealing * any core can schedule work for any other, including itself * significant reduction of overhead of scheduling work for a core Mobile devices are one of the HSA's application areas, in which it yields improved power efficiency.


Block diagrams

The illustrations below compare CPU-GPU coordination under HSA versus under traditional architectures.


Software support

Some of the HSA-specific features implemented in the hardware need to be supported by the
operating system kernel The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system. It is the portion of the operating system code that is always resident in memory and facilitates in ...
and specific device drivers. For example, support for AMD
Radeon Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tech ...
and
AMD FirePro AMD FirePro was AMD's brand of graphics cards designed for use in workstations and servers running professional Computer-aided design (CAD), Computer-generated imagery (CGI), Digital content creation (DCC), and High-performance computing/GPGP ...
graphics cards, and
APUs Apus is a small constellation in the Southern Celestial Hemisphere, southern sky. It represents a bird-of-paradise, and its name means "without feet" in Greek language, Greek because the bird-of-paradise was once wrongly believed to lack feet. ...
based on Graphics Core Next (GCN), was merged into version 3.19 of the Linux kernel mainline, released on 8 February 2015. Programs do not interact directly with , but queue their jobs utilizing the HSA runtime. This very first implementation, known as , focuses on "Kaveri" or "Berlin" APUs and works alongside the existing Radeon kernel graphics driver. Additionally, supports ''heterogeneous queuing'' (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. Support for ''heterogeneous memory management'' (''HMM''), suited only for graphics hardware featuring version 2 of the AMD's IOMMU, was accepted into the Linux kernel mainline version 4.14. Integrated support for HSA platforms has been announced for the "Sumatra" release of OpenJDK, due in 2015.
AMD APP SDK AMD APP SDK is a software development kit by AMD for "Accelerated Parallel Processing" (APP). AMD APP SDK also targets Heterogeneous System Architecture (not only GPU). AMD APP SDK was available for 32-bit and 64-bit versions of Microsoft Windows ...
is AMD's proprietary software development kit targeting parallel computing, available for Microsoft Windows and Linux. Bolt is a C++ template library optimized for heterogeneous computing. GPUOpen comprehends a couple of other software tools related to HSA.
CodeXL CodeXL (formerly AMD CodeXL) was an open-source software development tool suite which included a GPU debugger, a GPU profiler, a CPU profiler, Graphics frame analyzer and a static shader/kernel analyzer. CodeXL was mainly developed by AMD. With ...
version 2.0 includes an HSA profiler.


Hardware support


AMD

, only AMD's "Kaveri" A-series APUs (cf. "Kaveri" desktop processors and "Kaveri" mobile processors) and Sony's
PlayStation 4 The PlayStation 4 (PS4) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 3 in February 2013, it was launched on November 15, 2013, in North America, November 29, 2013 in ...
allowed the
integrated GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
to access memory via version 2 of the AMD's IOMMU. Earlier APUs (Trinity and Richland) included the version 2 IOMMU functionality, but only for use by an external GPU connected via PCI Express. Post-2015 Carrizo and Bristol Ridge APUs also include the version 2 IOMMU functionality for the integrated GPU.


ARM

ARM's Bifrost microarchitecture, as implemented in the Mali-G71, is fully compliant with the HSA 1.1 hardware specifications. , ARM has not announced software support that would use this hardware feature.


See also

*
General-purpose computing on graphics processing units General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
(GPGPU) * Non-Uniform Memory Access (NUMA) * OpenMP * Shared memory *
Zero-copy "Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwid ...


References


External links

* by Vinod Tipparaju at SC13 in November 2013
HSA and the software ecosystem

2012 – HSA by Michael Houston
{{Use dmy dates, date=July 2019 Heterogeneous computing