Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
s and
graphics processors
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
on the same bus, with shared
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
and
tasks. The HSA is being developed by the
HSA Foundation
The HSA Foundation is a not-for-profit engineering organization of industry and academia that works on the development of the Heterogeneous System Architecture (HSA), a set of royalty-free computer hardware specifications, as well as open source so ...
, which includes (among many others)
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
and
ARM. The platform's stated aim is to reduce
communication latency between CPUs, GPUs and other
compute device
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-pr ...
s, and make these various devices more compatible from a programmer's perspective,
relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
or
CUDA
CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
).
CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance.
Heterogeneous computing is widely used in
system-on-chip devices such as
tablets,
smartphone
A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...
s, other mobile devices, and
video game console
A video game console is an electronic device that Input/output, outputs a video signal or image to display a video game that can be played with a game controller. These may be home video game console, home consoles, which are generally placed i ...
s.
HSA allows programs to use the graphics processor for
floating point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
calculations without separate memory or scheduling.
Rationale
The rationale behind HSA is to ease the burden on programmers when offloading calculations to the GPU. Originally driven solely by AMD and called the FSA, the idea was extended to encompass processing units other than GPUs, such as other manufacturers'
DSPs, as well.
Modern GPUs are very well suited to perform
single instruction, multiple data
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(SIMD) and
single instruction, multiple threads (SIMT), while modern CPUs are still being optimized for branching. etc.
Overview
Originally introduced by
embedded system
An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
s such as the
Cell Broadband Engine
Cell is a multi-core microprocessor microarchitecture that combines a general-purpose PowerPC core of modest performance with streamlined coprocessing elements which greatly accelerate multimedia and vector processing applications, as well as ma ...
, sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
s (CPUs),
graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s (GPUs),
digital signal processor
A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio si ...
s (DSPs), or any type of
application-specific integrated circuit
An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficie ...
s (ASICs). The system architecture allows any accelerator, for instance a
graphics processor
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
, to operate at the same processing level as the system's CPU.
Among its main features, HSA defines a unified
virtual address space for compute devices: where GPUs traditionally have their own memory, separate from the main (CPU) memory, HSA requires these devices to share
page tables
A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, ...
so that devices can exchange data by sharing
pointers. This is to be supported by custom
memory management units.
To render interoperability possible and also to ease various aspects of programming, HSA is intended to be
ISA
Isa or ISA may refer to:
Places
* Isa, Amur Oblast, Russia
* Isa, Kagoshima, Japan
* Isa, Nigeria
* Isa District, Kagoshima, former district in Japan
* Isa Town, middle class town located in Bahrain
* Mount Isa, Queensland, Australia
* Mount Is ...
-agnostic for both CPUs and accelerators, and to support high-level programming languages.
So far, the HSA specifications cover:
HSA Intermediate Layer
HSAIL (Heterogeneous System Architecture Intermediate Language), a
virtual instruction set for parallel programs
* similar to
LLVM Intermediate Representation
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
and
SPIR (used by
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
and
Vulkan)
* finalized to a specific instruction set by a
JIT compiler
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may cons ...
* make late decisions on which core(s) should run a task
* explicitly parallel
* supports exceptions, virtual functions and other high-level features
* debugging support
HSA memory model
* compatible with
C++11
C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by ...
, OpenCL,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
and
.NET memory models
* relaxed consistency
* designed to support both managed languages (e.g. Java) and unmanaged languages (e.g.
C)
* will make it much easier to develop 3rd-party compilers for a wide range of heterogeneous products programmed in
Fortran, C++,
C++ AMP
C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''.
History
"C" ...
, Java, et al.
HSA dispatcher and run-time
* designed to enable heterogeneous task queueing: a work queue per core, distribution of work into queues, load balancing by work stealing
* any core can schedule work for any other, including itself
* significant reduction of overhead of scheduling work for a core
Mobile devices are one of the HSA's application areas, in which it yields improved power efficiency.
Block diagrams
The illustrations below compare CPU-GPU coordination under HSA versus under traditional architectures.
Software support
Some of the HSA-specific features implemented in the hardware need to be supported by the
operating system kernel
The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system. It is the portion of the operating system code that is always resident in memory and facilitates in ...
and specific device drivers. For example, support for AMD
Radeon
Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tech ...
and
AMD FirePro
AMD FirePro was AMD's brand of graphics cards designed for use in workstations and servers running professional Computer-aided design (CAD), Computer-generated imagery (CGI), Digital content creation (DCC), and High-performance computing/GPGP ...
graphics cards, and
APUs
Apus is a small constellation in the Southern Celestial Hemisphere, southern sky. It represents a bird-of-paradise, and its name means "without feet" in Greek language, Greek because the bird-of-paradise was once wrongly believed to lack feet. ...
based on
Graphics Core Next (GCN), was merged into version 3.19 of the
Linux kernel mainline, released on 8 February 2015.
Programs do not interact directly with , but queue their jobs utilizing the HSA runtime. This very first implementation, known as , focuses on
"Kaveri" or "Berlin" APUs and works alongside the existing Radeon kernel graphics driver.
Additionally, supports ''heterogeneous queuing'' (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. Support for ''heterogeneous memory management'' (''HMM''), suited only for graphics hardware featuring version 2 of the AMD's
IOMMU, was accepted into the Linux kernel mainline version 4.14.
Integrated support for HSA platforms has been announced for the "Sumatra" release of
OpenJDK, due in 2015.
AMD APP SDK
AMD APP SDK is a software development kit by AMD for "Accelerated Parallel Processing" (APP). AMD APP SDK also targets Heterogeneous System Architecture (not only GPU).
AMD APP SDK was available for 32-bit and 64-bit versions of Microsoft Windows ...
is AMD's proprietary software development kit targeting parallel computing, available for Microsoft Windows and Linux. Bolt is a C++ template library optimized for heterogeneous computing.
GPUOpen comprehends a couple of other software tools related to HSA.
CodeXL
CodeXL (formerly AMD CodeXL) was an open-source software development tool suite which included a GPU debugger, a GPU profiler, a CPU profiler, Graphics frame analyzer and a static shader/kernel analyzer.
CodeXL was mainly developed by AMD. With ...
version 2.0 includes an HSA profiler.
Hardware support
AMD
, only AMD's "Kaveri" A-series APUs (cf.
"Kaveri" desktop processors and
"Kaveri" mobile processors) and Sony's
PlayStation 4
The PlayStation 4 (PS4) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 3 in February 2013, it was launched on November 15, 2013, in North America, November 29, 2013 in ...
allowed the
integrated GPU
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
to access memory via version 2 of the AMD's IOMMU. Earlier APUs (Trinity and Richland) included the version 2 IOMMU functionality, but only for use by an external GPU connected via PCI Express.
Post-2015 Carrizo and Bristol Ridge APUs also include the version 2 IOMMU functionality for the integrated GPU.
ARM
ARM's
Bifrost microarchitecture, as implemented in the Mali-G71,
is fully compliant with the HSA 1.1 hardware specifications. , ARM has not announced software support that would use this hardware feature.
See also
*
General-purpose computing on graphics processing units
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
(GPGPU)
*
Non-Uniform Memory Access (NUMA)
*
OpenMP
*
Shared memory
*
Zero-copy
"Zero-copy" describes computer operations in which the CPU does not perform the task of copying data from one memory area to another or in which unnecessary data copies are avoided. This is frequently used to save CPU cycles and memory bandwid ...
References
External links
* by Vinod Tipparaju at
SC13 in November 2013
HSA and the software ecosystem2012 – HSA by Michael Houston{{Use dmy dates, date=July 2019
Heterogeneous computing