Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of

central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...

s and

graphics processors A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...

on the same bus, with shared

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...

and tasks. The HSA is being developed by the

HSA Foundation The HSA Foundation is a not-for-profit engineering organization of industry and academia that works on the development of the Heterogeneous System Architecture (HSA), a set of royalty-free computer hardware specifications, as well as open source so ...

, which includes (among many others) AMD and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with

OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), d ...

CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...

). CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance.

Heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incor ...

is widely used in

system-on-chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memor ...

devices such as tablets,

smartphone A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whi ...

s, other mobile devices, and

video game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to ...

s. HSA allows programs to use the graphics processor for

floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can ...

calculations without separate memory or scheduling.

Rationale

The rationale behind HSA is to ease the burden on programmers when offloading calculations to the GPU. Originally driven solely by AMD and called the FSA, the idea was extended to encompass processing units other than GPUs, such as other manufacturers' DSPs, as well. Modern GPUs are very well suited to perform

single instruction, multiple data Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...

(SIMD) and

single instruction, multiple threads Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading. It is different from SPMD in that all instructions in all "threads" are exe ...

(SIMT), while modern CPUs are still being optimized for branching. etc.

Overview

Originally introduced by

embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded ...

s such as the Cell Broadband Engine, sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units

s (CPUs),

graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...

s (GPUs),

digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio s ...

s (DSPs), or any type of

application-specific integrated circuit An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-effici ...

s (ASICs). The system architecture allows any accelerator, for instance a graphics processor, to operate at the same processing level as the system's CPU. Among its main features, HSA defines a unified

virtual address space In computing, a virtual address space (VAS) or address space is the set of ranges of virtual addresses that an operating system makes available to a process. The range of virtual addresses usually starts at a low address and can extend to the hig ...

for compute devices: where GPUs traditionally have their own memory, separate from the main (CPU) memory, HSA requires these devices to share page tables so that devices can exchange data by sharing pointers. This is to be supported by custom

memory management unit A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical a ...

s. To render interoperability possible and also to ease various aspects of programming, HSA is intended to be

ISA Isa or ISA may refer to: Places * Isa, Amur Oblast, Russia * Isa, Kagoshima, Japan * Isa, Nigeria * Isa District, Kagoshima, former district in Japan * Isa Town, middle class town located in Bahrain * Mount Isa, Queensland, Australia * Mount ...

-agnostic for both CPUs and accelerators, and to support high-level programming languages. So far, the HSA specifications cover:

HSA Intermediate Layer

HSAIL (Heterogeneous System Architecture Intermediate Language), a virtual instruction set for parallel programs * similar to LLVM Intermediate Representation and SPIR (used by

and Vulkan) * finalized to a specific instruction set by a JIT compiler * make late decisions on which core(s) should run a task * explicitly parallel * supports exceptions, virtual functions and other high-level features * debugging support

HSA memory model

* compatible with

C++11 C11, C.XI, C-11 or C.11 may refer to: Transport * C-11 Fleetster, a 1920s American light transport aircraft for use of the United States Assistant Secretary of War * Fokker C.XI, a 1935 Dutch reconnaissance seaplane * LET C-11, a license-build ...

, OpenCL,

Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...

and .NET memory models * relaxed consistency * designed to support both managed languages (e.g. Java) and unmanaged languages (e.g. C) * will make it much easier to develop 3rd-party compilers for a wide range of heterogeneous products programmed in Fortran, C++,

C++ AMP C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...

, Java, et al.

HSA dispatcher and run-time

* designed to enable heterogeneous task queueing: a work queue per core, distribution of work into queues, load balancing by work stealing * any core can schedule work for any other, including itself * significant reduction of overhead of scheduling work for a core Mobile devices are one of the HSA's application areas, in which it yields improved power efficiency.

Block diagrams

The illustrations below compare CPU-GPU coordination under HSA versus under traditional architectures.

Software support

Some of the HSA-specific features implemented in the hardware need to be supported by the

operating system kernel The kernel is a computer program at the core of a computer's operating system and generally has complete control over everything in the system. It is the portion of the operating system code that is always resident in memory and facilitates ...

and specific device drivers. For example, support for AMD

Radeon Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tech ...

and AMD FirePro graphics cards, and

APUs Apus is a small constellation in the southern sky. It represents a bird-of-paradise, and its name means "without feet" in Greek because the bird-of-paradise was once wrongly believed to lack feet. First depicted on a celestial globe by Pet ...

based on

Graphics Core Next Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was la ...

(GCN), was merged into version 3.19 of the

Linux kernel mainline The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU o ...

, released on 8 February 2015. Programs do not interact directly with , but queue their jobs utilizing the HSA runtime. This very first implementation, known as , focuses on "Kaveri" or "Berlin" APUs and works alongside the existing Radeon kernel graphics driver. Additionally, supports ''heterogeneous queuing'' (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. Support for ''heterogeneous memory management'' (''HMM''), suited only for graphics hardware featuring version 2 of the AMD's IOMMU, was accepted into the Linux kernel mainline version 4.14. Integrated support for HSA platforms has been announced for the "Sumatra" release of

OpenJDK OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006. The implementation is licensed under the GPL-2.0-only w ...

, due in 2015. AMD APP SDK is AMD's proprietary software development kit targeting parallel computing, available for Microsoft Windows and Linux. Bolt is a C++ template library optimized for heterogeneous computing. GPUOpen comprehends a couple of other software tools related to HSA.

CodeXL CodeXL (formerly AMD CodeXL) was an open-source software development tool suite which included a GPU debugger, a GPU profiler, a CPU profiler, Graphics frame analyzer and a static shader/kernel analyzer. CodeXL was mainly developed by AMD. Wit ...

version 2.0 includes an HSA profiler.

Hardware support

AMD

, only AMD's "Kaveri" A-series APUs (cf. "Kaveri" desktop processors and "Kaveri" mobile processors) and Sony's

PlayStation 4 The PlayStation 4 (PS4) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 3 in February 2013, it was launched on November 15, 2013, in North America, November 29, 2013 i ...

allowed the

integrated GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, ...

to access memory via version 2 of the AMD's IOMMU. Earlier APUs (Trinity and Richland) included the version 2 IOMMU functionality, but only for use by an external GPU connected via PCI Express. Post-2015 Carrizo and Bristol Ridge APUs also include the version 2 IOMMU functionality for the integrated GPU.

ARM

ARM's Bifrost microarchitecture, as implemented in the Mali-G71, is fully compliant with the HSA 1.1 hardware specifications. , ARM has not announced software support that would use this hardware feature.

References

External links

* by Vinod Tipparaju at SC13 in November 2013
HSA and the software ecosystem

2012 – HSA by Michael Houston
{{Use dmy dates, date=July 2019 Heterogeneous computing