Heterogeneous System Architecture
   HOME

TheInfoList



OR:

Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
s and graphics processors on the same bus, with shared
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
and tasks. The HSA is being developed by the
HSA Foundation The HSA Foundation is a not-for-profit engineering organization of industry and academia that works on the development of the Heterogeneous System Architecture (HSA), a set of royalty-free computer hardware specifications, as well as open-source so ...
, which includes (among many others)
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
and ARM. The platform's stated aim is to reduce communication latency between CPUs, GPUs and other compute devices, and make these various devices more compatible from a programmer's perspective, relieving the programmer of the task of planning the moving of data between devices' disjoint memories (as must currently be done with
OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...
or
CUDA In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...
). CUDA and OpenCL as well as most other fairly advanced programming languages can use HSA to increase their execution performance.
Heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or core. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incor ...
is widely used in
system-on-chip A system on a chip (SoC) is an integrated circuit that combines most or all key components of a computer or electronic system onto a single microchip. Typically, an SoC includes a central processing unit (CPU) with memory, input/output, and da ...
devices such as tablets,
smartphone A smartphone is a mobile phone with advanced computing capabilities. It typically has a touchscreen interface, allowing users to access a wide range of applications and services, such as web browsing, email, and social media, as well as multi ...
s, other mobile devices, and
video game console A video game console is an electronic device that Input/output, outputs a video signal or image to display a video game that can typically be played with a game controller. These may be home video game console, home consoles, which are generally ...
s. HSA allows programs to use the graphics processor for
floating point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a signed sequence of a fixed number of digits in some base) multiplied by an integer power of that base. Numbers of this form ...
calculations without separate memory or scheduling.


Rationale

The rationale behind HSA is to ease the burden on programmers when offloading calculations to the GPU. Originally driven solely by AMD and called the FSA, the idea was extended to encompass processing units other than GPUs, such as other manufacturers' DSPs, as well. Modern GPUs are very well suited to perform
single instruction, multiple data Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
(SIMD) and
single instruction, multiple threads Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with zero-overhead multithreading, i.e. multithreading where the hardware is capable of switch ...
(SIMT), while modern CPUs are still being optimized for branching. etc.


Overview

Originally introduced by
embedded system An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...
s such as the Cell Broadband Engine, sharing system memory directly between multiple system actors makes heterogeneous computing more mainstream. Heterogeneous computing itself refers to systems that contain multiple processing units
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
s (CPUs),
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed for digital image processing and to accelerate computer graphics, being present either as a discrete video card or embedded on motherboards, mobile phones, personal ...
s (GPUs),
digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on metal–oxide–semiconductor (MOS) integrated circuit chips. ...
s (DSPs), or any type of
application-specific integrated circuit An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficienc ...
s (ASICs). The system architecture allows any accelerator, for instance a graphics processor, to operate at the same processing level as the system's CPU. Among its main features, HSA defines a unified
virtual address space In computing, a virtual address space (VAS) or address space is the set of ranges of virtual addresses that an operating system makes available to a process. The range of virtual addresses usually starts at a low address and can extend to the h ...
for compute devices: where GPUs traditionally have their own memory, separate from the main (CPU) memory, HSA requires these devices to share page tables so that devices can exchange data by sharing
pointers Pointer may refer to: People with the name * Pointer (surname), a surname (including a list of people with the name) * Pointer Williams (born 1974), American former basketball player Arts, entertainment, and media * ''Pointer'' (journal), the ...
. This is to be supported by custom
memory management unit A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit that examines all references to computer memory, memory, and translates the memory addresses being referenced, known as virtual mem ...
s. To render interoperability possible and also to ease various aspects of programming, HSA is intended to be ISA-agnostic for both CPUs and accelerators, and to support high-level programming languages. So far, the HSA specifications cover:


HSA Intermediate Layer

HSAIL (Heterogeneous System Architecture Intermediate Language), a virtual instruction set for parallel programs * similar to
LLVM Intermediate Representation LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a frontend for any programming language and a backend for any instruction set architecture. LLVM is designed around a language-indepen ...
and SPIR (used by
OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...
and
Vulkan Vulkan is a cross-platform API and open standard for 3D graphics and computing. It was intended to address the shortcomings of OpenGL, and allow developers more control over the GPU. It is designed to support a wide variety of GPUs, CPUs and o ...
) * finalized to a specific instruction set by a
JIT compiler In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is compiler, compilation (of Source code, computer code) during execution of a program (at run time (program lifecycle phase), run time) rather than b ...
* make late decisions on which core(s) should run a task * explicitly parallel * supports exceptions, virtual functions and other high-level features * debugging support


HSA memory model

* compatible with
C++11 C++11 is a version of a joint technical standard, ISO/IEC 14882, by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), for the C++ programming language. C++11 replaced the prior vers ...
, OpenCL,
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
and
.NET The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
memory models * relaxed consistency * designed to support both managed languages (e.g. Java) and unmanaged languages (e.g. C) * will make it much easier to develop 3rd-party compilers for a wide range of heterogeneous products programmed in Fortran, C++,
C++ AMP C, or c, is the third letter of the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C ...
, Java, et al.


HSA dispatcher and run-time

* designed to enable heterogeneous task queueing: a work queue per core, distribution of work into queues, load balancing by work stealing * any core can schedule work for any other, including itself * significant reduction of overhead of scheduling work for a core Mobile devices are one of the HSA's application areas, in which it yields improved power efficiency.


Block diagrams

The illustrations below compare CPU-GPU coordination under HSA versus under traditional architectures.


Software support

Some of the HSA-specific features implemented in the hardware need to be supported by the
operating system kernel A kernel is a computer program at the core of a computer's operating system that always has complete control over everything in the system. The kernel is also responsible for preventing and mitigating conflicts between different processes. It is ...
and specific device drivers. For example, support for AMD
Radeon Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tech ...
and AMD FirePro graphics cards, and
APUs Apus is a small constellation in the Southern Celestial Hemisphere, southern sky. It represents a bird-of-paradise, and its name means "without feet" in Greek language, Greek because the bird-of-paradise was once wrongly believed to lack feet. ...
based on
Graphics Core Next Graphics Core Next (GCN) is the codename for a series of microarchitectures and an instruction set architecture that were developed by AMD for its GPUs as the successor to its TeraScale microarchitecture. The first product featuring GCN was lau ...
(GCN), was merged into version 3.19 of the Linux kernel mainline, released on 8 February 2015. Programs do not interact directly with , but queue their jobs utilizing the HSA runtime. This very first implementation, known as , focuses on "Kaveri" or "Berlin" APUs and works alongside the existing Radeon kernel graphics driver. Additionally, supports ''heterogeneous queuing'' (HQ), which aims to simplify the distribution of computational jobs among multiple CPUs and GPUs from the programmer's perspective. Support for ''heterogeneous memory management'' (''HMM''), suited only for graphics hardware featuring version 2 of the AMD's IOMMU, was accepted into the Linux kernel mainline version 4.14. Integrated support for HSA platforms has been announced for the "Sumatra" release of
OpenJDK OpenJDK (Open Java Development Kit) is a free and open-source implementation of the Java Platform, Standard Edition (Java SE). It is the result of an effort Sun Microsystems began in 2006, four years before the company was acquired by Oracle Corp ...
, due in 2015. AMD APP SDK is AMD's proprietary software development kit targeting
parallel computing Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...
, available for Microsoft Windows and Linux. Bolt is a C++ template library optimized for heterogeneous computing. GPUOpen comprehends a couple of other software tools related to HSA. CodeXL version 2.0 includes an HSA profiler.


Hardware support


AMD

, only AMD's "Kaveri" A-series APUs (cf. "Kaveri" desktop processors and "Kaveri" mobile processors) and Sony's
PlayStation 4 The PlayStation 4 (PS4) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 3 in February 2013, it was launched on November 15, 2013, in North America, November 29, 2013, in ...
allowed the integrated GPU to access memory via version 2 of the AMD's IOMMU. Earlier APUs (Trinity and Richland) included the version 2 IOMMU functionality, but only for use by an external GPU connected via PCI Express. Post-2015 Carrizo and Bristol Ridge APUs also include the version 2 IOMMU functionality for the integrated GPU.


ARM

ARM's Bifrost microarchitecture, as implemented in the Mali-G71, is fully compliant with the HSA 1.1 hardware specifications. , ARM has not announced software support that would use this hardware feature.


See also

*
General-purpose computing on graphics processing units General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditional ...
(GPGPU) *
Non-Uniform Memory Access Non-uniform memory access (NUMA) is a computer storage, computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory ...
(NUMA) *
OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
* Shared memory *
Zero-copy In computer science, zero-copy refers to techniques that enable data transfer between memory spaces without requiring the CPU to copy the data. By avoiding redundant copying, zero-copy methods minimize CPU usage and memory bandwidth, leading ...


References


External links

* by Vinod Tipparaju at SC13 in November 2013
HSA and the software ecosystem

2012 – HSA by Michael Houston
{{Use dmy dates, date=July 2019 Heterogeneous computing