ROCm is an

Advanced Micro Devices Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufact ...

(AMD) software stack for

graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...

(GPU) programming. ROCm spans several domains:

general-purpose computing on graphics processing units General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...

(GPGPU),

high performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a multid ...

(HPC),

heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorp ...

. It offers several programming models:

HIP In vertebrate anatomy, hip (or "coxa"Latin ''coxa'' was used by Celsus in the sense "hip", but by Pliny the Elder in the sense "hip bone" (Diab, p 77) in medical terminology) refers to either an anatomical region or a joint. The hip region is ...

( GPU-kernel-based programming),

OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...

/ Message Passing Interface (MPI) ( directive-based programming),

OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...

. ROCm is free, libre and

open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...

(except the GPU firmware blobs), it is distributed under various licenses.

Background

The first GPGPU software stack from

ATI Ati or ATI may refer to: * Ati people, a Negrito ethnic group in the Philippines **Ati language (Philippines), the language spoken by this people group ** Ati-Atihan festival, an annual celebration held in the Philippines *Ati language (China), a ...

/AMD was

Close to Metal In computing, Close To Metal ("CTM" in short, originally called ''Close-to-the-Metal'') is the name of a beta version of a low-level programming interface developed by ATI, now the AMD Graphics Product Group, aimed at enabling GPGPU computing. CT ...

, which became

Stream A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a stream ...

. ROCm was launched around 2016 with the Boltzmann Initiative. ROCm stack builds upon previous AMD GPU stacks, some tools trace back to GPUOpen, others to the

Heterogeneous System Architecture Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA ...

(HSA).

Heterogeneous System Architecture

HSA was aimed at producing a middle-level, hardware-agnostic intermediate representation, that could be JIT-compiled to the eventual hardware (GPU, FPGA...) using the appropriate finalizer. This approach was dropped for ROCm: now it builds only GPU code, using

LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...

, and its AMDGPU backend that was upstreamed, although there is still research on such enhanced modularity with LLVM MLIR.

Microsoft AMP C++ 1.2

Programming abilities

ROCm as a stack ranges from the kernel driver to the end-user applications. AMD has introductory videos about AMD GCN hardware, and ROCm programming via its learning portal. One of the best technical introductions about the stack and ROCm/HIP programming, remains, to date, to be found on Reddit.

High-level programming

HIP programming

HIP(HCC) kernel language

Memory allocation

=NUMA

=Heterogeneous Memory Model and Shared Virtual Memory

ROCm code objects

Compute/Graphics interop

Low-level programming

Hardware support

ROCm is primarily targeted at discrete professional GPUs, bu

but unofficial support includes Vega-family and RDNA2 consumer GPUs. AMD Accelerated Processing Unit, Accelerated Processor Units (APU) are "enabled", but not officially supported. Having ROCm functional there is involved.

Professional-grade GPUs

AMD Instinct accelerators are the first-class ROCm citizens, alongside th
prosumer
Radeon Pro GPU series: they mostly see full support. The only consumer-grade GPU that has relatively equal support is, as of January 2022, the Radeon VII (GCN 5 - Vega).

Consumer-grade GPUs

Software ecosystem

Learning resources

AMD ROCm product manager gave a tour of the stack.

Third-party integration

The main consumers of the stack are machine learning and high-performance computing/GPGPU applications.

Machine learning

Various Deep Learning frameworks have a ROCm backend: *

PyTorch PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...

TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...

ONNX The Open Neural Network Exchange (ONNX) [] is an Open-source software, open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and ...

* MXNet *

CuPy CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. CuPy shares the sa ...

MIOpen
* Caffe
Iree
(which uses LLVM Multi-Level Intermediate Representation (MLIR))

Supercomputing

ROCm is gaining significant traction in the top 500. ROCm is used with the Exascale supercomputers ElCapitan and

Frontier A frontier is the political and geographical area near or beyond a boundary. A frontier can also be referred to as a "front". The term came from French in the 15th century, with the meaning "borderland"—the region of a country that fronts o ...

. Some related software is to be found a
AMD Infinity hub

Other acceleration & graphics interoperation

As of version 3.0,

Blender A blender (sometimes called a mixer or liquidiser in British English) is a kitchen appliance, kitchen and laboratory appliance used to mix, crush, purée or emulsion, emulsify food and other substances. A stationary blender consists of a blender ...

can now use HIP compute kernels for its renderer Cycles.

Other Languages

= Julia

Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...

has the AMDGPU.jl package, which integrates with LLVM and selects components of the ROCm stack. Instead of compiling code through HIP, AMDGPU.jl uses Julia's compiler to generate LLVM IR directly, which is later consumed by LLVM to generate native device code. AMDGPU.jl uses ROCr's HSA implementation to upload native code onto the device and execute it, similar to how HIP loads its own generated device code. AMDGPU.jl also supports integration with ROCm's rocBLAS (for BLAS), rocRAND (for random number generation), and rocFFT (for FFTs). Future integration with rocALUTION, rocSOLVER, MIOpen, and certain other ROCm libraries is planned.

Software distribution

Official

ROCm software is currently spread across dozens of public

GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...

repositories. Within the main publi
meta-repository
there is a
xml manifest
for each official release: usin
git-repo
a

version control In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...

tool built on top of

git Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...

, is the recommended way to synchronize with the stack locally. The release of ROCm 5.1 is imminent, probably mid-February given a minor release each month. AMD starts distributing containerized applications for ROCm, notably scientific research applications gathered unde
AMD Infinity Hub
AM
distributes itself
packages tailored to various Linux distributions.

Third-party

There is a growin
third-party ecosystem packaging ROCm
Linux distributions are packaging officially (natively) ROCm, with various degrees of advancement: Arch, Gentoo, Debian and Fedora, GNU Guix, NixOS. There are spack packages.

Components

There is one kernel-space component, ROCk, and the rest - there is roughly a hundred components in the stack - is made of

user-space A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour. Kernel ...

modules. The unofficial typographic policy is to use: uppercase ROC lowercase following for low-level libraries, i.e. ROCt, and the contrary for user-facing libraries, i.e. rocBLAS. AMD is active developing with the LLVM community, but upstreaming is not instantaneous, and as of January 2022, still lagging. AMD still packages officially various LLVM forks for parts that are not yet upstreamed - compiler optimizations destined to remain proprietary, debug support, OpenMP offloading...

Low-level

ROCk - Kernel driver

ROCm - Device libraries

Support libraries
implemented as LLVM bitcode. These provide various utilities and functions for math operations, atomics, queries for launch parameters, on-device kernel launch, etc.

ROCt - Thunk

Th
thunk
is responsible for all the thinking and queuing that goes into the stack.

ROCr - Runtime

Th
ROC runtime
is different from the ROC Common Language Runtime in that it is not the same thing.

ROCm - CompilerSupport

ROCm code object manager
is in charge of interacting with LLVM

intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...

Mid-level

ROCclr Common Language Runtime

Th
common language runtime
is an indirection layer adapting calls to ROCr on linux and PAL on windows. It used to be able to route between different compilers like the HSAIL-compiler. It is now being absorbed by the upper indirection layers (HIP, OpenCL).

OpenCL

ROCm ships its Installable Client Driver ICD loader and an OpenC
implementation bundled together
As of January 2022, ROCm 4.5.2 ships OpenCL 2.2, and is lagging behind competition.

HIP
Heterogeneous Interface for Portability

The AMD implementation for its GPUs is calle
HIPAMD
There is also
CPU implementation
mostly for demonstration purposes.

HIPCC

HIP builds a `HIPCC` compiler that either wraps

Clang Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...

and compiles with LLVM open AMDGPU backend, or redirects to the NVIDIA compiler.

HIPIFY

HIPIFY
is a source-to-source compiling tool, it translates CUDA to HIP and reverse, either using a clang-based tool, or a sed-like Perl script.

GPUFORT

Like HIPIFY
GPUFORT
is a tool compiling source code into other third-generation-language sources, allowing users to migrate from CUDA Fortran to HIP Fortran. It is also in the repertoire of research projects, even more so.

High-level

ROCm high-level libraries are usually consumed directly by application software, such as

machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...

frameworks. Most of the following libraries are in the

General Matrix Multiply Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...

(GEMM) category, which GPU architecture excels at. The majority of these user-facing libraries comes in dual-form: ''hip'' for the indirection layer that can route to Nvidia hardware, and ''roc'' for AMD implementation.

rocBLAS / hipBLAS

rocBLAS
an
hipBLAS
are central in high-level libraries, it is the AMD implementation for

Basic Linear Algebra Subprograms Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...

. It uses the librar
Tensile
privately.

rocSOLVER / hipSOLVER

This pair of libraries constitutes the

LAPACK LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also ...

implementation for ROCm and is strongly coupled to rocBLAS.

Utilities

ROCm developer tools
Debug, tracer, profiler, System Management Interface, Validation suite, Cluster management.
GPUOpen tools
GPU analyzer, memory visualizer... * External tools: radeontop ( TUI overview)

Comparison with competitors

ROCm is a competitor to similar stacks aimed at GPU computing: Nvidia

CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...

and Intel OneAPI.

NVidia CUDA

Nvidia is close-source until cuBLAS and such high-level libraries.
Nvidia vendors the Clang frontend and its

Parallel Thread Execution Parallel Thread Execution (PTX or NVPTX) is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's CUDA programming environment. The NVCC compiler translates code written in CUDA, a C++-like lang ...

(PTX) LLVM GPU backend as the Nvidia CUDA Compiler (NVCC).
There is an open-source layer above it, for exampl
RAPIDS

Intel OneAPI

References

External links

* * * * * * — Docker containers for scientific applications. {{Authority control AMD software Application programming interfaces Concurrent computing GPGPU GPGPU libraries Graphics cards Graphics hardware Heterogeneous computing Machine learning Parallel computing Supercomputers