ROCm is an
Advanced Micro Devices
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufact ...
(AMD) software stack for
graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
(GPU) programming. ROCm spans several domains:
general-purpose computing on graphics processing units
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
(GPGPU),
high performance computing
High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.
Overview
HPC integrates systems administration (including network and security knowledge) and parallel programming into a multid ...
(HPC),
heterogeneous computing
Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorp ...
. It offers several programming models:
HIP
In vertebrate anatomy, hip (or "coxa"Latin ''coxa'' was used by Celsus in the sense "hip", but by Pliny the Elder in the sense "hip bone" (Diab, p 77) in medical terminology) refers to either an anatomical region or a joint.
The hip region is ...
(
GPU-kernel-based programming),
OpenMP
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...
/
Message Passing Interface (MPI) (
directive-based programming),
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
.
ROCm is free, libre and
open-source software
Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. Op ...
(except the GPU
firmware blobs), it is distributed under various licenses.
Background
The first GPGPU software stack from
ATI
Ati or ATI may refer to:
* Ati people, a Negrito ethnic group in the Philippines
**Ati language (Philippines), the language spoken by this people group
** Ati-Atihan festival, an annual celebration held in the Philippines
*Ati language (China), a ...
/AMD was
Close to Metal In computing, Close To Metal ("CTM" in short, originally called ''Close-to-the-Metal'') is the name of a beta version of a low-level programming interface developed by ATI, now the AMD Graphics Product Group, aimed at enabling GPGPU computing. CT ...
, which became
Stream
A stream is a continuous body of water, body of surface water Current (stream), flowing within the stream bed, bed and bank (geography), banks of a channel (geography), channel. Depending on its location or certain characteristics, a stream ...
.
ROCm was launched around 2016 with the
Boltzmann Initiative. ROCm stack builds upon previous AMD GPU stacks, some tools trace back to
GPUOpen, others to the
Heterogeneous System Architecture Heterogeneous System Architecture (HSA) is a cross-vendor set of specifications that allow for the integration of central processing units and graphics processors on the same bus, with shared memory and tasks. The HSA is being developed by the HSA ...
(HSA).
Heterogeneous System Architecture
HSA was aimed at producing a middle-level, hardware-agnostic intermediate representation, that could be JIT-compiled to the eventual hardware (GPU, FPGA...) using the appropriate finalizer. This approach was dropped for ROCm: now it builds only GPU code, using
LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
, and its
AMDGPU backend that was upstreamed, although there is still research on such enhanced modularity with LLVM MLIR.
Microsoft AMP C++ 1.2
Programming abilities
ROCm as a stack ranges from the kernel driver to the end-user applications.
AMD has introductory videos about AMD GCN hardware, and ROCm programming via its learning portal.
One of the best technical introductions about the stack and ROCm/HIP programming, remains, to date, to be found on Reddit.
High-level programming
HIP programming
HIP(HCC) kernel language
Memory allocation
=NUMA
=
=Heterogeneous Memory Model and Shared Virtual Memory
=
ROCm code objects
Compute/Graphics interop
Low-level programming
Hardware support
ROCm is primarily targeted at discrete professional GPUs, bu
but unofficial support includes Vega-family and RDNA2 consumer GPUs.
AMD Accelerated Processing Unit, Accelerated Processor Units (APU) are "enabled", but not officially supported. Having ROCm functional there is involved.
Professional-grade GPUs
AMD Instinct accelerators are the first-class ROCm citizens, alongside th
prosumer Radeon Pro GPU series: they mostly see full support.
The only consumer-grade GPU that has relatively equal support is, as of January 2022, the Radeon VII (GCN 5 - Vega).
Consumer-grade GPUs
Software ecosystem
Learning resources
AMD ROCm product manager gave a tour of the stack.
Third-party integration
The main consumers of the stack are machine learning and high-performance computing/GPGPU applications.
Machine learning
Various Deep Learning frameworks have a ROCm backend:
*
PyTorch
PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...
*
TensorFlow
TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...
*
ONNX
The Open Neural Network Exchange (ONNX) [] is an Open-source software, open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and ...
*
MXNet
*
CuPy
CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them.
CuPy shares the sa ...
MIOpen*
Caffe
Iree(which uses LLVM Multi-Level Intermediate Representation (MLIR))
Supercomputing
ROCm is gaining significant traction in the
top 500.
ROCm is used with the Exascale supercomputers
ElCapitan and
Frontier
A frontier is the political and geographical area near or beyond a boundary. A frontier can also be referred to as a "front". The term came from French in the 15th century, with the meaning "borderland"—the region of a country that fronts o ...
.
Some related software is to be found a
AMD Infinity hub
Other acceleration & graphics interoperation
As of version 3.0,
Blender
A blender (sometimes called a mixer or liquidiser in British English) is a kitchen appliance, kitchen and laboratory appliance used to mix, crush, purée or emulsion, emulsify food and other substances. A stationary blender consists of a blender ...
can now use HIP compute kernels for its
renderer Cycles.
Other Languages
= Julia
=
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
has the AMDGPU.jl package, which integrates with LLVM and selects components of the ROCm stack. Instead of compiling code through HIP, AMDGPU.jl uses Julia's compiler to generate LLVM IR directly, which is later consumed by LLVM to generate native device code. AMDGPU.jl uses ROCr's HSA implementation to upload native code onto the device and execute it, similar to how HIP loads its own generated device code.
AMDGPU.jl also supports integration with ROCm's rocBLAS (for BLAS), rocRAND (for random number generation), and rocFFT (for FFTs). Future integration with rocALUTION, rocSOLVER, MIOpen, and certain other ROCm libraries is planned.
Software distribution
Official
ROCm software is currently spread across dozens of public
GitHub
GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
repositories. Within the main publi
meta-repository there is a
xml manifestfor each official release: usin
git-repo a
version control
In software engineering, version control (also known as revision control, source control, or source code management) is a class of systems responsible for managing changes to computer programs, documents, large web sites, or other collections o ...
tool built on top of
git
Git () is a distributed version control system: tracking changes in any set of files, usually used for coordinating work among programmers collaboratively developing source code during software development. Its goals include speed, data in ...
, is the recommended way to synchronize with the stack locally.
The release of ROCm 5.1 is imminent, probably mid-February given a minor release each month.
AMD starts distributing containerized applications for ROCm, notably scientific research applications gathered unde
AMD Infinity Hub
AM
distributes itselfpackages tailored to various Linux distributions.
Third-party
There is a growin
third-party ecosystem packaging ROCm
Linux distributions are packaging officially (natively) ROCm, with various degrees of advancement: Arch, Gentoo, Debian and Fedora, GNU Guix, NixOS.
There are spack packages.
Components
There is one kernel-space component, ROCk, and the rest - there is roughly a hundred components in the stack - is made of
user-space
A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour.
Kernel ...
modules.
The unofficial typographic policy is to use: uppercase ROC lowercase following for low-level libraries, i.e. ROCt, and the contrary for user-facing libraries, i.e. rocBLAS.
AMD is active developing with the LLVM community, but upstreaming is not instantaneous, and as of January 2022, still lagging. AMD still packages officially various LLVM forks
for parts that are not yet upstreamed - compiler optimizations destined to remain proprietary, debug support, OpenMP offloading...
Low-level
ROCk - Kernel driver
ROCm - Device libraries
Support librariesimplemented as LLVM bitcode. These provide various utilities and functions for math operations, atomics, queries for launch parameters, on-device kernel launch, etc.
ROCt - Thunk
Th
thunkis responsible for all the thinking and queuing that goes into the stack.
ROCr - Runtime
Th
ROC runtimeis different from the ROC Common Language Runtime in that it is not the same thing.
ROCm - CompilerSupport
ROCm code object manageris in charge of interacting with LLVM
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
.
Mid-level
ROCclr Common Language Runtime
Th
common language runtimeis an indirection layer adapting calls to ROCr on linux and PAL on windows.
It used to be able to route between different compilers like the HSAIL-compiler. It is now being absorbed by the upper indirection layers (HIP, OpenCL).
OpenCL
ROCm ships its Installable Client Driver ICD loader and an OpenC
implementation bundled together
As of January 2022, ROCm 4.5.2 ships OpenCL 2.2, and is lagging behind competition.
HIP
Heterogeneous Interface for Portability
The AMD implementation for its GPUs is calle
HIPAMD There is also
CPU implementationmostly for demonstration purposes.
HIPCC
HIP builds a `HIPCC` compiler that either wraps
Clang
Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
and compiles with LLVM open AMDGPU backend, or redirects to the
NVIDIA compiler.
HIPIFY
HIPIFYis a source-to-source compiling tool, it translates CUDA to HIP and reverse, either using a clang-based tool, or a sed-like Perl script.
GPUFORT
Like HIPIFY
GPUFORTis a tool compiling source code into other third-generation-language sources, allowing users to migrate from CUDA Fortran to HIP Fortran. It is also in the repertoire of research projects, even more so.
High-level
ROCm high-level libraries are usually consumed directly by application software, such as
machine learning
Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence.
Machine ...
frameworks. Most of the following libraries are in the
General Matrix Multiply
Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...
(GEMM) category, which GPU architecture excels at.
The majority of these user-facing libraries comes in dual-form: ''hip'' for the indirection layer that can route to Nvidia hardware, and ''roc'' for AMD implementation.
[
]
rocBLAS / hipBLAS
rocBLASan
hipBLASare central in high-level libraries, it is the AMD implementation for
Basic Linear Algebra Subprograms
Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...
.
It uses the librar
Tensileprivately.
rocSOLVER / hipSOLVER
This pair of libraries constitutes the
LAPACK
LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also ...
implementation for ROCm and is strongly coupled to rocBLAS.
Utilities
ROCm developer tools Debug, tracer, profiler, System Management Interface, Validation suite, Cluster management.
GPUOpen tools GPU analyzer, memory visualizer...
* External tools: radeontop (
TUI overview)
Comparison with competitors
ROCm is a competitor to similar stacks aimed at GPU computing: Nvidia
CUDA
CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
and
Intel OneAPI.
NVidia CUDA
Nvidia is close-source until cuBLAS and such high-level libraries.
Nvidia vendors the Clang frontend and its
Parallel Thread Execution
Parallel Thread Execution (PTX or NVPTX) is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's CUDA programming environment. The NVCC compiler translates code written in CUDA, a C++-like lang ...
(PTX) LLVM GPU backend as the
Nvidia CUDA Compiler (NVCC).
There is an open-source layer above it, for exampl
RAPIDS
Intel OneAPI
See also
*
AMD Software
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
– a general overview of AMD's drivers, APIs, and development endeavors.
*
GPUOpen – AMD's complementary graphics stack
*
AMD Radeon Software
AMD Radeon Software is a device driver and utility software package for AMD's graphics cards and APUs. Its graphical user interface is built with Electron and is compatible with 64-bit Windows and Linux distributions.
Software bundle
Func ...
– AMD's software distribution channel
References
External links
*
*
*
*
*
* —
Docker containers for scientific applications.
{{Authority control
AMD software
Application programming interfaces
Concurrent computing
GPGPU
GPGPU libraries
Graphics cards
Graphics hardware
Heterogeneous computing
Machine learning
Parallel computing
Supercomputers