computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...

, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or

field-programmable gate arrays A field-programmable gate array (FPGA) is a type of configurable integrated circuit that can be repeatedly programmed after manufacturing. FPGAs are a subset of logic devices referred to as programmable logic devices (PLDs). They consist of ...

(FPGAs)), separate from but used by a main program (typically running on a

central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...

). They are sometimes called compute shaders, sharing execution units with vertex shaders and pixel shaders on GPUs, but are not limited to execution on one class of device, or graphics APIs.

Description

Compute kernels roughly correspond to

inner loop In computer programs, an important form of control flow is the Loop (computing), loop which causes a block of code to be executed more than once. A common idiom is to have a loop Nested loop, nested inside another loop, with the contained loop be ...

s when implementing algorithms in traditional languages (except there is no implied sequential operation), or to code passed to internal iterators. They may be specified by a separate

programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...

such as " OpenCL C" (managed by the

OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...

API), as "compute

shader In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene—a process known as '' shading''. Shaders have evolved to perform a variety of s ...

s" written in a shading language (managed by a graphics API such as

OpenGL OpenGL (Open Graphics Library) is a Language-independent specification, cross-language, cross-platform application programming interface (API) for rendering 2D computer graphics, 2D and 3D computer graphics, 3D vector graphics. The API is typic ...

), or embedded directly in application code written in a high level language, as in the case of C++AMP. Microsoft support this as DirectCompute.

Vector processing

This

programming paradigm A programming paradigm is a relatively high-level way to conceptualize and structure the implementation of a computer program. A programming language can be classified as supporting one or more paradigms. Paradigms are separated along and descri ...

maps well to vector processors: there is an assumption that each invocation of a kernel within a batch is independent, allowing for data parallel execution. However,

atomic operations Atomic may refer to: * Of or relating to the atom, the smallest particle of a chemical element that retains its chemical properties * Atomic physics, the study of the atom * Atomic Age, also known as the "Atomic Era" * Atomic scale, distances com ...

may sometimes be used for

synchronization Synchronization is the coordination of events to operate a system in unison. For example, the Conductor (music), conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are sa ...

between elements (for interdependent work), in some scenarios. Individual invocations are given indices (in 1 or more dimensions) from which arbitrary addressing of buffer data may be performed (including scatter gather operations), so long as the non-overlapping assumption is respected.

Vulkan API

The Vulkan API provides the intermediate SPIR-V representation to describe ''both'' Graphical Shaders, and Compute Kernels, in a language independent and machine independent manner. The intention is to facilitate language evolution and provide a more natural ability to leverage GPU compute capabilities, in line with hardware developments such as Unified Memory Architecture and Heterogeneous System Architecture. This allows closer cooperation between a CPU and GPU.

LLM Kernel Generation

Much work has been done in the field of Kernel generation through LLMs as a means of optimizing code. KernelBench, created by the Scaling Intelligence Lab at Stanford, provides a framework to evaluate the ability of LLMs to generate efficient GPU kernels.

Cognition Cognition is the "mental action or process of acquiring knowledge and understanding through thought, experience, and the senses". It encompasses all aspects of intellectual functions and processes such as: perception, attention, thought, ...

has created Kevin 32-B https://cognition.ai/blog/kevin-32b to create efficient CUDA kernels which is currently the highest performing model on KernelBench.

References

GPGPU Parallel computing {{Graphics Processing Unit

Description

Vector processing

Vulkan API

LLM Kernel Generation

See also

References