HOME

TheInfoList



OR:

General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
(GPU), which typically handles computation only for
computer graphics Computer graphics deals with generating images with the aid of computers. Today, computer graphics is a core technology in digital photography, film, video games, cell phone and computer displays, and many specialized applications. A great de ...
, to perform computation in applications traditionally handled by the
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU). The use of multiple
video card A graphics card (also called a video card, display card, graphics adapter, VGA card/VGA, video adapter, display adapter, or mistakenly GPU) is an expansion card which generates a feed of output images to a display device, such as a computer moni ...
s in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing. Essentially, a GPGPU
pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
is a kind of parallel processing between one or more GPUs and CPUs that analyzes data as if it were in image or other graphic form. While GPUs operate at lower frequencies, they typically have many times the number of cores. Thus, GPUs can process far more pictures and graphical data per second than a traditional CPU. Migrating data into graphical form and then using the GPU to scan and analyze it can create a large
speedup In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with d ...
. GPGPU pipelines were developed at the beginning of the 21st century for
graphics processing Computer graphics is a sub-field of computer science which studies methods for digitally synthesizing and manipulating visual content. Although the term often refers to the study of three-dimensional computer graphics, it also encompasses two-di ...
(e.g. for better
shader In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of spec ...
s). These pipelines were found to fit
scientific computing Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
needs well, and have since been developed in this direction.


History

In principle, any arbitrary
boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth function ( ...
, including addition, multiplication, and other mathematical functions, can be built up from a
functionally complete In logic, a functionally complete set of logical connectives or Boolean operators is one which can be used to express all possible truth tables by combining members of the set into a Boolean expression.. ("Complete set of logical connectives").. (" ...
set of logic operators. In 1987,
Conway's Game of Life The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970. It is a zero-player game, meaning that its evolution is determined by its initial state, requiring no further ...
became one of the first examples of general-purpose computing using an early stream processor called a
blitter A blitter is a circuit, sometimes as a coprocessor or a logic block on a microprocessor, dedicated to the rapid movement and modification of data within a computer's memory. A blitter can copy large quantities of data from one memory area to anot ...
to invoke a special sequence of
logical operations In logic, a logical connective (also called a logical operator, sentential connective, or sentential operator) is a logical constant. They can be used to connect logical formulas. For instance in the syntax of propositional logic, the binary c ...
on bit vectors. General-purpose computing on GPUs became more practical and popular after about 2001, with the advent of both programmable
shader In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of spec ...
s and
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
support on graphics processors. Notably, problems involving
matrices Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchis ...
and/or
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
s especially two-, three-, or four-dimensional vectors were easy to translate to a GPU, which acts with native speed and support on those types. A significant milestone for GPGPU was the year 2003 when two research groups independently discovered GPU-based approaches for the solution of general linear algebra problems on GPUs that ran faster than on CPUs. These early efforts to use GPUs as general-purpose processors required reformulating computational problems in terms of graphics primitives, as supported by the two major APIs for graphics processors,
OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
and
DirectX Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with "Direct", ...
. This cumbersome translation was obviated by the advent of general-purpose programming languages and APIs such as Sh/
RapidMind RapidMind Inc. was a privately held company founded and headquartered in Waterloo, Ontario, Canada, acquired by Intel in 2009. It provided a software product that aims to make it simpler for software developers to target multi-core processors an ...
,
Brook A brook is a small river or natural stream of fresh water. It may also refer to: Computing *Brook, a programming language for GPU programming based on C *Brook+, an explicit data-parallel C compiler *BrookGPU, a framework for GPGPU programming ...
and Accelerator. These were followed by Nvidia's
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
, which allowed programmers to ignore the underlying graphical concepts in favor of more common
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
concepts. Newer, hardware-vendor-independent offerings include Microsoft's
DirectCompute Microsoft DirectCompute is an application programming interface (API) that supports running compute kernels on general-purpose computing on graphics processing units on Microsoft's Windows Vista, Windows 7 and later versions. DirectCompute is part ...
and Apple/Khronos Group's
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
. This means that modern GPGPU pipelines can leverage the speed of a GPU without requiring full and explicit conversion of the data to a graphical form. Mark Harris, the founder of GPGPU.org, coined the term ''GPGPU''.


Implementations

Any language that allows the code running on the CPU to poll a GPU
shader In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of spec ...
for return values, can create a GPGPU framework.
Programming standards for parallel computing include
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
(vendor-independent),
OpenACC OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems. As in OpenMP, the programme ...
,
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...
and
OpenHMPP OpenHMPP (HMPP for Hybrid Multicore Parallel Programming) - programming standard for heterogeneous computing. Based on a set of compiler directives, standard is a programming model designed to handle hardware accelerators without the complexity a ...
. , OpenCL is the dominant open general-purpose GPU computing language, and is an open standard defined by the
Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computation ...
. OpenCL provides a
cross-platform In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software r ...
GPGPU platform that additionally supports data parallel compute on CPUs. OpenCL is actively supported on Intel, AMD, Nvidia, and ARM platforms. The Khronos Group has also standardised and implemented
SYCL SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, anno ...
, a higher-level programming model for
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
as a single-source domain specific embedded language based on pure C++11. The dominant proprietary framework is
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
. Nvidia launched CUDA in 2006, a
software development kit A software development kit (SDK) is a collection of software development tools in one installable package. They facilitate the creation of applications by having a compiler, debugger and sometimes a software framework. They are normally specific to ...
(SDK) and
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
(API) that allows using the programming language C to code algorithms for execution on GeForce 8 series and later GPUs.
ROCm ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous ...
, launched in 2016, is AMD's open-source response to CUDA. It is, as of 2022, on par with CUDA with regards to features, and still lacking in consumer support. OpenVIDIA was developed at
University of Toronto The University of Toronto (UToronto or U of T) is a public research university in Toronto, Ontario, Canada, located on the grounds that surround Queen's Park. It was founded by royal charter in 1827 as King's College, the first institution ...
between 2003–2005, James Fung, Steve Mann, Chris Aimone,
OpenVIDIA: Parallel GPU Computer Vision
, Proceedings of the ACM Multimedia 2005, Singapore, 6–11 November 2005, pages 849–852
in collaboration with Nvidia. Altimesh Hybridizer created by Altimesh compiles
Common Intermediate Language Common Intermediate Language (CIL), formerly called Microsoft Intermediate Language (MSIL) or Intermediate Language (IL), is the intermediate language binary instruction set defined within the Common Language Infrastructure (CLI) specification. ...
to CUDA binaries. It supports generics and virtual functions. Debugging and profiling is integrated with
Visual Studio Visual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including web site, websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platfor ...
and Nsight. It is available as a Visual Studio extension on Visual Studio Marketplace.
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
introduced the
DirectCompute Microsoft DirectCompute is an application programming interface (API) that supports running compute kernels on general-purpose computing on graphics processing units on Microsoft's Windows Vista, Windows 7 and later versions. DirectCompute is part ...
GPU computing API, released with the
DirectX Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with "Direct", ...
11 API. ' created by QuantAlea introduces native GPU computing capabilities for the Microsoft .NET language F# and C#. Alea GPU also provides a simplified GPU programming model based on GPU parallel-for and parallel aggregate using delegates and automatic memory management.
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
supports GPGPU acceleration using the ''Parallel Computing Toolbox'' and ''MATLAB Distributed Computing Server'', and third-party packages like
Jacket A jacket is a garment for the upper body, usually extending below the hips. A jacket typically has sleeves, and fastens in the front or slightly on the side. A jacket is generally lighter, tighter-fitting, and less insulating than a coat, which ...
. GPGPU processing is also used to simulate
Newtonian physics Classical mechanics is a physical theory describing the motion of macroscopic objects, from projectiles to parts of machinery, and astronomical objects, such as spacecraft, planets, stars, and galaxies. For objects governed by classical mec ...
by
physics engine A physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics (including collision detection), soft body dynamics, and fluid dynamics, of use in the domains of computer gr ...
s, and commercial implementations include Havok Physics, FX and
PhysX PhysX is an open-source realtime physics engine middleware SDK developed by Nvidia as a part of Nvidia GameWorks software suite. Initially, video games supporting PhysX were meant to be accelerated by PhysX PPU (expansion cards designed by Ag ...
, both of which are typically used for computer and
video game Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, controller, keyboard, or motion sensing device to generate visual feedback. This fee ...
s. C++ Accelerated Massive Parallelism (
C++ AMP C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
) is a library that accelerates execution of
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
code by exploiting the data-parallel hardware on GPUs.


Mobile computers

Due to a trend of increasing power of mobile GPUs, general-purpose programming became available also on the mobile devices running major
mobile operating system A mobile operating system is an operating system for mobile phones, tablets, smartwatches, smartglasses, or other non-laptop personal mobile computing devices. While computers such as typical laptops are "mobile", the operating systems used on ...
s.
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
Android 4.2 enabled running
RenderScript RenderScript is a component of the Android operating system for mobile devices that offers an API for acceleration that takes advantage of heterogeneous hardware. It allows developers to increase the performance of their applications at the cost ...
code on the mobile device GPU.
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
introduced the proprietary
Metal A metal (from Greek μέταλλον ''métallon'', "mine, quarry, metal") is a material that, when freshly prepared, polished, or fractured, shows a lustrous appearance, and conducts electricity and heat relatively well. Metals are typicall ...
API for
iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also includes ...
applications, able to execute arbitrary code through Apple's GPU compute shaders.


Hardware support

Computer
video card A graphics card (also called a video card, display card, graphics adapter, VGA card/VGA, video adapter, display adapter, or mistakenly GPU) is an expansion card which generates a feed of output images to a display device, such as a computer moni ...
s are produced by various vendors, such as
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
,
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
. Cards from such vendors differ on implementing data-format support, such as
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
and
floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
formats (32-bit and 64-bit).
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
introduced a ''
Shader Model The High-Level Shader Language or High-Level Shading Language (HLSL) is a proprietary shading language developed by Microsoft for the Direct3D 9 API to augment the shader assembly language, and went on to become the required shading language ...
'' standard, to help rank the various features of graphic cards into a simple Shader Model version number (1.0, 2.0, 3.0, etc.).


Integer numbers

Pre-DirectX 9 video cards only supported paletted or integer color types. Various formats are available, each containing a red element, a green element, and a blue element. Sometimes another alpha value is added, to be used for transparency. Common formats are: * 8 bits per pixel – Sometimes palette mode, where each value is an index in a table with the real color value specified in one of the other formats. Sometimes three bits for red, three bits for green, and two bits for blue. * 16 bits per pixel – Usually the bits are allocated as five bits for red, six bits for green, and five bits for blue. * 24 bits per pixel – There are eight bits for each of red, green, and blue. * 32 bits per pixel – There are eight bits for each of red, green, blue, and
alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...
.


Floating-point numbers

For early
fixed-function Fixed-function is a term canonically used to contrast 3D graphics APIs and earlier GPUs designed prior to the advent of shader-based 3D graphics APIs and GPU architectures. History Historically fixed-function APIs consisted of a set of functio ...
or limited programmability graphics (i.e., up to and including DirectX 8.1-compliant GPUs) this was sufficient because this is also the representation used in displays. It is important to note that this representation does have certain limitations. Given sufficient graphics processing power even graphics programmers would like to use better formats, such as
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
data formats, to obtain effects such as
high-dynamic-range imaging In photography and videography, multi-exposure HDR capture is a technique that creates extended or high dynamic range (HDR) images by taking and combining multiple exposures of the same subject matter at different exposure levels. Combining mu ...
. Many GPGPU applications require floating point accuracy, which came with video cards conforming to the DirectX 9 specification. DirectX 9 Shader Model 2.x suggested the support of two precision types: full and partial precision. Full precision support could either be FP32 or FP24 (floating point 32- or 24-bit per component) or greater, while partial precision was FP16. ATI's
Radeon R300 The R300 GPU, introduced in August 2002 and developed by ATI Technologies, is its third generation of GPU used in ''Radeon'' graphics cards. This GPU features 3D acceleration based upon Direct3D 9.0 and OpenGL 2.0, a major improvement in featur ...
series of GPUs supported FP24 precision only in the programmable fragment pipeline (although FP32 was supported in the vertex processors) while
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
's NV30 series supported both FP16 and FP32; other vendors such as
S3 Graphics S3 Graphics, Ltd (commonly referred to as S3) was an American computer graphics company. The company sold the Trio, ViRGE, Savage 3D, and Chrome series of graphics processors. Struggling against competition from 3dfx Interactive, ATI and Nvid ...
and
XGI XGI Technology Inc. () is based upon the old graphics division of Silicon Integrated Systems (SiS) spun off as a separate company, and the graphics assets of Trident Microsystems. It existed from 2003 to 2010. History Founded in June 2003 and h ...
supported a mixture of formats up to FP24. The implementations of floating point on Nvidia GPUs are mostly
IEEE The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
compliant; however, this is not true across all vendors. This has implications for correctness which are considered important to some scientific applications. While 64-bit floating point values (double precision float) are commonly available on CPUs, these are not universally supported on GPUs. Some GPU architectures sacrifice IEEE compliance, while others lack double-precision. Efforts have occurred to emulate double-precision floating point values on GPUs; however, the speed tradeoff negates any benefit to offloading the computing onto the GPU in the first place.Double precision on GPUs (Proceedings of ASIM 2005)
: Dominik Goddeke, Robert Strzodka, and Stefan Turek. Accelerating Double Precision (FEM) Simulations with (GPUs). Proceedings of ASIM 2005 18th Symposium on Simulation Technique, 2005.


Vectorization

Most operations on the GPU operate in a vectorized fashion: one operation can be performed on up to four values at once. For example, if one color is to be modulated by another color , the GPU can produce the resulting color in one operation. This functionality is useful in graphics because almost every basic data type is a vector (either 2-, 3-, or 4-dimensional). Examples include vertices, colors, normal vectors, and texture coordinates. Many other applications can put this to good use, and because of their higher performance, vector instructions, termed single instruction, multiple data (
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
), have long been available on CPUs.


GPU vs. CPU

Originally, data was simply passed one-way from a
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU) to a
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
(GPU), then to a
display device A display device is an output device for presentation of information in visual or tactile form (the latter used for example in tactile electronic displays for blind people). When the input information that is supplied has an electrical signal the ...
. As time progressed, however, it became valuable for GPUs to store at first simple, then complex structures of data to be passed back to the CPU that analyzed an image, or a set of scientific-data represented as a 2D or 3D format that a video card can understand. Because the GPU has access to every draw operation, it can analyze data in these forms quickly, whereas a CPU must poll every pixel or data element much more slowly, as the speed of access between a CPU and its larger pool of
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A Random access, random-access memory device allows data items to b ...
(or in an even worse case, a
hard drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
) is slower than GPUs and video cards, which typically contain smaller amounts of more expensive memory that is much faster to access. Transferring the portion of the data set to be actively analyzed to that GPU memory in the form of textures or other easily readable GPU forms results in speed increase. The distinguishing feature of a GPGPU design is the ability to transfer information bidirectionally back from the GPU to the CPU; generally the data throughput in both directions is ideally high, resulting in a multiplier effect on the speed of a specific high-use
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
. GPGPU pipelines may improve efficiency on especially large data sets and/or data containing 2D or 3D imagery. It is used in complex graphics pipelines as well as
scientific computing Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
; more so in fields with large data sets like
genome mapping Gene mapping describes the methods used to identify the locus of a gene and the distances between genes. Gene mapping can also describe the distances between different sites within a gene. The essence of all genome mapping is to place a co ...
, or where two- or three-dimensional analysis is useful especially at present
biomolecule A biomolecule or biological molecule is a loosely used term for molecules present in organisms that are essential to one or more typically biological processes, such as cell division, morphogenesis, or development. Biomolecules include large ...
analysis,
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
study, and other complex
organic chemistry Organic chemistry is a subdiscipline within chemistry involving the scientific study of the structure, properties, and reactions of organic compounds and organic materials, i.e., matter in its various forms that contain carbon atoms.Clayden, J.; ...
. Such pipelines can also vastly improve efficiency in
image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
and
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
, among other fields; as well as parallel processing generally. Some very heavily optimized pipelines have yielded speed increases of several hundred times the original CPU-based pipeline on one high-use task. A simple example would be a GPU program that collects data about average
lighting Lighting or illumination is the deliberate use of light to achieve practical or aesthetic effects. Lighting includes the use of both artificial light sources like lamps and light fixtures, as well as natural illumination by capturing daylig ...
values as it renders some view from either a camera or a computer graphics program back to the main program on the CPU, so that the CPU can then make adjustments to the overall screen view. A more advanced example might use
edge detection Edge detection includes a variety of mathematical methods that aim at identifying edges, curves in a digital image at which the image brightness changes sharply or, more formally, has discontinuities. The same problem of finding discontinuitie ...
to return both numerical information and a processed image representing outlines to a
computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
program controlling, say, a mobile robot. Because the GPU has fast and local hardware access to every
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the smal ...
or other picture element in an image, it can analyze and average it (for the first example) or apply a Sobel edge filter or other
convolution In mathematics (in particular, functional analysis), convolution is a operation (mathematics), mathematical operation on two function (mathematics), functions ( and ) that produces a third function (f*g) that expresses how the shape of one is ...
filter (for the second) with much greater speed than a CPU, which typically must access slower
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A Random access, random-access memory device allows data items to b ...
copies of the graphic in question. GPGPU is fundamentally a software concept, not a hardware concept; it is a type of
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
, not a piece of equipment. Specialized equipment designs may, however, even further enhance the efficiency of GPGPU pipelines, which traditionally perform relatively few algorithms on very large amounts of data. Massively parallelized, gigantic-data-level tasks thus may be parallelized even further via specialized setups such as rack computing (many similar, highly tailored machines built into a ''rack''), which adds a third layer many computing units each using many CPUs to correspond to many GPUs. Some
Bitcoin Bitcoin ( abbreviation: BTC; sign: ₿) is a decentralized digital currency that can be transferred on the peer-to-peer bitcoin network. Bitcoin transactions are verified by network nodes through cryptography and recorded in a public distr ...
"miners" used such setups for high-quantity processing.


Caches

Historically, CPUs have used hardware-managed caches, but the earlier GPUs only provided software-managed local memories. However, as GPUs are being increasingly used for general-purpose applications, state-of-the-art GPUs are being designed with hardware-managed multi-level caches which have helped the GPUs to move towards mainstream computing. For example,
GeForce 200 series The GeForce 200 series is a series of Tesla-based GeForce graphics processing units developed by Nvidia. Architecture The GeForce 200 Series introduced Nvidia's second generation of Tesla (microarchitecture), Nvidia's unified shader architec ...
GT200 architecture GPUs did not feature an L2 cache, the
Fermi Enrico Fermi (; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age" and ...
GPU has 768 KiB last-level cache, the
Kepler Johannes Kepler (; ; 27 December 1571 – 15 November 1630) was a German astronomer, mathematician, astrologer, natural philosopher and writer on music. He is a key figure in the 17th-century Scientific Revolution, best known for his laws o ...
GPU has 1.5 MiB last-level cache, the Maxwell GPU has 2 MiB last-level cache, and the
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
GPU has 4 MiB last-level cache.


Register file

GPUs have very large register files, which allow them to reduce context-switching latency. Register file size is also increasing over different GPU generations, e.g., the total register file size on Maxwell (GM200), Pascal and Volta GPUs are 6 MiB, 14 MiB and 20 MiB, respectively. By comparison, the size of a register file on CPUs is small, typically tens or hundreds of kilobytes.


Energy efficiency

The high performance of GPUs comes at the cost of high power consumption, which under full load is in fact as much power as the rest of the PC system combined. The maximum power consumption of the Pascal series GPU (Tesla P100) was specified to be 250W.


Stream processing

GPUs are designed specifically for graphics and thus are very restrictive in operations and programming. Due to their design, GPUs are only effective for problems that can be solved using
stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
and the hardware can only be used in certain ways. The following discussion referring to vertices, fragments and textures concerns mainly the legacy model of GPGPU programming, where graphics APIs (
OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
or
DirectX Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with "Direct", ...
) were used to perform general-purpose computation. With the introduction of the
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
(Nvidia, 2007) and
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
(vendor-independent, 2008) general-purpose computing APIs, in new GPGPU codes it is no longer necessary to map the computation to graphics primitives. The stream processing nature of GPUs remains valid regardless of the APIs used. (See e.g.,) GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors processors that can operate in parallel by running one kernel on many records in a stream at once. A ''stream'' is simply a set of records that require similar computation. Streams provide data parallelism. ''
Kernels Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...
'' are the functions that are applied to each element in the stream. In the GPUs, ''vertices'' and ''fragments'' are the elements in streams and vertex and fragment shaders are the kernels to be run on them. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable. Arithmetic intensity is defined as the number of operations performed per word of memory transferred. It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup. Ideal GPGPU applications have large data sets, high parallelism, and minimal dependency between data elements.


GPU programming concepts


Computational resources

There are a variety of computational resources available on the GPU: * Programmable processors – vertex, primitive, fragment and mainly compute pipelines allow programmer to perform kernel on streams of data * Rasterizer – creates fragments and interpolates per-vertex constants such as texture coordinates and color * Texture unit – read-only memory interface * Framebuffer – write-only memory interface In fact, a program can substitute a write only texture for output instead of the framebuffer. This is done either through
Render to Texture This is a glossary of terms relating to computer graphics. For more general computer hardware terms, see glossary of computer hardware terms. 0–9 A B ...
(RTT), Render-To-Backbuffer-Copy-To-Texture (RTBCTT), or the more recent stream-out.


Textures as stream

The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs. Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on. Since textures are used as memory, texture lookups are then used as memory reads. Certain operations can be done automatically by the GPU because of this.


Kernels

Compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main progr ...
s can be thought of as the body of
loop Loop or LOOP may refer to: Brands and enterprises * Loop (mobile), a Bulgarian virtual network operator and co-founder of Loop Live * Loop, clothing, a company founded by Carlos Vasquez in the 1990s and worn by Digable Planets * Loop Mobile, an ...
s. For example, a programmer operating on a grid on the CPU might have code that looks like this: // Input and output grids have 10000 x 10000 or 100 million elements. void transform_10k_by_10k_grid(float in 000010000], float out 000010000]) On the GPU, the programmer only specifies the body of the loop as the kernel and what data to loop over by invoking geometry processing.


Flow control

In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops. Such flow control structures have only recently been added to GPUs. Conditional writes could be performed using a properly crafted series of arithmetic/bit operations, but looping and conditional branching were not possible. Recent GPUs allow branching, but usually with a performance penalty. Branching should generally be avoided in inner loops, whether in CPU or GPU code, and various methods, such as static branch resolution, pre-computation, predication, loop splitting,Future Chips
"Tutorial on removing branches", 2011
and Z-cullGPGPU survey paper
: John D. Owens, David Luebke, Naga Govindaraju, Mark Harris, Jens Krüger, Aaron E. Lefohn, and Tim Purcell. "A Survey of General-Purpose Computation on Graphics Hardware". Computer Graphics Forum, volume 26, number 1, 2007, pp. 80–113.
can be used to achieve branching when hardware support does not exist.


GPU methods


Map

The map operation simply applies the given function (the kernel) to every element in the stream. A simple example is multiplying each value in the stream by a constant (increasing the brightness of an image). The map operation is simple to implement on the GPU. The programmer generates a fragment for each pixel on screen and applies a fragment program to each one. The result stream of the same size is stored in the output buffer.


Reduce

Some computations require calculating a smaller stream (possibly a stream of only one element) from a larger stream. This is called a reduction of the stream. Generally, a reduction can be performed in multiple steps. The results from the prior step are used as the input for the current step and the range over which the operation is applied is reduced until only one stream element remains.


Stream filtering

Stream filtering is essentially a non-uniform reduction. Filtering involves removing items from the stream based on some criteria.


Scan

The scan operation, also termed '' prefix sum#Parallel algorithm, parallel prefix sum'', takes in a vector (stream) of data elements and an (arbitrary) associative binary function '+' with an identity element 'i'. If the input is 0, a1, a2, a3, ... an ''exclusive scan'' produces the output , a0, a0 + a1, a0 + a1 + a2, ... while an ''inclusive scan'' produces the output 0, a0 + a1, a0 + a1 + a2, a0 + a1 + a2 + a3, ...and does not require an identity to exist. While at first glance the operation may seem inherently serial, efficient parallel scan algorithms are possible and have been implemented on graphics processing units. The scan operation has uses in e.g., quicksort and sparse matrix-vector multiplication.


Scatter

The scatter operation is most naturally defined on the vertex processor. The vertex processor is able to adjust the position of the
vertex Vertex, vertices or vertexes may refer to: Science and technology Mathematics and computer science *Vertex (geometry), a point where two or more curves, lines, or edges meet * Vertex (computer graphics), a data structure that describes the positio ...
, which allows the programmer to control where information is deposited on the grid. Other extensions are also possible, such as controlling how large an area the vertex affects. The fragment processor cannot perform a direct scatter operation because the location of each fragment on the grid is fixed at the time of the fragment's creation and cannot be altered by the programmer. However, a logical scatter operation may sometimes be recast or implemented with another gather step. A scatter implementation would first emit both an output value and an output address. An immediately following gather operation uses address comparisons to see whether the output value maps to the current output slot. In dedicated
compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main progr ...
s, scatter can be performed by indexed writes.


Gather

Gather Gather, gatherer, or gathering may refer to: Anthropology and sociology * Hunter-gatherer, a person or a society whose subsistence depends on hunting and gathering of wild foods *Intensive gathering, the practice of cultivating wild plants as a s ...
is the reverse of scatter. After scatter reorders elements according to a map, gather can restore the order of the elements according to the map scatter used. In dedicated compute kernels, gather may be performed by indexed reads. In other shaders, it is performed with texture-lookups.


Sort

The sort operation transforms an unordered set of elements into an ordered set of elements. The most common implementation on GPUs is using
radix sort In computer science, radix sort is a non-comparative sorting algorithm. It avoids comparison by creating and distributing elements into buckets according to their radix. For elements with more than one significant digit, this bucketing process i ...
for integer and floating point data and coarse-grained
merge sort In computer science, merge sort (also commonly spelled as mergesort) is an efficient, general-purpose, and comparison-based sorting algorithm. Most implementations produce a stable sort, which means that the order of equal elements is the same ...
and fine-grained sorting networks for general comparable data.Merrill, Duane. Allocation-oriented Algorithm Design with Application to GPU Computing
Ph.D. dissertation, Department of Computer Science, University of Virginia. Dec. 2011.

, 2013.


Search

The search operation allows the programmer to find a given element within the stream, or possibly find neighbors of a specified element. The GPU is not used to speed up the search for an individual element, but instead is used to run multiple searches in parallel. Mostly the search method used is
binary search In computer science, binary search, also known as half-interval search, logarithmic search, or binary chop, is a search algorithm that finds the position of a target value within a sorted array. Binary search compares the target value to the m ...
on sorted elements.


Data structures

A variety of data structures can be represented on the GPU: * Dense
arrays An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
*
Sparse matrices In numerical analysis and scientific computing, a sparse matrix or sparse array is a matrix in which most of the elements are zero. There is no strict definition regarding the proportion of zero-value elements for a matrix to qualify as sparse b ...
(
sparse array Sparse is a computer software tool designed to find possible coding faults in the Linux kernel. Unlike other such tools, this static analysis tool was initially designed to only flag constructs that were likely to be of interest to kernel deve ...
) static or dynamic * Adaptive structures (
union type In computer science, a union is a value that may have any of several representations or formats within the same position in memory; that consists of a variable that may hold such a data structure. Some programming languages support special data t ...
)


Applications

The following are some of the areas where GPUs have been used for general purpose computing: *
Automatic parallelization Automatic may refer to: Music Bands * Automatic (band), Australian rock band * Automatic (American band), American rock band * The Automatic, a Welsh alternative rock band Albums * ''Automatic'' (Jack Bruce album), a 1983 electronic rock ...
* Physical based simulation and
physics engine A physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics (including collision detection), soft body dynamics, and fluid dynamics, of use in the domains of computer gr ...
sJoselli, Mark, et al.
A new physics engine with automatic process distribution between CPU-GPU
" Proceedings of the 2008 ACM SIGGRAPH symposium on Video games. ACM, 2008.
(usually based on
Newtonian physics Classical mechanics is a physical theory describing the motion of macroscopic objects, from projectiles to parts of machinery, and astronomical objects, such as spacecraft, planets, stars, and galaxies. For objects governed by classical mec ...
models) **
Conway's Game of Life The Game of Life, also known simply as Life, is a cellular automaton devised by the British mathematician John Horton Conway in 1970. It is a zero-player game, meaning that its evolution is determined by its initial state, requiring no further ...
,
cloth simulation Textile is an umbrella term that includes various fiber-based materials, including fibers, yarns, filaments, threads, different fabric types, etc. At first, the word "textiles" only referred to woven fabrics. However, weaving is not the ...
, fluid
incompressible flow In fluid mechanics or more generally continuum mechanics, incompressible flow ( isochoric flow) refers to a flow in which the material density is constant within a fluid parcel—an infinitesimal volume that moves with the flow velocity. A ...
by solution of
Euler equations (fluid dynamics) In fluid dynamics, the Euler equations are a set of quasilinear partial differential equations governing adiabatic and inviscid flow. They are named after Leonhard Euler. In particular, they correspond to the Navier–Stokes equations with zer ...
or
Navier–Stokes equations In physics, the Navier–Stokes equations ( ) are partial differential equations which describe the motion of viscous fluid substances, named after French engineer and physicist Claude-Louis Navier and Anglo-Irish physicist and mathematician Geo ...
*
Statistical physics Statistical physics is a branch of physics that evolved from a foundation of statistical mechanics, which uses methods of probability theory and statistics, and particularly the Mathematics, mathematical tools for dealing with large populations ...
**
Ising model The Ising model () (or Lenz-Ising model or Ising-Lenz model), named after the physicists Ernst Ising and Wilhelm Lenz, is a mathematical model of ferromagnetism in statistical mechanics. The model consists of discrete variables that represent ...
*
Lattice gauge theory In physics, lattice gauge theory is the study of gauge theories on a spacetime that has been discretized into a lattice. Gauge theories are important in particle physics, and include the prevailing theories of elementary particles: quantum elec ...
* Segmentation 2D and 3D *
Level set methods Level-set methods (LSM) are a conceptual framework for using level sets as a tool for numerical analysis of surfaces and shapes. The advantage of the level-set model is that one can perform numerical computations involving curves and surfaces on a ...
* CT reconstruction * Fast Fourier transform * GPU learning
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
and data mining computations, e.g., with software BIDMach *
k-nearest neighbor algorithm In statistics, the ''k''-nearest neighbors algorithm (''k''-NN) is a non-parametric supervised learning method first developed by Evelyn Fix and Joseph Hodges in 1951, and later expanded by Thomas Cover. It is used for classification and reg ...
*
Fuzzy logic Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely ...
*
Tone mapping Tone mapping is a technique used in image processing and computer graphics to map one set of colors to another to approximate the appearance of high-dynamic-range images in a medium that has a more limited dynamic range. Print-outs, CRT or L ...
*
Audio signal processing Audio signal processing is a subfield of signal processing that is concerned with the electronic manipulation of audio signals. Audio signals are electronic representations of sound waves—longitudinal waves which travel through air, consisting ...
** Audio and sound effects processing, to use a
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
for
digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...
(DSP) **
Analog signal processing Analog signal processing is a type of signal processing conducted on continuous analog signals by some analog means (as opposed to the discrete digital signal processing where the signal processing is carried out by a digital process). "Analog" indi ...
**
Speech processing Speech processing is the study of speech signals and the processing methods of signals. The signals are usually processed in a digital representation, so speech processing can be regarded as a special case of digital signal processing, applied t ...
*
Digital image processing Digital image processing is the use of a digital computer to process digital images through an algorithm. As a subcategory or field of digital signal processing, digital image processing has many advantages over analog image processing. It allo ...
*
Video processing In electronics engineering, video processing is a particular case of signal processing, in particular image processing, which often employs video filters and where the input and output signals are video files or video streams. Video processing tec ...
** Hardware accelerated video decoding and post-processing ***
Motion compensation Motion compensation in computing, is an algorithmic technique used to predict a frame in a video, given the previous and/or future frames by accounting for motion of the camera and/or objects in the video. It is employed in the encoding of video d ...
(mo comp) *** Inverse discrete cosine transform (iDCT) *** Variable-length decoding (VLD),
Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code proceeds by means of Huffman coding, an algori ...
*** Inverse quantization ( IQ (not to be confused by Intelligence Quotient)) *** In-loop deblocking *** Bitstream processing (
CAVLC Context-adaptive variable-length coding (CAVLC) is a form of entropy coding used in H.264/MPEG-4 AVC video encoding. It is an inherently lossless compression technique, like almost all entropy-coders. In H.264/MPEG-4 AVC, it is used to encode r ...
/
CABAC Context-adaptive binary arithmetic coding (CABAC) is a form of entropy encoding used in the H.264/MPEG-4 AVC and High Efficiency Video Coding (HEVC) standards. It is a lossless compression technique, although the video coding standards in which it ...
) using special purpose hardware for this task because this is a serial task not suitable for regular GPGPU computation ***
Deinterlacing Deinterlacing is the process of converting interlaced video into a non-interlaced or Progressive scan, progressive form. Interlaced video signals are commonly found in analog television, digital television (HDTV) when in the 1080i format, some D ...
**** Spatial-temporal deinterlacing *** Noise reduction *** Edge enhancement *** Color correction ** Hardware accelerated video encoding and pre-processing *
Global illumination Global illumination (GI), or indirect illumination, is a group of algorithms used in 3D computer graphics that are meant to add more realistic lighting to 3D scenes. Such algorithms take into account not only the light that comes directly from ...
ray tracing,
photon mapping In computer graphics, photon mapping is a two-pass global illumination rendering algorithm developed by Henrik Wann Jensen between 1995 and 2001Jensen, H. (1996). ''Global Illumination using Photon Maps''. nlineAvailable at: http://graphics.stanf ...
, radiosity among others,
subsurface scattering Subsurface scattering (SSS), also known as subsurface light transport (SSLT), is a mechanism of light transport in which light that penetrates the surface of a translucent object is scattered by interacting with the material and exits the surfa ...
* Geometric computing
constructive solid geometry Constructive solid geometry (CSG; formerly called computational binary solid geometry) is a technique used in solid modeling. Constructive solid geometry allows a modeler to create a complex surface or object by using Boolean operators to combi ...
, distance fields,
collision detection Collision detection is the computational problem of detecting the intersection (Euclidean geometry), intersection of two or more objects. Collision detection is a classic issue of computational geometry and has applications in various computing ...
, transparency computation, shadow generation * Scientific computing **
Monte Carlo simulation Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determini ...
of light propagation **
Weather forecasting Weather forecasting is the application of science and technology forecasting, to predict the conditions of the Earth's atmosphere, atmosphere for a given location and time. People have attempted to predict the weather informally for millennia a ...
**
Climate research Climatology (from Greek , ''klima'', "place, zone"; and , '' -logia'') or climate science is the scientific study of Earth's climate, typically defined as weather conditions averaged over a period of at least 30 years. This modern field of stu ...
**
Molecular modeling on GPU Molecular modeling on GPU is the technique of using a graphics processing unit (GPU) for molecular simulations. In 2007, NVIDIA introduced video cards that could be used not only to show graphics but also for scientific calculations. These cards ...
**
Quantum mechanical Quantum mechanics is a fundamental theory in physics that provides a description of the physical properties of nature at the scale of atoms and subatomic particles. It is the foundation of all quantum physics including quantum chemistry, qua ...
physics **
Astrophysics Astrophysics is a science that employs the methods and principles of physics and chemistry in the study of astronomical objects and phenomena. As one of the founders of the discipline said, Astrophysics "seeks to ascertain the nature of the h ...
*
Bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
*
Computational finance Computational finance is a branch of applied computer science that deals with problems of practical interest in finance.Rüdiger U. Seydel, '' tp://nozdr.ru/biblio/kolxo3/F/FN/Seydel%20R.U.%20Tools%20for%20Computational%20Finance%20(4ed.,%20Spring ...
*
Medical imaging Medical imaging is the technique and process of imaging the interior of a body for clinical analysis and medical intervention, as well as visual representation of the function of some organs or tissues (physiology). Medical imaging seeks to rev ...
*
Clinical decision support system A clinical decision support system (CDSS) is a health information technology, provides clinicians, staff, patients, or other individuals with knowledge and person-specific information, to help health and health care. CDSS encompasses a variety of ...
(CDSS) *
Computer vision Computer vision is an interdisciplinary scientific field that deals with how computers can gain high-level understanding from digital images or videos. From the perspective of engineering, it seeks to understand and automate tasks that the hum ...
*
Digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...
/
signal processing Signal processing is an electrical engineering subfield that focuses on analyzing, modifying and synthesizing ''signals'', such as audio signal processing, sound, image processing, images, and scientific measurements. Signal processing techniq ...
*
Control engineering Control engineering or control systems engineering is an engineering discipline that deals with control systems, applying control theory to design equipment and systems with desired behaviors in control environments. The discipline of controls o ...
*
Operations research Operations research ( en-GB, operational research) (U.S. Air Force Specialty Code: Operations Analysis), often shortened to the initialism OR, is a discipline that deals with the development and application of analytical methods to improve deci ...
** Implementations of: the GPU Tabu Search algorithm solving the Resource Constrained Project Scheduling problem is freely available on GitHub; the GPU algorithm solving the Nurse Rerostering problem is freely available on GitHub. *
Neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
s *
Database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
operations *
Computational Fluid Dynamics Computational fluid dynamics (CFD) is a branch of fluid mechanics that uses numerical analysis and data structures to analyze and solve problems that involve fluid flows. Computers are used to perform the calculations required to simulate th ...
especially using
Lattice Boltzmann methods The lattice Boltzmann methods (LBM), originated from the lattice gas automata (LGA) method (Hardy- Pomeau-Pazzis and Frisch- Hasslacher- Pomeau models), is a class of computational fluid dynamics (CFD) methods for fluid simulation. Instead of solv ...
*
Cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
and
cryptanalysis Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic sec ...
* Performance modeling: computationally intensive tasks on GPU ** Implementations of:
MD6 The MD6 Message-Digest Algorithm is a cryptographic hash function. It uses a Merkle tree-like structure to allow for immense parallel computation of hashes for very long inputs. Authors claim a performance of 28 cycles per byte for MD6-256 on an ...
,
Advanced Encryption Standard The Advanced Encryption Standard (AES), also known by its original name Rijndael (), is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001. AES is a variant ...
(AES),
Data Encryption Standard The Data Encryption Standard (DES ) is a symmetric-key algorithm for the encryption of digital data. Although its short key length of 56 bits makes it too insecure for modern applications, it has been highly influential in the advancement of cry ...
(DES), RSA,
elliptic curve cryptography Elliptic-curve cryptography (ECC) is an approach to public-key cryptography based on the algebraic structure of elliptic curves over finite fields. ECC allows smaller keys compared to non-EC cryptography (based on plain Galois fields) to provide e ...
(ECC) **
Password cracking In cryptanalysis and computer security, password cracking is the process of recovering passwords from data that has been stored in or transmitted by a computer system in scrambled form. A common approach (brute-force attack) is to repeatedly try ...
**
Cryptocurrency A cryptocurrency, crypto-currency, or crypto is a digital currency designed to work as a medium of exchange through a computer network that is not reliant on any central authority, such as a government or bank, to uphold or maintain it. It i ...
transactions processing ("mining") (
Bitcoin mining The bitcoin network is a peer-to-peer payment network that operates on a cryptographic protocol. Users send and receive bitcoins, the units of currency, by broadcasting digitally-signed messages to the network using bitcoin cryptocurrency ...
) *
Electronic design automation Electronic design automation (EDA), also referred to as electronic computer-aided design (ECAD), is a category of software tools for designing Electronics, electronic systems such as integrated circuits and printed circuit boards. The tools wo ...
*
Antivirus software Antivirus software (abbreviated to AV software), also known as anti-malware, is a computer program used to prevent, detect, and remove malware. Antivirus software was originally developed to detect and remove computer viruses, hence the nam ...
*
Intrusion detection An intrusion detection system (IDS; also intrusion prevention system or IPS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically rep ...
Regular Expression Matching on Graphics Hardware for Intrusion Detection
. Giorgos Vasiliadis et al., Regular Expression Matching on Graphics Hardware for Intrusion Detection. In proceedings of RAID 2009.
* Increase computing power for
distributed computing A distributed system is a system whose components are located on different computer network, networked computers, which communicate and coordinate their actions by message passing, passing messages to one another from any system. Distributed com ...
projects like SETI@home,
Einstein@home Einstein@Home is a volunteer computing project that searches for signals from spinning neutron stars in data from gravitational-wave detectors, from large radio telescopes, and from a gamma-ray telescope. Neutron stars are detected by their pulse ...


Bioinformatics

GPGPU usage in Bioinformatics:


Molecular dynamics

† Expected speedups are highly dependent on system configuration. GPU performance compared against multi-core
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
CPU socket. GPU performance benchmarked on GPU supported features and may be a
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...
to kernel performance comparison. For details on configuration used, view application website. Speedups as per Nvidia in-house testing or ISV's documentation. ‡ Q=
Quadro GPU Quadro was Nvidia's brand for Graphics processing unit, graphics cards intended for use in workstations running professional computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC) applications, scientific ...
, T= Tesla GPU. Nvidia recommended GPUs for this application. Check with developer or ISV to obtain certification information.


See also

*
Fastra II The Fastra II is a desktop computer, desktop supercomputer designed for tomography. It was built in late 2009 by the ASTRA (All Scale Tomographic Reconstruction Antwerp) group of researchers of the IBBT (Interdisciplinary institute for BroadBand ...
* Physics engine **
Advanced Simulation Library Advanced Simulation Library (ASL) is free and open-source hardware-accelerated multiphysics simulation platform. It enables users to write customized numerical solvers in C++ and deploy them on a variety of massively parallel architecture ...
**
Physics processing unit A physics processing unit (PPU) is a dedicated microprocessor designed to handle the calculations of physics, especially in the physics engine of video games. It is an example of hardware acceleration. Examples of calculations involving a PPU mig ...
(PPU) *
Close to Metal In computing, Close To Metal ("CTM" in short, originally called ''Close-to-the-Metal'') is the name of a beta version of a low-level programming interface developed by ATI, now the AMD Graphics Product Group, aimed at enabling GPGPU computing. CT ...
*
Audio processing unit A sound card (also known as an audio card) is an internal expansion card that provides input and output of audio signals to and from a computer under the control of computer programs. The term ''sound card'' is also applied to external audio ...
*
Larrabee (microarchitecture) Larrabee is the codename for a cancelled GPGPU chip that Intel was developing separately from its current line of integrated graphics accelerators. It is named after either Mount Larrabee or Larrabee State Park in Whatcom County, Washington, near ...
*
AI accelerator An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications i ...
* Deep learning processor (DLP)


References

{{DEFAULTSORT:Gpgpu Emerging technologies Graphics hardware Graphics cards Instruction processing Parallel computing Video game development