HOME

TheInfoList



OR:

CUDA (or Compute Unified Device Architecture) is a
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
platform and
application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
(API) that allows software to use certain types of
graphics processing units A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mob ...
(GPUs) for general purpose processing, an approach called general-purpose computing on GPUs (
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
). CUDA is a software layer that gives direct access to the GPU's virtual
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
and parallel computational elements, for the execution of
compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main progr ...
s. CUDA is designed to work with programming languages such as C,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, and Fortran. This accessibility makes it easier for specialists in parallel programming to use GPU resources, in contrast to prior APIs like
Direct3D Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware a ...
and
OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
, which required advanced skills in graphics programming. CUDA-powered GPUs also support programming frameworks such as
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...
,
OpenACC OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems. As in OpenMP, the programme ...
and
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
; and HIP by compiling such code to CUDA. CUDA was created by
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
. When it was first introduced, the name was an acronym for Compute Unified Device Architecture, but Nvidia later dropped the common use of the acronym.


Background

The graphics processing unit (GPU), as a specialized computer processor, addresses the demands of
real-time Real-time or real time describes various operations in computing or other processes that must guarantee response times within a specified time (deadline), usually a relatively short time. A real-time process is generally one that happens in defined ...
high-resolution 3D graphics compute-intensive tasks. By 2012, GPUs had evolved into highly parallel multi-core systems allowing efficient manipulation of large blocks of data. This design is more effective than general-purpose
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPUs) for
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s in situations where processing large blocks of data is done in parallel, such as: *
cryptographic hash function A cryptographic hash function (CHF) is a hash algorithm (a map of an arbitrary binary string to a binary string with fixed size of n bits) that has special properties desirable for cryptography: * the probability of a particular n-bit output re ...
s *
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
*
molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of the ...
simulations *
physics engine A physics engine is computer software that provides an approximate simulation of certain physical systems, such as rigid body dynamics (including collision detection), soft body dynamics, and fluid dynamics, of use in the domains of computer gr ...
s *
sort algorithm In computer science, a sorting algorithm is an algorithm that puts elements of a list into an order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending. Efficient sorting is important ...
s


Programming abilities

The CUDA platform is accessible to software developers through CUDA-accelerated libraries,
compiler directives In computer programming, a directive or pragma (from "pragmatic") is a language construct that specifies how a compiler (or other Translator (computing), translator) should process its input. Directives are not part of the Formal grammar, grammar ...
such as
OpenACC OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems. As in OpenMP, the programme ...
, and extensions to industry-standard programming languages including C,
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
and Fortran. C/C++ programmers can use 'CUDA C/C++', compiled to PTX with nvcc, Nvidia's
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
-based C/C++ compiler, or by clang itself. Fortran programmers can use 'CUDA Fortran', compiled with the PGI CUDA Fortran compiler from
The Portland Group PGI (formerly The Portland Group, Inc.) was a company that produced a set of commercially available Fortran, C and C++ compilers for high-performance computing systems. On July 29, 2013, Nvidia acquired The Portland Group, Inc.Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computation ...
's
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
, Microsoft's
DirectCompute Microsoft DirectCompute is an application programming interface (API) that supports running compute kernels on general-purpose computing on graphics processing units on Microsoft's Windows Vista, Windows 7 and later versions. DirectCompute is part ...
,
OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
Compute Shader and
C++ AMP C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
. Third party wrappers are also available for
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
,
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
, Fortran,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
,
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
,
Lua Lua or LUA may refer to: Science and technology * Lua (programming language) * Latvia University of Agriculture * Last universal ancestor, in evolution Ethnicity and language * Lua people, of Laos * Lawa people, of Thailand sometimes referred t ...
,
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fro ...
,
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
, R,
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...
, IDL,
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
, and native support in
Mathematica Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimizat ...
. In the
computer game Video games, also known as computer games, are electronic games that involves interaction with a user interface or input device such as a joystick, game controller, controller, computer keyboard, keyboard, or motion sensing device to gener ...
industry, GPUs are used for graphics rendering, and for game physics calculations (physical effects such as debris, smoke, fire, fluids); examples include
PhysX PhysX is an open-source realtime physics engine middleware SDK developed by Nvidia as a part of Nvidia GameWorks software suite. Initially, video games supporting PhysX were meant to be accelerated by PhysX PPU (expansion cards designed by ...
and
Bullet A bullet is a kinetic projectile, a component of firearm ammunition that is shot from a gun barrel. Bullets are made of a variety of materials, such as copper, lead, steel, polymer, rubber and even wax. Bullets are made in various shapes and co ...
. CUDA has also been used to accelerate non-graphical applications in
computational biology Computational biology refers to the use of data analysis, mathematical modeling and computational simulations to understand biological systems and relationships. An intersection of computer science, biology, and big data, the field also has fo ...
,
cryptography Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or ''-logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adver ...
and other fields by an
order of magnitude An order of magnitude is an approximation of the logarithm of a value relative to some contextually understood reference value, usually 10, interpreted as the base of the logarithm and the representative of values of magnitude one. Logarithmic dis ...
or more. CUDA provides both a low level
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
(CUDA Driver API, non single-source) and a higher level API (CUDA Runtime API, single-source). The initial CUDA SDK was made public on 15 February 2007, for
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
and
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
.
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
support was later added in version 2.0, which supersedes the beta released February 14, 2008. CUDA works with all Nvidia GPUs from the G8x series onwards, including
GeForce GeForce is a brand of graphics processing units (GPUs) designed by Nvidia. As of the GeForce 40 series, there have been eighteen iterations of the design. The first GeForce products were discrete GPUs designed for add-on graphics boards, inte ...
,
Quadro Quadro was Nvidia's brand for graphics cards intended for use in workstations running professional computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC) applications, scientific calculations and machine ...
and the Tesla line. CUDA is compatible with most standard operating systems. CUDA 8.0 comes with the following libraries (for compilation & runtime, in alphabetical order): * cuBLAS – CUDA Basic Linear Algebra Subroutines library * CUDART – CUDA Runtime library * cuFFT – CUDA Fast Fourier Transform library * cuRAND – CUDA Random Number Generation library * cuSOLVER – CUDA based collection of dense and sparse direct solvers * cuSPARSE – CUDA Sparse Matrix library * NPP – NVIDIA Performance Primitives library * nvGRAPH – NVIDIA Graph Analytics library * NVML – NVIDIA Management Library * NVRTC – NVIDIA Runtime Compilation library for CUDA C++ CUDA 8.0 comes with these other software components: * nView – NVIDIA nView Desktop Management Software * NVWMI – NVIDIA Enterprise Management Toolkit * GameWorks
PhysX PhysX is an open-source realtime physics engine middleware SDK developed by Nvidia as a part of Nvidia GameWorks software suite. Initially, video games supporting PhysX were meant to be accelerated by PhysX PPU (expansion cards designed by ...
– is a multi-platform game physics engine CUDA 9.0–9.2 comes with these other components: * CUTLASS 1.0 – custom linear algebra algorithms, * NVCUVID – NVIDIA Video Decoder was deprecated in CUDA 9.2; it is now available in NVIDIA Video Codec SDK CUDA 10 comes with these other components: * nvJPEG – Hybrid (CPU and GPU) JPEG processing CUDA 11.0-11.8 comes with these other components: * CUB is new one of more supported C++ libraries * MIG multi instance GPU support * nvJPEG2000 –
JPEG 2000 JPEG 2000 (JP2) is an image compression standard and coding system. It was developed from 1997 to 2000 by a Joint Photographic Experts Group committee chaired by Touradj Ebrahimi (later the JPEG president), with the intention of superseding the ...
encoder and decoder


Advantages

CUDA has several advantages over traditional general-purpose computation on GPUs (GPGPU) using graphics APIs: * Scattered reads code can read from arbitrary addresses in memory. * Unified virtual memory (CUDA 4.0 and above) * Unified memory (CUDA 6.0 and above) *
Shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
CUDA exposes a fast
shared memory In computer science, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Shared memory is an efficient means of passing data between progr ...
region that can be shared among threads. This can be used as a user-managed cache, enabling higher bandwidth than is possible using texture lookups. * Faster downloads and readbacks to and from the GPU * Full support for integer and bitwise operations, including integer texture lookups


Limitations

* Whether for the host computer or the GPU device, all CUDA source code is now processed according to C++ syntax rules. This was not always the case. Earlier versions of CUDA were based on C syntax rules. As with the more general case of compiling C code with a C++ compiler, it is therefore possible that old C-style CUDA source code will either fail to compile or will not behave as originally intended. * Interoperability with rendering languages such as OpenGL is one-way, with OpenGL having access to registered CUDA memory but CUDA not having access to OpenGL memory. * Copying between host and device memory may incur a performance hit due to system bus bandwidth and latency (this can be partly alleviated with asynchronous memory transfers, handled by the GPU's DMA engine). * Threads should be running in groups of at least 32 for best performance, with total number of threads numbering in the thousands. Branches in the program code do not affect performance significantly, provided that each of 32 threads takes the same execution path; the
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
execution model becomes a significant limitation for any inherently divergent task (e.g. traversing a
space partitioning In geometry, space partitioning is the process of dividing a space (usually a Euclidean space) into two or more disjoint subsets (see also partition of a set). In other words, space partitioning divides a space into non-overlapping regions. Any ...
data structure during ray tracing). * No emulator or fallback functionality is available for modern revisions. * Valid C++ may sometimes be flagged and prevent compilation due to the way the compiler approaches optimization for target GPU device limitations. * C++
run-time type information In computer programming, run-time type information or run-time type identification (RTTI) is a feature of some programming languages (such as C++, Object Pascal, and Ada) that exposes information about an object's data type at runtime. Run-time typ ...
(RTTI) and C++-style exception handling are only supported in host code, not in device code. * In
single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floatin ...
on first generation CUDA compute capability 1.x devices,
denormal number In computer science, subnormal numbers are the subset of denormalized numbers (sometimes called denormals) that fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest normal ...
s are unsupported and are instead flushed to zero, and the precision of both the division and square root operations are slightly lower than IEEE 754-compliant single precision math. Devices that support compute capability 2.0 and above support denormal numbers, and the division and square root operations are IEEE 754 compliant by default. However, users can obtain the prior faster gaming-grade math of compute capability 1.x devices if desired by setting compiler flags to disable accurate divisions and accurate square roots, and enable flushing denormal numbers to zero. * Unlike
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-progra ...
, CUDA-enabled GPUs are only available from Nvidia. Attempts to implement CUDA on other GPUs include: ** Project Coriander: Converts CUDA C++11 source to OpenCL 1.2 C. A fork of CUDA-on-CL intended to run
TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...
. ** CU2CL: Convert CUDA 3.2 C++ to OpenCL C. ** GPUOpen HIP: A thin abstraction layer on top of CUDA and ROCm intended for AMD and Nvidia GPUs. Has a conversion tool for importing CUDA C++ source. Supports CUDA 4.0 plus C++11 and float16.


GPUs supported

Supported CUDA level of GPU and card. * CUDA SDK 1.0 support for compute capability 1.0 – 1.1 (Tesla) * CUDA SDK 1.1 support for compute capability 1.0 – 1.1+x (Tesla) * CUDA SDK 2.0 support for compute capability 1.0 – 1.1+x (Tesla) * CUDA SDK 2.1 – 2.3.1 support for compute capability 1.0 – 1.3 (Tesla) * CUDA SDK 3.0 – 3.1 support for compute capability 1.0 – 2.0 (Tesla, Fermi) * CUDA SDK 3.2 support for compute capability 1.0 – 2.1 (Tesla, Fermi) * CUDA SDK 4.0 – 4.2 support for compute capability 1.0 – 2.1+x (Tesla, Fermi, more?). * CUDA SDK 5.0 – 5.5 support for compute capability 1.0 – 3.5 (Tesla, Fermi, Kepler). * CUDA SDK 6.0 support for compute capability 1.0 – 3.5 (Tesla, Fermi, Kepler). * CUDA SDK 6.5 support for compute capability 1.1 – 5.x (Tesla, Fermi, Kepler, Maxwell). Last version with support for compute capability 1.x (Tesla). * CUDA SDK 7.0 – 7.5 support for compute capability 2.0 – 5.x (Fermi, Kepler, Maxwell). * CUDA SDK 8.0 support for compute capability 2.0 – 6.x (Fermi, Kepler, Maxwell, Pascal). Last version with support for compute capability 2.x (Fermi). * CUDA SDK 9.0 – 9.2 support for compute capability 3.0 – 7.0 (Kepler, Maxwell, Pascal, Volta) * CUDA SDK 10.0 – 10.2 support for compute capability 3.0 – 7.5 (Kepler, Maxwell, Pascal, Volta, Turing). Last version with support for compute capability 3.0 and 3.2 (Kepler in part). 10.2 is the last official release for macOS, as support will not be available for macOS in newer releases. * CUDA SDK 11.0 support for compute capability 3.5 – 8.0 (Kepler (in part), Maxwell, Pascal, Volta, Turing, Ampere (in part)). * CUDA SDK 11.1 – 11.4 support for compute capability 3.5 – 8.6 (Kepler (in part), Maxwell, Pascal, Volta, Turing, Ampere (in part)). * CUDA SDK 11.5 – 11.7.1 support for compute capability 3.5 – 8.7 (Kepler (in part), Maxwell, Pascal, Volta, Turing, Ampere). * CUDA SDK 11.8 support for compute capability 3.5 – 9.0 (Kepler (in part), Maxwell, Pascal, Volta, Turing, Ampere, Ada Lovelace, Hopper). * CUDA SDK 12.0 support for compute capability 5.0 – 9.0 (Maxwell, Pascal, Volta, Turing, Ampere, Ada Lovelace, Hopper) '*' –
OEM An original equipment manufacturer (OEM) is generally perceived as a company that produces non-aftermarket parts and equipment that may be marketed by another manufacturer. It is a common industry term recognized and used by many professional or ...
-only products


Version features and specifications


Data types

Note: Any missing lines or empty entries do reflect some lack of information on that exact item.


Tensor cores

Note: Any missing lines or empty entries do reflect some lack of information on that exact item.


Technical Specification


Multiprocessor Architecture

For more information read the Nvidia CUDA programming guide.


Example

This example code in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
loads a texture from an image into an array on the GPU: texture tex; void foo() //end foo() __global__ void kernel(float* odata, int height, int width) Below is an example given in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
that computes the product of two arrays on the GPU. The unofficial Python language bindings can be obtained from ''PyCUDA''. import pycuda.compiler as comp import pycuda.driver as drv import numpy import pycuda.autoinit mod = comp.SourceModule( """ __global__ void multiply_them(float *dest, float *a, float *b) """ ) multiply_them = mod.get_function("multiply_them") a = numpy.random.randn(400).astype(numpy.float32) b = numpy.random.randn(400).astype(numpy.float32) dest = numpy.zeros_like(a) multiply_them(drv.Out(dest), drv.In(a), drv.In(b), block=(400, 1, 1)) print(dest - a * b) Additional Python bindings to simplify matrix multiplication operations can be found in the program ''pycublas''. import numpy from pycublas import CUBLASMatrix A = CUBLASMatrix(numpy.mat(
1, 2, 3 1-2-3; 1, 2, 3; or One, Two, Three may refer to: Brands * 1-2-3 (fuel station), in Norway * Lotus 1-2-3, a computer spreadsheet program * .123, a file extension used by Lotus 1-2-3 * Jell-O 1-2-3, a dessert Film, TV and books * ''One, Two, Three'' ...
, 5, 6, numpy.float32)) B = CUBLASMatrix(numpy.mat(
2, 3 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
, 5 The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
, 7, numpy.float32)) C = A * B print(C.np_mat())
while
CuPy CuPy is an open source library for GPU-accelerated computing with Python programming language, providing support for multi-dimensional arrays, sparse matrices, and a variety of numerical algorithms implemented on top of them. CuPy shares the sa ...
directly replaces NumPy: import cupy a = cupy.random.randn(400) b = cupy.random.randn(400) dest = cupy.zeros_like(a) print(dest - a * b)


Current and future usages of CUDA architecture

* Accelerated rendering of 3D graphics * Accelerated interconversion of video file formats * Accelerated
encryption In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can decip ...
,
decryption In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can decip ...
and
compression Compression may refer to: Physical science *Compression (physics), size reduction due to forces *Compression member, a structural element such as a column *Compressibility, susceptibility to compression * Gas compression *Compression ratio, of a ...
*
Bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
, e.g.
NGS NGS may refer to: Places * NGSO (NGS orbit), non-geostationary orbit * Nagasaki Airport (IATA airport code: NGS) in Omura, Nagasaki, Japan Organisations * National Galleries of Scotland, representing the national art collection of Scotland * Na ...
DNA sequencing
BarraCUDA A barracuda, or cuda for short, is a large, predatory, ray-finned fish known for its fearsome appearance and ferocious behaviour. The barracuda is a saltwater fish of the genus ''Sphyraena'', the only genus in the family Sphyraenidae, which was ...
* Distributed calculations, such as predicting the native conformation of
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respo ...
* Medical analysis simulations, for example
virtual reality Virtual reality (VR) is a simulated experience that employs pose tracking and 3D near-eye displays to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment (particularly video games), educ ...
based on CT and
MRI Magnetic resonance imaging (MRI) is a medical imaging technique used in radiology to form pictures of the anatomy and the physiological processes of the body. MRI scanners use strong magnetic fields, magnetic field gradients, and radio waves ...
scan images * Physical simulations, in particular in
fluid dynamics In physics and engineering, fluid dynamics is a subdiscipline of fluid mechanics that describes the flow of fluids— liquids and gases. It has several subdisciplines, including ''aerodynamics'' (the study of air and other gases in motion) an ...
*
Neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
training in
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
problems *
Face recognition A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces. Such a system is typically employed to authenticate users through ID verification services, and wo ...
*
Volunteer computing Volunteer computing is a type of distributed computing in which people donate their computers' unused resources to a research-oriented project, and sometimes in exchange for credit points. The fundamental idea behind it is that a modern desktop co ...
projects, such as SETI@home and other projects using
BOINC The Berkeley Open Infrastructure for Network Computing (BOINC, pronounced – rhymes with "oink") is an open-source middleware system for volunteer computing (a type of distributed computing). Developed originally to support SETI@home, it beca ...
software *
Molecular dynamics Molecular dynamics (MD) is a computer simulation method for analyzing the physical movements of atoms and molecules. The atoms and molecules are allowed to interact for a fixed period of time, giving a view of the dynamic "evolution" of the ...
* Mining
cryptocurrencies A cryptocurrency, crypto-currency, or crypto is a digital currency designed to work as a medium of exchange through a computer network that is not reliant on any central authority, such as a government or bank A bank is a financial i ...
*
Structure from motion Structure from motion (SfM) is a photogrammetric range imaging technique for estimating three-dimensional structures from two-dimensional image sequences that may be coupled with local motion signals. It is studied in the fields of computer visio ...
(SfM) software


See also

*
SYCL SYCL is a higher-level programming model to improve programming productivity on various hardware accelerators. It is a single-source embedded domain-specific language (eDSL) based on pure C++17. It is a standard developed by Khronos Group, anno ...
– an open standard from
Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computation ...
for programming a variety of platforms, including GPUs, with ''single-source'' modern C++, similar to higher-level CUDA Runtime API (''single-source'') *
BrookGPU The Brook programming language and its implementation BrookGPU were early and influential attempts to enable general-purpose computing on graphics processing units. Brook, developed at Stanford University graphics group, was a compiler and runtim ...
– the Stanford University graphics group's compiler *
Array programming In computer science, array programming refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings. Modern programming languages that s ...
*
Parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
*
Stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
* rCUDA – an API for computing on remote computers *
Molecular modeling on GPU Molecular modeling on GPU is the technique of using a graphics processing unit (GPU) for molecular simulations. In 2007, NVIDIA introduced video cards that could be used not only to show graphics but also for scientific calculations. These cards ...
*
Vulkan Vulkan is a low- overhead, cross-platform API, open standard for 3D graphics and computing. Vulkan targets high-performance real-time 3D graphics applications, such as video games and interactive media. Vulkan is intended to offer higher perfor ...
– low-level, high-performance 3D graphics and computing API *
OptiX Nvidia OptiX (OptiX Application Acceleration Engine) is a ray tracing API that was first developed around 2009. The computations are offloaded to the GPUs through either the low-level or the high-level API introduced with CUDA. CUDA is only av ...
– ray tracing API by NVIDIA * CUDA binary (cubin) – a type of fat binary


References


External links

* __FORCETOC__ {{DEFAULTSORT:Cuda Computer physics engines GPGPU GPGPU libraries Graphics hardware Nvidia software Parallel computing Graphics cards Video game hardware