HOME

TheInfoList



OR:

Parallel Thread Execution (PTX or NVPTX) is a low-level
parallel Parallel is a geometric term of location which may refer to: Computing * Parallel algorithm * Parallel computing * Parallel metaheuristic * Parallel (software), a UNIX utility for running programs in parallel * Parallel Sysplex, a cluster of IB ...
thread execution virtual machine and
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
used in
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
's
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach c ...
programming environment. The NVCC compiler translates code written in CUDA, a
C++ C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
-like language, into PTX instructions (an
assembly language In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
represented as
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
text), and the graphics driver contains a compiler which translates the PTX instructions into the executable binary code which can be run on the processing cores of
Nvidia GPUs This list contains general information about graphics processing units (GPUs) and video cards from Nvidia, based on official specifications. In addition some Nvidia motherboards come with integrated onboard GPUs. Limited/Special/Collectors' Editio ...
. The
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free softwa ...
also has basic ability for PTX generation in the context of
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating sy ...
offloading. Inline PTX assembly can be used in CUDA.


Registers

PTX uses an arbitrarily large register set; the output from the compiler is almost pure single-assignment form, with consecutive lines generally referring to consecutive registers. Programs start with declarations of the form .reg .u32 %r<335>; // declare 335 registers %r0, %r1, ..., %r334 of type unsigned 32-bit integer It is a three-argument assembly language, and almost all instructions explicitly list the data type (in terms of sign and width) on which they operate. Register names are preceded with a % character and constants are literal, e.g.: shr.u64 %rd14, %rd12, 32; // shift right an unsigned 64-bit integer from %rd12 by 32 positions, result in %rd14 cvt.u64.u32 %rd142, %r112; // convert an unsigned 32-bit integer to 64-bit There are predicate registers, but compiled code in shader model 1.0 uses these only in conjunction with branch commands; the conditional branch is @%p14 bra $label; // branch to $label The setp.cc.type instruction sets a predicate register to the result of comparing two registers of appropriate type, there is also a set instruction, where set.le.u32.u64 %r101, %rd12, %rd28 sets the 32-bit register %r101 to 0xffffffff if the 64-bit register %rd12 is less than or equal to the 64-bit register %rd28. Otherwise %r101 is set to 0x00000000. There are a few predefined identifiers that denote pseudoregisters. Among others, %tid, %ntid, %ctaid, and %nctaid contain, respectively, thread indices, block dimensions, block indices, and grid dimensions.


State spaces

Load (ld) and store (st) commands refer to one of several distinct state spaces (memory banks), e.g. ld.param. There are eight state spaces: * .reg : registers * .sreg : special, read-only, platform-specific registers * .const : shared, read-only memory * .global : global memory, shared by all threads * .local : local memory, private to each thread * .param : parameters passed to the kernel * .shared : memory shared between threads in a block * .tex : global texture memory (deprecated) Shared memory is declared in the PTX file via lines at the start of the form: .shared .align 8 .b8 pbatch_cache 5744 // define 15,744 bytes, aligned to an 8-byte boundary Writing kernels in PTX requires explicitly registering PTX modules via the CUDA Driver API, typically more cumbersome than using the CUDA Runtime API and Nvidia's CUDA compiler, nvcc. The GPU Ocelot project provided an API to register PTX modules alongside CUDA Runtime API kernel invocations, though the GPU Ocelot is no longer actively maintained.


See also

*
Standard Portable Intermediate Representation Standard Portable Intermediate Representation (SPIR) is an intermediate language for parallel compute and graphics by Khronos Group. It is used in multiple execution environments, including the Vulkan graphics API and the OpenCL compute API, to ...
(SPIR) * CUDA binary (cubin) – a type of fat binary


References

{{Reflist


External links


PTX ISA page on NVIDIA Developer Zone
Nvidia