Tesla is the codename for a GPU microarchitecture developed by

Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...

, and released in 2006, as the successor to

Curie In computing, a CURIE (or ''Compact URI'') defines a generic, abbreviated syntax for expressing Uniform Resource Identifiers (URIs). It is an abbreviated URI expressed in a compact syntax, and may be found in both XML and non-XML grammars. A CURIE ...

microarchitecture. It was named after the pioneering electrical engineer

Nikola Tesla Nikola Tesla ( ; ,"Tesla"
''

GeForce 8 Series,

GeForce 9 Series The GeForce 9 series is the ninth generation of Nvidia's GeForce series of graphics processing units, the first of which was released on February 21, 2008. Products are based on a slightly repolished Tesla microarchitecture, adding PCIe 2. ...

GeForce 100 Series The GeForce 100 series is a series of Tesla-based graphics processing units developed by Nvidia, first released in March 2009. Its cards are rebrands of GeForce 9 series cards, available only for OEMs. However, the GTS 150 was briefly available ...

GeForce 200 Series The GeForce 200 series is a series of Tesla-based GeForce graphics processing units developed by Nvidia. Architecture The GeForce 200 Series introduced Nvidia's second generation of Tesla (microarchitecture), Nvidia's unified shader architec ...

, and

GeForce 300 Series The GeForce 300 series is a series of Tesla-based graphics processing units developed by Nvidia, first released in November 2009. Its cards are rebrands of the GeForce 200 series cards, available only for OEMs. All GPUs of the series support D ...

of GPUs collectively manufactured in 90 nm, 80 nm, 65 nm, 55 nm, and 40 nm. It was also in the GeForce 405 and in the

Quadro Quadro was Nvidia's brand for graphics cards intended for use in workstations running professional computer-aided design (CAD), computer-generated imagery (CGI), digital content creation (DCC) applications, scientific calculations and machine ...

FX, Quadro x000, Quadro NVS series, and

Nvidia Tesla Nvidia Tesla was the name of Nvidia's line of products targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 ser ...

computing modules. Tesla replaced the old fixed-pipeline microarchitectures, represented at the time of introduction by the GeForce 7 series. It competed directly with AMD's first unified shader microarchitecture named TeraScale, a development of ATI's work on the

Xbox 360 The Xbox 360 is a home video game console developed by Microsoft. As the successor to the original Xbox, it is the second console in the Xbox series. It competed with Sony's PlayStation 3 and Nintendo's Wii as part of the seventh generati ...

which used a similar design. Tesla was followed by

Fermi Enrico Fermi (; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age" and ...

Overview

Tesla is Nvidia's first microarchitecture implementing the

unified shader model In the field of 3D computer graphics, the unified shader model (known in Direct3D 10 as " Shader Model 4.0") refers to a form of shader hardware in a graphical processing unit (GPU) where all of the shader stages in the rendering pipeline (geome ...

. The driver supports Direct3D 10

Shader Model The High-Level Shader Language or High-Level Shading Language (HLSL) is a proprietary shading language developed by Microsoft for the Direct3D 9 API to augment the shader assembly language, and went on to become the required shading language ...

4.0 / OpenGL 2.1 (later drivers have OpenGL 3.3 support) architecture. The design is a major shift for NVIDIA in GPU functionality and capability, the most obvious change being the move from the separate functional units (pixel shaders, vertex shaders) within previous GPUs to a homogeneous collection of universal

floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...

processors (called "stream processors") that can perform a more universal set of tasks. GeForce 8's unified shader architecture consists of a number of stream processors (SPs). Unlike the

vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...

approach taken with older shader units, each SP is

scalar Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers * Scalar (physics), a physical quantity that can be described by a single element of a number field such ...

and thus can operate only on one component at a time. This makes them less complex to build while still being quite flexible and universal. Scalar shader units also have the advantage of being more efficient in a number of cases as compared to previous generation

vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...

shader units that rely on ideal instruction mixture and ordering to reach peak throughput. The lower maximum throughput of these scalar processors is compensated for by efficiency and by running them at a high clock speed (made possible by their simplicity). GeForce 8 runs the various parts of its core at differing clock speeds (clock domains), similar to the operation of the previous GeForce 7 Series GPUs. For example, the stream processors of GeForce 8800 GTX operate at a 1.35 GHz clock rate while the rest of the chip is operating at 575 MHz.Wasson, Scott
NVIDIA's GeForce 8800 graphics processor
, Tech Report, 8 November 2007. GeForce 8 performs significantly better

texture filtering In computer graphics, texture filtering or texture smoothing is the method used to determine the texture color for a texture mapped pixel, using the colors of nearby texels (pixels of the texture). There are two main categories of texture filtering ...

than its predecessors that used various optimizations and visual tricks to speed up rendering without impairing filtering quality. The GeForce 8 line correctly renders an angle-independent

anisotropic filtering In 3D computer graphics, anisotropic filtering (abbreviated AF) is a method of enhancing the image quality of textures on surfaces of computer graphics that are at oblique viewing angles with respect to the camera where the projection of the ...

algorithm along with full trilinear texture filtering. G80, though not its smaller brethren, is equipped with much more texture filtering arithmetic ability than the GeForce 7 series. This allows high-quality filtering with a much smaller performance hit than previously. NVIDIA has also introduced new polygon edge

anti-aliasing Anti-aliasing may refer to any of a number of techniques to combat the problems of aliasing in a sampled signal such as a digital image or digital audio recording. Specific topics in anti-aliasing include: * Anti-aliasing filter, a filter used be ...

methods, including the ability of the GPU's

ROPs Rops may refer to: People * Daniel-Rops (1901–1965), French writer and historian * Félicien Rops (1833–1898), Belgian artist Places * Rops (peak), a mountain in Kosovo Sports * Rovaniemen Palloseura (RoPS), a Finnish football club T ...

to perform both

Multisample anti-aliasing Multisample anti-aliasing (MSAA) is a type of spatial anti-aliasing, a technique used in computer graphics to remove jaggies. Definition The term generally refers to a special case of supersampling. Initial implementations of full-scene anti-al ...

(MSAA) and HDR lighting at the same time, correcting various limitations of previous generations. GeForce 8 can perform MSAA with both FP16 and FP32 texture formats. GeForce 8 supports 128-bit

HDR rendering High-dynamic-range rendering (HDRR or HDR rendering), also known as high-dynamic-range lighting, is the Rendering (computer graphics), rendering of computer graphics scenes by using computer graphics lighting, lighting calculations done in high dy ...

, an increase from prior cards' 64-bit support. The chip's new anti-aliasing technology, called coverage sampling AA (CSAA), uses Z, color, and coverage information to determine final pixel color. This technique of color optimization allows 16X CSAA to look crisp and sharp.Sommefeldt, Ry
NVIDIA G80: Image Quality Analysis
Beyond3D, 12 December 2006.

Performance

The claimed theoretical

single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floatin ...

processing power for Tesla-based cards given in FLOPS may be hard to reach in real-world workloads. In G80/G90/GT200, each Streaming Multiprocessor (SM) contains 8 Shader Processors (SP, or Unified Shader, or

CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...

Core) and 2 Special Function Units (SFU). Each SP can fulfill up to two single-precision operations per clock: 1 Multiply and 1 Add, using a single MAD instruction. Each SFU can fulfill up to four operations per clock: four MUL (Multiply) instructions. So one SM as a whole can execute 8 MADs (16 operations) and 8 MULs (8 operations) per clock, or 24 operations per clock, which is (relatively speaking) 3 times the number of SPs. Therefore, to calculate the theoretical dual-issue MAD+MUL performance in floating point operations per second

GFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...

] of a graphics card with SP count [''n''] and shader frequency [''f'', GHz], the formula is: ''FLOPS_sp+sfu = 3 × n × f''. However leveraging dual-issue performance like MAD+MUL is problematic: * Dual-issuing the MUL is not available in graphics mode on G80/G90,Sommefeldt, Rys
NVIDIA G80: Architecture and GPU Analysis - Page 11
Beyond3D, 8 November 2006 though it was much improved in GT200. * Not all combinations of instructions like MAD+MUL can be executed in parallel on the SP and SFU, because the SFU is rather specialized as it can only handle a specific subset of instructions: 32-bit floating point multiplication, transcendental functions, interpolation for parameter blending, reciprocal, reciprocal square root, sine, cosine, etc. * The SFU could become busy for many cycles when executing these instructions, in which case it is unavailable for dual-issuing MUL instructions. For these reasons, in order to estimate the performance of real-world workloads, it may be more helpful to ignore the SFU and to assume only 1 MAD (2 operations) per SP per cycle. In this case the formula to calculate the theoretical performance in floating point operations per second becomes: ''FLOPS_sp = 2 × n × f''. The theoretical double-precision processing power of a Tesla GPU is 1/8 of the single precision performance on GT200; there is no double precision support on G8x and G9x.

Video decompression/compression

NVDEC

NVENC

NVENC was only introduced in later chips.

Chips

* G80 * G84 * G86 * G92 * G92B * G94 * G94B * G96 * G96B * G96C * G98 * C77 * C78 * C79 * C7A * C7A-ION * ION * GT200 * GT200B * GT215 * GT216 * GT218 * C87 * C89

References

External links

{{NVIDIA GPGPU

Nvidia microarchitectures Parallel computing Graphics cards