A graphics processing unit (GPU) is a specialized

electronic circuit An electronic circuit is composed of individual electronic components, such as resistors, transistors, capacitors, inductors and diodes, connected by conductive wires or traces through which electric current can flow. It is a type of electrical ...

designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a

display device A display device is an output device for presentation of information in visual or tactile form (the latter used for example in tactile electronic displays for blind people). When the input information that is supplied has an electrical signal the ...

. GPUs are used in embedded systems, mobile phones, personal computers, workstations, and

game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to a t ...

s. Modern GPUs are efficient at manipulating computer graphics and

image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...

. Their parallel structure makes them more efficient than general-purpose central processing units (CPUs) for algorithms that process large blocks of data in parallel. In a personal computer, a GPU can be present on a video card or embedded on the

motherboard A motherboard (also called mainboard, main circuit board, mb, mboard, backplane board, base board, system board, logic board (only in Apple computers) or mobo) is the main printed circuit board (PCB) in general-purpose computers and other expand ...

. In some CPUs, they are embedded on the CPU

die Die, as a verb, refers to death, the cessation of life. Die may also refer to: Games * Die, singular of dice, small throwable objects used for producing random numbers Manufacturing * Die (integrated circuit), a rectangular piece of a semicondu ...

. In the 1970s, the term "GPU" originally stood for ''graphics processor unit'' and described a programmable processing unit independently working from the CPU and responsible for graphics manipulation and output. Later, in 1994, Sony used the term (now standing for ''graphics processing unit'') in reference to the

PlayStation is a video gaming brand that consists of five home video game consoles, two handhelds, a media center, and a smartphone, as well as an online service and multiple magazines. The brand is produced by Sony Interactive Entertainment, a divisi ...

console's Toshiba-designed Sony GPU in 1994. The term was popularized by Nvidia in 1999, who marketed the GeForce 256 as "the world's first GPU". It was presented as a "single-chip processor with integrated transform, lighting, triangle setup/clipping, and rendering engines". Rival ATI Technologies coined the term "visual processing unit" or VPU with the release of the

Radeon 9700 The R300 GPU, introduced in August 2002 and developed by ATI Technologies, is its third generation of GPU used in ''Radeon'' graphics cards. This GPU features 3D acceleration based upon Direct3D 9.0 and OpenGL 2.0, a major improvement in feat ...

in 2002.

History

1970s

Arcade system board An arcade video game takes player input from its controls, processes it through electrical or computerized components, and displays output to an electronic monitor or similar display. Most arcade video games are coin-operated, housed in an arc ...

s have been using specialized graphics circuits since the 1970s. In early video game hardware, the RAM for frame buffers was expensive, so video chips composited data together as the display was being scanned out on the monitor. A specialized

barrel shifter A barrel shifter is a digital circuit that can shift a data word by a specified number of bits without the use of any sequential logic, only pure combinational logic, i.e. it inherently provides a binary operation. It can however in theory also ...

circuit was used to help the CPU animate the framebuffer graphics for various 1970s

arcade game An arcade game or coin-op game is a coin-operated entertainment machine typically installed in public businesses such as restaurants, bars and amusement arcades. Most arcade games are presented as primarily games of skill and include arcade v ...

s from Midway and Taito, such as '' Gun Fight'' (1975), '' Sea Wolf'' (1976) and '' Space Invaders'' (1978). The Namco Galaxian arcade system in 1979 used specialized graphics hardware supporting

RGB color The RGB color model is an additive color model in which the red, green and blue primary colors of light are added together in various ways to reproduce a broad array of colors. The name of the model comes from the initials of the three additiv ...

, multi-colored sprites and tilemap backgrounds. The Galaxian hardware was widely used during the golden age of arcade video games, by game companies such as Namco, Centuri, Gremlin, Irem, Konami, Midway, Nichibutsu,

Sega is a Japanese multinational corporation, multinational video game and entertainment company headquartered in Shinagawa, Tokyo. Its international branches, Sega of America and Sega Europe, are headquartered in Irvine, California and London, r ...

and Taito. ANTIC chip on an Atari 130XE motherboard

In the home market, the Atari 2600 in 1977 used a video shifter called the Television Interface Adaptor. The

Atari 8-bit computers The Atari 8-bit family is a series of 8-bit home computers introduced by Atari, Inc. in 1979 as the Atari 400 and Atari 800. The series was successively upgraded to Atari 1200XL , Atari 600XL, Atari 800XL, Atari 65XE, Atari 130XE, Atari 800XE, ...

(1979) had ANTIC, a video processor which interpreted instructions describing a "display list"—the way the scan lines map to specific bitmapped or character modes and where the memory is stored (so there did not need to be a contiguous frame buffer). 6502 machine code

subroutine In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Functions may ...

s could be triggered on scan lines by setting a bit on a display list instruction. ANTIC also supported smooth vertical and

horizontal scrolling In computer displays, filmmaking, television production, and other kinetic displays, scrolling is sliding text, images or video across a monitor or display, vertically or horizontally. "Scrolling," as such, does not change the layout of the text ...

independent of the CPU.

1980s

The NEC µPD7220 was the first implementation of a PC graphics display processor as a single Large Scale Integration (LSI)

integrated circuit An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece (or "chip") of semiconductor material, usually silicon. Large numbers of tiny ...

chip, enabling the design of low-cost, high-performance video graphics cards such as those from Number Nine Visual Technology. It became the best-known GPU up until the mid-1980s. It was the first fully integrated VLSI (very large-scale integration) metal-oxide-semiconductor ( NMOS) graphics display processor for PCs, supported up to 1024x1024 resolution, and laid the foundations for the emerging PC graphics market. It was used in a number of graphics cards and was licensed for clones such as the Intel 82720, the first of Intel's graphics processing units. The Williams Electronics arcade games '' Robotron 2084'', '' Joust'', '' Sinistar'', and ''

Bubbles Bubble, Bubbles or The Bubble may refer to: Common uses * Bubble (physics), a globule of one substance in another, usually gas in a liquid ** Soap bubble * Economic bubble, a situation where asset prices are much higher than underlying fundame ...

'', all released in 1982, contain custom blitter chips for operating on 16-color bitmaps. In 1984,

Hitachi () is a Japanese multinational corporation, multinational Conglomerate (company), conglomerate corporation headquartered in Chiyoda, Tokyo, Japan. It is the parent company of the Hitachi Group (''Hitachi Gurūpu'') and had formed part of the Ni ...

released ARTC HD63484, the first major

CMOS Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss", ) is a type of metal–oxide–semiconductor field-effect transistor (MOSFET) fabrication process that uses complementary and symmetrical pairs of p-type and n-type MOSFE ...

graphics processor for PC. The ARTC was capable of displaying up to

4K resolution 4K resolution refers to a horizontal display resolution of approximately 4,000 pixels. Digital television and digital cinematography commonly use several different 4K resolutions. In television and consumer media, 38402160 (4K Ultra-high-definitio ...

when in monochrome mode, and it was used in a number of PC graphics cards and terminals during the late 1980s. In 1985, the Commodore Amiga featured a custom graphics chip, with a blitter unit accelerating bitmap manipulation, line draw, and area fill functions. Also included is a

coprocessor A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography o ...

with its own simple instruction set, capable of manipulating graphics hardware registers in sync with the video beam (e.g. for per-scanline palette switches, sprite multiplexing, and hardware windowing), or driving the blitter. In 1986, Texas Instruments released the TMS34010, the first fully programmable graphics processor. It could run general-purpose code, but it had a graphics-oriented instruction set. During 1990–1992, this chip became the basis of the Texas Instruments Graphics Architecture ("TIGA")

Windows accelerator A Windows accelerator was a type of Graphics processing unit for personal computers with additional acceleration features like 2D line-drawings, blitter, clipping, font caching, hardware cursor support, color expansion, linear addressing, and patt ...

cards.

In 1987, the IBM 8514 graphics system was released as one of the first video cards for IBM PC compatibles to implement fixed-function 2D primitives in electronic hardware. Sharp's X68000, released in 1987, used a custom graphics chipset with a 65,536 color palette and hardware support for sprites, scrolling, and multiple playfields, eventually serving as a development machine for

Capcom is a Japanese video game developer and video game publisher, publisher. It has created a number of List of best-selling video game franchises, multi-million-selling game franchises, with its most commercially successful being ''Resident Evil' ...

's CP System arcade board. Fujitsu later competed with the

FM Towns The is a Japanese personal computer, built by Fujitsu from February 1989 to the summer of 1997. It started as a proprietary PC variant intended for multimedia applications and PC games, but later became more compatible with IBM PC compatibles. ...

computer, released in 1989 with support for a full 16,777,216 color palette. In 1988, the first dedicated polygonal 3D graphics boards were introduced in arcades with the Namco System 21 and Taito Air System. IBM VGA 90X8941 on PS55

IBM's proprietary Video Graphics Array (VGA) display standard was introduced in 1987, with a maximum resolution of 640×480 pixels. In November 1988, NEC Home Electronics announced its creation of the Video Electronics Standards Association (VESA) to develop and promote a Super VGA (SVGA) computer display standard as a successor to IBM's proprietary VGA display standard. Super VGA enabled

graphics display resolution The graphics display resolution is the width and height dimension of an electronic visual display device, measured in pixels. This information is used for electronic devices such as a computer monitor. Certain combinations of width and height a ...

s up to 800×600 pixels, a 36% increase.

1990s

In 1991, S3 Graphics introduced the '' S3 86C911'', which its designers named after the

Porsche 911 The Porsche 911 (pronounced ''Nine Eleven'' or in german: Neunelfer) is a two-door 2+2 high performance rear-engined sports car introduced in September 1964 by Porsche AG of Stuttgart, Germany. It has a rear-mounted flat-six engine and origin ...

as an indication of the performance increase it promised. The 86C911 spawned a host of imitators: by 1995, all major PC graphics chip makers had added 2D acceleration support to their chips. By this time, fixed-function ''Windows accelerators'' had surpassed expensive general-purpose graphics coprocessors in Windows performance, and these coprocessors faded away from the PC market. Throughout the 1990s, 2D GUI acceleration continued to evolve. As manufacturing capabilities improved, so did the level of integration of graphics chips. Additional

application programming interface An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...

s (APIs) arrived for a variety of tasks, such as Microsoft's WinG graphics library for Windows 3.x, and their later DirectDraw interface for

hardware acceleration Hardware acceleration is the use of computer hardware designed to perform specific functions more efficiently when compared to software running on a general-purpose central processing unit (CPU). Any transformation of data that can be calcula ...

of 2D games within Windows 95 and later. In the early- and mid-1990s, real-time 3D graphics were becoming increasingly common in arcade, computer, and console games, which led to increasing public demand for hardware-accelerated 3D graphics. Early examples of mass-market 3D graphics hardware can be found in arcade system boards such as the Sega Model 1,

Namco System 22 The Namco System 22 is the successor to the Namco System 21 arcade system board. It debuted in 1992 with '' Sim Drive'' in Japan, followed by a worldwide debut in 1993 with ''Ridge Racer''. The System 22 was designed by Namco with assistance fr ...

, and Sega Model 2, and the fifth-generation video game consoles such as the

Saturn Saturn is the sixth planet from the Sun and the second-largest in the Solar System, after Jupiter. It is a gas giant with an average radius of about nine and a half times that of Earth. It has only one-eighth the average density of Earth; h ...

and Nintendo 64. Arcade systems such as the Sega Model 2 and SGI Onyx-based Namco Magic Edge Hornet Simulator in 1993 were capable of hardware T&L ( transform, clipping, and lighting) years before appearing in consumer graphics cards. Some systems used DSPs to accelerate transformations.

Fujitsu is a Japanese multinational information and communications technology equipment and services corporation, established in 1935 and headquartered in Tokyo. Fujitsu is the world's sixth-largest IT services provider by annual revenue, and the la ...

, which worked on the Sega Model 2 arcade system, began working on integrating T&L into a single

LSI LSI may refer to: Science and technology * Large-scale integration, integrated circuits with tens of thousands of transistors * Latent semantic indexing, a technique in natural language processing * LSI-11, an early large-scale integration com ...

solution for use in home computers in 1995; the Fujitsu Pinolite, the first 3D geometry processor for personal computers, released in 1997. The first hardware T&L GPU on

home A home, or domicile, is a space used as a permanent or semi-permanent residence for one or many humans, and sometimes various companion animals. It is a fully or semi sheltered space and can have both interior and exterior aspects to it. H ...

video game consoles was the Nintendo 64's Reality Coprocessor, released in 1996. In 1997,

Mitsubishi The is a group of autonomous Japanese multinational companies in a variety of industries. Founded by Yatarō Iwasaki in 1870, the Mitsubishi Group historically descended from the Mitsubishi zaibatsu, a unified company which existed from 1870 ...

released the 3Dpro/2MP, a fully featured GPU capable of transformation and lighting, for workstations and Windows NT desktops; ATi utilized it for their FireGL 4000

graphics card A graphics card (also called a video card, display card, graphics adapter, VGA card/VGA, video adapter, display adapter, or mistakenly GPU) is an expansion card which generates a feed of output images to a display device, such as a computer moni ...

, released in 1997. The term "GPU" was coined by Sony in reference to the 32-bit Sony GPU (designed by Toshiba) in the

video game console, released in 1994. In the PC world, notable failed first tries for low-cost 3D graphics chips were the S3 '' ViRGE'',

ATI Rage The ATI Rage (stylized as RAGE or rage) is a series of graphics chipsets developed by ATI Technologies offering graphical user interface (GUI) 2D acceleration, video acceleration, and 3D acceleration developed by ATI Technologies. It is the s ...

, and Matrox ''Mystique''. These chips were essentially previous-generation 2D accelerators with 3D features bolted on. Many were even pin-compatible with the earlier-generation chips for ease of implementation and minimal cost. Initially, performance 3D graphics were possible only with discrete boards dedicated to accelerating 3D functions (and lacking 2D GUI acceleration entirely) such as the

PowerVR PowerVR is a division of Imagination Technologies (formerly VideoLogic) that develops hardware and software for 2D and 3D rendering, and for video encoding, decoding, associated image processing and DirectX, OpenGL ES, OpenVG, and OpenCL accelera ...

and the

3dfx 3dfx Interactive was an American technology company headquartered in San Jose, California, founded in 1994, that specialized in the manufacturing of 3D graphics processing units, and later, video cards. It was a pioneer in the field from the l ...

''Voodoo''. However, as manufacturing technology continued to progress, video, 2D GUI acceleration and 3D functionality were all integrated into one chip. Rendition's ''Verite'' chipsets were among the first to do this well enough to be worthy of note. In 1997, Rendition went a step further by collaborating with Hercules and Fujitsu on a "Thriller Conspiracy" project which combined a Fujitsu FXG-1 Pinolite geometry processor with a Vérité V2200 core to create a graphics card with a full T&L engine years before Nvidia's GeForce 256. This card, designed to reduce the load placed upon the system's CPU, never made it to market.

OpenGL OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...

appeared in the early '90s as a professional graphics API, but originally suffered from performance issues which allowed the Glide API to step in and become a dominant force on the PC in the late '90s. 3dfx Glide API However, these issues were quickly overcome and the Glide API fell by the wayside. Software implementations of OpenGL were common during this time, although the influence of OpenGL eventually led to widespread hardware support. Over time, a parity emerged between features offered in hardware and those offered in OpenGL.

DirectX Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with "Direct", ...

became popular among Windows game developers during the late 90s. Unlike OpenGL, Microsoft insisted on providing strict one-to-one support of hardware. The approach made DirectX less popular as a standalone graphics API initially, since many GPUs provided their own specific features, which existing OpenGL applications were already able to benefit from, leaving DirectX often one generation behind. (See: Comparison of OpenGL and Direct3D.) Over time, Microsoft began to work more closely with hardware developers and started to target the releases of DirectX to coincide with those of the supporting graphics hardware.

Direct3D Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware a ...

5.0 was the first version of the burgeoning API to gain widespread adoption in the gaming market, and it competed directly with many more-hardware-specific, often proprietary graphics libraries, while OpenGL maintained a strong following. Direct3D 7.0 introduced support for hardware-accelerated transform and lighting (T&L) for Direct3D, while OpenGL had this capability already exposed from its inception. 3D accelerator cards moved beyond being just simple rasterizers to add another significant hardware stage to the 3D rendering pipeline. The Nvidia '' GeForce 256'' (also known as NV10) was the first consumer-level card released on the market with hardware-accelerated T&L, while professional 3D cards already had this capability. Hardware transform and lighting, both already existing features of OpenGL, came to consumer-level hardware in the '90s and set the precedent for later pixel shader and vertex shader units which were far more flexible and programmable.

2000 to 2010

Nvidia was first to produce a chip capable of programmable shading; the '' GeForce 3'' (code named NV20). Each pixel could now be processed by a short "program" that could include additional image textures as inputs, and each geometric vertex could likewise be processed by a short program before it was projected onto the screen. Used in the Xbox console, it competed with the

PlayStation 2 The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October 2000, in Europe on 24 November 2000, and in Australia on 3 ...

, which used a custom vector unit for hardware accelerated vertex processing (commonly referred to as VU0/VU1). The earliest incarnations of shader execution engines used in Xbox were not general purpose and could not execute arbitrary pixel code. Vertices and pixels were processed by different units which had their own resources with pixel shaders having much tighter constraints (being as they are executed at much higher frequencies than with vertices). Pixel shading engines were actually more akin to a highly customizable function block and didn't really "run" a program. Many of these disparities between vertex and pixel shading were not addressed until much later with the Unified Shader Model. By October 2002, with the introduction of the ATI ''

'' (also known as R300), the world's first

9.0 accelerator, pixel and vertex shaders could implement looping and lengthy floating point math, and were quickly becoming as flexible as CPUs, yet orders of magnitude faster for image-array operations. Pixel shading is often used for bump mapping, which adds texture, to make an object look shiny, dull, rough, or even round or extruded. With the introduction of the Nvidia GeForce 8 series, and then new generic stream processing unit GPUs became a more generalized computing devices. Today, parallel GPUs have begun making computational inroads against the CPU, and a subfield of research, dubbed GPU Computing or GPGPU for ''General Purpose Computing on GPU'', has found its way into fields as diverse as machine learning, oil exploration, scientific

, linear algebra,

statistics Statistics (from German language, German: ''wikt:Statistik#German, Statistik'', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of ...

3D reconstruction In computer vision and computer graphics, 3D reconstruction is the process of capturing the shape and appearance of real objects. This process can be accomplished either by active or passive methods. If the model is allowed to change its shape i ...

and even stock options pricing determination. GPGPU at the time was the precursor to what is now called a compute shader (e.g. CUDA, OpenCL, DirectCompute) and actually abused the hardware to a degree by treating the data passed to algorithms as texture maps and executing algorithms by drawing a triangle or quad with an appropriate pixel shader. This obviously entails some overheads since units like the Scan Converter are involved where they aren't really needed (nor are triangle manipulations even a concern—except to invoke the pixel shader). Nvidia's CUDA platform, first introduced in 2007, was the earliest widely adopted programming model for GPU computing. More recently OpenCL has become broadly supported. OpenCL is an open standard defined by the Khronos Group which allows for the development of code for both GPUs and CPUs with an emphasis on portability. OpenCL solutions are supported by Intel, AMD, Nvidia, and ARM, and according to a recent report by Evan's Data, OpenCL is the GPGPU development platform most widely used by developers in both the US and Asia Pacific.

2010 to present

In 2010, Nvidia began a partnership with Audi to power their cars' dashboards, using the Tegra GPUs to provide increased functionality to cars' navigation and entertainment systems. Advances in GPU technology in cars has helped push self-driving technology. AMD's Radeon HD 6000 Series cards were released in 2010 and in 2011, AMD released their 6000M Series discrete GPUs to be used in mobile devices. The Kepler line of graphics cards by Nvidia came out in 2012 and were used in the Nvidia's 600 and 700 series cards. A feature in this new GPU microarchitecture included GPU boost, a technology that adjusts the clock-speed of a video card to increase or decrease it according to its power draw. The Kepler microarchitecture was manufactured on the 28 nm process. The PS4 and Xbox One were released in 2013, they both use GPUs based on AMD's Radeon HD 7850 and 7790. Nvidia's Kepler line of GPUs was followed by the

Maxwell Maxwell may refer to: People * Maxwell (surname), including a list of people and fictional characters with the name ** James Clerk Maxwell, mathematician and physicist * Justice Maxwell (disambiguation) * Maxwell baronets, in the Baronetage o ...

line, manufactured on the same process. 28 nm chips by Nvidia were manufactured by TSMC, the Taiwan Semiconductor Manufacturing Company, that was manufacturing using the 28 nm process at the time. Compared to the 40 nm technology from the past, this new manufacturing process allowed a 20 percent boost in performance while drawing less power. Virtual reality headsets have very high system requirements. VR headset manufacturers recommended the GTX 970 and the R9 290X or better at the time of their release.

Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...

is the next generation of consumer graphics cards by Nvidia released in 2016. The GeForce 10 series of cards are under this generation of graphics cards. They are made using the 16 nm manufacturing process which improves upon previous microarchitectures. Nvidia has released one non-consumer card under the new

Volta Volta may refer to: Persons * Alessandro Volta (1745–1827), Italian physicist and inventor of the electric battery, count and eponym of the volt * Giovanni Volta (1928–2012), Italian Roman Catholic bishop * Giovanni Serafino Volta (1764–184 ...

architecture, the Titan V. Changes from the Titan XP, Pascal's high-end card, include an increase in the number of CUDA cores, the addition of tensor cores, and HBM2. Tensor cores are cores specially designed for deep learning, while high-bandwidth memory is on-die, stacked, lower-clocked memory that offers an extremely wide memory bus that is useful for the Titan V's intended purpose. To emphasize that the Titan V is not a gaming card, Nvidia removed the "GeForce GTX" suffix it adds to consumer gaming cards. On August 20, 2018, Nvidia launched the RTX 20 series GPUs that add ray-tracing cores to GPUs, improving their performance on lighting effects.

Polaris 11 The Radeon 400 series is a series of graphics processors developed by AMD. These cards were the first to feature the Polaris GPUs, using the new 14 nm FinFET manufacturing process, developed by Samsung Electronics and licensed to GlobalFoun ...

and Polaris 10 GPUs from AMD are fabricated by a 14-nanometer process. Their release results in a substantial increase in the performance per watt of AMD video cards. AMD has also released the Vega GPUs series for the high end market as a competitor to Nvidia's high end Pascal cards, also featuring HBM2 like the Titan V. In 2019, AMD released the successor to their Graphics Core Next (GCN) microarchitecture/instruction set. Dubbed as RDNA, the first product lineup featuring the first generation of RDNA was the Radeon RX 5000 series of video cards, which later launched on July 7, 2019.AMD press release: AMD.com. Retrieved October 5th, 2019 Later, the company announced that the successor to the RDNA microarchitecture would be a refresh. Dubbed as RDNA 2, the new microarchitecture was reportedly scheduled for release in Q4 2020. AMD unveiled the Radeon RX 6000 series, its next-gen RDNA 2 graphics cards with support for hardware-accelerated ray tracing at an online event on October 28, 2020. The lineup initially consists of the RX 6800, RX 6800 XT and RX 6900 XT. The RX 6800 and 6800 XT launched on November 18, 2020, with the RX 6900 XT being released on December 8, 2020. The RX 6700 XT, which is based on Navi 22, was launched on March 18, 2021. The

PlayStation 5 The PlayStation 5 (PS5) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 4 in April 2019, it was launched on November 12, 2020, in Australia, Japan, New Zealand, North Ame ...

and Xbox Series X and Series S were released in 2020, they both use GPUs based on the RDNA 2 microarchitecture with proprietary tweaks and different GPU configurations in each system's implementation.

GPU companies

Many companies have produced GPUs under a number of brand names. In 2009, Intel, Nvidia and AMD/ ATI were the market share leaders, with 49.4%, 27.8% and 20.6% market share respectively. However, those numbers include Intel's integrated graphics solutions as GPUs. Not counting those, Nvidia and AMD control nearly 100% of the market as of 2018. Their respective market shares are 66% and 33%. In addition, Matrox produces GPUs. Modern smartphones also use mostly Adreno GPUs from

Qualcomm Qualcomm () is an American multinational corporation headquartered in San Diego, California, and incorporated in Delaware. It creates semiconductors, software, and services related to wireless technology. It owns patents critical to the 5G, 4 ...

GPUs from Imagination Technologies and Mali GPUs from ARM.

Computational functions

Modern GPUs use most of their transistors to do calculations related to

3D computer graphics 3D computer graphics, or “3D graphics,” sometimes called CGI, 3D-CGI or three-dimensional computer graphics are graphics that use a three-dimensional representation of geometric data (often Cartesian) that is stored in the computer for th ...

. In addition to the 3D hardware, today's GPUs include basic 2D acceleration and framebuffer capabilities (usually with a VGA compatibility mode). Newer cards such as AMD/ATI HD5000-HD7000 even lack dedicated 2D acceleration; it has to be emulated by 3D hardware. GPUs were initially used to accelerate the memory-intensive work of texture mapping and rendering polygons, later adding units to accelerate

geometric Geometry (; ) is, with arithmetic, one of the oldest branches of mathematics. It is concerned with properties of space such as the distance, shape, size, and relative position of figures. A mathematician who works in the field of geometry is ca ...

calculations such as the

rotation Rotation, or spin, is the circular movement of an object around a '' central axis''. A two-dimensional rotating object has only one possible central axis and can rotate in either a clockwise or counterclockwise direction. A three-dimensional ...

and translation of vertices into different

coordinate system In geometry, a coordinate system is a system that uses one or more numbers, or coordinates, to uniquely determine the position of the points or other geometric elements on a manifold such as Euclidean space. The order of the coordinates is sig ...

s. Recent developments in GPUs include support for programmable shaders which can manipulate vertices and textures with many of the same operations supported by CPUs, oversampling and

interpolation In the mathematical field of numerical analysis, interpolation is a type of estimation, a method of constructing (finding) new data points based on the range of a discrete set of known data points. In engineering and science, one often has a n ...

techniques to reduce aliasing, and very high-precision color spaces. Given that most of these computations involve

matrix Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** '' The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchi ...

and vector operations, engineers and scientists have increasingly studied the use of GPUs for non-graphical calculations; they are especially suited to other embarrassingly parallel problems. Several factors of the GPU's construction enter into the performance of the card for real-time rendering. Common factors can include the size of the connector pathways in the semiconductor device fabrication, the

clock signal In electronics and especially synchronous digital circuits, a clock signal (historically also known as ''logic beat'') oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits. A clock sign ...

frequency, and the number and size of various on-chip memory caches. Additionally, the number of Streaming Multiprocessors (SM) for NVidia GPUs, or Compute Units (CU) for AMD GPUs, which describe the number of core on-silicon processor units within the GPU chip that perform the core calculations, typically working in parallel with other SM/CUs on the GPU. Performance of GPUs are typically measured in floating point operations per second or

FLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...

, with GPUs in the 2010s and 2020s typically delivering performance measured in teraflops (TFLOPS). This is an estimated performance measure as other factors can impact the actual display rate. With the emergence of deep learning, the importance of GPUs has increased. In research done by Indigo, it was found that while training deep learning neural networks, GPUs can be 250 times faster than CPUs. There has been some level of competition in this area with ASICs, most prominently the Tensor Processing Unit (TPU) made by Google. However, ASICs require changes to existing code and GPUs are still very popular.

GPU accelerated video decoding and encoding

Most GPUs made since 1995 support the YUV color space and hardware overlays, important for

digital video Digital video is an electronic representation of moving visual images (video) in the form of encoded digital data. This is in contrast to analog video, which represents moving visual images in the form of analog signals. Digital video comprises ...

playback, and many GPUs made since 2000 also support MPEG primitives such as motion compensation and iDCT. This process of hardware accelerated video decoding, where portions of the video decoding process and video post-processing are offloaded to the GPU hardware, is commonly referred to as "GPU accelerated video decoding", "GPU assisted video decoding", "GPU hardware accelerated video decoding" or "GPU hardware assisted video decoding". More recent graphics cards even decode

high-definition video High-definition video (HD video) is video of higher resolution and quality than standard-definition. While there is no standardized meaning for ''high-definition'', generally any video image with considerably more than 480 vertical scan lines (No ...

on the card, offloading the central processing unit. The most common APIs for GPU accelerated video decoding are DxVA for

Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...

operating system and VDPAU, VAAPI, XvMC, and

XvBA X-Video Bitstream Acceleration (XvBA), designed by AMD Graphics for its Radeon GPU and APU, is an arbitrary extension of the X video extension (Xv) for the X Window System on Linux operating-systems. XvBA API allows video programs to offload porti ...

for Linux-based and UNIX-like operating systems. All except XvMC are capable of decoding videos encoded with MPEG-1,

MPEG-2 MPEG-2 (a.k.a. H.222/H.262 as was defined by the ITU) is a standard for "the generic video coding format, coding of moving pictures and associated audio information". It describes a combination of Lossy compression, lossy video compression and ...

, MPEG-4 ASP (MPEG-4 Part 2), MPEG-4 AVC (H.264 / DivX 6), VC-1, WMV3/ WMV9, Xvid / OpenDivX (DivX 4), and DivX 5 codecs, while XvMC is only capable of decoding MPEG-1 and MPEG-2. There are several dedicated hardware video decoding and encoding solutions.

Video decoding processes that can be accelerated

The video decoding processes that can be accelerated by today's modern GPU hardware are: * Motion compensation (mocomp) * Inverse discrete cosine transform (iDCT) ** Inverse telecine 3:2 and 2:2 pull-down correction * Inverse

modified discrete cosine transform The modified discrete cosine transform (MDCT) is a transform based on the type-IV discrete cosine transform (DCT-IV), with the additional property of being lapped transform, lapped: it is designed to be performed on consecutive blocks of a larger ...

(iMDCT) * In-loop deblocking filter * Intra-frame prediction * Inverse quantization (IQ) * Variable-length decoding (VLD), more commonly known as slice-level acceleration * Spatial-temporal deinterlacing and automatic interlace/

progressive Progressive may refer to: Politics * Progressivism, a political philosophy in support of social reform ** Progressivism in the United States, the political philosophy in the American context * Progressive realism, an American foreign policy par ...

source detection * Bitstream processing ( Context-adaptive variable-length coding/ Context-adaptive binary arithmetic coding) and perfect pixel positioning. The above operations also have applications in video editing, encoding and transcoding

GPU forms

Terminology

In personal computers, there are two main forms of GPUs. Each has many synonyms: * '' Dedicated graphics card'' - also called ''discrete''. * '' Integrated graphics'' - also called: ''shared graphics solutions'', ''integrated graphics processors'' (IGP), or ''unified memory architecture'' (UMA).

Usage specific GPU

Most GPUs are designed for a specific usage, real-time 3D graphics or other mass calculations: # Gaming #* GeForce GTX, RTX #*

Nvidia Titan Nvidia Titan is a series of video cards developed by Nvidia including: *GTX Titan The GeForce 700 series (stylized as GEFORCE GTX 700 SERIES) is a series of graphics processing units developed by Nvidia. While mainly a refresh of the Kepler m ...

#* Radeon HD, R5, R7, R9, RX, Vega and Navi series #* Radeon VII # Cloud Gaming #* Nvidia GRID #* Radeon Sky # Workstation #* Nvidia Quadro #* Nvidia RTX #* AMD FirePro #* AMD Radeon Pro #* Intel Arc Pro # Cloud Workstation #* Nvidia Tesla #* AMD FireStream # Artificial Intelligence training and Cloud #* Nvidia Tesla #* AMD Radeon Instinct # Automated/Driverless car #* Nvidia Drive PX

Dedicated graphics cards

The GPUs of the most powerful class typically interface with the

by means of an expansion slot such as

PCI Express PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common ...

(PCIe) or Accelerated Graphics Port (AGP) and can usually be replaced or upgraded with relative ease, assuming the motherboard is capable of supporting the upgrade. A few graphics cards still use

Peripheral Component Interconnect Peripheral Component Interconnect (PCI) is a local computer bus for attaching hardware devices in a computer and is part of the PCI Local Bus standard. The PCI bus supports the functions found on a processor bus but in a standardized format th ...

(PCI) slots, but their bandwidth is so limited that they are generally used only when a PCIe or AGP slot is not available. A dedicated GPU is not necessarily removable, nor does it necessarily interface with the motherboard in a standard fashion. The term "dedicated" refers to the fact that dedicated graphics cards have RAM that is dedicated to the card's use, not to the fact that ''most'' dedicated GPUs are removable. Further, this RAM is usually specially selected for the expected serial workload of the graphics card (see GDDR). Sometimes, systems with dedicated, ''discrete'' GPUs were called "DIS" systems, as opposed to "UMA" systems (see next section). Dedicated GPUs for portable computers are most commonly interfaced through a non-standard and often proprietary slot due to size and weight constraints. Such ports may still be considered PCIe or AGP in terms of their logical host interface, even if they are not physically interchangeable with their counterparts. Technologies such as SLI and NVLink by Nvidia and CrossFire by AMD allow multiple GPUs to draw images simultaneously for a single screen, increasing the processing power available for graphics. These technologies, however, are increasingly uncommon, as most games do not fully utilize multiple GPUs, as most users cannot afford them. Multiple GPUs are still used on supercomputers (like in

Summit A summit is a point on a surface that is higher in elevation than all points immediately adjacent to it. The topography, topographic terms acme, apex, peak (mountain peak), and zenith are synonymous. The term (mountain top) is generally used ...

), on workstations to accelerate video (processing multiple videos at once) and 3D rendering, for VFX and for simulations, and in AI to expedite training, as is the case with Nvidia's lineup of DGX workstations and servers and Tesla GPUs and Intel's upcoming Ponte Vecchio GPUs.

Integrated graphics processing unit

''Integrated graphics processing unit'' (IGPU), ''Integrated graphics'', ''shared graphics solutions'', ''integrated graphics processors'' (IGP) or ''unified memory architecture'' (UMA) utilize a portion of a computer's system RAM rather than dedicated graphics memory. IGPs can be integrated onto the motherboard as part of the (northbridge) chipset, or on the same die (integrated circuit) with the CPU (like AMD APU or Intel HD Graphics). On certain motherboards, AMD's IGPs can use dedicated sideport memory. This is a separate fixed block of high performance memory that is dedicated for use by the GPU. In early 2007, computers with integrated graphics account for about 90% of all PC shipments. They are less costly to implement than dedicated graphics processing, but tend to be less capable. Historically, integrated processing was considered unfit to play 3D games or run graphically intensive programs but could run less intensive programs such as Adobe Flash. Examples of such IGPs would be offerings from SiS and VIA circa 2004. However, modern integrated graphics processors such as

AMD Accelerated Processing Unit AMD Accelerated Processing Unit (APU), formerly known as Fusion, is a series of 64-bit microprocessors from Advanced Micro Devices (AMD), combining a general-purpose AMD64 central processing unit ( CPU) and integrated graphics processing unit ...

and Intel Graphics Technology (HD, UHD, Iris, Iris Pro, Iris Plus, and Xe-LP) are more than capable of handling 2D graphics or low stress 3D graphics. Since the GPU computations are extremely memory-intensive, integrated processing may find itself competing with the CPU for the relatively slow system RAM, as it has minimal or no dedicated video memory. IGPs can have up to 29.856 GB/s of memory bandwidth from system RAM, whereas a graphics card may have up to 264 GB/s of bandwidth between its RAM and GPU core. This memory bus bandwidth can limit the performance of the GPU, though multi-channel memory can mitigate this deficiency. Older integrated graphics chipsets lacked hardware transform and lighting, but newer ones include it.

Hybrid graphics processing

This newer class of GPUs competes with integrated graphics in the low-end desktop and notebook markets. The most common implementations of this are ATI's HyperMemory and Nvidia's TurboCache. Hybrid graphics cards are somewhat more expensive than integrated graphics, but much less expensive than dedicated graphics cards. These share memory with the system and have a small dedicated memory cache, to make up for the high latency of the system RAM. Technologies within PCI Express can make this possible. While these solutions are sometimes advertised as having as much as 768 MB of RAM, this refers to how much can be shared with the system memory.

Stream processing and general purpose GPUs (GPGPU)

It is becoming increasingly common to use a general purpose graphics processing unit (GPGPU) as a modified form of stream processor (or a vector processor), running compute kernels. This concept turns the massive computational power of a modern graphics accelerator's shader pipeline into general-purpose computing power, as opposed to being hardwired solely to do graphical operations. In certain applications requiring massive vector operations, this can yield several orders of magnitude higher performance than a conventional CPU. The two largest discrete (see " Dedicated graphics cards" above) GPU designers, AMD and Nvidia, are beginning to pursue this approach with an array of applications. Both Nvidia and AMD have teamed with

Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...

to create a GPU-based client for the Folding@home distributed computing project, for protein folding calculations. In certain circumstances, the GPU calculates forty times faster than the CPUs traditionally used by such applications. GPGPU can be used for many types of embarrassingly parallel tasks including ray tracing. They are generally suited to high-throughput type computations that exhibit data-parallelism to exploit the wide vector width SIMD architecture of the GPU. Furthermore, GPU-based high performance computers are starting to play a significant role in large-scale modelling. Three of the 10 most powerful supercomputers in the world take advantage of GPU acceleration. GPUs support API extensions to the C programming language such as OpenCL and OpenMP. Furthermore, each GPU vendor introduced its own API which only works with their cards, AMD APP SDK and CUDA from AMD and Nvidia, respectively. These technologies allow specified functions called compute kernels from a normal C program to run on the GPU's stream processors. This makes it possible for C programs to take advantage of a GPU's ability to operate on large buffers in parallel, while still using the CPU when appropriate. CUDA is also the first API to allow CPU-based applications to directly access the resources of a GPU for more general purpose computing without the limitations of using a graphics API. Since 2005 there has been interest in using the performance offered by GPUs for

evolutionary computation In computer science, evolutionary computation is a family of algorithms for global optimization inspired by biological evolution, and the subfield of artificial intelligence and soft computing studying these algorithms. In technical terms, they ...

in general, and for accelerating the fitness evaluation in genetic programming in particular. Most approaches compile linear or tree programs on the host PC and transfer the executable to the GPU to be run. Typically the performance advantage is only obtained by running the single active program simultaneously on many example problems in parallel, using the GPU's SIMD architecture. However, substantial acceleration can also be obtained by not compiling the programs, and instead transferring them to the GPU, to be interpreted there. Acceleration can then be obtained by either interpreting multiple programs simultaneously, simultaneously running multiple example problems, or combinations of both. A modern GPU can readily simultaneously interpret hundreds of thousands of very small programs. Some modern workstation GPUs, such as the Nvidia Quadro workstation cards using the Volta and Turing architectures, feature dedicating processing cores for tensor-based deep learning applications. In Nvidia's current series of GPUs these cores are called Tensor Cores. These GPUs usually have significant FLOPS performance increases, utilizing 4x4 matrix multiplication and division, resulting in hardware performance up to 128 TFLOPS in some applications. These tensor cores are also supposed to appear in consumer cards running the Turing architecture, and possibly in the Navi series of consumer cards from AMD.

External GPU (eGPU)

An external GPU is a graphics processor located outside of the housing of the computer, similar to a large external hard drive. External graphics processors are sometimes used with laptop computers. Laptops might have a substantial amount of RAM and a sufficiently powerful central processing unit (CPU), but often lack a powerful graphics processor, and instead have a less powerful but more energy-efficient on-board graphics chip. On-board graphics chips are often not powerful enough for playing video games, or for other graphically intensive tasks, such as editing video or 3D animation/rendering. Therefore, it is desirable to be able to attach a GPU to some external bus of a notebook.

is the only bus used for this purpose. The port may be, for example, an ExpressCard or mPCIe port (PCIe ×1, up to 5 or 2.5 Gbit/s respectively) or a Thunderbolt 1, 2, or 3 port (PCIe ×4, up to 10, 20, or 40 Gbit/s respectively). Those ports are only available on certain notebook systems. eGPU enclosures include their own power supply (PSU), because powerful GPUs can easily consume hundreds of watts. Official vendor support for external GPUs has gained traction recently. One notable milestone was Apple's decision to officially support external GPUs with MacOS High Sierra 10.13.4. There are also several major hardware vendors (HP, Alienware, Razer) releasing Thunderbolt 3 eGPU enclosures. This support has continued to fuel eGPU implementations by enthusiasts.

Sales

In 2013, 438.3 million GPUs were shipped globally and the forecast for 2014 was 414.2 million.

References

External links

NVIDIA - What is GPU computing?
* Th
''GPU Gems'' book series

How GPUs work

GPU Caps Viewer - Video card information utility

OpenGPU-GPU Architecture(In Chinese)

ARM Mali GPUs Overview
{{DEFAULTSORT:Graphics Processing Unit GPGPU Graphics hardware Virtual reality OpenCL compute devices Artificial intelligence Application-specific integrated circuits Hardware acceleration Digital electronics Electronic design Electronic design automation