Graphics Core Next (GCN) is the
codename
A code name, call sign or cryptonym is a Code word (figure of speech), code word or name used, sometimes clandestinely, to refer to another name, word, project, or person. Code names are often used for military purposes, or in espionage. They may ...
for a series of
microarchitecture
In computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as µarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular processor. A given ISA may be impl ...
s and an
instruction set architecture
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
that were developed by
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
for its
GPUs
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
as the successor to its
TeraScale microarchitecture. The first product featuring GCN was launched on January 9, 2012.
GCN is a
reduced instruction set
In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comput ...
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
microarchitecture contrasting the
very long instruction word
Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
SIMD architecture of TeraScale. GCN requires considerably more
transistors
upright=1.4, gate (G), body (B), source (S) and drain (D) terminals. The gate is separated from the body by an insulating layer (pink).
A transistor is a semiconductor device used to Electronic amplifier, amplify or electronic switch, switch e ...
than TeraScale, but offers advantages for
general-purpose GPU (GPGPU) computation due to a simpler
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
.
GCN graphics chips were
fabricated with
CMOS
Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss", ) is a type of metal–oxide–semiconductor field-effect transistor (MOSFET) fabrication process that uses complementary and symmetrical pairs of p-type and n-type MOSFE ...
at 28 nm, and with
FinFET
A fin field-effect transistor (FinFET) is a multigate device, a MOSFET (metal-oxide-semiconductor field-effect transistor) built on a substrate where the gate is placed on two, three, or four sides of the channel or wrapped around the channel, f ...
at
14 nm
The 14 nm process refers to the MOSFET technology node that is the successor to the 22nm (or 20nm) node. The 14nm was so named by the International Technology Roadmap for Semiconductors (ITRS). Until about 2011, the node following 22nm was expe ...
(by
Samsung Electronics
Samsung Electronics Co., Ltd. (, sometimes shortened to SEC and stylized as SΛMSUNG) is a South Korean multinational corporation, multinational electronics corporation headquartered in Yeongtong-gu, Suwon, South Korea. It is the pinnacle of ...
and
GlobalFoundries
GlobalFoundries Inc. (GF or GloFo) is a multinational semiconductor contract manufacturing and design company incorporated in the Cayman Islands and headquartered in Malta, New York. Created by the divestiture of the manufacturing arm of AMD, th ...
) and
7 nm
In semiconductor manufacturing, the International Technology Roadmap for Semiconductors defines the 7 nm process as the MOSFET technology node following the 10 nm node. It is based on FinFET (fin field-effect transistor) technology, ...
(by
TSMC
Taiwan Semiconductor Manufacturing Company Limited (TSMC; also called Taiwan Semiconductor) is a Taiwanese multinational corporation, multinational semiconductor contract manufacturing and design company. It is the world's most valuable semicon ...
), available on selected models in AMD's
Radeon
Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tech ...
HD 7000,
HD 8000,
200
__NOTOC__
Year 200 ( CC) was a leap year starting on Tuesday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Severus and Victorinus (or, less frequently, year 953 '' Ab ur ...
,
300
__NOTOC__
Year 300 (Roman numerals, CCC) was a leap year starting on Monday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Constantius and Valerius (or, less frequently, ...
,
400
__NOTOC__
Year 400 ( CD) was a leap year starting on Sunday (link will display the full calendar) of the Julian calendar. In the Roman Empire, it was known as the Year of the Consulship of Stilicho and Aurelianus (or, less frequently, year 11 ...
,
500 and
Vega
Vega is the brightest star in the northern constellation of Lyra. It has the Bayer designation α Lyrae, which is Latinised to Alpha Lyrae and abbreviated Alpha Lyr or α Lyr. This star is relatively close at only from the Sun, an ...
series of graphics cards, including the separately released Radeon VII. GCN was also used in the graphics portion of
Accelerated Processing Unit
AMD Accelerated Processing Unit (APU), formerly known as Fusion, is a series of 64-bit microprocessors from Advanced Micro Devices (AMD), combining a general-purpose AMD64 central processing unit ( CPU) and integrated graphics processing unit ...
s (APUs), such as those in the
PlayStation 4
The PlayStation 4 (PS4) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 3 in February 2013, it was launched on November 15, 2013, in North America, November 29, 2013 in ...
and
Xbox One
The Xbox One is a home video game console developed by Microsoft. Announced in May 2013, it is the successor to Xbox 360 and the third base console in the Xbox series of video game consoles. It was first released in North America, parts of ...
.
Instruction set
The GCN instruction set is owned by AMD and was developed specifically for GPUs. It has no
micro-operation
In computer central processing units, micro-operations (also known as micro-ops or μops, historically also as micro-actions) are detailed low-level instructions used in some designs to implement complex machine instructions (sometimes termed m ...
for
division
Division or divider may refer to:
Mathematics
*Division (mathematics), the inverse of multiplication
*Division algorithm, a method for computing the result of mathematical division
Military
*Division (military), a formation typically consisting ...
.
Documentation
Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance and use. As a form of knowledge manageme ...
is available for:
* th
Graphics Core Next 1 instruction set
* th
Graphics Core Next 2 instruction set
* th
Graphics Core Next 3 and 4 instruction sets
* th
Graphics Core Next 5 instruction set and
* th
"Vega" 7nm instruction set architecture(also referred to as Graphics Core Next 5.1).
An
LLVM compiler back end is available for the GCN instruction set. It is used by
Mesa 3D
Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications. Mesa translates these specifications to vendor-specific graphics hardware drivers.
Its most im ...
.
GNU Compiler Collection
The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
9 supports GCN 3 and GCN 5 since 2019 for single-threaded, stand-alone programs, with GCC 10 also offloading via
OpenMP
OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating syste ...
and
OpenACC
OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems.
As in OpenMP, the programm ...
.
MIAOW is an open-source
RTL implementation of the AMD
Southern Islands
The Southern Islands is a planning area consisting of a collection of islets located within the Central Region of Singapore, once home to the native Malay islanders and sea nomads before they were relocated to the mainland for urban redevelopm ...
GPGPU microarchitecture.
In November 2015, AMD announced its Boltzmann Initiative, which aims to enable the porting of
CUDA
CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ca ...
-based applications to a common
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
programming model.
At the Super Computing 15 event, AMD displayed a Heterogeneous Compute Compiler (HCC), a
headless Linux
Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
driver and
HSA runtime infrastructure for cluster-class high-performance computing, and a Heterogeneous-compute Interface for Portability (HIP) tool for porting CUDA applications to the aforementioned common C++ model.
Microarchitectures
As of July 2017, the Graphics Core Next instruction set has seen five iterations. The differences between the first four generations are rather minimal, but the fifth-generation GCN architecture features heavily modified stream processors to improve performance and support the simultaneous processing of two lower-precision numbers in place of a single higher-precision number.
Command processing
Graphics Command Processor
The Graphics Command Processor (GCP) is a functional unit of the GCN microarchitecture. Among other tasks, it is responsible for the handling of asynchronous
shaders
In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the Rendering (computer graphics), rendering of a 3D scene - a process known as ''shading''. Shaders have evolved ...
.
Asynchronous Compute Engine
The Asynchronous Compute Engine (ACE) is a distinct functional block serving computing purposes, whose purpose is similar to that of the Graphics Command Processor.
Schedulers
Since the third iteration of GCN, the hardware contains two
schedulers: one to schedule "wavefronts" during shader execution (the CU Scheduler, or Compute Unit Scheduler) and the other to schedule execution of draw and compute queues. The latter helps performance by executing compute operations when the compute units (CUs) are underutilized due to graphics commands limited by fixed function pipeline speed or bandwidth. This functionality is known as Async Compute.
For a given shader, the GPU drivers may also schedule instructions on the
CPU to minimize latency.
Geometric processor
The geometry processor contains a Geometry Assembler, a Tesselator, and a Vertex Assembler.
The Tesselator is capable of doing
tessellation
A tessellation or tiling is the covering of a surface, often a plane (mathematics), plane, using one or more geometric shapes, called ''tiles'', with no overlaps and no gaps. In mathematics, tessellation can be generalized to high-dimensional ...
in hardware as defined by
Direct3D
Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware a ...
11 and
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
4.5 (see AMD January 21, 2017), and succeeded
ATI TruForm
ATI TruForm was a brand by ATI (now AMD) for a SIP block capable of doing a graphics procedure called tessellation in computer hardware. ATI TruForm was included into Radeon 8500 (available from August 2001 on) and newer products.
The successor ...
and hardware tessellation in TeraScale as AMD's then-latest
semiconductor intellectual property core
In electronic design, a semiconductor intellectual property core (SIP core), IP core, or IP block is a reusable unit of logic, cell, or integrated circuit layout design that is the intellectual property of one party. IP cores can be licensed to ...
.
Compute units
One compute unit (CU) combines 64 shader processors with 4
texture mapping units (TMUs).
The compute units are separate from, but feed into, the
render output unit
In computer graphics, the render output unit (ROP) or raster operations pipeline is a hardware component in modern graphics processing units (GPUs) and one of the final steps in the rendering process of modern graphics cards. The pixel pipelines ...
s (ROPs).
Each compute unit consists of the following:
* a CU scheduler
* a Branch & Message Unit
* 4 16-lane-wide SIMD Vector Units (SIMD-VUs)
* 4 64 KiB
vector general-purpose register (VGPR) files
* 1
scalar unit (SU)
* a 4 KiB
GPR file
* a local data share of 64 KiB
* 4 Texture Filter Units
* 16 Texture Fetch Load/Store Units
* a 16 KiB level 1 (L1)
cache
Cache, caching, or caché may refer to:
Places United States
* Cache, Idaho, an unincorporated community
* Cache, Illinois, an unincorporated community
* Cache, Oklahoma, a city in Comanche County
* Cache, Utah, Cache County, Utah
* Cache Count ...
Four Compute units are wired to share a 16KiB L1 instruction cache and a 32KiB L1 data cache, both of which are read-only. A SIMD-VU operates on 16 elements at a time (per cycle), while a SU can operate on one a time (one/cycle). In addition, the SU handles some other operations, such as branching.
Every SIMD-VU has some private memory where it stores its registers. There are two types of registers: scalar registers (S0, S1, etc.), which hold 4 bytes number each, and vector registers (V0, V1, etc.), which each represent a set of 64 4-byte numbers. On the vector registers, every operation is done in parallel on the 64 numbers. which correspond to 64 inputs. For example, it may work on 64 different pixels at a time (for each of them the inputs are slightly different, and thus you get slightly different color at the end).
Every SIMD-VU has room for 512 scalar registers and 256 vector registers.
CU scheduler
The CU scheduler is the hardware functional block, choosing which wavefronts the SIMD-VU executes. It picks one SIMD-VU per cycle for scheduling. This is not to be confused with other hardware or software schedulers.
Wavefront
A
shader
In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of spec ...
is a small program written in
GLSL
OpenGL Shading Language (GLSL) is a high-level shading language with a syntax based on the C programming language. It was created by the OpenGL ARB (OpenGL Architecture Review Board) to give developers more direct control of the graphics pipeli ...
that performs graphics processing, and a
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine learnin ...
is a small program written in OpenCL that performs GPGPU processing. These processes don't need that many registers, but they do need to load data from system or graphics memory. This operation comes with significant latency. AMD and Nvidia chose similar approaches to hide this unavoidable latency: the grouping of multiple
threads. AMD calls such a group a "wavefront", whereas Nvidia calls it a "warp". A group of threads is the most basic unit of scheduling of GPUs that implement this approach to hide latency. It is the minimum size of the data processed in SIMD fashion, the smallest executable unit of code, and the way to processes a single instruction over all of the threads in it at the same time.
In all GCN GPUs, a "wavefront" consists of 64 threads, and in all Nvidia GPUs, a "warp" consists of 32 threads.
AMD's solution is to attribute multiple wavefronts to each SIMD-VU. The hardware distributes the registers to the different wavefronts, and when one wavefront is waiting on some result, which lies in memory, the CU Scheduler assigns the SIMD-VU another wavefront. Wavefronts are attributed per SIMD-VU. SIMD-VUs do not exchange wavefronts. A maximum of 10 wavefronts can be attributed per SIMD-VU (thus 40 per CU).
AMD CodeXL shows tables with the relationship between number of SGPRs and VGPRs to the number of wavefronts, but essentially, for SGPRS it is between 104 and 512 per number of wavefronts, and for VGPRS it is 256 per number of wavefronts.
Note that in conjunction with the
SSE instructions, this concept of the most basic level of parallelism is often called a "vector width". The vector width is characterized by the total number of bits in it.
SIMD Vector Unit
Each SIMD Vector Unit has:
* a 16-lane integer and floating point vector
Arithmetic Logic Unit
In computing, an arithmetic logic unit (ALU) is a Combinational logic, combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on ...
(ALU)
* 64 KiB Vector
General Purpose Register
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
(VGPR) file
* A 48-bit
Program Counter
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
* Instruction buffer for 10 wavefronts (each wavefront is a group of 64 threads, or the size of one logical VGPR)
* A 64-thread wavefront issues to a 16-lane SIMD Unit over four cycles
Each SIMD-VU has 10 wavefront instruction buffers, and it takes 4 cycles to execute one wavefront.
Audio and video acceleration blocks
Many implementations of GCN are typically accompanied by several of AMD's other
ASIC
An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
blocks. Including but not limited to the
Unified Video Decoder
Unified Video Decoder (UVD, previously called Universal Video Decoder) is the name given to AMD's dedicated video decoding ASIC. There are multiple versions implementing a multitude of video codecs, such as H.264 and VC-1.
UVD was introduced with ...
,
Video Coding Engine
Video Code Engine (VCE, was earlier referred to as Video Coding Engine, Video Compression Engine or Video Codec Engine in official AMD documentation) is AMD's video encoding application-specific integrated circuit implementing the video codec H. ...
, and
AMD TrueAudio
TrueAudio is the name given to AMD's ASIC intended to serve as dedicated co-processor for the calculations of computationally expensive advanced audio signal processing, like e.g. convolution reverberation effects and 3D audio effects. TrueAud ...
.
Video Coding Engine
The Video Coding Engine is a
video encoding
In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compression ...
ASIC
An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
, first introduced with the
Radeon HD 7000 Series
The Radeon HD 7000 series, codenamed "Southern Islands", is a family of GPUs developed by AMD, and manufactured on TSMC's 28 nm process. The primary competitor of Southern Islands, Nvidia's GeForce 600 Series (also manufactured at TSMC), a ...
.
The initial version of the VCE added support for encoding I and P frames
H.264
Advanced Video Coding (AVC), also referred to as H.264 or MPEG-4 Part 10, is a video compression standard based on block-oriented, motion-compensated coding. It is by far the most commonly used format for the recording, compression, and distr ...
in the
YUV420 pixel format, along with SVE temporal encode and Display Encode Mode, while the second version added B-frame support for YUV420 and YUV444 I-frames.
VCE 3.0 formed a part of the third generation of GCN, adding high-quality video scaling and the
HEVC
High Efficiency Video Coding (HEVC), also known as H.265 and MPEG-H Part 2, is a video compression standard designed as part of the MPEG-H project as a successor to the widely used Advanced Video Coding (AVC, H.264, or MPEG-4 Part 10). In compari ...
(H.265) codec.
VCE 4.0 was part of the Vega architecture, and was subsequently succeeded by
Video Core Next
Video Core Next is AMD's brand for its dedicated video encoding and decoding hardware core. It is a family of hardware accelerator designs for encoding and decoding video, and is built into AMD's GPUs and APUs since AMD Raven Ridge, released Janu ...
.
TrueAudio
Unified virtual memory
In a preview in 2011,
AnandTech
''AnandTech'' is an online computer hardware magazine owned by Future plc. It was founded in 1997 by then-14-year-old Anand Lal Shimpi, who served as CEO and editor-in-chief until August 30, 2014, with Ryan Smith replacing him as editor-in-chief ...
wrote about the unified virtual memory, supported by Graphics Core Next.
Heterogeneous System Architecture (HSA)
Some of the specific
HSA features implemented in the hardware need support from the operating system's
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine learnin ...
(its subsystems) and/or from specific device drivers. For example, in July 2014, AMD published a set of 83 patches to be merged into
Linux kernel mainline
The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU oper ...
3.17 for supporting their Graphics Core Next-based
Radeon
Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tech ...
graphics cards. The so-called HSA kernel driver resides in the directory , while the
DRM
DRM may refer to:
Government, military and politics
* Defense reform movement, U.S. campaign inspired by Col. John Boyd
* Democratic Republic of Madagascar, a former socialist state (1975–1992) on Madagascar
* Direction du renseignement militai ...
graphics device drivers reside in and augment the already existing DRM drivers for Radeon cards. This very first implementation focuses on a single
"Kaveri" APU and works alongside the existing Radeon kernel graphics driver (kgd).
Lossless Delta Color Compression
Hardware schedulers
Hardware schedulers are used to perform scheduling
and offload the assignment of compute queues to the ACEs from the driver to hardware, by buffering these queues until there is at least one empty queue in at least one ACE. This causes the HWS to immediately assign buffered queues to the ACEs until all queues are full or there are no more queues to safely assign.
Part of the scheduling work performed includes prioritized queues which allow critical tasks to run at a higher priority than other tasks without requiring the lower priority tasks to be preempted to run the high priority task, therefore allowing the tasks to run concurrently with the high priority tasks scheduled to hog the GPU as much as possible while letting other tasks use the resources that the high priority tasks are not using.
These are essentially Asynchronous Compute Engines that lack dispatch controllers.
They were first introduced in the fourth generation GCN microarchitecture,
but were present in the third generation GCN microarchitecture for internal testing purposes. A driver update has enabled the hardware schedulers in third generation GCN parts for production use.
Primitive Discard Accelerator
This unit discards
degenerate triangles before they enter the vertex shader and triangles that do not cover any fragments before they enter the fragment shader.
This unit was introduced with the fourth generation GCN microarchitecture.
Generations
Graphics Core Next 1
The GCN 1 microarchitecture was used in several
Radeon HD 7000 series
The Radeon HD 7000 series, codenamed "Southern Islands", is a family of GPUs developed by AMD, and manufactured on TSMC's 28 nm process. The primary competitor of Southern Islands, Nvidia's GeForce 600 Series (also manufactured at TSMC), a ...
graphics cards.
* support for 64-bit addressing (
x86-64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
address space) with unified address space for CPU and GPU
** support for
PCI-E 3.0
** GPU sends
interrupt request
In a computer, an interrupt request (or IRQ) is a hardware signal sent to the processor that temporarily stops a running program and allows a special program, an interrupt handler, to run instead. Hardware interrupts are used to handle events s ...
s to CPU on various events (such as
page fault
In computing, a page fault (sometimes called PF or hard fault) is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to t ...
s)
* support for Partially Resident Textures, which enable virtual memory support through
DirectX
Microsoft DirectX is a collection of application programming interfaces (APIs) for handling tasks related to multimedia, especially game programming and video, on Microsoft platforms. Originally, the names of these APIs all began with "Direct", ...
and
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve hardwa ...
extensions
*
AMD PowerTune
AMD PowerTune is a series of dynamic frequency scaling technologies built into some AMD GPUs and APUs that allow the clock speed of the processor to be dynamically changed (to different ''P-states'') by software. This allows the processor to mee ...
support, which dynamically adjusts performance to stay within a specific TDP
* support for
Mantle (API)
Mantle was a low-overhead rendering API targeted at 3D video games. AMD originally developed Mantle in cooperation with DICE, starting in 2013. Mantle was designed as an alternative to Direct3D and OpenGL, primarily for use on personal computers, ...
There are Asynchronous Compute Engines controlling computation and dispatching.
ZeroCore Power
ZeroCore Power is a long idle power saving technology, shutting off functional units of the GPU when not in use. AMD ZeroCore Power technology supplements
AMD PowerTune
AMD PowerTune is a series of dynamic frequency scaling technologies built into some AMD GPUs and APUs that allow the clock speed of the processor to be dynamically changed (to different ''P-states'') by software. This allows the processor to mee ...
.
Chips
Discrete GPUs (Southern Islands family):
* Hainan
* Oland
* Cape Verde
* Pitcairn
* Tahiti
Graphics Core Next 2
The 2nd generation of GCN was introduced with the
Radeon HD 7790 and is also found in the
Radeon HD 8770,
R7 260/260X, R9 290/290X, R9 295X2,
R7 360, and R9 390/390X, as well as
Steamroller
A steamroller (or steam roller) is a form of road roller – a type of heavy construction machinery used for leveling surfaces, such as roads or airfields – that is powered by a steam engine. The leveling/flattening action is achieved through ...
-based
desktop "Kaveri" APUs and
mobile "Kaveri" APUs and in the
Puma-based
"Beema" and "Mullins" APUs. It has multiple advantages over the original GCN, including
FreeSync
FreeSync is an adaptive synchronization technology for LCD and OLED displays that support a variable refresh rate aimed at avoiding tearing and reducing stuttering caused by misalignment between the screen's refresh rate and the content's frame ...
support,
AMD TrueAudio
TrueAudio is the name given to AMD's ASIC intended to serve as dedicated co-processor for the calculations of computationally expensive advanced audio signal processing, like e.g. convolution reverberation effects and 3D audio effects. TrueAud ...
and a revised version of
AMD PowerTune
AMD PowerTune is a series of dynamic frequency scaling technologies built into some AMD GPUs and APUs that allow the clock speed of the processor to be dynamically changed (to different ''P-states'') by software. This allows the processor to mee ...
technology.
GCN 2nd generation introduced an entity called "Shader Engine" (SE). A Shader Engine comprises one geometry processor, up to 44 CUs (Hawaii chip), rasterizers,
ROPs
Rops may refer to:
People
* Daniel-Rops (1901–1965), French writer and historian
* Félicien Rops (1833–1898), Belgian artist
Places
* Rops (peak), a mountain in Kosovo
Sports
* Rovaniemen Palloseura (RoPS), a Finnish football club
Tech ...
, and L1 cache. Not part of a Shader Engine is the Graphics Command Processor, the 8 ACEs, the L2 cache and memory controllers as well as the audio and video accelerators, the display controllers, the 2
DMA controllers and the
PCIe
PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common mo ...
interface.
The
A10-7850K "Kaveri" contains 8 CUs (compute units) and 8 Asynchronous Compute Engines for independent scheduling and work item dispatching.
At AMD Developer Summit (APU) in November 2013 Michael Mantor presented the
Radeon R9 290X.
Chips
Discrete GPUs (Sea Islands family):
* Bonaire
* Hawaii
integrated into APUs:
* Temash
* Kabini
* Liverpool (i.e. the APU found in the PlayStation 4)
* Durango (i.e. the APU found in the Xbox One and Xbox One S)
* Kaveri
* Godavari
* Mullins
* Beema
* Carrizo-L
Graphics Core Next 3
GCN 3rd generation was introduced in 2014 with the
Radeon R9 285 and R9 M295X, which have the "Tonga" GPU. It features improved tessellation performance, lossless delta color compression to reduce memory bandwidth usage, an updated and more efficient instruction set, a new high quality scaler for video, and a new multimedia engine (video encoder/decoder). Delta color compression is supported in Mesa. However, its double precision performance is worse compared to previous generation.
Chips
discrete GPUs:
* Tonga (Volcanic Islands family), comes with
UVD
Unified Video Decoder (UVD, previously called Universal Video Decoder) is the name given to AMD's dedicated video decoding ASIC. There are multiple versions implementing a multitude of video codecs, such as H.264 and VC-1.
UVD was introduced wit ...
5.0 (Unified Video Decoder)
* Fiji (Pirate Islands family), comes with UVD 6.0 and
High Bandwidth Memory
High Bandwidth Memory (HBM) is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators ...
(HBM 1)
integrated into APUs:
* Carrizo, comes with UVD 6.0
* Bristol Ridge
* Stoney Ridge
Graphics Core Next 4
GPUs of the Arctic Islands-family were introduced in Q2 of 2016 with the
AMD Radeon 400 series
The Radeon 400 series is a series of graphics processors developed by AMD. These cards were the first to feature the Polaris GPUs, using the new 14 nm FinFET manufacturing process, developed by Samsung Electronics and licensed to GlobalFoundri ...
. The 3D-engine (i.e. GCA (Graphics and Compute array) or GFX) is identical to that found in the Tonga-chips.
But Polaris feature a newer Display Controller engine, UVD version 6.3, etc.
All Polaris-based chips other than the Polaris 30 are produced on the
14 nm
The 14 nm process refers to the MOSFET technology node that is the successor to the 22nm (or 20nm) node. The 14nm was so named by the International Technology Roadmap for Semiconductors (ITRS). Until about 2011, the node following 22nm was expe ...
FinFET
A fin field-effect transistor (FinFET) is a multigate device, a MOSFET (metal-oxide-semiconductor field-effect transistor) built on a substrate where the gate is placed on two, three, or four sides of the channel or wrapped around the channel, f ...
process, developed by
Samsung Electronics
Samsung Electronics Co., Ltd. (, sometimes shortened to SEC and stylized as SΛMSUNG) is a South Korean multinational corporation, multinational electronics corporation headquartered in Yeongtong-gu, Suwon, South Korea. It is the pinnacle of ...
and licensed to
GlobalFoundries
GlobalFoundries Inc. (GF or GloFo) is a multinational semiconductor contract manufacturing and design company incorporated in the Cayman Islands and headquartered in Malta, New York. Created by the divestiture of the manufacturing arm of AMD, th ...
. The slightly newer refreshed Polaris 30 is built on the
12 nm LP FinFET process node, developed by Samsung and GlobalFoundries. The fourth generation GCN instruction set architecture is compatible with the third generation. It is an optimization for 14 nm FinFET process enabling higher GPU clock speeds than with the 3rd GCN generation.
Architectural improvements include new hardware schedulers, a new primitive discard accelerator, a new display controller, and an updated UVD that can decode HEVC at 4K resolutions at 60 frames per second with 10 bits per color channel.
Chips
discrete GPUs:
* Polaris 10 (also codenamed
Ellesmere) found on "Radeon RX 470" and "Radeon RX 480"-branded graphics cards
* Polaris 11 (also codenamed
Baffin) found on "Radeon RX 460"-branded graphics cards (also Radeon RX 560D)
* Polaris 12 (also codenamed Lexa) found on "Radeon RX 550" and "Radeon RX 540"-branded graphics cards
* Polaris 20, which is a refreshed (
14 nm
The 14 nm process refers to the MOSFET technology node that is the successor to the 22nm (or 20nm) node. The 14nm was so named by the International Technology Roadmap for Semiconductors (ITRS). Until about 2011, the node following 22nm was expe ...
LPP
Samsung
The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
/
GloFo
GlobalFoundries Inc. (GF or GloFo) is a multinational semiconductor contract manufacturing and design company incorporated in the Cayman Islands and headquartered in Malta, New York. Created by the divestiture of the manufacturing arm of AMD, t ...
FinFET
A fin field-effect transistor (FinFET) is a multigate device, a MOSFET (metal-oxide-semiconductor field-effect transistor) built on a substrate where the gate is placed on two, three, or four sides of the channel or wrapped around the channel, f ...
process) Polaris 10 with higher clocks, used for "Radeon RX 570" and "Radeon RX 580"-branded graphics cards
* Polaris 21, which is a refreshed (14 nm LPP Samsung/GloFo FinFET process) Polaris 11, used for "Radeon RX 560"-branded graphics cards
* Polaris 22, found on "Radeon RX Vega M GH" and "Radeon RX Vega M GL"-branded graphics cards (as part of
Kaby Lake-G
Kaby Lake is Intel's codename for its seventh generation Core microprocessor family announced on August 30, 2016. Like the preceding Skylake, Kaby Lake is produced using a 14 nanometer manufacturing process technology. Breaking with Intel's p ...
)
* Polaris 23, which is a refreshed (14 nm LPP Samsung/GloFo FinFET process) Polaris 12, used for "Radeon Pro WX 3200" and "Radeon RX 540X"-branded graphics cards (also Radeon RX 640)
* Polaris 30, which is a refreshed (
12 nm LP GloFo FinFET process) Polaris 20 with higher clocks, used for "Radeon RX 590"-branded graphics cards
In addition to dedicated GPUs, Polaris is utilized in the APUs of the PlayStation 4 Pro and Xbox One X, titled "Neo" and "Scorpio", respectively.
Precision Performance
FP64 performance of all GCN 4th generation GPUs is
1/
16 of FP32 performance.
Graphics Core Next 5
AMD began releasing details of their next generation of GCN Architecture, termed the 'Next-Generation Compute Unit', in January 2017.
The new design was expected to increase
instructions per clock
In computer architecture, instructions per cycle (IPC), commonly called instructions per clock is one aspect of a processor's performance: the average number of instructions executed for each clock cycle. It is the multiplicative inverse of cycl ...
, higher
clock speeds, support for
HBM2
High Bandwidth Memory (HBM) is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators ...
, a larger memory
address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.
For software programs to save and retrieve st ...
. The discrete graphics chipsets also include "HBCC (High Bandwidth Cache Controller)", but not when integrated into APUs. Additionally, the new chips were expected to include improvements in the
Rasterisation
In computer graphics, rasterisation (British English) or rasterization (American English) is the task of taking an image described in a vector graphics format (shapes) and converting it into a raster image (a series of pixels, dots or lines, whic ...
and
Render output unit
In computer graphics, the render output unit (ROP) or raster operations pipeline is a hardware component in modern graphics processing units (GPUs) and one of the final steps in the rendering process of modern graphics cards. The pixel pipelines ...
s. The
stream processors are heavily modified from the previous generations to support packed math Rapid Pack Math technology for 8-bit, 16-bit, and 32-bit numbers. With this there is a significant performance advantage when lower precision is acceptable (for example: processing two
half-precision
In computing, half precision (sometimes called FP16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in applications w ...
numbers at the same rate as a single
single precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floating- ...
number).
Nvidia introduced tile-based rasterization and binning with
Maxwell, and this was a big reason for Maxwell's efficiency increase. In January,
AnandTech
''AnandTech'' is an online computer hardware magazine owned by Future plc. It was founded in 1997 by then-14-year-old Anand Lal Shimpi, who served as CEO and editor-in-chief until August 30, 2014, with Ryan Smith replacing him as editor-in-chief ...
assumed that Vega would finally catch up with Nvidia regarding energy efficiency optimizations due to the new "DSBR (Draw Stream Binning Rasterizer)" to be introduced with Vega.
It also added support for a new
shader
In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of spec ...
stage – Primitive Shaders.
Primitive shaders provide more flexible geometry processing and replace the
vertex
Vertex, vertices or vertexes may refer to:
Science and technology Mathematics and computer science
*Vertex (geometry), a point where two or more curves, lines, or edges meet
*Vertex (computer graphics), a data structure that describes the position ...
and
geometry shaders in a rendering pipeline. As of December 2018, the Primitive shaders can't be used because required API changes are yet to be done.
Vega 10 and Vega 12 use the
14 nm
The 14 nm process refers to the MOSFET technology node that is the successor to the 22nm (or 20nm) node. The 14nm was so named by the International Technology Roadmap for Semiconductors (ITRS). Until about 2011, the node following 22nm was expe ...
FinFET
A fin field-effect transistor (FinFET) is a multigate device, a MOSFET (metal-oxide-semiconductor field-effect transistor) built on a substrate where the gate is placed on two, three, or four sides of the channel or wrapped around the channel, f ...
process, developed by
Samsung Electronics
Samsung Electronics Co., Ltd. (, sometimes shortened to SEC and stylized as SΛMSUNG) is a South Korean multinational corporation, multinational electronics corporation headquartered in Yeongtong-gu, Suwon, South Korea. It is the pinnacle of ...
and licensed to
GlobalFoundries
GlobalFoundries Inc. (GF or GloFo) is a multinational semiconductor contract manufacturing and design company incorporated in the Cayman Islands and headquartered in Malta, New York. Created by the divestiture of the manufacturing arm of AMD, th ...
. Vega 20 uses the
7 nm
In semiconductor manufacturing, the International Technology Roadmap for Semiconductors defines the 7 nm process as the MOSFET technology node following the 10 nm node. It is based on FinFET (fin field-effect transistor) technology, ...
FinFET process developed by
TSMC
Taiwan Semiconductor Manufacturing Company Limited (TSMC; also called Taiwan Semiconductor) is a Taiwanese multinational corporation, multinational semiconductor contract manufacturing and design company. It is the world's most valuable semicon ...
.
Chips
discrete GPUs:
* Vega 10 (
14 nm
The 14 nm process refers to the MOSFET technology node that is the successor to the 22nm (or 20nm) node. The 14nm was so named by the International Technology Roadmap for Semiconductors (ITRS). Until about 2011, the node following 22nm was expe ...
Samsung
The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
/
GloFo
GlobalFoundries Inc. (GF or GloFo) is a multinational semiconductor contract manufacturing and design company incorporated in the Cayman Islands and headquartered in Malta, New York. Created by the divestiture of the manufacturing arm of AMD, t ...
FinFET
A fin field-effect transistor (FinFET) is a multigate device, a MOSFET (metal-oxide-semiconductor field-effect transistor) built on a substrate where the gate is placed on two, three, or four sides of the channel or wrapped around the channel, f ...
process) (also codenamed
Greenland
Greenland ( kl, Kalaallit Nunaat, ; da, Grønland, ) is an island country in North America that is part of the Kingdom of Denmark. It is located between the Arctic and Atlantic oceans, east of the Canadian Arctic Archipelago. Greenland is t ...
) found on "Radeon RX Vega 64", "Radeon RX Vega 56", "Radeon Vega Frontier Edition", "Radeon Pro V340", Radeon Pro WX 9100, and Radeon Pro WX 8200 graphics cards
* Vega 12 (14 nm Samsung/GloFo FinFET process) found on "Radeon Pro Vega 20" and "Radeon Pro Vega 16"-branded mobile graphics cards
* Vega 20 (
7 nm
In semiconductor manufacturing, the International Technology Roadmap for Semiconductors defines the 7 nm process as the MOSFET technology node following the 10 nm node. It is based on FinFET (fin field-effect transistor) technology, ...
TSMC
Taiwan Semiconductor Manufacturing Company Limited (TSMC; also called Taiwan Semiconductor) is a Taiwanese multinational corporation, multinational semiconductor contract manufacturing and design company. It is the world's most valuable semicon ...
FinFET process) found on "Radeon Instinct MI50" and "Radeon Instinct MI60"-branded accelerator cards, "Radeon Pro Vega II", and "Radeon VII"-branded graphics cards.
integrated into APUs:
* Raven Ridge came with VCN 1 which supersedes VCE and UVD and allows full fixed-function VP9 decode.
Precision performance
Double-precision floating-point (FP64) performance of all GCN 5th generation GPUs, except for Vega 20, is
1/
16 of FP32 performance. For Vega 20 with Radeon Instinct this is
1/
2 of FP32 performance. For Vega 20 with Radeon VII this is
1/
4 of FP32 performance.
All GCN 5th generation GPUs support
half-precision floating-point (FP16) calculations which is double of FP32 performance.
Comparison of GCN chips
* Table contains only discrete GPU chips (including mobile). APU(IGP) and console chips are not listed.
1 Old code names such as Treasure (Lexa) or Hawaii Refresh (Ellesmere) are not listed.
2 Initial launch date. Launch dates of variant chips such as Polaris 20 (April 2017) are not listed.
See also
*
List of AMD graphics processing units
The following is a list that contains general information about GPUs and video cards by AMD, including those by ATI Technologies before 2006, based on official specifications in table-form.
Field explanations
The headers in the table listed b ...
External links
Official AMD.com Graphics Core Next (GCN) website
References
{{AMD graphics
AMD microarchitectures
Computer-related introductions in 2012
GPGPU
Radeon Graphics Core Next
Parallel computing