AMD FireStream was
AMD's brand name for their
Radeon
Radeon () is a brand of computer products, including graphics processing units, random-access memory, RAM disk software, and solid-state drives, produced by Radeon Technologies Group, a division of AMD. The brand was launched in 2000 by ATI Tec ...
-based product line targeting
stream processing and/or
GPGPU in
supercomputers. Originally developed by
ATI Technologies
ATI Technologies Inc. (commonly called ATI) was a Canadian semiconductor technology corporation based in Markham, Ontario, that specialized in the development of graphics processing units and chipsets. Founded in 1985 as Array Technology Inc., ...
around the
Radeon X1900 XTX in 2006, the product line was previously branded as both ATI FireSTREAM and AMD Stream Processor. The AMD FireStream can also be used as a
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
co-processor
A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I ...
for offloading CPU calculations, which is part of the
Torrenza initiative. The FireStream line has been discontinued since 2012, when GPGPU workloads were entirely folded into the
AMD FirePro line.
Overview
The FireStream line is a series of add-on
expansion card
In computing, an expansion card (also called an expansion board, adapter card, peripheral card or accessory card) is a printed circuit board that can be inserted into an electrical connector, or expansion slot (also referred to as a bus sl ...
s released from 2006 to 2010, based on standard Radeon GPUs but designed to serve as a general-purpose
co-processor
A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I ...
, rather than rendering and outputting 3D graphics. Like the
FireGL/FirePro line, they were given more memory and memory bandwidth, but the FireStream cards do not necessarily have video output ports. All support 32-bit
single-precision
Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
A floati ...
floating point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be r ...
, and all but the first release support 64-bit
double-precision. The line was partnered with new APIs to provide higher performance than existing
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve ha ...
and
Direct3D
Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...
shader APIs could provide, beginning with
Close to Metal, followed by
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...
and the Stream Computing SDK, and eventually integrated into the
APP SDK.
For highly parallel floating point math workloads, the cards can speed up large computations by more than 10 times; Folding@Home, the earliest and one of the most visible users of the GPGPU, obtained 20-40 times the CPU performance.
Each pixel and vertex shader, or unified shader in later models, can perform arbitrary floating-point calculations.
History
Following the release of the
Radeon R520 and
GeForce G70 GPU cores with
programmable shaders, the large floating-point throughput drew attention from academic and commercial groups, experimenting with using then for non-graphics work. The interest led ATI (and
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
) to create GPGPU products — able to calculate general purpose mathematical formulas in a massively parallel way — to process heavy calculations traditionally done on
CPUs and specialized floating-point math
co-processor
A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or I ...
s. GPGPUs were projected to have immediate performance gains of a factor of 10 or more, over compared to contemporary multi-socket CPU-only calculation.
With the development of the high-performance X1900 XFX nearly finished, ATI based its first Stream Processor design on it, announcing it as the upcoming ATI FireSTREAM together with the new
Close to Metal API at SIGGRAPH 2006. The core itself was mostly unchanged, except for doubling the onboard memory and bandwidth, similar to the
FireGL V7350; new driver and software support made up most of the difference.
Folding@home began using the X1900 for general computation, using a pre-release of version 6.5 of the ATI Catalyst driver, and reported 20-40x improvement in GPU over CPU.
The first product was released in late 2006, rebranded as AMD Stream Processor after the merger with AMD.
The brand became AMD FireStream with the second generation of stream processors in 2007, based on the RV650 chip with new unified shaders and double precision support.
Asynchronous
DMA
DMA may refer to:
Arts
* ''DMA'' (magazine), a defunct dance music magazine
* Dallas Museum of Art, an art museum in Texas, US
* Danish Music Awards, an award show held in Denmark
* BT Digital Music Awards, an annual event in the UK
* Doctor of M ...
also improved performance by allowing a larger memory pool without the CPU's help. One model was released, the 9170, for the initial price of $1999. Plans included the development of a stream processor on an
MXM module by 2008, for laptop computing,
but was never released.
The third-generation quickly followed in 2008 with dramatic performance improvements from the RV770 core; the 9250 had nearly double the performance of the 9170, and became the first single-chip
teraflop processor, despite dropping the price to under $1000.
A faster sibling, the 9270, was released shortly after, for $1999.
In 2010 the final generation of FireStreams came out, the 9350 and 9370 cards, based on the Cypress chip featured in the HD 5800. This generation again doubled the performance relative to the previous, to 2 teraflops in the 9350 and 2.6 teraflops in the 9370,
and was the first built from the ground up for
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...
. This generation was also the only one to feature fully passive cooling, and active cooling was unavailable.
The Northern and Southern Islands generations were skipped, and in 2012, AMD announced that the new FirePro W (workstation) and S (server) series based on the new
Graphics Core Next architecture would take the place of FireStream cards.
Models
* FireStream 9170 include
Direct3D
Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...
10.1,
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve ha ...
3.3 and APP Stream
* FireStream 92x0 include
Direct3D
Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...
10.1,
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve ha ...
3.3 and
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...
1.0
* FireStream 93x0 include
Direct3D
Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...
11,
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve ha ...
4.3 and
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...
1.2 with Last Driver updates
Software
The AMD FireStream was launched with a wide range of software platform support. One of the supporting firms was
PeakStream (acquired by
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
in June 2007), who was first to provide an open
beta
Beta (, ; uppercase , lowercase , or cursive ; grc, βῆτα, bē̂ta or ell, βήτα, víta) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Modern Greek, it represents the voiced labi ...
version of software to support CTM and AMD FireStream as well as
x86 and
Cell (Cell Broadband Engine) processors. The FireStream was claimed to be 20 times faster in typical applications than regular CPUs after running PeakStream's software .
RapidMind also provided stream processing software that worked with ATI and NVIDIA, as well as Cell processors.
Software Development Kit
After abandoning their short-lived
Close to Metal API, AMD focused on
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...
. AMD first released its Stream Computing
SDK (v1.0), in December 2007 under the AMD
EULA
An end-user license agreement or EULA () is a legal contract between a software supplier and a customer or end-user, generally made available to the customer via a retailer acting as an intermediary. A EULA specifies in detail the rights and restr ...
, to be run on
Windows XP
Windows XP is a major release of Microsoft's Windows NT operating system. It was release to manufacturing, released to manufacturing on August 24, 2001, and later to retail on October 25, 2001. It is a direct upgrade to its predecessors, Wind ...
.
[AMD APP SDK download page](_blank)
an
Stream Computing SDK EULA
, retrieved December 29, 2007 The SDK includes "Brook+", an AMD hardware optimized version of the
Brook
A brook is a small river or natural stream of fresh water. It may also refer to:
Computing
*Brook, a programming language for GPU programming based on C
*Brook+, an explicit data-parallel C compiler
* BrookGPU, a framework for GPGPU programm ...
language developed by Stanford University, itself a variant of the
ANSI C
ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and th ...
(
C language),
open-sourced and optimized for stream computing. The
AMD Core Math Library (ACML) and
AMD Performance Library (APL) with optimizations for the AMD FireStream and the COBRA video library (further renamed as "Accelerated Video Transcoding" or AVT) for
video transcoding acceleration will also be included. Another important part of the SDK, the Compute Abstraction Layer (CAL), is a software development layer aimed for low-level access, through the CTM hardware interface, to the GPU architecture for performance tuning software written in various high-level
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
s.
In August 2011, AMD released version 2.5 of the ATI APP Software Development Kit,
which includes support for
OpenCL 1.1, a
parallel computing language developed by the
Khronos Group
The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computat ...
. The concept of
compute shaders, officially called DirectCompute, in
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
's next generation API called
DirectX 11 is already included in graphics drivers with DirectX 11 support.
AMD APP SDK
Benchmarks
According to an AMD-demonstrated system with two dual-core AMD
Opteron processors and two Radeon R600 GPU cores running on
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
Windows XP Professional
Windows XP, which is the next version of Windows NT after Windows 2000 and the successor to the consumer-oriented Windows Me, has been released in several editions since its original release in 2001.
Windows XP is available in many languages. In ...
, 1
teraflop (TFLOP) can be achieved by a universal multiply-add (MADD) calculation. By comparison, an Intel Core 2 Quad Q9650 3.0 GHz processor at the time could achieve 48 GFLOPS.
In a demonstration of Kaspersky SafeStream anti-virus scanning that had been optimized for AMD stream processors, was able to scan 21 times faster with the R670 based acceleration than with search running entirely on an Opteron, in 2007.
Limitations
*
Recursive functions are not supported in Brook+ because all function calls are
inlined at compile time. Using CAL, functions (recursive or otherwise) are supported to 32 levels.
[AMD Intermediate Language Reference Guide, August 2008]
*Only bilinear texture filtering is supported;
mipmap
In computer graphics, mipmaps (also MIP maps) or pyramids are pre-calculated, optimized sequences of images, each of which is a progressively lower resolution representation of the previous. The height and width of each image, or level, in the ...
ped textures and
anisotropic filtering
In 3D computer graphics, anisotropic filtering (abbreviated AF) is a method of enhancing the image quality of textures on surfaces of computer graphics that are at oblique viewing angles with respect to the camera where the projection of the ...
are not supported.
*Functions cannot have a variable number of arguments. The same problem occurs for recursive functions.
*Conversion of floating-point numbers to integers on GPUs is done differently than on x86 CPUs; it is not fully
IEEE-754 compliant.
*Doing "global synchronization" on the GPU is not very efficient, which forces the GPU to divide the
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine lea ...
and do synchronization on the CPU. Given the variable number of multiprocessors and other factors, there may not be a perfect solution to this problem.
*The bus bandwidth and latency between the CPU and the GPU may become a
bottleneck
Bottleneck literally refers to the narrowed portion (neck) of a bottle near its opening, which limit the rate of outflow, and may describe any object of a similar shape. The literal neck of a bottle was originally used to play what is now known as ...
.
See also
*
Stream Processing
*
ROCm
*
Heterogeneous System Architecture
*
NVIDIA Tesla
Nvidia Tesla was the name of Nvidia's line of products targeted at stream processing or general-purpose graphics processing units (GPGPU), named after pioneering electrical engineer Nikola Tesla. Its products began using GPUs from the G80 ser ...
similar solution by Nvidia
*
Intel Xeon Phi
Xeon Phi was a series of x86 manycore processors designed and made by Intel. It was intended for use in supercomputers, servers, and high-end workstations. Its architecture allowed use of standard programming languages and application programm ...
similar solution by Intel
*Open Computing Language (
OpenCL
OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...
) – an industry standard
*Compute Unified Device Architecture (
CUDA
CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...
) - a proprietary Nvidia-only solution
*
List of AMD graphics processing units
References
External links
ATI Stream Technology FAQATI Stream published papers and presentationsATI Stream SDKAnandTech article on distributed computingAMD Intermediate Language Reference Guide (CAL) v2.0 Feb '09
{{DEFAULTSORT:FireStream
Advanced Micro Devices products
ATI Technologies
GPGPU