AMD FireStream was AMD's brand name for their Radeon-based product line targeting

stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views Stream (computing), streams, or sequences of events in time, as the centr ...

and/or

GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...

supercomputer A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...

s. Originally developed by

ATI Technologies ATI Technologies Inc. was a Canadian semiconductor industry, semiconductor technology corporation based in Markham, Ontario, that specialized in the development of graphics processing units and chipsets. Founded in 1985, the company listed pub ...

around the Radeon X1900 XTX in 2006, the product line was previously branded as both ATI FireSTREAM and AMD Stream Processor. The AMD FireStream can also be used as a

floating-point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...

co-processor A coprocessor is a computer processor used to supplement the functions of the primary processor (the CPU). Operations performed by the coprocessor may be floating-point arithmetic, graphics, signal processing, string processing, cryptography or ...

for offloading CPU calculations, which is part of the Torrenza initiative. The FireStream line has been discontinued since 2012, when GPGPU workloads were entirely folded into the AMD FirePro line.

Overview

The FireStream line is a series of add-on

expansion card In computing, an expansion card (also called an expansion board, adapter card, peripheral card or accessory card) is a printed circuit board that can be inserted into an electrical connector, or expansion slot (also referred to as a bus sl ...

s released from 2006 to 2010, based on standard Radeon GPUs but designed to serve as a general-purpose

, rather than rendering and outputting 3D graphics. Like the FireGL/FirePro line, they were given more memory and memory bandwidth, but the FireStream cards do not necessarily have video output ports. All support 32-bit single-precision

floating point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a signed sequence of a fixed number of digits in some base) multiplied by an integer power of that base. Numbers of this form ...

, and all but the first release support 64-bit double-precision. The line was partnered with new APIs to provide higher performance than existing

OpenGL OpenGL (Open Graphics Library) is a Language-independent specification, cross-language, cross-platform application programming interface (API) for rendering 2D computer graphics, 2D and 3D computer graphics, 3D vector graphics. The API is typic ...

and

Direct3D Direct3D is a graphics application programming interface (API) for Microsoft Windows. Part of DirectX, Direct3D is used to render three-dimensional graphics in applications where performance is important, such as games. Direct3D uses hardware ...

shader APIs could provide, beginning with Close to Metal, followed by

OpenCL OpenCL (Open Computing Language) is a software framework, framework for writing programs that execute across heterogeneous computing, heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), di ...

and the Stream Computing SDK, and eventually integrated into the APP SDK. For highly parallel floating point math workloads, the cards can speed up large computations by more than 10 times; Folding@Home, the earliest and one of the most visible users of the GPGPU, obtained 20-40 times the CPU performance. Each pixel and vertex shader, or unified shader in later models, can perform arbitrary floating-point calculations.

History

Following the release of the Radeon R520 and GeForce G70 GPU cores with programmable shaders, the large floating-point throughput drew attention from academic and commercial groups, experimenting with using then for non-graphics work. The interest led ATI (and

Nvidia Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...

) to create GPGPU products — able to calculate general purpose mathematical formulas in a massively parallel way — to process heavy calculations traditionally done on CPUs and specialized floating-point math

s. GPGPUs were projected to have immediate performance gains of a factor of 10 or more, over compared to contemporary multi-socket CPU-only calculation. With the development of the high-performance X1900 XFX nearly finished, ATI based its first Stream Processor design on it, announcing it as the upcoming ATI FireSTREAM together with the new Close to Metal API at SIGGRAPH 2006. The core itself was mostly unchanged, except for doubling the onboard memory and bandwidth, similar to the FireGL V7350; new driver and software support made up most of the difference.

Folding@home Folding@home (FAH or F@h) is a distributed computing project aimed to help scientists develop new therapeutics for a variety of diseases by the means of simulating protein dynamics. This includes the process of protein folding and the movements ...

began using the X1900 for general computation, using a pre-release of version 6.5 of the ATI Catalyst driver, and reported 20-40x improvement in GPU over CPU. The first product was released in late 2006, rebranded as AMD Stream Processor after the merger with AMD. The brand became AMD FireStream with the second generation of stream processors in 2007, based on the RV650 chip with new unified shaders and double precision support. Asynchronous DMA also improved performance by allowing a larger memory pool without the CPU's help. One model was released, the 9170, for the initial price of $1999. Plans included the development of a stream processor on an MXM module by 2008, for laptop computing, but was never released. The third-generation quickly followed in 2008 with dramatic performance improvements from the RV770 core; the 9250 had nearly double the performance of the 9170, and became the first single-chip

teraflop Floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance in computing, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measur ...

processor, despite dropping the price to under $1000. A faster sibling, the 9270, was released shortly after, for $1999. In 2010 the final generation of FireStreams came out, the 9350 and 9370 cards, based on the Cypress chip featured in the HD 5800. This generation again doubled the performance relative to the previous, to 2 teraflops in the 9350 and 2.6 teraflops in the 9370, and was the first built from the ground up for

. This generation was also the only one to feature fully passive cooling, and active cooling was unavailable. The Northern and Southern Islands generations were skipped, and in 2012, AMD announced that the new FirePro W (workstation) and S (server) series based on the new Graphics Core Next architecture would take the place of FireStream cards.

Models

* FireStream 9170 include

10.1,

3.3 and APP Stream * FireStream 92x0 include

10.1,

3.3 and

1.0 * FireStream 93x0 include

11,

4.3 and

1.2 with Last Driver updates

Software

The AMD FireStream was launched with a wide range of software platform support. One of the supporting firms was PeakStream (acquired by

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

in June 2007), who was first to provide an open

beta Beta (, ; uppercase , lowercase , or cursive ; or ) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Ancient Greek, beta represented the voiced bilabial plosive . In Modern Greek, it represe ...

version of software to support CTM and AMD FireStream as well as

x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...

and Cell (Cell Broadband Engine) processors. The FireStream was claimed to be 20 times faster in typical applications than regular CPUs after running PeakStream's software . RapidMind also provided stream processing software that worked with ATI and NVIDIA, as well as Cell processors.

Software Development Kit

After abandoning their short-lived Close to Metal API, AMD focused on

. AMD first released its Stream Computing SDK (v1.0), in December 2007 under the AMD

EULA An end-user license agreement or EULA () is a legal contract between a software supplier and a customer or end-user. The practice of selling licenses to rather than copies of software predates the recognition of software copyright, which has ...

, to be run on

Windows XP Windows XP is a major release of Microsoft's Windows NT operating system. It was released to manufacturing on August 24, 2001, and later to retail on October 25, 2001. It is a direct successor to Windows 2000 for high-end and business users a ...

.AMD APP SDK download page
an
Stream Computing SDK EULA
, retrieved December 29, 2007 The SDK includes "Brook+", an AMD hardware optimized version of the Brook language developed by Stanford University, itself a variant of the

ANSI C ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the ...

(

C language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities o ...

), open-sourced and optimized for stream computing. The AMD Core Math Library (ACML) and AMD Performance Library (APL) with optimizations for the AMD FireStream and the COBRA video library (further renamed as "Accelerated Video Transcoding" or AVT) for video transcoding acceleration will also be included. Another important part of the SDK, the Compute Abstraction Layer (CAL), is a software development layer aimed for low-level access, through the CTM hardware interface, to the GPU architecture for performance tuning software written in various high-level

programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...

s. In August 2011, AMD released version 2.5 of the ATI APP Software Development Kit, which includes support for OpenCL 1.1, a

parallel computing Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...

language developed by the

Khronos Group The Khronos Group, Inc. is an open, non-profit, member-driven consortium of 170 organizations developing, publishing and maintaining royalty-free interoperability standards for 3D graphics, virtual reality, augmented reality, parallel computat ...

. The concept of

compute shader In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main pro ...

s, officially called DirectCompute, in

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

's next generation API called DirectX 11 is already included in graphics drivers with DirectX 11 support.

AMD APP SDK

Benchmarks

According to an AMD-demonstrated system with two dual-core AMD Opteron processors and two Radeon R600 GPU cores running on

Windows XP Professional, 1

(TFLOP) can be achieved by a universal multiply-add (MADD) calculation. By comparison, an Intel Core 2 Quad Q9650 3.0 GHz processor at the time could achieve 48 GFLOPS. In a demonstration of Kaspersky SafeStream anti-virus scanning that had been optimized for AMD stream processors, was able to scan 21 times faster with the R670 based acceleration than with search running entirely on an Opteron, in 2007.

Limitations

* Recursive functions are not supported in Brook+ because all function calls are inlined at compile time. Using CAL, functions (recursive or otherwise) are supported to 32 levels.AMD Intermediate Language Reference Guide, August 2008 *Only bilinear texture filtering is supported;

mipmap In computer graphics, a mipmap (''mip'' being an acronym of the Latin phrase ''multum in parvo'', meaning "much in little") is a pre-calculated, optimized sequence of images, each of which has an image resolution which is a factor of two small ...

ped textures and

anisotropic filtering In 3D computer graphics, anisotropic filtering (AF) is a technique that improves the appearance of Texture filtering, textures, especially on surfaces viewed at sharp Viewing angle, angles. It helps make textures look sharper and more detailed ...

are not supported. *Functions cannot have a variable number of arguments. The same problem occurs for recursive functions. *Conversion of floating-point numbers to integers on GPUs is done differently than on x86 CPUs; it is not fully IEEE-754 compliant. *Doing "global synchronization" on the GPU is not very efficient, which forces the GPU to divide the kernel and do synchronization on the CPU. Given the variable number of multiprocessors and other factors, there may not be a perfect solution to this problem. *The bus bandwidth and latency between the CPU and the GPU may become a

bottleneck Bottleneck may refer to: * the narrowed portion (neck) of a bottle Science and technology * Bottleneck (engineering), where the performance of an entire system is limited by a single component * Bottleneck (network), in a communication network * ...

References

External links

ATI Stream Technology FAQ

ATI Stream published papers and presentationsATI Stream SDKAnandTech article on distributed computingAMD Intermediate Language Reference Guide (CAL) v2.0 Feb '09
{{DEFAULTSORT:FireStream AMD products ATI Technologies GPGPU