The FMA instruction set is an extension to the 128 and 256-bit
Streaming SIMD Extensions instructions in the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
to perform
fused multiply–add (FMA) operations.
["FMA3 and FMA4 are not instruction sets, they are individual instructions -- fused multiply add. They could be quite useful depending on how Intel and AMD implement them" ] There are two variants:
* FMA4 is supported in
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
processors starting with the
Bulldozer
A bulldozer or dozer (also called a crawler) is a large, motorized machine equipped with a metal blade to the front for pushing material: soil, sand, snow, rubble, or rock during construction work. It travels most commonly on continuous track ...
architecture. FMA4 was performed in hardware before FMA3 was. Support for FMA4 has been removed since
Zen 1.
* FMA3 is supported in AMD processors starting with the
Piledriver architecture and
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
starting with
Haswell processors and
Broadwell processors since 2014.
Instructions
FMA3 and FMA4 instructions have almost identical functionality, but are not compatible. Both contain
fused multiply–add (FMA) instructions for
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...
scalar and
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
operations, but FMA3 instructions have three operands, while FMA4 ones have four. The FMA operation has the form ''d'' = round(''a'' · ''b'' + ''c''), where the round function performs a
rounding
Rounding means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $ with $, the fraction 312/937 with 1/3, or the expression with .
Rounding is often done to ob ...
to allow the result to fit within the destination register if there are too many significant bits to fit within the destination.
The four-operand form (FMA4) allows ''a'', ''b'', ''c'' and ''d'' to be four different registers, while the three-operand form (FMA3) requires that ''d'' be the same register as ''a'', ''b'' or ''c''. The three-operand form makes the code shorter and the hardware implementation slightly simpler, while the four-operand form provides more programming flexibility.
See
XOP instruction set
The XOP (''eXtended Operations'') instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 2011. ...
for more discussion of compatibility issues between Intel and AMD.
FMA3 instruction set
CPUs with FMA3
* AMD
**
Piledriver (2012) and newer microarchitectures
*** 2nd gen APUs,
"Trinity" (32nm), May 15, 2012
*** 2nd gen "Bulldozer" (bdver2) with
Piledriver cores, October 23, 2012
* Intel
**
Haswell (2013) and newer processors, except
Pentium
Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium processor from which the brand took its name was first released on March 22, 1993. After that, the Pentium II and Pe ...
s and
Celeron
Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers.
Celeron processors are compatible with IA-32 software. They typically offer less performance per clock speed com ...
s
Excerpt from FMA3
Supported commands include
;Note:
* VFNMADD is
result = − a · b + c
, not
result = − (a · b + c)
.
* VFNMSUB generates a −0 for all inputs are zero.
Explicit order of operands is included in the mnemonic using numbers "132", "213", and "231":
as well as operand format (packed or scalar) and size (single or double).
This results in
FMA4 instruction set
CPUs with FMA4
* AMD
** "Heavy Equipment" processors
***
Bulldozer-based processors, October 12, 2011
***
Piledriver-based processors
***
Steamroller-based processors
***
Excavator-based processors (including "v2")
**
Zen
Zen ( zh, t=禪, p=Chán; ja, text= 禅, translit=zen; ko, text=선, translit=Seon; vi, text=Thiền) is a school of Mahayana Buddhism that originated in China during the Tang dynasty, known as the Chan School (''Chánzong'' 禪宗), and ...
: WikiChip's testing shows FMA4 still appears to work (under the conditions of the tests) despite not being officially supported and not even reported by CPUID. This has also been confirmed by Agner. But other tests gave wrong results.
[ AMD Official Web Site FMA4 Support Note ZEN CPUs = AMD ThreadRipper 1900x, R7 Pro 1800, 1700, R5 Pro 1600, 1500, R3 Pro 1300, 1200, R3 2200G, R5 2400G.
* Intel
** Intel has not released CPUs with support for FMA4.
]
Excerpt from FMA4
History
The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:
* August 2007: AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
announces the SSE5 The SSE5 (short for Streaming SIMD Extensions version 5) was a SIMD instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture.
AMD chose not to implement SSE5 as or ...
instruction set, which includes 3-operand FMA instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands.
* April 2008: Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
announces their AVX AVX may refer to:
Technology
* Advanced Vector Extensions, an instruction set extension in the x86 microprocessor architecture
** AVX2, an expansion of the AVX instruction set
** AVX-512, 512-bit extensions to the 256-bit AVX
* AVX Corporation, a m ...
and FMA instruction sets, including 4-operand FMA instructions. The coding of these instructions uses the new VEX coding scheme, which is more flexible than AMD's DREX scheme.
* December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.
* May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.
* October 2011: AMD Bulldozer
A bulldozer or dozer (also called a crawler) is a large, motorized machine equipped with a metal blade to the front for pushing material: soil, sand, snow, rubble, or rock during construction work. It travels most commonly on continuous track ...
processor supports FMA4.
* January 2012: AMD announces FMA3 support in future processors codenamed Trinity
The Christian doctrine of the Trinity (, from 'threefold') is the central dogma concerning the nature of God in most Christian churches, which defines one God existing in three coequal, coeternal, consubstantial divine persons: God the F ...
and Vishera Vishera may refer to:
* Vishera FX-series CPU, codename for a line of CPU by AMD
* Vishera Nature Reserve, in Perm Krai, Russia
* Malaya Vishera
Malaya Vishera (russian: Ма́лая Ви́шера) is a town and the administrative center of Ma ...
; they are based on the Piledriver architecture.
* May 2012: AMD Piledriver processor supports both FMA3 and FMA4.
* June 2013: Intel Haswell processor supports FMA3.
* February 2017 The first generation of AMD Ryzen
Ryzen ( ) is a brand of multi-core x86-64 microprocessors designed and marketed by AMD for desktop, mobile, server, and embedded platforms based on the Zen microarchitecture. It consists of central processing units (CPUs) marketed for mainst ...
processors officially supports FMA3, but not FMA4 according to the CPUID
In the x86 architecture, the CPUID instruction (identified by a CPUID opcode) is a processor supplementary instruction (its name derived from CPU IDentification) allowing software to discover details of the processor. It was introduced by Intel ...
instruction. There has been confusion regarding whether FMA4 was implemented or not on this processor due to errata in the initial patch to the GNU Binutils
The GNU Binary Utilities, or , are a set of programming tools for creating and managing binary programs, object files, libraries, profile data, and assembly source code.
Tools
They were originally written by programmers at Cygnus Solutions.
...
package that has since been rectified. While the FMA4 instructions seem to work according to some tests, they can also give wrong results. Additionally, the initial Ryzen CPUs could be crashed by a particular sequence of FMA3 instructions. It has since been resolved by an updated CPU microcode.
Compiler and assembler support
Different compilers provide different levels of support for FMA:
* GCC supports FMA4 with -mfma4 since version 4.5.0 and FMA3 with -mfma since version 4.7.0.
* Microsoft Visual C++
Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tri ...
2010 SP1 supports FMA4 instructions.
* Microsoft Visual C++
Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tri ...
2012 supports FMA3 instructions (if the processor also supports AVX2 instruction set extension).
* Microsoft Visual C++
Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tri ...
since VC 2013
* PathScale
PathScale Inc. was a company that developed a highly optimizing C, C++, and Fortran compiler suite for the x86-64 microprocessor architectures. It derives from the SGI compilers for the MIPS architecture R10000 processor, called MIPSPro.
Hist ...
supports FMA4 with -mfma.
* LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate repre ...
3.1 adds FMA4 support, along with preliminary FMA3 support.
* Open64
Open64 is a free, open-source, optimizing compiler for the Itanium and x86-64 microprocessor architectures. It derives from the SGI compilers for the MIPS R10000 processor, called ''MIPSPro''. It was initially released in 2000 as GNU GPL s ...
5.0 adds "limited support".
* Intel compilers support only FMA3 instructions.
* NASM supports FMA3 instructions since version 2.03 and FMA4 instructions since 2.06.
* FASM
FASM (''flat assembler'') is an assembler for x86 processors. It supports Intel-style assembly language on the IA-32 and x86-64 computer architectures. It claims high speed, size optimizations, operating system (OS) portability, and macro ab ...
supports both FMA3 and FMA4 instructions.
References
{{DEFAULTSORT:Fma Instruction Set
X86 instructions
SIMD computing
AMD technologies