SSE5
   HOME

TheInfoList



OR:

The SSE5 (short for Streaming SIMD Extensions version 5) was a
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
instruction set extension proposed by
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the
AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
architecture. AMD chose not to implement SSE5 as originally proposed. In May 2009, AMD replaced SSE5 with three smaller instruction set extensions named as XOP, FMA4, and
F16C The F16C (previously/informally known as CVT16) instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats. History T ...
, which retain the proposed functionality of SSE5, but encode the instructions differently for better compatibility with Intel's proposed
AVX AVX may refer to: Technology * Advanced Vector Extensions, an instruction set extension in the x86 microprocessor architecture ** AVX2, an expansion of the AVX instruction set ** AVX-512, 512-bit extensions to the 256-bit AVX * AVX Corporation, a m ...
instruction set. The three SSE5-derived instruction sets were introduced in the
Bulldozer A bulldozer or dozer (also called a crawler) is a large, motorized machine equipped with a metal blade to the front for pushing material: soil, sand, snow, rubble, or rock during construction work. It travels most commonly on continuous track ...
processor core, released in October 2011 on a
32 nm The 32 nm node is the step following the 45 nm process in CMOS ( MOSFET) semiconductor device fabrication. "32- nanometre" refers to the average half-pitch (i.e., half the distance between identical features) of a memory cell at this technology ...
process.


Compatibility

AMD's SSE5 extension bundle does not include the full set of
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
's SSE4 instructions, making it a competitor to SSE4 rather than a successor. This complicates software development. It is recommended practice for a program to test for the presence of instruction set extensions by means of the CPUID instruction before entering a code path which depends upon those instructions to function correctly. For maximum portability, an optimized application will require three code paths: a base code path for compatibility with older processors (from either vendor), a separately optimized Intel code path exploiting SSE4 or AVX, and a separately optimized AMD code path exploiting SSE5. Due to this proliferation, benchmarks between Intel and AMD processors increasingly reflect the cleverness or implementation quality of the divergent code paths rather than the strength of the underlying platform.


SSE5 enhancements

The proposed SSE5 instruction set consisted of 170 instructions (including 46 base instructions), many of which are designed to improve single-threaded performance. Some SSE5 instructions are 3-operand instructions, the use of which will increase the average number of
instructions per cycle In computer architecture, instructions per cycle (IPC), commonly called instructions per clock is one aspect of a processor's performance: the average number of instructions executed for each clock cycle. It is the multiplicative inverse of cycl ...
achievable by
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
code. Selected new instructions include: *
Fused multiply–accumulate Fuse or FUSE may refer to: Devices * Fuse (electrical), a device used in electrical systems to protect against excessive current ** Fuse (automotive), a class of fuses for vehicles * Fuse (hydraulic), a device used in hydraulic systems to protect ...
(FMACxx) instructions * Integer multiply–accumulate (IMAC, IMADC) instructions * Permutation (PPERM, PERMPx) and conditional move (PCMOV) instructions * Precision control, rounding, and conversion instructions AMD claims SSE5 will provide dramatic performance improvements, particularly in
high-performance computing High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
(HPC),
multimedia Multimedia is a form of communication that uses a combination of different content forms such as text, audio, images, animations, or video into a single interactive presentation, in contrast to tradition ...
, and
computer security Computer security, cybersecurity (cyber security), or information technology security (IT security) is the protection of computer systems and networks from attack by malicious actors that may result in unauthorized information disclosure, the ...
applications, including a 5x performance gain for
Advanced Encryption Standard The Advanced Encryption Standard (AES), also known by its original name Rijndael (), is a specification for the encryption of electronic data established by the U.S. National Institute of Standards and Technology (NIST) in 2001. AES is a variant ...
(AES) encryption and a 30% performance gain for discrete cosine transform (DCT) used to process video streams. For more detailed information, consult the instruction sets as subsequently divided. * XOP: A revision of most of the SSE5 instruction set *
FMA3 The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations."FMA3 and FMA4 are not instruction sets, they are i ...
: Floating-point vector multiply–accumulate. *
F16C The F16C (previously/informally known as CVT16) instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats. History T ...
:
Half-precision In computing, half precision (sometimes called FP16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in applications w ...
floating-point conversion.


2009 revision

The SSE5 specification included a proposed extension to the general coding scheme of
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
instructions in order to allow instructions to have more than two operands. In 2008,
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
announced their planned
AVX AVX may refer to: Technology * Advanced Vector Extensions, an instruction set extension in the x86 microprocessor architecture ** AVX2, an expansion of the AVX instruction set ** AVX-512, 512-bit extensions to the 256-bit AVX * AVX Corporation, a m ...
instruction set which proposed a different way of coding instructions with more than two operands. The two proposed coding schemes, SSE5 and AVX, are mutually incompatible, although the AVX scheme has certain advantages over the SSE5 scheme: most importantly, AVX has plenty of space for future extensions, including larger vector sizes. In May 2009, AMD published a revised specification for the planned future instructions. This revision changes the coding scheme to make it compatible with the AVX scheme, but with a differing prefix byte in order to avoid overlap between instructions introduced by AMD and instructions introduced by Intel. The revised instruction set no longer carries the name SSE5, which has been criticized for being misleading, but most of the instructions in the new revision are functionally identical to the original SSE5 specification—only the way the instructions are coded differs. The planned additions to the AMD instruction set consists of three subsets: # XOP: Integer vector multiply–accumulate instructions, integer vector horizontal addition, integer vector compare, shift and rotate instructions, byte permutation and conditional move instructions, floating point fraction extraction. # FMA4: Floating-point vector multiply–accumulate. #
F16C The F16C (previously/informally known as CVT16) instruction set is an x86 instruction set architecture extension which provides support for converting between half-precision and standard IEEE single-precision floating-point formats. History T ...
:
Half-precision In computing, half precision (sometimes called FP16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in applications w ...
floating-point conversion. These new instruction sets include support for future extensions for the vector size from 128 bits to 256 bits. It is unclear from these preliminary specifications whether the
Bulldozer A bulldozer or dozer (also called a crawler) is a large, motorized machine equipped with a metal blade to the front for pushing material: soil, sand, snow, rubble, or rock during construction work. It travels most commonly on continuous track ...
processor will support 256-bit vector registers (YMM registers).


See also

*
x86 instruction listings The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor. The x86 instructi ...
* Fused multiply–add


References


External links


A New SSE Instruction Set: AMD Announces SSE5
AnandTech
AMD64 Architecture Programmer’s Manual Volume 6: 128-Bit and 256-Bit XOP and FMA4 InstructionsAMD and Intel incompatible - What to do? AMD Developer Forums
{{Multimedia extensions, state=uncollapsed X86 instructions SIMD computing AMD technologies