HOME

TheInfoList



OR:

SSE2 (Streaming SIMD Extensions 2) is one of the Intel
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(Single Instruction, Multiple Data)
processor supplementary instruction A processor supplementary capability is a feature that has been added to an existing central processing unit (CPU) design after the initial introduction of that design to the marketplace. A supplementary capability increases the usefulness o ...
sets first introduced by
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
with the initial version of the
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 ...
in 2000. It extends the earlier SSE instruction set, and is intended to fully replace MMX. Intel extended SSE2 to create
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
in 2004. SSE2 added 144 new instructions to SSE, which has 70 instructions. Competing chip-maker
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
added support for SSE2 with the introduction of their
Opteron Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64 or AMD64). It was released on April 22, 2003, with the ''SledgeHa ...
and
Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon XP. T ...
ranges of
AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
64-bit CPUs in 2003.


Features

Most of the SSE2 instructions implement the integer vector operations also found in MMX. Instead of the MMX registers they use the XMM registers, which are wider and allow for significant performance improvements in specialized applications. Another advantage of replacing MMX with SSE2 is avoiding the mode switching penalty for issuing x87 instructions present in MMX because it is sharing register space with the x87 FPU. The SSE2 also complements the floating-point vector operations of the SSE instruction set by adding support for the double precision data type. Other SSE2 extensions include a set of
cache control instruction In computing, a cache control instruction is a hint embedded in the instruction stream of a processor intended to improve the performance of hardware caches, using foreknowledge of the memory access pattern supplied by the programmer or compiler ...
s intended primarily to minimize
cache pollution Cache pollution describes situations where an executing computer program loads data into CPU cache unnecessarily, thus causing other useful data to be evicted from the cache into lower levels of the memory hierarchy, degrading performance. For e ...
when processing infinite streams of information, and a sophisticated complement of numeric format conversion instructions. AMD's implementation of SSE2 on the AMD64 (
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
) platform includes an additional eight registers, doubling the total number to 16 (XMM0 through XMM15). These additional registers are only visible when running in 64-bit mode. Intel adopted these additional registers as part of their support for x86-64 architecture (or in Intel's parlance, "Intel 64") in 2004.


Differences between x87 FPU and SSE2

FPU (x87) instructions provide higher precision by calculating intermediate results with 80 bits of precision, by default, to minimise roundoff error in numerically unstable algorithms (see IEEE 754 design rationale and references therein). However, the x87 FPU is a scalar unit only whereas SSE2 can process a small vector of operands in parallel. If code designed for x87 is ported to the lower precision double precision SSE2 floating point, certain combinations of math operations or input datasets can result in measurable numerical deviation, which can be an issue in reproducible scientific computations, e.g. if the calculation results must be compared against results generated from a different machine architecture. A related issue is that, historically, language standards and compilers had been inconsistent in their handling of the x87 80-bit registers implementing double extended precision variables, compared with the double and single precision formats implemented in SSE2: the rounding of extended precision intermediate values to double precision variables was not fully defined and was dependent on implementation details such as when registers were spilled to memory.


Differences between MMX and SSE2

SSE2 extends MMX instructions to operate on XMM registers. Therefore, it is possible to convert all existing MMX code to an SSE2 equivalent. Since an SSE2 register is twice as long as an MMX register, loop counters and memory access may need to be changed to accommodate this. However, 8 byte loads and stores to XMM are available, so this is not strictly required. Although one SSE2 instruction can operate on twice as much data as an MMX instruction, performance might not increase significantly. Two major reasons are: accessing SSE2 data in memory not aligned to a 16-byte boundary can incur significant penalty, and the
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ov ...
of SSE2 instructions in older
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
implementations was half that for MMX instructions.
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
addressed the first problem by adding an instruction in
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
to reduce the overhead of accessing unaligned data and improving the overall performance of misaligned loads, and the last problem by widening the execution engine in their
Core microarchitecture The Intel Core microarchitecture (provisionally referred to as Next Generation Micro-architecture, and developed as Merom) is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the p ...
in Core 2 Duo and later products. Since MMX and x87 register files alias one another, using MMX will prevent x87 instructions from working as desired. Once MMX has been used, the programmer must use the emms instruction (C: _mm_empty()) to restore operation to the x87 register file. On some operating systems, x87 is not used very much, but may still be used in some critical areas like pow() where the extra precision is needed. In such cases, the corrupt floating-point state caused by failure to emit emms may go undetected for millions of instructions before ultimately causing the floating-point routine to fail, returning NaN. Since the problem is not locally apparent in the MMX code, finding and correcting the bug can be very time consuming. As SSE2 does not have this problem and it usually provides much better throughput and provides more registers in 64-bit code, it should be preferred for nearly all vectorization work.


Compiler usage

When first introduced in 2000, SSE2 was not supported by software development tools. For example, to use SSE2 in a
Microsoft Visual Studio Visual Studio is an integrated development environment (IDE) from Microsoft. It is used to develop computer programs including websites, web apps, web services and mobile apps. Visual Studio uses Microsoft software development platforms such a ...
project, the programmer had to either manually write inline-assembly or import object-code from an external source. Later the Visual C++ Processor Pack added SSE2 support to
Visual C++ Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tria ...
and
MASM The Microsoft Macro Assembler (MASM) is an x86 assembler that uses the Intel syntax for MS-DOS and Microsoft Windows. Beginning with MASM 8.0, there are two versions of the assembler: One for 16-bit & 32-bit assembly sources, and another (ML64) ...
. The
Intel C++ Compiler Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems. Overview Intel o ...
can automatically generate
SSE4 SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper; more ...
,
SSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology. History SSSE3 was first introduced with Intel processors based on the Core microarchitecture ...
,
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
, SSE2, and SSE code without the use of hand-coded assembly. Since GCC 3, GCC can automatically generate SSE/SSE2 scalar code when the target supports those instructions.
Automatic vectorization Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, ...
for SSE/SSE2 has been added since GCC 4. The
Sun Studio Compiler Suite Oracle Developer Studio, formerly named Oracle Solaris Studio, Sun Studio, Sun WorkShop, Forte Developer, and SunPro Compilers, is Oracle Corporation's flagship software development product for the Solaris and Linux operating systems. It includ ...
can also generate SSE2 instructions when the compiler flag -xvector=simd is used. Since
Microsoft Visual C++ Microsoft Visual C++ (MSVC) is a compiler for the C, C++ and C++/CX programming languages by Microsoft. MSVC is proprietary software; it was originally a standalone product but later became a part of Visual Studio and made available in both tria ...
2012, the compiler option to generate SSE2 instructions is turned on by default.


CPU support

SSE2 is an extension of the
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...
architecture, based on the
x86 instruction set The x86 instruction set refers to the set of instructions that x86-compatible microprocessors support. The instructions are usually part of an executable program, often stored as a computer file and executed on the processor. The x86 instruction ...
. Therefore, only x86 processors can include SSE2. The
AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
architecture supports the
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...
as a compatibility mode and includes the SSE2 in its specification. It also doubles the number of XMM registers, allowing for better performance. SSE2 is also a requirement for installing Windows 8 (and later) or Microsoft Office 2013 (and later) "to enhance the reliability of third-party apps and drivers running in Windows 8". The following IA-32 CPUs support SSE2: *
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
NetBurst The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of central processing units (CPUs) made by Intel. The first CPU to use this architecture was the Willamette-core Pentium ...
-based CPUs (
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 ...
,
Xeon Xeon ( ) is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same arc ...
,
Celeron Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers. Celeron processors are compatible with IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called ...
,
Pentium D Pentium D is a range of desktop 64-bit x86-64 processors based on the NetBurst microarchitecture, which is the dual-core variant of the Pentium 4 manufactured by Intel. Each CPU comprised two dies, each containing a single core, residing next to ...
,
Celeron D Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers. Celeron processors are compatible with IA-32 software. They typically offer less performance per clock speed comp ...
) * Intel
Pentium M The Pentium M is a family of mobile 32-bit single-core x86 microprocessors (with the modified Intel P6 microarchitecture) introduced in March 2003 and forming a part of the Intel Carmel notebook platform under the then new Centrino brand. The '' ...
and
Celeron M Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers. Celeron processors are compatible with IA-32 software. They typically offer less performance per clock speed ...
*
Intel Atom Intel Atom is the brand name for a line of IA-32 and x86-64 instruction set ultra-low-voltage processors by Intel Corporation designed to reduce electric consumption and power dissipation in comparison with ordinary processors of the Intel Cor ...
*
AMD Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon XP. T ...
*
Transmeta Efficeon The Efficeon processor is Transmeta's second-generation 256-bit VLIW design released in 2004 which employs a software engine Code Morphing Software (CMS) to convert code written for x86 processors to the native instruction set of the chip. Like ...
*
VIA C7 The VIA C7 is an x86 central processing unit designed by Centaur Technology and sold by VIA Technologies. Product history The C7 delivers a number of improvements to the older VIA C3 cores but is nearly identical to the latest VIA C3 Nehemiah ...
The following IA-32 CPUs were released after SSE2 was developed, but did not implement it: *
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
CPUs prior to
Athlon 64 The Athlon 64 is a ninth-generation, AMD64-architecture microprocessor produced by Advanced Micro Devices (AMD), released on September 23, 2003. It is the third processor to bear the name ''Athlon'', and the immediate successor to the Athlon XP. T ...
, such as
Athlon XP Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and the fi ...
*
VIA C3 The VIA C3 is a family of x86 central processing units for personal computers designed by Centaur Technology and sold by VIA Technologies. The different CPU cores are built following the design methodology of Centaur Technology. In addition to ...
*
Transmeta Crusoe The Transmeta Crusoe is a family of x86-compatible microprocessors developed by Transmeta and introduced in 2000. Instead of the instruction set architecture being implemented in hardware, or translated by specialized hardware, the Crusoe runs ...
*
Intel Quark Intel Quark is a line of 32-bit x86 SoCs and microcontrollers by Intel, designed for small size and low power consumption, and targeted at new markets including wearable devices. The line was introduced at Intel Developer Forum in 2013, and d ...


See also

* SSE2 instructions


References

{{Multimedia extensions X86 instructions SIMD computing