SSE1
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, Streaming SIMD Extensions (SSE) is a single instruction, multiple data (
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
)
instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
extension to the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
architecture, designed by
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
and introduced in 1999 in its
Pentium III The Pentium III (marketed as Intel Pentium III Processor, informally PIII or P3) brand refers to Intel's 32-bit x86 desktop and mobile CPUs based on the sixth-generation P6 (microarchitecture), P6 microarchitecture introduced on February 28, 1999 ...
series of
central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
s (CPUs) shortly after the appearance of
Advanced Micro Devices Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a Information technology, hardware and F ...
(AMD's)
3DNow! 3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of float ...
. SSE contains 70 new instructions (65 unique mnemonics using 70 encodings), most of which work on
single precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floa ...
floating-point In computing, floating-point arithmetic (FP) is arithmetic on subsets of real numbers formed by a ''significand'' (a Sign (mathematics), signed sequence of a fixed number of digits in some Radix, base) multiplied by an integer power of that ba ...
data. SIMD instructions can greatly increase performance when exactly the same operations are to be performed on multiple data objects. Typical applications are
digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
and graphics processing. Intel's first
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called ''i386'') is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the i386, 80386 microprocessor in 1985. IA-32 is the first incarn ...
SIMD effort was the MMX instruction set. MMX had two main problems: it re-used existing
x87 x87 is a floating-point-related subset of the x86 architecture instruction set. It originated as an extension of the 8086 instruction set in the form of optional floating-point coprocessors that work in tandem with corresponding x86 CPUs. These m ...
floating-point registers making the CPUs unable to work on both floating-point and SIMD data at the same time, and it only worked on
integers An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ...
. SSE floating-point instructions operate on a new independent register set, the XMM registers, and adds a few integer instructions that work on MMX registers. SSE was subsequently expanded by Intel to
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. SSE2 instructions allow the use of ...
,
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
,
SSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology. History SSSE3 was first introduced with Intel processors based on the Core microarchitect ...
and
SSE4 SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper;< ...
. Because it supports floating-point math, it had wider applications than MMX and became more popular. The addition of integer support in SSE2 made MMX largely redundant, though further performance increases can be attained in some situations by using MMX in parallel with SSE operations. SSE was originally called Katmai New Instructions (KNI), Katmai being the code name for the first Pentium III core revision. During the Katmai project Intel sought to distinguish it from its earlier product line, particularly its flagship
Pentium II The Pentium II is a brand of sixth-generation Intel x86 microprocessors based on the P6 (microarchitecture), P6 microarchitecture, introduced on May 7, 1997. It combined the ''P6'' microarchitecture seen on the Pentium Pro with the MMX (instruc ...
. It was later renamed Internet Streaming SIMD Extensions (ISSE), then SSE. AMD added a subset of SSE, 19 of them, called new MMX instructions, and known as several variants and combinations of SSE and MMX, shortly after with the release of the original
Athlon AMD Athlon is the brand name applied to a series of x86, x86-compatible microprocessors designed and manufactured by AMD, Advanced Micro Devices. The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor a ...
in August 1999, see 3DNow! extensions. AMD eventually added full support for SSE instructions, starting with its
Athlon XP AMD Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices. The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and the fi ...
and
Duron Duron is a line of budget x86-compatible microprocessors manufactured by Advanced Micro Devices, AMD and released on June 19, 2000. Duron was intended to be a lower-cost offering to complement AMD's then mainstream performance Athlon process ...
( Morgan core) processors.


Registers

SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The
AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new operating modes: 64-bit mode an ...
extensions from AMD added a further eight registers XMM8 through XMM15, and this extension is duplicated in the
Intel 64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new operating modes: 64-bit mode an ...
architecture. There is also a new 32-bit control/status register, MXCSR. The registers XMM8 through XMM15 are accessible only in 64-bit operating mode. SSE used only a single data type for XMM registers: * four 32-bit
single-precision Single-precision floating-point format (sometimes called FP32 or float32) is a computer number format, usually occupying 32 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point. A floati ...
floating-point numbers
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. SSE2 instructions allow the use of ...
would later expand the usage of the XMM registers to include: * two 64-bit
double-precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide range of numeric values by using a floating radix point. Double prec ...
floating-point numbers or * two 64-bit integers or * four 32-bit integers or * eight 16-bit short integers or * sixteen 8-bit bytes or characters. Because these 128-bit registers are additional machine states that the
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions that can save all
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
and SSE register states at once. This support was quickly added to all major IA-32 operating systems. The first CPU to support SSE, the
Pentium III The Pentium III (marketed as Intel Pentium III Processor, informally PIII or P3) brand refers to Intel's 32-bit x86 desktop and mobile CPUs based on the sixth-generation P6 (microarchitecture), P6 microarchitecture introduced on February 28, 1999 ...
, shared execution resources between SSE and the
floating-point unit A floating-point unit (FPU), numeric processing unit (NPU), colloquially math coprocessor, is a part of a computer system specially designed to carry out operations on floating-point numbers. Typical operations are addition, subtraction, multip ...
(FPU). While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same
clock cycle In electronics and especially synchronous digital circuits, a clock signal (historically also known as ''logic beat'') is an electronic logic signal (voltage or current) which oscillates between a high and a low state at a constant frequency and ...
. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating-point operations to be mixed without the performance hit from explicit MMX/floating-point mode switching.


SSE instructions

SSE introduced both
scalar Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers *Scalar (physics), a physical quantity that can be described by a single element of a number field such a ...
and packed floating-point instructions.


Floating-point instructions

Floating operations are IEEE 754-1985 compliant, with the exception of RSQRTSS, which is not specified in the standard. * Memory-to-register/register-to-memory/register-to-register data movement ** Scalar – MOVSS ** Packed – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS, MOVMSKPS * Arithmetic ** Scalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS ** Packed – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS *
Compare Comparison or comparing is the act of evaluating two or more things by determining the relevant, comparable characteristics of each thing, and then determining which characteristics of each are similar to the other, which are different, and t ...
** Scalar – CMPSS, COMISS, UCOMISS ** Packed – CMPPS * Data shuffle and unpacking ** Packed – SHUFPS, UNPCKHPS, UNPCKLPS * Data-type conversion ** Scalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI ** Packed – CVTPI2PS, CVTPS2PI, CVTTPS2PI *
Bitwise In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic operat ...
logical operations ** Packed – ANDPS, ORPS, XORPS, ANDNPS


Integer instructions

* Arithmetic ** PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW * Data movement ** PEXTRW, PINSRW * Other ** PMOVMSKB, PSHUFW


Other instructions

* MXCSR management ** LDMXCSR, STMXCSR * Cache and Memory management ** MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE


Example

The following simple example demonstrates the advantage of using SSE. Consider an operation like vector addition, which is used very often in computer graphics applications. To add two single precision, four-component vectors together using x86 requires four floating-point addition instructions. vec_res.x = v1.x + v2.x; vec_res.y = v1.y + v2.y; vec_res.z = v1.z + v2.z; vec_res.w = v1.w + v2.w; This corresponds to four x86 FADD instructions in the object code. On the other hand, as the following pseudo-code shows, a single 128-bit 'packed-add' instruction can replace the four scalar addition instructions. movaps xmm0, 1;xmm0 = v1.w , v1.z , v1.y , v1.x addps xmm0, 2 ;xmm0 = v1.w+v2.w , v1.z+v2.z , v1.y+v2.y , v1.x+v2.x movaps ec_res xmm0 ;xmm0


Later versions

*
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets introduced by Intel with the initial version of the Pentium 4 in 2000. SSE2 instructions allow the use of ...
, Willamette New Instructions (WNI), introduced with the
Pentium 4 Pentium 4 is a series of single-core central processing unit, CPUs for Desktop computer, desktops, laptops and entry-level Server (computing), servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 20 ...
, is a major enhancement to SSE. SSE2 adds two major features:
double-precision Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide range of numeric values by using a floating radix point. Double prec ...
(64-bit) floating-point for all SSE operations, and MMX integer operations on 128-bit XMM registers. In the original SSE instruction set, conversion to and from integers placed the integer data in the 64-bit MMX registers. SSE2 enables the programmer to perform SIMD math on any data type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to use the legacy MMX or FPU registers. It offers an orthogonal set of instructions for dealing with common data types. *
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...
, also called Prescott New Instructions (PNI), is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions. It also allowed addition or multiplication of two numbers that are stored in the same register, which wasn't possible in SSE2 and earlier. This capability, known as horizontal in Intel terminology, was the major addition to the SSE3 instruction set. AMD's
3DNow! 3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of float ...
extension could do the latter too. *
SSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology. History SSSE3 was first introduced with Intel processors based on the Core microarchitect ...
, Merom New Instructions (MNI), is an upgrade to SSE3, adding 16 new instructions which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core
microarchitecture In electronics, computer science and computer engineering, microarchitecture, also called computer organization and sometimes abbreviated as μarch or uarch, is the way a given instruction set architecture (ISA) is implemented in a particular ...
. *
SSE4 SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core microarchitecture and AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vague details in a white paper;< ...
, Penryn New Instructions (PNI), is another major enhancement, adding a
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
instruction, additional integer instructions, a popcnt instruction ( Population count: count number of bits set to 1, used extensively e.g. in
cryptography Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or ''-logy, -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of Adversary (cryptography), ...
), and more. * XOP, FMA4 and CVT16 are new iterations announced by
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
in August 2007 and revised in May 2009. *
Advanced Vector Extensions Advanced Vector Extensions (AVX, also known as Gesher New Instructions and then Sandy Bridge New Instructions) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They w ...
(AVX), Gesher New Instructions (GNI), is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions (up from 2). Intel released processors in early 2011 with AVX support. *
AVX2 Advanced Vector Extensions (AVX, also known as Gesher New Instructions and then Sandy Bridge New Instructions) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They w ...
is an expansion of the AVX instruction set. *
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and first implemented in the 2016 Intel Xeon Phi x200 (Knights Landing), and then ...
(3.1 and 3.2) are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture.


Identifying

The following programs can be used to determine which, if any, versions of SSE are supported on a system * Intel Processor Identification Utility *
CPU-Z CPU-Z is a freeware system profiler, system profiling and system monitor, monitoring application for Microsoft Windows and Android (operating system), Android that detects the central processing unit, Random access memory, RAM, motherboard chipse ...
– CPU, motherboard, and memory identification utility. * lscpu - provided by the util-linux package in most Linux distributions.


See also

* AltiVec - equivalent on
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
architecture


References


External links


Intel Intrinsics Guide
{{DEFAULTSORT:Streaming Simd Extensions SIMD computing X86 instructions