SSE4 (Streaming SIMD Extensions 4) is a
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
CPU
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
used in the
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
Core microarchitecture
The Intel Core microarchitecture (provisionally referred to as Next Generation Micro-architecture, and developed as Merom) is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the p ...
and
AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006
Intel Developer Forum
The Intel Developer Forum (IDF) was a biannual gathering of technologists to discuss Intel products and products based on Intel products. The first IDF was held in 1997.
To emphasize the importance of China, the Spring 2007 IDF was held in Beiji ...
, with vague details in a
white paper
A white paper is a report or guide that informs readers concisely about a complex issue and presents the issuing body's philosophy on the matter. It is meant to help readers understand an issue, solve a problem, or make a decision. A white paper ...
; more precise details of 47 instructions became available at the Spring 2007 Intel Developer Forum in
Beijing
}
Beijing ( ; ; ), alternatively romanized as Peking ( ), is the capital of the People's Republic of China. It is the center of power and development of the country. Beijing is the world's most populous national capital city, with over 21 ...
, in the presentation. SSE4 is fully compatible with software written for previous generations of Intel 64 and IA-32 architecture microprocessors. All existing software continues to run correctly without modification on microprocessors that incorporate SSE4, as well as in the presence of existing and new applications that incorporate SSE4.
SSE4 subsets
Intel SSE4 consists of 54 instructions. A subset consisting of 47 instructions, referred to as ''SSE4.1'' in some Intel documentation, is available in
Penryn. Additionally, ''SSE4.2'', a second subset consisting of the 7 remaining instructions, is first available in
Nehalem-based
Core i7
The following is a list of Intel Core i7 brand microprocessors. Introduced in 2008, the Core i7 line of microprocessors are intended to be used by high-end users.
Desktop processors
Nehalem microarchitecture (1st generation)
"Bloomfield" ...
. Intel credits feedback from developers as playing an important role in the development of the instruction set.
Starting with
Barcelona
Barcelona ( , , ) is a city on the coast of northeastern Spain. It is the capital and largest city of the autonomous community of Catalonia, as well as the second most populous municipality of Spain. With a population of 1.6 million within ci ...
-based processors,
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
introduced the ''SSE4a'' instruction set, which has 4 SSE4 instructions and 4 new SSE instructions. These instructions are not found in Intel's processors supporting SSE4.1 and AMD processors only started supporting Intel's SSE4.1 and SSE4.2 (the full SSE4 instruction set) in the
Bulldozer
A bulldozer or dozer (also called a crawler) is a large, motorized machine equipped with a metal blade to the front for pushing material: soil, sand, snow, rubble, or rock during construction work. It travels most commonly on continuous track ...
-based FX processors. With SSE4a the misaligned SSE feature was also introduced which meant unaligned load instructions were as fast as aligned versions on aligned addresses. It also allowed disabling the alignment check on non-load SSE operations accessing memory. Intel later introduced similar speed improvements to unaligned SSE in their Nehalem processors, but did not introduce misaligned access by non-load SSE instructions until
AVX AVX may refer to:
Technology
* Advanced Vector Extensions, an instruction set extension in the x86 microprocessor architecture
** AVX2, an expansion of the AVX instruction set
** AVX-512, 512-bit extensions to the 256-bit AVX
* AVX Corporation, a m ...
.
Name confusion
What is now known as
SSSE3
Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology.
History
SSSE3 was first introduced with Intel processors based on the Core microarchitecture ...
(Supplemental Streaming
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
Extensions 3), introduced in the
Intel Core 2
Intel Core 2 is the processor family encompassing a range of Intel's consumer 64-bit x86-64 single-, dual-, and quad-core microprocessors based on the Core microarchitecture. The single- and dual-core models are single-die, whereas the quad-cor ...
processor line, was referred to as SSE4 by some media until Intel came up with the SSSE3 moniker. Internally dubbed Merom New Instructions, Intel originally did not plan to assign a special name to them, which was criticized by some journalists.
[My Experience With "Conroe"](_blank)
, DailyTech Intel eventually cleared up the confusion and reserved the SSE4 name for their next instruction set extension.
[ tp://download.intel.com/technology/architecture/new-instructions-paper.pdf Extending the World’s Most Popular Processor Architecture, Intel]
Intel is using the marketing term ''HD Boost'' to refer to SSE4.
New instructions
Unlike all previous iterations of SSE, SSE4 contains instructions that execute operations which are not specific to multimedia applications. It features a number of instructions whose action is determined by a constant field and a set of instructions that take XMM0 as an implicit third operand.
Several of these instructions are enabled by the single-cycle shuffle engine in Penryn. (Shuffle operations reorder bytes within a register.)
SSE4.1
These instructions were introduced with
Penryn microarchitecture, the 45 nm shrink of Intel's
Core microarchitecture
The Intel Core microarchitecture (provisionally referred to as Next Generation Micro-architecture, and developed as Merom) is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the p ...
. Support is indicated via the CPUID.01H:ECX.SSE41
it 19flag.
SSE4.2
SSE4.2 added STTNI (String and Text New Instructions), several new instructions that perform character searches and comparison on two operands of 16 bytes at a time. These were designed (among other things) to speed up the parsing of
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
documents. It also added a CRC32 instruction to compute
cyclic redundancy check
A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on t ...
s as used in certain data transfer protocols. These instructions were first implemented in the
Nehalem-based
Intel Core i7
The following is a list of Intel Core i7 brand microprocessors. Introduced in 2008, the Core i7 line of microprocessors are intended to be used by high-end users.
Desktop processors
Nehalem microarchitecture (1st generation)
"Bloomfield" ...
product line and complete the SSE4 instruction set. Support is indicated via the CPUID.01H:ECX.SSE42
it 20
It or IT may refer to:
* It (pronoun), in English
* Information technology
Arts and media Film and television
* ''It'' (1927 film), a film starring Clara Bow
* '' It! The Terror from Beyond Space'', a 1958 science fiction film
* ''It!'' (1967 ...
flag.
POPCNT and LZCNT
These instructions operate on integer rather than SSE registers, because they are not SIMD instructions, but appear at the same time and although introduced by AMD with the SSE4a instruction set, they are counted as separate extensions with their own dedicated CPUID bits to indicate support. Intel implements POPCNT beginning with the
Nehalem microarchitecture and LZCNT beginning with the
Haswell microarchitecture. AMD implements both beginning with the
Barcelona microarchitecture.
AMD calls this pair of instructions
''Advanced Bit Manipulation'' (ABM).
The encoding of ''lzcnt'' is similar enough to ''bsr'' (bit scan reverse) that if ''lzcnt'' is performed on a CPU not supporting it such as Intel CPU's prior to Haswell, it will perform the ''bsr'' operation instead of raising an invalid instruction error despite the different result values of ''lzcnt'' and ''bsr''.
Trailing zeros can be counted using the ''bsf'' (bit scan forward) or ''tzcnt'' instructions.
SSE4a
The SSE4a instruction group was introduced in AMD's
Barcelona microarchitecture. These instructions are not available in Intel processors. Support is indicated via the CPUID.80000001H:ECX.SSE4A
it 6
It or IT may refer to:
* It (pronoun), in English
* Information technology
Arts and media Film and television
* ''It'' (1927 film), a film starring Clara Bow
* '' It! The Terror from Beyond Space'', a 1958 science fiction film
* ''It!'' (1967 ...
flag.
Supporting CPUs
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
**
Silvermont
Silvermont is a microarchitecture for low-power Atom, Celeron and Pentium branded processors used in systems on a chip (SoCs) made by Intel. Silvermont forms the basis for a total of four SoC families:
* ''Merrifield'' and ''Moorefield'' cons ...
processors (SSE4.1, SSE4.2 and POPCNT supported)
**
Goldmont
Goldmont is a microarchitecture for low-power Atom, Celeron and Pentium branded processors used in systems on a chip (SoCs) made by Intel. They allow only one thread per core.
The ''Apollo Lake'' platform with 14 nm Goldmont core was un ...
processors (SSE4.1, SSE4.2 and POPCNT supported)
**
Goldmont Plus
Goldmont Plus is a microarchitecture for low-power Atom (system on chip), Atom, Celeron and Pentium, Pentium Silver branded processors used in system on a chip, systems on a chip (SoCs) made by Intel. The ''Gemini Lake'' platform with 14 nm ...
processors (SSE4.1, SSE4.2 and POPCNT supported)
**
Tremont processors (SSE4.1, SSE4.2 and POPCNT supported)
**
Penryn processors (SSE4.1 supported, except
Pentium Dual-Core
The Pentium Dual-Core brand was used for mainstream x86-architecture microprocessors from Intel from 2006 to 2009 when it was renamed to Pentium. The processors are based on either the 32-bit '' Yonah'' or (with quite different microarchitectu ...
and
Celeron
Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers.
Celeron processors are compatible with IA-32
IA-32 (short for "Intel Architecture, 32-bit", commonly called ...
)
**
Nehalem processors and
Westmere processors (SSE4.1, SSE4.2 and POPCNT supported, except
Pentium
Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium processor from which the brand took its name was first released on March 22, 1993. After that, the Pentium II and Pe ...
and
Celeron
Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers.
Celeron processors are compatible with IA-32
IA-32 (short for "Intel Architecture, 32-bit", commonly called ...
)
**
Sandy Bridge
Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation of the Intel Core processors (Core i7, i5, i3). The Sandy Bridge microarchitecture is the successor to Nehalem and Westmere microarchitecture ...
processors and newer (SSE4.1, SSE4.2 and POPCNT supported, include
Pentium
Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium processor from which the brand took its name was first released on March 22, 1993. After that, the Pentium II and Pe ...
and
Celeron
Celeron is Intel's brand name for low-end IA-32 and x86-64 computer microprocessor models targeted at low-cost personal computers.
Celeron processors are compatible with IA-32
IA-32 (short for "Intel Architecture, 32-bit", commonly called ...
)
**
Haswell processors and newer (SSE4.1, SSE4.2, POPCNT and LZCNT supported)
*
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...
**
K10-based processors (SSE4a, POPCNT and LZCNT supported)
** "Cat" low-power processors
***
Bobcat-based processors (SSE4a, POPCNT and LZCNT supported)
***
Jaguar-based processors and newer (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
***
Puma-based processors and newer (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
** "Heavy Equipment" processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
***
Bulldozer-based processors
***
Piledriver-based processors
***
Steamroller-based processors
***
Excavator-based processors and newer
**
Zen-based processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
**
Zen+-based processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
**
Zen2-based processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
**
Zen3-based processors (SSE4a, SSE4.1, SSE4.2, POPCNT and LZCNT supported)
*
VIA
Via or VIA may refer to the following:
Science and technology
* MOS Technology 6522, Versatile Interface Adapter
* ''Via'' (moth), a genus of moths in the family Noctuidae
* Via (electronics), a through-connection
* VIA Technologies, a Taiwan ...
**
Nano 3000, X2, QuadCore processors (SSE4.1 supported)
**
Nano QuadCore C4000-series processors (SSE4.1, SSE4.2 supported)
**
Eden X4 processors (SSE4.1, SSE4.2 supported)
*
Zhaoxin
Zhaoxin (Shanghai Zhaoxin Semiconductor Co., Ltd.; , ) is a fabless semiconductor company, created in 2013 as a joint venture between VIA Technologies and the Shanghai Municipal Government. The company manufactures x86-compatible desktop and lap ...
** ZX-C processors and newer (SSE4.1, SSE4.2 supported)
References
External links
SSE4 Programming Referenceby Intel
archived a
Ghostarchive.orgat May 10, 2022
{{Multimedia extensions
X86 instructions
SIMD computing