HOME

TheInfoList




Single instruction, multiple data (SIMD) is a type of parallel processing in
Flynn's taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966. and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
. SIMD can be internal (part of the hardware design) and it can be directly accessible through an
instruction set architecture In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , ...
(ISA): it should not be confused ''with'' an ISA. SIMD describes computers with
multiple processing elements Parallel computing is a type of computing, computation where many calculations or the execution of Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at t ...
that perform the same operation on multiple data points simultaneously. Such machines exploit data level parallelism, but not concurrency: there are simultaneous (parallel) computations, but each unit performs the exact same instruction at any given moment (just with different data). SIMD is particularly applicable to common tasks such as adjusting the contrast in a
digital image A digital image is an image An image (from la, imago) is an artifact that depicts visual perception Visual perception is the ability to interpret the surrounding environment (biophysical), environment through photopic vision (day ...
or adjusting the volume of
digital audio Digital audio is a representation of sound recorded in, or converted into, Digital signal (signal processing), digital form. In digital audio, the sound wave of the audio signal is typically encoded as numerical Sampling (signal processing), s ...
. Most modern
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuit File:PExdcr01CJC.jpg, 200px, A circuit built on a printed circuit board (PCB). An electronic circuit is composed of ...

CPU
designs include SIMD instructions to improve the performance of
multimedia Multimedia is a form of communication that combines different such as , , , , or into a single interactive presentation, in contrast to traditional mass media which featured little to no interaction fr ...

multimedia
use. SIMD has three different subcategories in Flynn's 1972 Taxonomy, one of which is SIMT. SIMT should not be confused with software threads or hardware threads, both of which are task time-sharing (time-slicing). SIMT is true simultaneous parallel hardware-level execution.


History

The first use of SIMD instructions was in the
ILLIAC IV The ILLIAC IV was the first massively parallel computer Massively parallel is the term for using a large number of computer processor (computing), processors (or separate computers) to simultaneously perform a set of coordinated computations par ...
, which was completed in 1966. SIMD was the basis for
vector supercomputers Vector may refer to: Biology *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism; a disease vector *Vector (molecular biology), a DNA molecule used as a vehicle to artificially carr ...
of the early 1970s such as the
CDC Star-100 The CDC STAR-100 is a vector supercomputer by the largest supercomputer over time A supercomputer is a computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. Mo ...
and the Texas Instruments ASC, which could operate on a "vector" of data with a single instruction. Vector processing was especially popularized by
Cray Cray Inc., a subsidiary of Hewlett Packard Enterprise The Hewlett Packard Enterprise Company (HPE) is an American multinational enterprise information technology company based in Houston, Texas Texas (, ) is a state in the Sou ...

Cray
in the 1970s and 1980s. Vector processing architectures are now considered separate from SIMD computers: Duncan's Taxonomy includes them where
Flynn's Taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966. and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
does not, due to Flynn's work (1966, 1972) pre-dating the
Cray-1 The Cray-1 was a supercomputer upright=1.5, Computing power of the top 1 supercomputer each year, measured in FLOPS A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance ...

Cray-1
(1977). The first era of modern SIMD computers was characterized by
massively parallel processing Massively parallel is the term for using a large number of computer processor (computing), processors (or separate computers) to simultaneously perform a set of coordinated computations parallel computing, in parallel. One approach is grid comput ...
-style
supercomputer upright=1.5, Computing power of the top 1 supercomputer each year, measured in FLOPS A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly mea ...

supercomputer
s such as the
Thinking Machines Thinking Machines Corporation was a supercomputer by the largest supercomputer over time A supercomputer is a computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatic ...
CM-1 and CM-2. These computers had many limited-functionality processors that would work in parallel. For example, each of 65,536 single-bit processors in a Thinking Machines CM-2 would execute the same instruction at the same time, allowing, for instance, to logically combine 65,536 pairs of bits at a time, using a hypercube-connected network or processor-dedicated RAM to find its operands. Supercomputing moved away from the SIMD approach when inexpensive scalar
MIMD In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softw ...

MIMD
approaches based on commodity processors such as the Intel i860 XP became more powerful, and interest in SIMD waned. The current era of SIMD processors grew out of the desktop-computer market rather than the supercomputer market. As desktop processors became powerful enough to support real-time gaming and audio/video processing during the 1990s, demand grew for this particular type of computing power, and microprocessor vendors turned to SIMD to meet the demand. Hewlett-Packard introduced
MAX Max or MAX may refer to: Animals * Max (dog) Max (9 August 1983 – 18 May 2013) was a beagle, dachshund and terrier mix whose owner claimed that he had lived 29 years and 282 days. His owner, Janelle Derouen, adopted him from a Louisiana sug ...
instructions into
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as H ...
1.1 desktops in 1994 to accelerate MPEG decoding. Sun Microsystems introduced SIMD integer instructions in its " VIS" instruction set extensions in 1995, in its UltraSPARC I microprocessor. MIPS followed suit with their similar
MDMX The MDMX (MIPS Digital Media eXtension), also known as MaDMaX, is an extension to the MIPS architecture MIPS (Microprocessor without Interlocked Pipelined Stages) is a reduced instruction set computer (RISC) instruction set architecture (ISA)Pr ...
system. The first widely deployed desktop SIMD was with Intel's
MMX MMX may refer to: * 2010 2010 was designated as: * * * *International Year for the Rapprochement of Cultures Pronunciation There is a debate among experts and the general public on how to pronounce specific years of the 21st century in En ...
extensions to the
x86 x86 is a family of instruction set architecture In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for th ...

x86
architecture in 1996. This sparked the introduction of the much more powerful
AltiVec AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple Inc., Apple, IBM, and Freescale Semiconductor (formerly Motorola's Semiconductor Products Sector) — the AIM alliance. It is implemente ...
system in the
Motorola Motorola, Inc. () was an American multinational Multinational may refer to: * Multinational corporation, a corporate organization operating in multiple countries * Multinational force, a military body from multiple countries * Multinational sta ...

Motorola
PowerPC PowerPC (with the backronym A backronym, or bacronym, is an acronym formed from a word that existed prior to the invention of the backronym. Unlike a typical acronym, in which a new word is constructed from a phrase, the phrase corresponding t ...

PowerPC
and IBM's
POWER Power most often refers to: * Power (physics) In physics, power is the amount of energy In , energy is the that must be to a or to perform on the body, or to it. Energy is a ; the law of states that energy can be in form, bu ...
systems. Intel responded in 1999 by introducing the all-new
SSE SSE may refer to: Computing *Senior software engineer *Server-sent events, a technology to push content to web clients *Simple Sharing Extensions, a specification that extends RSS from unidirectional to bidirectional information flows *Sizzle (sele ...
system. Since then, there have been several extensions to the SIMD instruction sets for both architectures. Advanced vector extensions AVX,
AVX2 Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 x86 is a family of instruction set architecture In computer science, an instruction set architecture (ISA), also called computer arch ...
and
AVX-512 AVX-512 are 512-bit There are currently no mainstream general-purpose CPU, processors built to operate on 512-bit integers or addresses, though a number of processors do operate on 512-bit data. , the Xeon Phi, Intel Xeon Phi has a vector processi ...
are developed by Intel. AMD supports AVX and
AVX2 Advanced Vector Extensions (AVX, also known as Sandy Bridge New Extensions) are extensions to the x86 x86 is a family of instruction set architecture In computer science, an instruction set architecture (ISA), also called computer arch ...
in their current products. All of these developments have been oriented toward support for real-time graphics, and are therefore oriented toward processing in two, three, or four dimensions, usually with vector lengths of between two and sixteen words, depending on data type and architecture. When new SIMD architectures need to be distinguished from older ones, the newer architectures are then considered "short-vector" architectures, as earlier SIMD and vector supercomputers had vector lengths from 64 to 64,000. A modern supercomputer is almost always a cluster of MIMD computers, each of which implements (short-vector) SIMD instructions.


Advantages

An application that may take advantage of SIMD is one where the same value is being added to (or subtracted from) a large number of data points, a common operation in many
multimedia Multimedia is a form of communication that combines different such as , , , , or into a single interactive presentation, in contrast to traditional mass media which featured little to no interaction fr ...

multimedia
applications. One example would be changing the brightness of an image. Each
pixel In digital imaging Digital imaging or digital image acquisition is the creation of a representation of the visual characteristics of an object, such as a physical scene or the interior structure of an object. The term is often assumed to imp ...

pixel
of an image consists of three values for the brightness of the red (R), green (G) and blue (B) portions of the color. To change the brightness, the R, G and B values are read from memory, a value is added to (or subtracted from) them, and the resulting values are written back out to memory. Audio DSPs would likewise, for volume control, multiply both Left and Right channels simultaneously. With a SIMD processor there are two improvements to this process. For one the data is understood to be in blocks, and a number of values can be loaded all at once. Instead of a series of instructions saying "retrieve this pixel, now retrieve the next pixel", a SIMD processor will have a single instruction that effectively says "retrieve n pixels" (where n is a number that varies from design to design). For a variety of reasons, this can take much less time than retrieving each pixel individually, as with a traditional CPU design. Another advantage is that the instruction operates on all loaded data in a single operation. In other words, if the SIMD system works by loading up eight data points at once, the add operation being applied to the data will happen to all eight values at the same time. This parallelism is separate from the parallelism provided by a
superscalar processor A superscalar processor is a CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuit File:PExdcr01CJC.jpg, 200px, A circuit built on a printed circuit board (PCB) ...
; the eight values are processed in parallel even on a non-superscalar processor, and a superscalar processor may be able to perform multiple SIMD operations in parallel.


Disadvantages

Outside of specialist areas and uses, for which SIMD has great savings, SIMD's disadvantages when used for general-purpose computing cannot be overemphasised. * Not all algorithms can be vectorized easily. For example, a flow-control-heavy task like code
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other obj ...

parsing
may not easily benefit from SIMD; however, it is theoretically possible to vectorize comparisons and ''"batch flow"'' to target maximal cache optimality, though this technique will require more intermediate state. Note: Batch-pipeline systems (example: GPUs or software rasterization pipelines) are most advantageous for cache control when implemented with SIMD intrinsics, but they are not exclusive to SIMD features. Further complexity may be apparent to avoid dependence within series such as code strings; while independence is required for vectorization. * Large register files which increases power consumption and required chip area. * Currently, implementing an algorithm with SIMD instructions usually requires human labor; most compilers don't generate SIMD instructions from a typical C program, for instance.
Automatic vectorization Automatic vectorization, in parallel computing Parallel computing is a type of computation Computation is any type of that includes both al and non-arithmetical steps and which follows a well-defined model (e.g. an ). Mechanical or elect ...
in compilers is an active area of computer science research. (Compare
vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its Instruction (computer science), instructions are designed to operate efficiently and effectively on large Array da ...
.) * Programming with particular SIMD instruction sets can involve numerous low-level challenges. *# SIMD may have restrictions on
data alignment Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing. The Central processing unit, CPU in modern computer har ...
; programmers familiar with one particular architecture may not expect this. Worse: the alignment may change from one revision or "compatible" processor to another. *# Gathering data into SIMD registers and scattering it to the correct destination locations is tricky (sometimes requiring permute operations) and can be inefficient. *# Specific instructions like rotations or three-operand addition are not available in some SIMD instruction sets. *# Instruction sets are architecture-specific: some processors lack SIMD instructions entirely, so programmers must provide non-vectorized implementations (or different vectorized implementations) for them. *# Different architectures provide different register sizes (e.g. 64, 128, 256 and 512 bits) and instruction sets, meaning that programmers must provide multiple implementations of vectorized code to operate optimally on any given CPU. In addition, the possible set of SIMD instructions grows with each new register size. Unfortunately, for legacy support reasons, the older versions cannot be retired. *# The early
MMX MMX may refer to: * 2010 2010 was designated as: * * * *International Year for the Rapprochement of Cultures Pronunciation There is a debate among experts and the general public on how to pronounce specific years of the 21st century in En ...
instruction set shared a register file with the floating-point stack, which caused inefficiencies when mixing floating-point and MMX code. However,
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be d ...
corrects this. To remedy problems 1 and 5,
RISC-V RISC-V (pronounced "risk-five") is an open standard An open standard is a standard Standard may refer to: Flags * Colours, standards and guidons * Standard (flag), a type of flag used for personal identification Norm, convention or require ...
's vector extension uses an alternative approach: instead of exposing the sub-register-level details to the programmer, the instruction set abstracts them out as a few "vector registers" that use the same interfaces across all CPUs with this instruction set. The hardware handles all alignment issues and "strip-mining" of loops. Machines with different vector sizes would be able to run the same code. LLVM calls this vector type "". The disadvantages of SIMD compared to
Vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its Instruction (computer science), instructions are designed to operate efficiently and effectively on large Array da ...
and even scalar processing for general-purpose use cannot be overstated. An order of magnitude increase in code size is not uncommon, when compared to equivalent scalar or equivalent vector code, and an order of magnitude ''or greater'' effectiveness (work done per instruction) is achievable with Vector ISAs ARM's
Scalable Vector Extension AArch64 or ARM64 is the 64-bit extension of the ARM architecture ARM (stylised in lowercase as arm, previously an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computing (RISC ...
takes another approach, known in
Flynn's Taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966. and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
as "Associative Processing", more commonly known today as "Predicated" (masked) SIMD. This approach is not as compact as
Vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its Instruction (computer science), instructions are designed to operate efficiently and effectively on large Array da ...
but is still far better than non-predicated SIMD. Detailed comparative examples are given in the
Vector processing In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its Instruction (computer science), instructions are designed to operate efficiently and effectively on large Array da ...
page.


Chronology


Hardware

Small-scale (64 or 128 bits) SIMD became popular on general-purpose CPUs in the early 1990s and continued through 1997 and later with Motion Video Instructions (MVI) for
Alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', modern pronunciation ''álfa'') is the first letter Letter, letters, or literature may refer to: Characters typeface * Letter (alphabet) A letter is a segmental symbol A s ...
. SIMD instructions can be found, to one degree or another, on most CPUs, including
IBM International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, with operations in over 170 countries. The company began in 1911, founded in Endicott, New York, as the C ...

IBM
's
AltiVec AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple Inc., Apple, IBM, and Freescale Semiconductor (formerly Motorola's Semiconductor Products Sector) — the AIM alliance. It is implemente ...
and SPE for
PowerPC PowerPC (with the backronym A backronym, or bacronym, is an acronym formed from a word that existed prior to the invention of the backronym. Unlike a typical acronym, in which a new word is constructed from a phrase, the phrase corresponding t ...

PowerPC
,
HP
HP
's
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as H ...
Multimedia Acceleration eXtensions The Multimedia Acceleration eXtensions or MAX are instruction set extensions to the Hewlett-Packard The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company ...
(MAX),
Intel Intel Corporation is an American multinational corporation A multinational company (MNC) is a corporate A corporation is an organization—usually a group of people or a company A company, abbreviated as co., is a Legal personalit ...
's MMX and iwMMXt,
SSE SSE may refer to: Computing *Senior software engineer *Server-sent events, a technology to push content to web clients *Simple Sharing Extensions, a specification that extends RSS from unidirectional to bidirectional information flows *Sizzle (sele ...
,
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be d ...
,
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel Intel Corporation is an American multinational corporation A multinational company (MNC) is a corporate A corporation is an organization—usually a group of people or a com ...
SSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it ca ...
and SSE4.x,
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational Multinational may refer to: * Multinational corporation, a corporate organization operating in multiple countries * Multinational force, a military body from multiple countries * ...
's
3DNow! 3DNow! is an extension to the x86 x86 is a family of instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes i ...
,
ARC Arc may refer to: Mathematics * Arc (geometry), a segment of a differentiable curve ** Circular arc, a segment of a circle * Arc length, the distance between two points along a section of a curve * Arc (projective geometry), a particular type o ...
's ARC Video subsystem,
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computing (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed i ...

SPARC
's VIS and VIS2,
Sun The Sun is the star A star is an astronomical object consisting of a luminous spheroid of plasma (physics), plasma held together by its own gravity. The List of nearest stars and brown dwarfs, nearest star to Earth is the Sun. Many othe ...
's
MAJC MAJC (Microprocessor Architecture for Java Computing) was a Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computer A computer is a machine that can be programmed to Execution (comput ...
,
ARM In human anatomy The human body is the structure of a human being Humans (''Homo sapiens'') are the most abundant and widespread species In biology, a species is the basic unit of biological classification, classification and ...
's
Neon Neon is a chemical element In chemistry Chemistry is the study of the properties and behavior of . It is a that covers the that make up matter to the composed of s, s and s: their composition, structure, properties, behavior ...
technology, MIPS'
MDMX The MDMX (MIPS Digital Media eXtension), also known as MaDMaX, is an extension to the MIPS architecture MIPS (Microprocessor without Interlocked Pipelined Stages) is a reduced instruction set computer (RISC) instruction set architecture (ISA)Pr ...
(MaDMaX) and
MIPS-3D MIPS-3D is an extension to the MIPS architecture, MIPS V instruction set architecture (ISA) that added 13 new instructions for improving the performance of 3D graphics applications. The instructions improved performance by reducing the number of ins ...
. The IBM, Sony, Toshiba co-developed
Cell Processor Cell is a multi-core A multi-core processor is a computer processor on a single integrated circuit An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic ci ...
's SPU's instruction set is heavily SIMD based.
Philips Koninklijke Philips N.V. (in Dutch Dutch commonly refers to: * Something of, from, or related to the Netherlands * Dutch people () * Dutch language () *Dutch language , spoken in Belgium (also referred as ''flemish'') Dutch may also refer t ...

Philips
, now
NXP NXP Semiconductors N.V. is a Dutch multinational semiconductor manufacturer The semiconductor industry is the aggregate of companies engaged in the design A design is a plan or specification for the construction of an object or system or f ...
, developed several SIMD processors named Xetal. The Xetal has 320 16-bit processor elements especially designed for vision tasks. Modern
graphics processing unit A graphics processing unit (GPU) is a specialized electronic circuit 200px, A circuit built on a printed circuit board (PCB). An electronic circuit is composed of individual electronic component An electronic component is any basic discre ...
s (GPUs) are often wide SIMD implementations, capable of branches, loads, and stores on 128 or 256 bits at a time. Intel's latest
AVX-512 AVX-512 are 512-bit There are currently no mainstream general-purpose CPU, processors built to operate on 512-bit integers or addresses, though a number of processors do operate on 512-bit data. , the Xeon Phi, Intel Xeon Phi has a vector processi ...
SIMD instructions now process 512 bits of data at once.


Software

SIMD instructions are widely used to process 3D graphics, although modern
graphics card A graphics card (also called a video card, display card, graphics adapter, video adapter, or display adapter) is an expansion card Modern EEPROM chip suitable for storing expansion card configuration electronically In computing Compu ...

graphics card
s with embedded SIMD have largely taken over this task from the CPU. Some systems also include permute functions that re-pack elements inside vectors, making them particularly useful for data processing and compression. They are also used in cryptography. The trend of general-purpose computing on GPUs (
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit A graphics processing unit (GPU) is a specialized designed to rapidly manipulate and alter to accelerate the creatio ...
) may lead to wider use of SIMD in the future. Adoption of SIMD systems in
personal computer A personal computer (PC) is a multi-purpose computer whose size, capabilities, and price make it feasible for individual use. Personal computers are intended to be operated directly by an end user, rather than by a computer expert or technician ...
software was at first slow, due to a number of problems. One was that many of the early SIMD instruction sets tended to slow overall performance of the system due to the re-use of existing floating point registers. Other systems, like
MMX MMX may refer to: * 2010 2010 was designated as: * * * *International Year for the Rapprochement of Cultures Pronunciation There is a debate among experts and the general public on how to pronounce specific years of the 21st century in En ...
and
3DNow! 3DNow! is an extension to the x86 x86 is a family of instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes i ...
, offered support for data types that were not interesting to a wide audience and had expensive context switching instructions to switch between using the
FPU FPU may stand for: Universities * Florida Polytechnic University Florida Polytechnic University (Florida Poly) is a public university #REDIRECT Public university #REDIRECT Public university#REDIRECT Public university A public university or ...
and MMX registers. Compilers also often lacked support, requiring programmers to resort to
assembly language In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a particular task. Programming involves tasks such as analysis, gene ...
coding. SIMD on
x86 x86 is a family of instruction set architecture In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for th ...

x86
had a slow start. The introduction of
3DNow! 3DNow! is an extension to the x86 x86 is a family of instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes i ...
by
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational Multinational may refer to: * Multinational corporation, a corporate organization operating in multiple countries * Multinational force, a military body from multiple countries * ...
and
SSE SSE may refer to: Computing *Senior software engineer *Server-sent events, a technology to push content to web clients *Simple Sharing Extensions, a specification that extends RSS from unidirectional to bidirectional information flows *Sizzle (sele ...
by
Intel Intel Corporation is an American multinational corporation A multinational company (MNC) is a corporate A corporation is an organization—usually a group of people or a company A company, abbreviated as co., is a Legal personalit ...
confused matters somewhat, but today the system seems to have settled down (after AMD adopted SSE) and newer compilers should result in more SIMD-enabled software. Intel and AMD now both provide optimized math libraries that use SIMD instructions, and open source alternatives like libSIMD, SIMDx86 and SLEEF have started to appear (see also
libm C mathematical operations are a group of functions in the C standard library, standard library of the C programming language implementing basic mathematical functions. All functions use floating-point numbers in one manner or another. Different C ...
).
Apple Computer Apple Inc. is an American multinational Multinational may refer to: * Multinational corporation, a corporate organization operating in multiple countries * Multinational force, a military body from multiple countries * Multinational state, ...
had somewhat more success, even though they entered the SIMD market later than the rest.
AltiVec AltiVec is a single-precision floating point and integer SIMD instruction set designed and owned by Apple Inc., Apple, IBM, and Freescale Semiconductor (formerly Motorola's Semiconductor Products Sector) — the AIM alliance. It is implemente ...
offered a rich system and can be programmed using increasingly sophisticated compilers from
Motorola Motorola, Inc. () was an American multinational Multinational may refer to: * Multinational corporation, a corporate organization operating in multiple countries * Multinational force, a military body from multiple countries * Multinational sta ...

Motorola
,
IBM International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, with operations in over 170 countries. The company began in 1911, founded in Endicott, New York, as the C ...

IBM
and
GNU GNU () is an extensive collection of free software Free software (or libre software) is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any ...

GNU
, therefore assembly language programming is rarely needed. Additionally, many of the systems that would benefit from SIMD were supplied by Apple itself, for example
iTunes iTunes () is a media player, media library, Internet radio Internet radio (also web radio, net radio, streaming radio, e-radio, IP radio, online radio) is a digital audio Digital audio is a representation of sound recorded in, or ...

iTunes
and
QuickTime QuickTime is an extensible multimedia framework A multimedia framework is a software framework that handles Electronic media, media on a computer and through a network. A good multimedia framework offers an intuitive Application programming ...

QuickTime
. However, in 2006, Apple computers moved to Intel x86 processors. Apple's
API In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and so ...
s and development tools (
XCode Xcode is Apple's integrated development environment#REDIRECT Integrated development environment {{Redirect category shell, 1= {{R from other capitalisation ... (IDE) for macOS macOS (; previously Mac OS X and later OS  ...
) were modified to support
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be d ...
and
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel Intel Corporation is an American multinational corporation A multinational company (MNC) is a corporate A corporation is an organization—usually a group of people or a com ...
as well as AltiVec. Apple was the dominant purchaser of PowerPC chips from IBM and
Freescale Semiconductor Freescale Semiconductor, Inc. was an American semiconductor A semiconductor material has an value falling between that of a , such as metallic copper, and an , such as glass. Its falls as its temperature rises; metals behave in the oppos ...
and even though they abandoned the platform, further development of AltiVec is continued in several
PowerPC PowerPC (with the backronym A backronym, or bacronym, is an acronym formed from a word that existed prior to the invention of the backronym. Unlike a typical acronym, in which a new word is constructed from a phrase, the phrase corresponding t ...

PowerPC
and
Power ISA The Power ISA is an instruction set architecture In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their ...
designs from Freescale and IBM. ''SIMD within a register'', or
SWAR SIMD within a register (SWAR), also known by the name "Packed SIMD" is a technique for performing parallel operations on data contained in a processor register. SIMD stands for ''single instruction, multiple data''. Flynn's 1972 Taxonomy categorise ...
, is a range of techniques and tricks used for performing SIMD in general-purpose registers on hardware that doesn't provide any direct support for SIMD instructions. This can be used to exploit parallelism in certain algorithms even on hardware that does not support SIMD directly.


Programmer interface

It is common for publishers of the SIMD instruction sets to make their own C/C++ language extensions with
intrinsic function In computer software Software is a collection of instructions Instruction or instructions may refer to: Computing * Instruction, one operation of a processor within a computer architecture instruction set * Computer program, a collection of ...
s or special datatypes (with
operator overloading In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a particular task. Programming involves tasks such as analysis, gen ...
) guaranteeing the generation of vector code. Intel, AltiVec, and ARM NEON provide extensions widely adopted by the compilers targeting their CPUs. (More complex operations are the task of vector math libraries.) The
GNU C Compiler The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, Computer architecture, hardware architectures and operating systems. The Free Software Foundation (FSF) distribute ...
takes the extensions a step further by abstracting them into a universal interface that can be used on any platform by providing a way of defining SIMD datatypes. The
LLVM LLVM is a set of compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The ...
Clang compiler also implements the feature, with an analogous interface defined in the IR. Rust's packed_simd crate uses this interface, and so does
Swift The Society for Worldwide Interbank Financial Telecommunication (SWIFT), legally S.W.I.F.T. SCRL, is a Belgium, Belgian cooperative society that serves as an intermediary and executor of financial transactions between banks worldwide. It also ...
2.0+. C++ has an experimental interface that works similarly to the GCC extension. LLVM's libcxx seems to implement it. For GCC and libstdc++, a wrapper library that builds on top of the GCC extension is available.
Microsoft Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation which produces Software, computer software, consumer electronics, personal computers, and related services. Its best-know ...
added SIMD to
.NET The domain name A domain name is an identification string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, enterta ...
in RyuJIT. The package, available on NuGet, implement SIMD datatypes. Java also has a new proposed API for SIMD instructions available in
OpenJDK OpenJDK (Open Java Development Kit) is a free and open-source Free and open-source software (FOSS) is software that is both free software and open-source software where anyone is free software license, freely licensed to use, copy, study, and ...
17 in an incubator module. It also has a safe fallback mechanism on unsupported CPUs to simple loops. Instead of providing an SIMD datatype, compilers can also be hinted to auto-vectorize some loops, potentially taking some assertions about the lack of data dependency. This is not as flexible as manipulating SIMD variables directly, but is easier to use.
OpenMP OpenMP (Open Multi-Processing) is an application programming interface In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algori ...
4.0+ has a hint.
Cilk Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming language A programming language is a formal language In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science ...
has a similar feature . GCC and Clang also have their own private pragmas for making loops vectorizable, but all three have been made obsolete by OpenMP.


SIMD multi-versioning

Consumer software is typically expected to work on a range of CPUs covering multiple generations, which could limit the programmer's ability to use new SIMD instructions to improve the computational performance of a program. The solution is to include multiple versions of the same code that uses either older or newer SIMD technologies, and pick one that best fits the user's CPU at run-time (dynamic dispatch). There are two main camps of solutions: * Function multi-versioning: a subroutine in the program or a library is duplicated and compiled for many instruction set extensions, and the program decides which one to use at run-time. * Library multi-versioning: the entire Library (computing), programming library is duplicated for many instruction set extensions, and the operating system or the program decides which one to load at run-time. The former solution is supported by the Intel C++ Compiler, GNU Compiler Collection since GCC 6, and Clang since clang 7. However, since GCC and clang requires explicit labels to "clone" functions, an easier way to do so is to compile multiple versions of the library and let the system glibc choose one, an approach adopted by the Intel-backed Clear Linux project. The Rust programming language also supports multi-versioning. Cloning can be done by calling the original function, so that inlining takes over.


SIMD on the web

In 2013 John McCutchan announced that he had created a high-performance interface to SIMD instruction sets for the Dart (programming language), Dart programming language, bringing the benefits of SIMD to web programs for the first time. The interface consists of two types: * Float32x4, 4 single precision floating point values. * Int32x4, 4 32-bit integer values. Instances of these types are immutable and in optimized code are mapped directly to SIMD registers. Operations expressed in Dart typically are compiled into a single instruction without any overhead. This is similar to C and C++ intrinsics. Benchmarks for 4×4 matrix, 4×4 matrix multiplication, 3D vertex transformation, and Mandelbrot set visualization show near 400% speedup compared to scalar code written in Dart. McCutchan's work on Dart, now called SIMD.js, has been adopted by ECMAScript and Intel announced at IDF 2013 that they are implementing McCutchan's specification for both V8 (JavaScript engine), V8 and SpiderMonkey (JavaScript engine), SpiderMonkey. However, by 2017, SIMD.js has been taken out of the ECMAScript standard queue in favor of pursuing a similar interface in WebAssembly. As of August 2020, the WebAssembly interface remains unfinished, but its portable 128-bit SIMD feature has already seen some use in many engines. Emscripten, Mozilla's C/C++-to-JavaScript compiler, with extensions can enable compilation of C++ programs that make use of SIMD intrinsics or GCC-style vector code to the SIMD API of JavaScript, resulting in equivalent speedups compared to scalar code. It also supports the WebAssembly 128-bit SIMD proposal.


Commercial applications

Though it has generally proven difficult to find sustainable commercial applications for SIMD-only processors, one that has had some measure of success is the Geometric-Arithmetic Parallel Processor, GAPP, which was developed by Lockheed Martin and taken to the commercial sector by their spin-off Teranex. The GAPP's recent incarnations have become a powerful tool in real-time digital image processing, video processing applications like conversion between various video standards and frame rates (NTSC to/from PAL, NTSC to/from High-definition television, HDTV formats, etc.), deinterlacing, Noise reduction, image noise reduction, adaptive video compression, and image enhancement. A more ubiquitous application for SIMD is found in video games: nearly every modern video game console since History of video game consoles (sixth generation), 1998 has incorporated a SIMD processor somewhere in its architecture. The PlayStation 2 was unusual in that one of its vector-float units could function as an autonomous Digital signal processor, DSP executing its own instruction stream, or as a coprocessor driven by ordinary CPU instructions. 3D graphics applications tend to lend themselves well to SIMD processing as they rely heavily on operations with 4-dimensional vectors. Microsoft's DirectX, Direct3D 9.0 now chooses at runtime processor-specific implementations of its own math operations, including the use of SIMD-capable instructions. One of the recent processors to use vector processing is the
Cell Processor Cell is a multi-core A multi-core processor is a computer processor on a single integrated circuit An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic ci ...
developed by
IBM International Business Machines Corporation (IBM) is an American multinational technology company headquartered in Armonk, New York, with operations in over 170 countries. The company began in 1911, founded in Endicott, New York, as the C ...

IBM
in cooperation with Toshiba and Sony. It uses a number of SIMD processors (a Non-Uniform Memory Access, NUMA architecture, each with independent cache memory, local store and controlled by a general purpose CPU) and is geared towards the huge datasets required by 3D and video processing applications. It differs from traditional ISAs by being SIMD from the ground up with no separate scalar registers. Ziilabs produced an SIMD type processor for use on mobile devices, such as media players and mobile phones. Larger scale commercial SIMD processors are available from ClearSpeed Technology, Ltd. and Stream Processors, Inc. ClearSpeed's CSX600 (2004) has 96 cores each with two double-precision floating point units while the CSX700 (2008) has 192. Stream Processors is headed by computer architect Bill Dally. Their Storm-1 processor (2007) contains 80 SIMD cores controlled by a MIPS CPU.


See also

* Streaming SIMD Extensions,
MMX MMX may refer to: * 2010 2010 was designated as: * * * *International Year for the Rapprochement of Cultures Pronunciation There is a debate among experts and the general public on how to pronounce specific years of the 21st century in En ...
,
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be d ...
,
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel Intel Corporation is an American multinational corporation A multinational company (MNC) is a corporate A corporation is an organization—usually a group of people or a com ...
, Advanced Vector Extensions,
AVX-512 AVX-512 are 512-bit There are currently no mainstream general-purpose CPU, processors built to operate on 512-bit integers or addresses, though a number of processors do operate on 512-bit data. , the Xeon Phi, Intel Xeon Phi has a vector processi ...
* Instruction set architecture *
Flynn's taxonomy Flynn's taxonomy is a classification of computer architectures, proposed by Michael J. Flynn in 1966. and extended in 1972. The classification system has stuck, and it has been used as a tool in design of modern processors and their functionalities ...
* SWAR, SIMD within a register (SWAR) * SPMD, Single Program, Multiple Data (SPMD) * OpenCL


References


External links


SIMD architectures (2000)

Cracking Open The Pentium 3 (1999)

Short Vector Extensions in Commercial Microprocessor

Article about Optimizing the Rendering Pipeline of Animated Models Using the Intel Streaming SIMD Extensions

"Yeppp!": cross-platform, open-source SIMD library from Georgia Tech

Introduction to Parallel Computing from LLNL Lawrence Livermore National Laboratory
* : A portable implementation of platform-specific intrinsics for other platforms (e.g. SSE intrinsics for ARM NEON), using C/C++ headers {{DEFAULTSORT:Simd Classes of computers Digital signal processing Flynn's taxonomy Parallel computing SIMD computing, de:Flynnsche Klassifikation#SIMD (Single Instruction, Multiple Data)