HOME

TheInfoList



OR:

SIMD within a register (SWAR), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
.
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
stands for ''single instruction, multiple data''. Flynn's 1972 taxonomy categorises SWAR as "pipelined processing". Many modern general-purpose computer processors have some provisions for
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
, in the form of a group of registers and instructions to make use of them. SWAR refers to the use of those registers and instructions, as opposed to using specialized processing engines designed to be better at SIMD operations. It also refers to the use of SIMD with general-purpose registers and instructions that were not meant to do it at the time, by way of various novel software tricks.


SWAR architectures

A SWAR architecture is one that includes instructions explicitly intended to perform parallel operations across data that is stored in the independent subwords or fields of a register. A SWAR-capable architecture is one that includes a set of instructions that is sufficient to allow data stored in these fields to be treated independently even though the architecture does not include instructions that are explicitly intended for that purpose. An early example of a SWAR architecture was the Intel Pentium with MMX, which implemented the MMX extension set. The Intel Pentium, by contrast, did not include such instructions, but could still act as a SWAR architecture through careful hand-coding or compiler techniques. Early SWAR architectures include
DEC Alpha Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set compute ...
, Hewlett-Packard's
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
MAX, Silicon Graphics Incorporated's MIPS
MDMX The MDMX (MIPS Digital Media eXtension), also known as MaDMaX, is an extension to the MIPS architecture released in October 1996 at the Microprocessor Forum. History MDMX was developed to accelerate multimedia applications that were becoming m ...
, and Sun's
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
V9 VIS. Like MMX, many of the SWAR instruction sets are intended for faster video coding.


History of the SWAR programming model

Wesley A. Clark Wesley Allison Clark (April 10, 1927 – February 22, 2016) was an American physicist who is credited for designing the first modern personal computer. He was also a computer designer and the main participant, along with Charles Molnar, in the ...
introduced partitioned subword data operations in the 1950s. This can be seen as a very early predecessor to SWAR.
Leslie Lamport Leslie B. Lamport (born February 7, 1941 in Brooklyn) is an American computer scientist and mathematician. Lamport is best known for his seminal work in distributed systems, and as the initial developer of the document preparation system LaTeX an ...
presented SWAR techniques in his paper titled "Multiple byte processing with full-word instructions" in 1975. With the introduction of Intel's MMX multimedia instruction set extensions in 1996, desktop processors with SIMD parallel processing capabilities became common. Early on, these instructions could only be used via hand-written assembly code. In the fall of 1996, Professor Hank Dietz was the instructor for the undergraduate Compiler Construction course at Purdue University's School of Electrical and Computer Engineering. For this course, he assigned a series of projects in which the students would build a simple compiler targeting MMX. The input language was a subset dialect of MasPar's MPL called NEMPL (Not Exactly MPL). During the course of the semester, it became clear to the course teaching assistant, Randall (Randy) Fisher, that there were a number of issues with MMX that would make it difficult to build the back-end of the NEMPL compiler. For example, MMX has an instruction for multiplying 16-bit data but not multiplying 8-bit data. The NEMPL language did not account for this problem, allowing the programmer to write programs that required 8-bit multiplies. Intel's x86 architecture was not the only architecture to include SIMD-like parallel instructions. Sun's VIS, SGI's
MDMX The MDMX (MIPS Digital Media eXtension), also known as MaDMaX, is an extension to the MIPS architecture released in October 1996 at the Microprocessor Forum. History MDMX was developed to accelerate multimedia applications that were becoming m ...
, and other multimedia instruction sets had been added to other manufacturers' existing instruction set architectures to support so-called ''new media'' applications. These extensions had significant differences in the precision of data and types of instructions supported. Dietz and Fisher began developing the idea of a well-defined parallel programming model that would allow the programming to target the model without knowing the specifics of the target architecture. This model would become the basis of Fisher's dissertation. The acronym "SWAR" was coined by Dietz and Fisher one day in Hank's office in the MSEE building at Purdue University. It refers to this form of parallel processing, architectures that are designed to natively perform this type of processing, and the general-purpose programming model that is Fisher's dissertation. The problem of compiling for these widely varying architectures was discussed in a paper presented at LCPC98.


Some applications of SWAR

SWAR processing has been used in image processing, cryptographic pairings, raster processing, computational fluid dynamics, and communications.{{cite thesis , type=Ph.D. , first=Lawrence A. , last=Spracklen , title=SWAR Systems and Communications Applications , publisher=University of Aberdeen , year=2001 , url=http://www.spracklen.info/publications/thesis.pdf


See also

*SIMD engines:
vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data calle ...
,
array processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data call ...
,
digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio s ...
, stream processor. *SWAR on x86 processors:
MMX MMX may refer to: * 2010, in Roman numerals Science and technology * MMX (instruction set), a single-instruction, multiple-data instruction set designed by Intel * MMX Mineração, a Brazilian mining company * Martian Moons eXploration, a Japane ...
,
3DNow! 3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of fl ...
, SSE,
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier SSE i ...
,
SSE3 SSE3, Streaming SIMD Extensions 3, also known by its Intel code name Prescott New Instructions (PNI), is the third iteration of the SSE instruction set for the IA-32 (x86) architecture. Intel introduced SSE3 in early 2004 with the Prescott revis ...


References


External links


The Aggregate - SWAR: SIMD Within A Register
Parallel computing SIMD computing