Swar Sadhana Samiti
   HOME

TheInfoList



OR:

SIMD within a register (SWAR), also known by the name "packed SIMD" is a technique for performing parallel operations on data contained in a
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
. SIMD stands for ''single instruction, multiple data''. Flynn's 1972 taxonomy categorises SWAR as "pipelined processing". Many modern general-purpose computer processors have some provisions for SIMD, in the form of a group of registers and instructions to make use of them. SWAR refers to the use of those registers and instructions, as opposed to using specialized processing engines designed to be better at SIMD operations. It also refers to the use of SIMD with general-purpose registers and instructions that were not meant to do it at the time, by way of various novel software tricks.


SWAR architectures

A SWAR architecture is one that includes instructions explicitly intended to perform parallel operations across data that is stored in the independent subwords or fields of a register. A SWAR-capable architecture is one that includes a set of instructions that is sufficient to allow data stored in these fields to be treated independently even though the architecture does not include instructions that are explicitly intended for that purpose. An early example of a SWAR architecture was the Intel Pentium with MMX, which implemented the MMX extension set. The
Intel Pentium Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium processor from which the brand took its name was first released on March 22, 1993. After that, the Pentium II and Pe ...
, by contrast, did not include such instructions, but could still act as a SWAR architecture through careful hand-coding or compiler techniques. Early SWAR architectures include
DEC Alpha Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers ...
, Hewlett-Packard's PA-RISC
MAX Max or MAX may refer to: Animals * Max (dog) (1983–2013), at one time purported to be the world's oldest living dog * Max (English Springer Spaniel), the first pet dog to win the PDSA Order of Merit (animal equivalent of OBE) * Max (gorilla) ...
, Silicon Graphics Incorporated's MIPS
MDMX The MDMX (MIPS Digital Media eXtension), also known as MaDMaX, is an extension to the MIPS architecture released in October 1996 at the Microprocessor Forum. History MDMX was developed to accelerate multimedia applications that were becoming m ...
, and Sun's
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...
V9 VIS. Like MMX, many of the SWAR instruction sets are intended for faster video coding.


History of the SWAR programming model

Wesley A. Clark Wesley Allison Clark (April 10, 1927 – February 22, 2016) was an American physicist who is credited for designing the first modern personal computer. He was also a computer designer and the main participant, along with Charles Molnar, in the ...
introduced partitioned subword data operations in the 1950s. This can be seen as a very early predecessor to SWAR. Leslie Lamport presented SWAR techniques in his paper titled "Multiple byte processing with full-word instructions" in 1975. With the introduction of Intel's MMX multimedia instruction set extensions in 1996, desktop processors with SIMD parallel processing capabilities became common. Early on, these instructions could only be used via hand-written assembly code. In the fall of 1996, Professor Hank Dietz was the instructor for the undergraduate Compiler Construction course at Purdue University's School of Electrical and Computer Engineering. For this course, he assigned a series of projects in which the students would build a simple compiler targeting MMX. The input language was a subset dialect of
MasPar MasPar Computer Corporation was a minisupercomputer vendor that was founded in 1987 by Jeff Kalb. The company was based in Sunnyvale, California. History While Kalb was the vice-president of the division of Digital Equipment Corporation (DEC) t ...
's MPL called NEMPL (Not Exactly MPL). During the course of the semester, it became clear to the course teaching assistant, Randall (Randy) Fisher, that there were a number of issues with MMX that would make it difficult to build the back-end of the NEMPL compiler. For example, MMX has an instruction for multiplying 16-bit data but not multiplying 8-bit data. The NEMPL language did not account for this problem, allowing the programmer to write programs that required 8-bit multiplies. Intel's x86 architecture was not the only architecture to include SIMD-like parallel instructions. Sun's VIS, SGI's
MDMX The MDMX (MIPS Digital Media eXtension), also known as MaDMaX, is an extension to the MIPS architecture released in October 1996 at the Microprocessor Forum. History MDMX was developed to accelerate multimedia applications that were becoming m ...
, and other multimedia instruction sets had been added to other manufacturers' existing instruction set architectures to support so-called ''new media'' applications. These extensions had significant differences in the precision of data and types of instructions supported. Dietz and Fisher began developing the idea of a well-defined parallel programming model that would allow the programming to target the model without knowing the specifics of the target architecture. This model would become the basis of Fisher's dissertation. The acronym "SWAR" was coined by Dietz and Fisher one day in Hank's office in the MSEE building at Purdue University. It refers to this form of parallel processing, architectures that are designed to natively perform this type of processing, and the general-purpose programming model that is Fisher's dissertation. The problem of compiling for these widely varying architectures was discussed in a paper presented at LCPC98.


Some applications of SWAR

SWAR processing has been used in image processing, cryptographic pairings, raster processing, computational fluid dynamics, and communications.{{cite thesis , type=Ph.D. , first=Lawrence A. , last=Spracklen , title=SWAR Systems and Communications Applications , publisher=University of Aberdeen , year=2001 , url=http://www.spracklen.info/publications/thesis.pdf


See also

*SIMD engines:
vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
,
array processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ' ...
,
digital signal processor A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio si ...
, stream processor. *SWAR on
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
processors: MMX,
3DNow! 3DNow! is a deprecated extension to the x86 instruction set developed by Advanced Micro Devices (AMD). It adds single instruction multiple data (SIMD) instructions to the base x86 instruction set, enabling it to perform vector processing of float ...
, SSE,
SSE2 SSE2 (Streaming SIMD Extensions 2) is one of the Intel SIMD (Single Instruction, Multiple Data) processor supplementary instruction sets first introduced by Intel with the initial version of the Pentium 4 in 2000. It extends the earlier Streamin ...
, SSE3


References


External links


The Aggregate - SWAR: SIMD Within A Register
Parallel computing SIMD computing