HOME

TheInfoList



OR:

Gather/scatter is a type of
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include sparse
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...
operations, sorting algorithms,
fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in th ...
s, and some computational graph theory problems. It is the vector equivalent of register indirect addressing, with gather involving indexed reads, and scatter, indexed writes.
Vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
s (and some
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
units in
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
s) have hardware support for gather and scatter operations.


Definitions


Gather

A sparsely populated vector y holding N non-empty elements can be represented by two densely populated vectors of length N; x containing the non-empty elements of y, and idx giving the index in y where x's element is located. The gather of y into x, denoted x \leftarrow y, _x, assigns x(i)=y(idx(i)) with idx having already been calculated. Assuming no
pointer aliasing In computing, aliasing describes a situation in which a data location in memory can be accessed through different symbolic names in the program. Thus, modifying the data through one name implicitly modifies the values associated with all aliased ...
between x[], y[],idx[], a C (programming language), C implementation is for (i = 0; i < N; ++i) x[i] = y dx[i;


Scatter

The sparse scatter, denoted y, _x \leftarrow x is the reverse operation. It copies the values of x into the corresponding locations in the sparsely populated vector y, i.e. y(idx(i))=x(i). for (i = 0; i < N; ++i) y dx[i_=_x[i.html"_;"title=".html"_;"title="dx[i">dx[i_=_x[i">.html"_;"title="dx[i">dx[i_=_x[i


__Support_

x86-64.html" ;"title="">dx[i_=_x[i.html" ;"title=".html" ;"title="dx[i">dx[i = x[i">.html" ;"title="dx[i">dx[i = x[i


Support

x86-64">">dx[i_=_x[i.html" ;"title=".html" ;"title="dx[i">dx[i = x[i">.html" ;"title="dx[i">dx[i = x[i


Support

x86-64 CPUs which support the AVX2 instruction set can gather 32-bit and 64-bit elements with memory offsets from a base address. A second register determines whether the particular element is loaded, and faults occurring from invalid memory accesses by masked-out elements are suppressed. The
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 (Knights Landing) and Skylake-X CPUs; thi ...
instruction set also contains (potentially masked) scatter operations. The
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
instruction set's
Scalable Vector Extension AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
includes gather and scatter operations on 8-, 16-, 32- and 64-bit elements.
InfiniBand InfiniBand (IB) is a computer networking communications standard used in high-performance computing that features very high throughput and very low latency. It is used for data interconnect both among and within computers. InfiniBand is also used ...
has hardware support for gather/scatter. Without instruction-level gather/scatter, efficient implementations may need to be tuned for optimal performance, for example with prefetching; libraries such as OpenMPI may provide such primitives.


See also

*
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
*
Vectorization Vectorization may refer to: Computing * Array programming, a style of computer programming where operations are applied to whole arrays instead of individual elements * Automatic vectorization, a compiler optimization that transforms loops to vect ...
* Compute kernel *
Memory access pattern In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache perform ...


References

{{reflist Parallel computing SIMD computing