HOME

TheInfoList



OR:

Gather/scatter is a type of
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include sparse
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...
operations, sorting algorithms,
fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in th ...
s, and some computational graph theory problems. It is the vector equivalent of register indirect addressing, with gather involving indexed reads, and scatter, indexed writes.
Vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
s (and some SIMD units in
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
s) have hardware support for gather and scatter operations.


Definitions


Gather

A sparsely populated vector y holding N non-empty elements can be represented by two densely populated vectors of length N; x containing the non-empty elements of y, and idx giving the index in y where x's element is located. The gather of y into x, denoted x \leftarrow y, _x, assigns x(i)=y(idx(i)) with idx having already been calculated. Assuming no
pointer aliasing In computing, aliasing describes a situation in which a data location in memory can be accessed through different symbolic names in the program. Thus, modifying the data through one name implicitly modifies the values associated with all aliased n ...
between x[], y[],idx[], a C (programming language), C implementation is for (i = 0; i < N; ++i) x[i] = y dx[i;


Scatter

The sparse scatter, denoted y, _x \leftarrow x is the reverse operation. It copies the values of x into the corresponding locations in the sparsely populated vector y, i.e. y(idx(i))=x(i). for (i = 0; i < N; ++i) y dx[i = x[i">.html" ;"title="dx[i">dx[i = x[i


Support

x86-64">">dx[i<_a>_=_x[i.html" ;"title=".html" ;"title="dx[i">dx[i = x[i">.html" ;"title="dx[i">dx[i = x[i


Support

x86-64 CPUs which support the AVX2 instruction set can gather 32-bit and 64-bit elements with memory offsets from a base address. A second register determines whether the particular element is loaded, and faults occurring from invalid memory accesses by masked-out elements are suppressed. The AVX-512 instruction set also contains (potentially masked) scatter operations. The ARM instruction set's
Scalable Vector Extension AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
includes gather and scatter operations on 8-, 16-, 32- and 64-bit elements. InfiniBand has hardware support for gather/scatter. Without instruction-level gather/scatter, efficient implementations may need to be tuned for optimal performance, for example with
prefetch Prefetching in computer science is a technique for speeding up fetch operations by beginning a fetch operation whose result is expected to be needed soon. Usually this is before it is ''known'' to be needed, so there is a risk of wasting time by p ...
ing; libraries such as OpenMPI may provide such primitives.


See also

* SIMD * Vectorization *
Compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main progr ...
* Memory access pattern


References

{{reflist Parallel computing SIMD computing