Gather/scatter is a type of
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary memory indices. Examples of its use include
sparse linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrix (mathemat ...
operations, sorting algorithms,
fast Fourier transform
A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...
s,
and some computational graph theory problems. It is the vector equivalent of
register indirect addressing, with gather involving indexed reads, and scatter, indexed writes.
Vector processors (and some
SIMD units in
CPUs) have hardware support for gather and scatter operations, as do many
input/output
In computing, input/output (I/O, i/o, or informally io or IO) is the communication between an information processing system, such as a computer, and the outside world, such as another computer system, peripherals, or a human operator. Inputs a ...
systems, allowing large data sets to be transferred to
main memory more rapidly.
The concept is somewhat similar to
vectored I/O, which is sometimes also referred to as scatter-gather I/O. This system differs in that it is used to map multiple sources of data from contiguous structures into a single stream for reading or writing. A common example is writing out a series of
strings, which in most
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s would be stored in separate memory locations.
Definitions
Gather
A
sparsely populated vector (with dimension
) holding
non-empty elements can be represented by two densely populated vectors of length
;
containing the non-empty elements of
, and
giving the index in
where
's element is located. The gather of
into
, denoted
, assigns
with
having already been calculated. Assuming no
pointer aliasing between x[], y[],idx[], a C (programming language), C implementation is
for (i = 0; i < N; ++i)
x[i] = y dx[i;
Scatter
The sparse scatter, denoted
is the reverse operation. It copies the values of
into the corresponding locations in the sparsely populated vector
, i.e.
.
for (i = 0; i < N; ++i)
y dx[i = x[i">.html" ;"title="dx[i">dx[i = x[i
Support
Scatter/gather units were also a part of most vector computers, notably the Cray X-MP and its follow-ons. In this case, the purpose was to efficiently store values in the limited resource of the vector registers. For instance, the Cray-1 had eight 64-word vector registers, so data that contained values that had no effect on the outcome, like zeros in an addition, were using up valuable space that would be better used. By gathering non-zero values into the registers, and scattering the results back out, the registers could be used much more efficiently, leading to higher performance. However the Cray-1 vector memory reference instructions could only access memory in "constant stride" - which allowed fast access of contiguous data (stride 1) or by some other constant increment. With the introduction of gather and scatter instructions in the X-MP, this restriction was eliminated. This basic layout was widely copied in later
supercomputer
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
designs, especially on the variety of models from Japan.
As
microprocessor
A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
design improved during the 1990s, commodity CPUs began to add vector processing units. At first these tended to be simple, sometimes overlaying the CPU's general purpose registers, but over time these evolved into increasingly powerful systems that met and then surpassed the units in high-end supercomputers. By this time, scatter/gather instructions had been added to many of these designs.
x86-64 CPUs which support the
AVX2
Advanced Vector Extensions (AVX, also known as Gesher New Instructions and then Sandy Bridge New Instructions) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They w ...
instruction set can gather 32-bit and 64-bit elements with memory offsets from a base address. A second register determines whether the particular element is loaded, and faults occurring from invalid memory accesses by masked-out elements are suppressed.
The
AVX-512 instruction set also contains (potentially masked) scatter operations.
The
ARM instruction set's
Scalable Vector Extension includes gather and scatter operations on 8-, 16-, 32- and 64-bit elements.
InfiniBand has hardware support for gather/scatter.
Without instruction-level gather/scatter, efficient implementations may need to be tuned for optimal performance, for example with
prefetching; libraries such as OpenMPI may provide such primitives.
See also
*
SIMD
*
Vectorization
*
Compute kernel
*
Memory access pattern In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache perform ...
References
{{reflist
Parallel computing
SIMD computing