In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
on
secondary storage
Computer data storage or digital data storage is a technology consisting of computer components and Data storage, recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The cent ...
. These patterns differ in the level of
locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
and drastically affect
cache performance,
and also have implications for the approach to
parallelism and distribution of workload in
shared memory systems. Further,
cache coherency issues can affect
multiprocessor performance, which means that certain memory access patterns place a ceiling on parallelism (which
manycore approaches seek to break).
Computer memory
Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...
is usually described as "
random access", but traversals by software will still exhibit patterns that can be exploited for efficiency. Various tools exist to help system designers and programmers understand, analyse and improve the memory access pattern, including
VTune and
Vectorization Advisor, including tools to address
GPU memory access patterns.
Memory access patterns also have implications for
security, which motivates some to try and disguise a program's activity for
privacy
Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively.
The domain of privacy partially overlaps with security, which can include the concepts of a ...
reasons.
Examples
Sequential
The simplest extreme is the
sequential access pattern, where data is read, processed, and written out with straightforward incremented/decremented addressing. These access patterns are highly amenable to
prefetching.
Strided
Strided or simple 2D, 3D access patterns (e.g., stepping through
multi-dimensional arrays) are similarly easy to predict, and are found in implementations of
linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrix (mathemat ...
algorithms and
image processing
An image or picture is a visual representation. An image can be two-dimensional, such as a drawing, painting, or photograph, or three-dimensional, such as a carving or sculpture. Images may be displayed through other media, including a pr ...
.
Loop tiling is an effective approach. Some systems with
DMA provided a strided mode for transferring data between subtile of larger
2D arrays and
scratchpad memory.
Linear
A linear access pattern is closely related to "strided", where a
memory address may be computed from a linear combination of some index. Stepping through indices sequentially with a linear pattern yields
strided access. A linear access pattern for writes (with any access pattern for non-overlapping reads) may guarantee that an algorithm can be parallelised, which is exploited in systems supporting
compute kernels.
Nearest neighbor
Nearest neighbor memory access patterns appear in simulation, and are related to sequential or strided patterns. An algorithm may traverse a data structure using information from the nearest neighbors of a data element (in one or more dimensions) to perform a calculation. These are common in physics simulations operating on grids.
Nearest neighbor can also refer to inter-node communication in a cluster; physics simulations which rely on such local access patterns can be parallelized with the data partitioned into cluster nodes, with purely nearest-neighbor communication between them, which may have advantages for latency and communication bandwidth. This use case maps well onto
torus network topology.
2D spatially coherent
In
3D rendering, access patterns for
texture mapping and
rasterization of small primitives (with arbitrary distortions of complex surfaces) are far from linear, but can still exhibit spatial locality (e.g., in
screen space or
texture space). This can be turned into good ''memory'' locality via some combination of
morton order and
tiling for
texture maps and
frame buffer data (mapping spatial regions onto cache lines), or by sorting primitives via
tile based deferred rendering. It can also be advantageous to store matrices in morton order in
linear algebra libraries.
Scatter
A
scatter memory access pattern combines sequential reads with indexed/random addressing for writes.
Compared to gather, It may place less load on a cache hierarchy since a
processing element may dispatch writes in a "fire and forget" manner (bypassing a cache altogether), whilst using predictable prefetching (or even DMA) for its source data.
However, it may be harder to parallelise since there is no guarantee the writes do not interact,
and many systems are still designed assuming that a hardware cache will coalesce many small writes into larger ones.
In the past,
forward texture mapping attempted to handle the randomness with "writes", whilst sequentially reading source texture information.
The
PlayStation 2
The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Interactive Entertainment, Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October, in Europe on 24 Novembe ...
console used conventional inverse texture mapping, but handled any scatter/gather processing "on-chip" using EDRAM, whilst 3D model (and a lot of texture data) from main memory was fed sequentially by DMA. This is why it lacked support for indexed primitives, and sometimes needed to manage textures "up front" in the
display list.
Gather
In a
gather memory access pattern, reads are randomly addressed or indexed, whilst the writes are sequential (or linear).
An example is found in
inverse texture mapping, where data can be written out linearly across
scan lines, whilst random access texture addresses are calculated per
pixel
In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...
.
Compared to scatter, the disadvantage is that caching (and bypassing latencies) is now essential for efficient reads of small elements, however it is easier to parallelise since the writes are guaranteed to not overlap. As such the gather approach is more common for
gpgpu programming,
where the massive threading (enabled by parallelism) is used to hide read latencies.
[deals with "scatter memory access patterns" and "gather memory access patterns" in the text]
Combined gather and scatter
An algorithm may gather data from one source, perform some computation in local or on chip memory, and scatter results elsewhere. This is essentially the full operation of a
GPU pipeline when performing
3D rendering- gathering indexed vertices and textures, and scattering shaded pixels in
screen space. Rasterization of opaque primitives using a depth buffer is "commutative", allowing reordering, which facilitates parallel execution. In the general case synchronisation primitives would be needed.
Random
At the opposite extreme is a truly random memory access pattern. A few multiprocessor systems are specialised to deal with these. The
PGAS approach may help by sorting operations by data on the fly (useful when the problem *is* figuring out the locality of unsorted data).
[ covers cases where PGAS is a win, where data may not be already sorted, e.g., dealing with complex graphs - see "science across the irregularity spectrum".] Data structures which rely heavily on
pointer chasing can often produce poor
locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
, although sorting can sometimes help. Given a truly random memory access pattern, it may be possible to break it down (including scatter or gather stages, or other intermediate sorting) which may improve the locality overall; this is often a prerequisite for
parallelizing.
Approaches
Data-oriented design
Data-oriented design is an approach intended to maximise the locality of reference, by organising data according to how it is traversed in various stages of a program, contrasting with the more common
object oriented approach (i.e., organising such that data layout explicitly mirrors the access pattern).
Contrast with locality of reference
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
refers to a property exhibited by memory access patterns. A programmer will change the memory access pattern (by reworking algorithms) to improve the locality of reference, and/or to increase potential for parallelism.
A programmer or system designer may create frameworks or abstractions (e.g.,
C++ templates or
higher-order function In mathematics and computer science, a higher-order function (HOF) is a function that does at least one of the following:
* takes one or more functions as arguments (i.e. a procedural parameter, which is a parameter of a procedure that is itself ...
s) that
encapsulate a specific memory access pattern.
[a C++ template library for producing optimised memory access patterns]
Different considerations for memory access patterns appear in parallelism beyond locality of reference, namely the separation of reads and writes. E.g.: even if the reads and writes are "perfectly" local, it can be impossible to parallelise due to
dependencies; separating the reads and writes into separate areas yields a different memory access pattern, maybe initially appear worse in pure locality terms, but desirable to leverage modern parallel hardware.
Locality of reference may also refer to individual variables (e.g., the ability of a
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
to cache them in
registers), whilst the term memory access pattern only refers to data held in an indexable memory (especially
main memory).
See also
*
Gather-scatter (vector addressing)
*
Locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
*
Parallel computing
Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...
*
Working set
Working set is a concept in computer science which defines the amount of memory that a process (computing), process requires in a given time interval.
Definition
Peter_J._Denning, Peter Denning (1968) defines "the working set of information W(t ...
References
{{Reflist, 32em
Computer performance
Software optimization
Computer data storage