HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
on
secondary storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
. These patterns differ in the level of
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
and drastically affect
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
performance, and also have implications for the approach to parallelism and distribution of workload in shared memory systems. Further,
cache coherency In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
issues can affect
multiprocessor Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There ar ...
performance, which means that certain memory access patterns place a ceiling on parallelism (which
manycore Manycore processors are special kinds of multi-core processors designed for a high degree of parallel processing, containing numerous simpler, independent processor cores (from a few tens of cores to thousands or more). Manycore processors are use ...
approaches seek to break).
Computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term ''primary storage ...
is usually described as "
random access Random access (more precisely and more generally called direct access) is the ability to access an arbitrary element of a sequence in equal time or any datum from a population of addressable elements roughly as easily and efficiently as any othe ...
", but traversals by software will still exhibit patterns that can be exploited for efficiency. Various tools exist to help system designers and programmers understand, analyse and improve the memory access pattern, including
VTune VTune Profiler (formerly VTune Amplifier) is a performance analysis tool for x86 based machines running Linux or Microsoft Windows operating systems. Many features work on both Intel and AMD hardware, but advanced hardware-based sampling requires ...
and Vectorization Advisor, including tools to address
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
memory access patterns Memory access patterns also have implications for
security Security is protection from, or resilience against, potential harm (or other unwanted coercive change) caused by others, by restraining the freedom of others to act. Beneficiaries (technically referents) of security may be of persons and social ...
, which motivates some to try and disguise a program's activity for
privacy Privacy (, ) is the ability of an individual or group to seclude themselves or information about themselves, and thereby express themselves selectively. The domain of privacy partially overlaps with security, which can include the concepts of a ...
reasons.


Examples


Sequential

The simplest extreme is the
sequential access Sequential access is a term describing a group of elements (such as data in a memory array or a disk file or on magnetic tape data storage) being accessed in a predetermined, ordered sequence. It is the opposite of random access, the ability to ac ...
pattern, where data is read, processed, and written out with straightforward incremented/decremented addressing. These access patterns are highly amenable to
prefetching Prefetching in computer science is a technique for speeding up fetch operations by beginning a fetch operation whose result is expected to be needed soon. Usually this is before it is ''known'' to be needed, so there is a risk of wasting time by p ...
.


Strided

Strided or simple 2D, 3D access patterns (e.g., stepping through
multi-dimensional array In computer science, array is a data type that represents a collection of ''elements'' (values or variables), each selected by one or more indices (identifying keys) that can be computed at run time during program execution. Such a collection i ...
s) are similarly easy to predict, and are found in implementations of
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...
algorithms and
image processing An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
.
Loop tiling In computer science and particularly in compiler design, loop nest optimization (LNO) is an optimization technique that applies a set of loop transformations for the purpose of locality optimization or parallelization or another loop overhead redu ...
is an effective approach. Some systems with DMA provided a strided mode for transferring data between subtile of larger
2D array D, or d, is the fourth letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''dee'' (pronounced ), plural ''dees''. History The ...
s and
scratchpad memory Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. In reference to a microprocesso ...
.


Linear

A linear access pattern is closely related to "strided", where a
memory address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Su ...
may be computed from a linear combination of some index. Stepping through indices sequentially with a linear pattern yields strided access. A linear access pattern for writes (with any access pattern for non-overlapping reads) may guarantee that an algorithm can be parallelised, which is exploited in systems supporting
compute kernel In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main progr ...
s.


Nearest neighbor

Nearest neighbor memory access patterns appear in simulation, and are related to sequential or strided patterns. An algorithm may traverse a data structure using information from the nearest neighbors of a data element (in one or more dimensions) to perform a calculation. These are common in physics simulations operating on grids. Nearest neighbor can also refer to inter-node communication in a cluster; physics simulations which rely on such local access patterns can be parallelized with the data partitioned into cluster nodes, with purely nearest-neighbor communication between them, which may have advantages for latency and communication bandwidth. This use case maps well onto torus network topology.


2D spatially coherent

In
3D rendering 3D rendering is the 3D computer graphics process of converting 3D modeling, 3D models into 2D computer graphics, 2D images on a computer. 3D renders may include photorealistic rendering, photorealistic effects or non-photorealistic rendering, no ...
, access patterns for
texture mapping Texture mapping is a method for mapping a texture on a computer-generated graphic. Texture here can be high frequency detail, surface texture, or color. History The original technique was pioneered by Edwin Catmull in 1974. Texture mapping ...
and
rasterization In computer graphics, rasterisation (British English) or rasterization (American English) is the task of taking an image described in a vector graphics format (shapes) and converting it into a raster image (a series of pixels, dots or lines, whic ...
of small primitives (with arbitrary distortions of complex surfaces) are far from linear, but can still exhibit spatial locality (e.g., in
screen space This is a glossary of terms relating to computer graphics. For more general computer hardware terms, see glossary of computer hardware terms. 0–9 A B ...
or
texture space Texture mapping is a method for mapping a texture on a computer-generated graphic. Texture here can be high frequency detail, surface texture, or color. History The original technique was pioneered by Edwin Catmull in 1974. Texture mapping ...
). This can be turned into good ''memory'' locality via some combination of
morton order In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It i ...
and
tiling Tiling may refer to: *The physical act of laying tiles *Tessellations Computing *The compiler optimization of loop tiling *Tiled rendering, the process of subdividing an image by regular grid *Tiling window manager People *Heinrich Sylvester The ...
for
texture map Texture mapping is a method for mapping a texture on a computer-generated graphic. Texture here can be high frequency detail, surface texture, or color. History The original technique was pioneered by Edwin Catmull in 1974. Texture mapping ...
s and
frame buffer A framebuffer (frame buffer, or sometimes framestore) is a portion of random-access memory (RAM) containing a bitmap that drives a video display. It is a memory buffer containing data representing all the pixels in a complete video frame. Modern ...
data (mapping spatial regions onto cache lines), or by sorting primitives via
tile based deferred rendering Tiled rendering is the process of subdividing a computer graphics image by a regular grid in optical space and rendering each section of the grid, or ''tile'', separately. The advantage to this design is that the amount of memory and bandwidth is re ...
. It can also be advantageous to store matrices in morton order in linear algebra libraries.


Scatter

A scatter memory access pattern combines sequential reads with indexed/random addressing for writes. Compared to gather, It may place less load on a cache hierarchy since a
processing element This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices. A ...
may dispatch writes in a "fire and forget" manner (bypassing a cache altogether), whilst using predictable prefetching (or even DMA) for its source data. However, it may be harder to parallelise since there is no guarantee the writes do not interact, and many systems are still designed assuming that a hardware cache will coalesce many small writes into larger ones. In the past, forward texture mapping attempted to handle the randomness with "writes", whilst sequentially reading source texture information. The
PlayStation 2 The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October 2000, in Europe on 24 November 2000, and in Australia on 3 ...
console used conventional inverse texture mapping, but handled any scatter/gather processing "on-chip" using EDRAM, whilst 3D model (and a lot of texture data) from main memory was fed sequentially by DMA. This is why it lacked support for indexed primitives, and sometimes needed to manage textures "up front" in the
display list A display list (or ''display file'') is a series of graphics commands that define an output image. The image is created ( ''rendered'') by executing the commands to combine various primitives. This activity is most often performed by specialized di ...
.


Gather

In a
gather Gather, gatherer, or gathering may refer to: Anthropology and sociology * Hunter-gatherer, a person or a society whose subsistence depends on hunting and gathering of wild foods *Intensive gathering, the practice of cultivating wild plants as a s ...
memory access pattern, reads are randomly addressed or indexed, whilst the writes are sequential (or linear). An example is found in
inverse texture mapping Texture mapping is a method for mapping a texture on a computer-generated graphic. Texture here can be high frequency detail, surface texture, or color. History The original technique was pioneered by Edwin Catmull in 1974. Texture mappi ...
, where data can be written out linearly across
scan line A scan line (also scanline) is one line, or row, in a raster scanning pattern, such as a line of video on a cathode ray tube (CRT) display of a television set or computer monitor. On CRT screens the horizontal scan lines are visually discernible ...
s, whilst random access texture addresses are calculated per
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a raster image, or the smallest point in an all points addressable display device. In most digital display devices, pixels are the smal ...
. Compared to scatter, the disadvantage is that caching (and bypassing latencies) is now essential for efficient reads of small elements, however it is easier to parallelise since the writes are guaranteed to not overlap. As such the gather approach is more common for
gpgpu General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
programming, where the massive threading (enabled by parallelism) is used to hide read latencies.deals with "scatter memory access patterns" and "gather memory access patterns" in the text


Combined gather and scatter

An algorithm may gather data from one source, perform some computation in local or on chip memory, and scatter results elsewhere. This is essentially the full operation of a
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
pipeline when performing
3D rendering 3D rendering is the 3D computer graphics process of converting 3D modeling, 3D models into 2D computer graphics, 2D images on a computer. 3D renders may include photorealistic rendering, photorealistic effects or non-photorealistic rendering, no ...
- gathering indexed vertices and textures, and scattering shaded pixels in
screen space This is a glossary of terms relating to computer graphics. For more general computer hardware terms, see glossary of computer hardware terms. 0–9 A B ...
. Rasterization of opaque primitives using a depth buffer is "commutative", allowing reordering, which facilitates parallel execution. In the general case synchronisation primitives would be needed.


Random

At the opposite extreme is a truly random memory access pattern. A few multiprocessor systems are specialised to deal with these. The PGAS approach may help by sorting operations by data on the fly (useful when the problem *is* figuring out the locality of unsorted data).covers cases where PGAS is a win, where data may not be already sorted, e.g., dealing with complex graphs - see "science across the irregularity spectrum". Data structures which rely heavily on
pointer chasing Pointer may refer to: Places * Pointer, Kentucky * Pointers, New Jersey * Pointers Airport, Wasco County, Oregon, United States * The Pointers, a pair of rocks off Antarctica People with the name * Pointer (surname), a surname (including a ...
can often produce poor
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
, although sorting can sometimes help. Given a truly random memory access pattern, it may be possible to break it down (including scatter or gather stages, or other intermediate sorting) which may improve the locality overall; this is often a prerequisite for parallelizing.


Approaches


Data-oriented design

Data-oriented design is an approach intended to maximise the locality of reference, by organising data according to how it is traversed in various stages of a program, contrasting with the more common
object oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of pro ...
approach (i.e., organising such that data layout explicitly mirrors the access pattern).


Contrast with locality of reference

Locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
refers to a property exhibited by memory access patterns. A programmer will change the memory access pattern (by reworking algorithms) to improve the locality of reference, and/or to increase potential for parallelism. A programmer or system designer may create frameworks or abstractions (e.g.,
C++ templates C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''. History "C" ...
or
higher-order function In mathematics and computer science, a higher-order function (HOF) is a function that does at least one of the following: * takes one or more functions as arguments (i.e. a procedural parameter, which is a parameter of a procedure that is itself ...
s) that encapsulate a specific memory access pattern.a C++ template library for producing optimised memory access patterns Different considerations for memory access patterns appear in parallelism beyond locality of reference, namely the separation of reads and writes. E.g.: even if the reads and writes are "perfectly" local, it can be impossible to parallelise due to dependencies; separating the reads and writes into separate areas yields a different memory access pattern, maybe initially appear worse in pure locality terms, but desirable to leverage modern parallel hardware. Locality of reference may also refer to individual variables (e.g., the ability of a
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
to cache them in registers), whilst the term memory access pattern only refers to data held in an indexable memory (especially
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
).


See also

*
Gather-scatter (vector addressing) Gather/scatter is a type of Computer memory, memory addressing that at once collects (gathers) from, or stores (scatters) data to, multiple, arbitrary indices. Examples of its use include Sparse matrix, sparse linear algebra operations, sorting a ...
*
Locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
*
Parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
*
Working set Working set is a concept in computer science which defines the amount of memory that a process requires in a given time interval. Definition Peter Denning (1968) defines "the working set of information W(t, \tau) of a process at time t to be the ...


References

{{Reflist, 32em Computer performance Software optimization Computer data storage