
In
computer architecture, the memory hierarchy separates
computer storage
Computer data storage or digital data storage is a technology consisting of computer components and Data storage, recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The cent ...
into a hierarchy based on
response time. Since response time,
complexity
Complexity characterizes the behavior of a system or model whose components interact in multiple ways and follow local rules, leading to non-linearity, randomness, collective dynamics, hierarchy, and emergence.
The term is generally used to c ...
, and
capacity are related, the levels may also be distinguished by their
performance
A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Performance has evolved glo ...
and controlling technologies.
Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level
programming constructs involving
locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
.
Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the size and capabilities of each component. Each of the various components can be viewed as part of a hierarchy of memories in which each member is typically smaller and faster than the next highest member of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling for activating the transfer.
There are four major storage levels.
* ''Internal''
processor register
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-onl ...
s and
cache.
* Mainthe system
RAM and controller cards.
* On-line mass storagesecondary storage.
* Off-line bulk storagetertiary and off-line storage.
This is a general memory hierarchy structuring. Many other structures are useful. For example, a paging algorithm may be considered as a level for
virtual memory
In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...
when designing a
computer architecture, and one can include a level of
nearline storage between online and offline storage.
Properties of the technologies in the memory hierarchy
* Adding complexity slows the ''memory hierarchy''.
* CMOx memory technology stretches the flash space in the memory hierarchy
* One of the main ways to increase system performance is minimising how far down the memory hierarchy one has to go to manipulate data.
* Latency and bandwidth are two metrics associated with caches. Neither of them is uniform, but is specific to a particular component of the memory hierarchy.
* Predicting where in the memory hierarchy the data resides is difficult.
* The location in the memory hierarchy dictates the time required for the prefetch to occur.
Examples

The number of levels in the memory hierarchy and the performance at each level has increased over time. The type of memory or storage components also change historically. For example, the memory hierarchy of an Intel Haswell Mobile processor circa 2013 is:
*
Processor register
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-onl ...
sthe fastest possible access (usually 1 CPU cycle). A few thousand bytes in size.
*
Cache
** Level 0 (L0),
micro-operation
In computer central processing units, micro-operations (also known as micro-ops or μops, historically also as micro-actions) are detailed low-level instructions used in some designs to implement complex machine instructions (sometimes termed ma ...
s cache6,144 bytes (6 KiB) in size
** Level 1 (L1)
instruction cache128 KiB in size
** Level 1 (L1) data cache128 KiB in size. Best access speed is around 700
GB/s.
** Level 2 (L2) instruction and data (shared)1
MiB in size. Best access speed is around 200 GB/s.
** Level 3 (L3) shared cache6 MiB in size. Best access speed is around 100 GB/s.
** Level 4 (L4) shared cache128 MiB in size. Best access speed is around 40 GB/s.
*
Main memory
Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processin ...
(
primary storage
Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processin ...
)
GiB in size. Best access speed is around 10 GB/s.
In the case of a
NUMA machine, access times may not be uniform.
*
Mass storage
In computing, mass storage refers to the storage of large amounts of data in a persisting and machine-readable fashion. In general, the term ''mass'' in ''mass storage'' is used to mean ''large'' in relation to contemporaneous hard disk drive ...
(
secondary storage
Computer data storage or digital data storage is a technology consisting of computer components and Data storage, recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The cent ...
)
terabyte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable un ...
s in size. , best access speed is from a consumer
solid state drive is about 2000 MB/s.
*
Nearline storage (
tertiary storage)up to
exabytes in size. , best access speed is about 160 MB/s.
*
Offline storage
The lower levels of the hierarchyfrom mass storage downwardsare also known as
tiered storage. The formal distinction between online, nearline, and offline storage is:
* Online storage is immediately available for I/O.
* Nearline storage is not immediately available, but can be made online quickly without human intervention.
* Offline storage is not immediately available, and requires some human intervention to bring online.
For example, always-on spinning disks are online, while spinning disks that spin down, such as massive arrays of idle disk (
MAID
A maid, housemaid, or maidservant is a female domestic worker. In the Victorian era, domestic service was the second-largest category of employment in England and Wales, after agricultural work. In developed Western nations, full-time maids a ...
), are nearline. Removable media such as tape cartridges that can be automatically loaded, as in a
tape library, are nearline, while cartridges that must be manually loaded are offline.
Most modern
CPUs
A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...
are so fast that, for most program workloads, the
bottleneck
Bottleneck may refer to:
* the narrowed portion (neck) of a bottle
Science and technology
* Bottleneck (engineering), where the performance of an entire system is limited by a single component
* Bottleneck (network), in a communication network
* ...
is the
locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
of memory accesses and the efficiency of the
caching and memory transfer between different levels of the hierarchy. As a result, the CPU spends much of its time idling, waiting for memory I/O to complete. This is sometimes called the ''space cost'', as a larger memory object is more likely to overflow a small and fast level and require use of a larger, slower level. The resulting load on memory use is known as ''pressure'' (respectively ''register pressure'', ''cache pressure'', and (main) ''memory pressure''). Terms for data being missing from a higher level and needing to be fetched from a lower level are, respectively:
register spilling (due to
register pressure: register to cache),
cache miss (cache to main memory), and (hard)
page fault
In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space ...
(''real'' main memory to ''virtual'' memory, i.e. mass storage, commonly referred to as ''disk'' regardless of the actual mass storage technology used).
Modern
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s mainly assume two levels of memory, main (''working'') memory and mass storage, though in
assembly language
In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
and
inline assembler
In computer programming, an inline assembler is a feature of some compilers that allows low-level code written in assembly language to be embedded within a program, among code that otherwise has been compiled from a high-level language, higher-leve ...
s in languages such as
C, registers can be directly accessed. Taking optimal advantage of the memory hierarchy requires the cooperation of programmers, hardware, and compilers (as well as underlying support from the operating system):
*''Programmers'' are responsible for moving data between disk and memory through file I/O.
*''Hardware'' is responsible for moving data between memory and caches.
*''
Optimizing compilers'' are responsible for generating code that, when executed, will cause the hardware to use caches and registers efficiently.
Many programmers assume one level of memory. This works fine until the application hits a performance wall. Then the memory hierarchy will be assessed during
code refactoring
In computer programming and software design, code refactoring is the process of restructuring existing source code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structure, ...
.
See also
*
Cache hierarchy
*
Use of spatial and temporal locality: hierarchical memory
*
Buffer vs. cache
*
Cache hierarchy in a modern processor
*
Memory wall
*
Computer memory
Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...
*
Hierarchical storage management
*
Cloud storage
*
Memory access pattern In computing, a memory access pattern or IO access pattern is the pattern with which a system or program reads and writes memory on secondary storage. These patterns differ in the level of locality of reference and drastically affect cache perform ...
*
Communication-avoiding algorithm
References
{{DEFAULTSORT:Memory Hierarchy
Computer architecture
Computer data storage
Hierarchy