In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhere. A ''cache hit'' occurs when the requested data can be found in a cache, while a ''cache miss'' occurs when it cannot. Cache hits are served by reading data from the cache, which is faster than recomputing a result or reading from a slower data store; thus, the more requests that can be served from the cache, the faster the system performs.
To be cost-effective and to enable efficient use of data, caches must be relatively small. Nevertheless, caches have proven themselves in many areas of computing, because typical
computer applications
A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These pr ...
access data with a high degree of
locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
. Such access patterns exhibit temporal locality, where data is requested that has been recently requested already, and
spatial locality, where data is requested that is stored physically close to data that has already been requested.
Motivation
There is an inherent trade-off between size and speed (given that a larger resource implies greater physical distances) but also a tradeoff between expensive, premium technologies (such as
SRAM) vs cheaper, easily mass-produced commodities (such as
DRAM or
hard disk
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s).
The
buffering provided by a cache benefits one or both of
latency and
throughput
Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ov ...
(
bandwidth
Bandwidth commonly refers to:
* Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range
* Bandwidth (computing), the rate of data transfer, bit rate or thr ...
):
Latency
A larger resource incurs a significant latency for access e.g. it can take hundreds of clock cycles for a modern 4 GHz processor to reach
DRAM. This is mitigated by reading in large chunks, in the hope that subsequent reads will be from nearby locations. Prediction or explicit
prefetching
Prefetching in computer science is a technique for speeding up fetch operations by beginning a fetch operation whose result is expected to be needed soon. Usually this is before it is ''known'' to be needed, so there is a risk of wasting time by p ...
might also guess where future reads will come from and make requests ahead of time; if done correctly the latency is bypassed altogether.
Throughput
The use of a cache also allows for higher throughput from the underlying resource, by assembling multiple fine grain transfers into larger, more efficient requests. In the case of
DRAM circuits, this might be served by having a wider data bus. For example, consider a program accessing bytes in a 32-bit
address space, but being served by a 128-bit off-chip data bus; individual uncached byte accesses would allow only 1/16th of the total bandwidth to be used, and 80% of the data movement would be memory addresses instead of data itself. Reading larger chunks reduces the fraction of bandwidth required for transmitting address information.
Operation
Hardware implements cache as a
block
Block or blocked may refer to:
Arts, entertainment and media Broadcasting
* Block programming, the result of a programming strategy in broadcasting
* W242BX, a radio station licensed to Greenville, South Carolina, United States known as ''96.3 ...
of memory for temporary storage of data likely to be used again.
Central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
s (CPUs),
solid-state drive
A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is a ...
s (SSDs) and
hard disk drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s (HDDs) frequently include hardware-based cache, while
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s and
web server
A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiate ...
s commonly rely on software caching.
A cache is made up of a pool of entries. Each entry has associated ''data'', which is a copy of the same data in some ''backing store''. Each entry also has a ''tag'', which specifies the identity of the data in the backing store of which the entry is a copy. Tagging allows simultaneous cache-oriented algorithms to function in multilayered fashion without differential relay interference.
When the cache client (a CPU, web browser,
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
) needs to access data presumed to exist in the backing store, it first checks the cache. If an entry can be found with a tag matching that of the desired data, the data in the entry is used instead. This situation is known as a cache hit. For example, a web browser program might check its local cache on disk to see if it has a local copy of the contents of a web page at a particular
URL. In this example, the URL is the tag, and the content of the web page is the data. The percentage of accesses that result in cache hits is known as the hit rate or hit ratio of the cache.
The alternative situation, when the cache is checked and found not to contain any entry with the desired tag, is known as a cache miss. This requires a more expensive access of data from the backing store. Once the requested data is retrieved, it is typically copied into the cache, ready for the next access.
During a cache miss, some other previously existing cache entry is removed in order to make room for the newly retrieved data. The
heuristic
A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...
used to select the entry to replace is known as the
replacement policy. One popular replacement policy, "least recently used" (LRU), replaces the oldest entry, the entry that was accessed less recently than any other entry (see
cache algorithm). More efficient caching algorithms compute the use-hit frequency against the size of the stored contents, as well as the
latencies and throughputs for both the cache and the backing store. This works well for larger amounts of data, longer latencies, and slower throughputs, such as that experienced with hard drives and networks, but is not efficient for use within a CPU cache.
Writing policies
When a system writes data to cache, it must at some point write that data to the backing store as well. The timing of this write is controlled by what is known as the ''write policy''. There are two basic writing approaches:
* ''Write-through'': write is done synchronously both to the cache and to the backing store.
* ''Write-back'': initially, writing is done only to the cache. The write to the backing store is postponed until the modified content is about to be replaced by another cache block.
A write-back cache is more complex to implement, since it needs to track which of its locations have been written over, and mark them as ''dirty'' for later writing to the backing store. The data in these locations are written back to the backing store only when they are evicted from the cache, an effect referred to as a ''lazy write''. For this reason, a read miss in a write-back cache (which requires a block to be replaced by another) will often require two memory accesses to service: one to write the replaced data from the cache back to the store, and then one to retrieve the needed data.
Other policies may also trigger data write-back. The client may make many changes to data in the cache, and then explicitly notify the cache to write back the data.
Since no data is returned to the requester on write operations, a decision needs to be made on write misses, whether or not data would be loaded into the cache.
This is defined by these two approaches:
* ''Write allocate'' (also called ''fetch on write''): data at the missed-write location is loaded to cache, followed by a write-hit operation. In this approach, write misses are similar to read misses.
* ''No-write allocate'' (also called ''write-no-allocate'' or ''write around''): data at the missed-write location is not loaded to cache, and is written directly to the backing store. In this approach, data is loaded into the cache on read misses only.
Both write-through and write-back policies can use either of these write-miss policies, but usually they are paired in this way:
[
]
* A write-back cache uses write allocate, hoping for subsequent writes (or even reads) to the same location, which is now cached.
* A write-through cache uses no-write allocate. Here, subsequent writes have no advantage, since they still need to be written directly to the backing store.
Entities other than the cache may change the data in the backing store, in which case the copy in the cache may become out-of-date or ''stale''. Alternatively, when the client updates the data in the cache, copies of those data in other caches will become stale. Communication protocols between the cache managers which keep the data consistent are known as coherency protocols.
Prefetch
On a cache read miss,
caches with a ''
demand paging
In computer operating systems, demand paging (as opposed to anticipatory paging) is a method of virtual memory management. In a system that uses demand paging, the operating system copies a disk page into physical memory only if an attempt is mad ...
policy'' read the minimum amount from the backing store.
For example, demand-paging virtual memory reads one page of virtual memory (often 4 kBytes) from disk into the disk cache in RAM.
For example, a typical CPU reads a single L2 cache line of 128 bytes from DRAM into the L2 cache, and a single L1 cache line of 64 bytes from the L2 cache into the L1 cache.
Caches with a
prefetch input queue
Fetching the instruction opcodes from program memory well in advance is known as prefetching and it is served by using prefetch input queue (PIQ).The pre-fetched instructions are stored in data structure - namely a queue. The fetching of opcodes ...
or more general ''anticipatory paging policy''
go further—they not only read the chunk requested, but guess that the next chunk or two will soon be required, and so prefetch that data into the cache ahead of time.
Anticipatory paging is especially helpful
when the backing store has a long latency to read the first chunk and much shorter times to sequentially read the next few chunks, such as
disk storage and
DRAM.
A few operating systems go further with a
loader
Loader can refer to:
* Loader (equipment)
* Loader (computing)
** LOADER.EXE, an auto-start program loader optionally used in the startup process of Microsoft Windows ME
* Loader (surname)
* Fast loader
* Speedloader
* Boot loader
** LOADER.COM ...
that always pre-loads the entire executable into RAM.
A few caches go even further, not only pre-loading an entire file, but also starting to load other related files that may soon be requested, such as the
page cache
In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache ...
associated with a
prefetcher
The Prefetcher is a component of Microsoft Windows which was introduced in Windows XP. It is a component of the Memory Manager that can speed up the Windows boot process and shorten the amount of time it takes to start up programs. It accomplish ...
or the
web cache
A Web cache (or HTTP cache) is a system for optimizing the World Wide Web. It is implemented both client-side and server-side. The caching of multimedias and other files can result in less overall delay when browsing the Web.
Parts of the sys ...
associated with
link prefetching Link prefetching allows web browsers to pre-load resources. This speeds up both the loading and rendering of web pages. Prefetching was first introduced in HTML5.
Prefetching is accomplished through hints in web pages. These hints are used by the ...
.
Examples of hardware caches
CPU cache
Small memories on or close to the
CPU can operate faster than the much larger
main memory. Most CPUs since the 1980s have used one or more caches, sometimes
in cascaded levels; modern high-end
embedded,
desktop
A desktop traditionally refers to:
* The surface of a desk (often to distinguish office appliances that fit on a desk, such as photocopiers and printers, from larger equipment covering its own area on the floor)
Desktop may refer to various compu ...
and server
microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
s may have as many as six types of cache (between levels and functions). Examples of caches with a specific function are the
D-cache and
I-cache
This glossary of computer hardware terms is a list of definitions of terms and concepts related to computer hardware, i.e. the physical and structural components of computers, architectural issues, and peripheral devices.
A
...
and the
translation lookaside buffer
A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. ...
for the
MMU.
GPU cache
Earlier
graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s (GPUs) often had limited read-only
texture cache
This is a glossary of terms relating to computer graphics.
For more general computer hardware terms, see glossary of computer hardware terms.
0–9
A
B
...
s, and introduced
Morton order
In mathematical analysis and computer science, functions which are Z-order, Lebesgue curve, Morton space-filling curve, Morton order or Morton code map multidimensional data to one dimension while preserving locality of the data points. It i ...
swizzled texture
This is a glossary of terms relating to computer graphics.
For more general computer hardware terms, see glossary of computer hardware terms.
0–9
A
B
...
s to improve 2D
cache coherency
In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
.
Cache miss
In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...
es would drastically affect performance, e.g. if
mipmapping
In computer graphics, mipmaps (also MIP maps) or pyramids are pre-calculated, optimized sequences of images, each of which is a progressively lower resolution representation of the previous. The height and width of each image, or level, in the ...
was not used. Caching was important to leverage 32-bit (and wider) transfers for texture data that was often as little as 4 bits per pixel, indexed in complex patterns by arbitrary
UV coordinates
UV mapping is the 3D modeling process of projecting a 3D model's surface to a 2D image for texture mapping. The letters "U" and "V" denote the axes of the 2D texture because "X", "Y", and "Z" are already used to denote the axes of the 3D object i ...
and
perspective transformations in
inverse texture mapping
Texture mapping is a method for mapping a texture on a computer-generated graphic. Texture here can be high frequency detail, surface texture, or color.
History
The original technique was pioneered by Edwin Catmull in 1974.
Texture mappi ...
.
As GPUs advanced (especially with
GPGPU
General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
compute shader
In computing, a compute kernel is a routine compiled for high throughput accelerators (such as graphics processing units (GPUs), digital signal processors (DSPs) or field-programmable gate arrays (FPGAs)), separate from but used by a main prog ...
s) they have developed progressively larger and increasingly general caches, including
instruction cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
s for
shader
In computer graphics, a shader is a computer program that calculates the appropriate levels of light, darkness, and color during the rendering of a 3D scene - a process known as ''shading''. Shaders have evolved to perform a variety of speci ...
s, exhibiting increasingly common functionality with CPU caches. For example,
GT200
The GT200 is a fraudulent " remote substance detector" that was claimed by its manufacturer, UK-based Global Technical Ltd, to be able to detect, from a distance, various substances including explosives and drugs. The GT200 was sold to a number ...
architecture GPUs did not feature an L2 cache, while the
Fermi
Enrico Fermi (; 29 September 1901 – 28 November 1954) was an Italian (later naturalized American) physicist and the creator of the world's first nuclear reactor, the Chicago Pile-1. He has been called the "architect of the nuclear age" and ...
GPU has 768 KB of last-level cache, the
Kepler
Johannes Kepler (; ; 27 December 1571 – 15 November 1630) was a German astronomer, mathematician, astrologer, natural philosopher and writer on music. He is a key figure in the 17th-century Scientific Revolution, best known for his laws o ...
GPU has 1536 KB of last-level cache, and the
Maxwell GPU has 2048 KB of last-level cache. These caches have grown to handle
synchronisation primitive
In computer science, synchronization refers to one of two distinct but related concepts: synchronization of processes, and synchronization of data. ''Process synchronization'' refers to the idea that multiple processes are to join up or handshak ...
s between threads and
atomic operation
In concurrent programming, an operation (or set of operations) is linearizable if it consists of an ordered list of invocation and response events (event), that may be extended by adding response events such that:
# The extended list can be re-e ...
s, and interface with a CPU-style
MMU.
DSPs
Digital signal processor
A digital signal processor (DSP) is a specialized microprocessor chip, with its architecture optimized for the operational needs of digital signal processing. DSPs are fabricated on MOS integrated circuit chips. They are widely used in audio si ...
s have similarly generalised over the years. Earlier designs used
scratchpad memory
Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. In reference to a microprocess ...
fed by
DMA, but modern DSPs such as
Qualcomm Hexagon
Hexagon is the brand name for a family of digital signal processor (DSP) products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.” According to Qualcomm, the Hexagon architecture is desig ...
often include a very similar set of caches to a CPU (e.g.
Modified Harvard architecture
The modified Harvard architecture is a variation of the Harvard computer architecture that, unlike the pure Harvard architecture, allows the contents of the instruction memory to be accessed as data. Most modern computers that are documented as ...
with shared L2, split L1 I-cache and D-cache).
Translation lookaside buffer
A
memory management unit
A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical a ...
(MMU) that fetches page table entries from main memory has a specialized cache, used for recording the results of
virtual address
In computing, a virtual address space (VAS) or address space is the set of ranges of virtual addresses that an operating system makes available to a process. The range of virtual addresses usually starts at a low address and can extend to the hig ...
to
physical address
In computing, a physical address (also real address, or binary address), is a memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a ''particular'' storage cell ...
translations. This specialized cache is called a
translation lookaside buffer
A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. ...
(TLB).
In-network cache
Information-centric networking
Information-centric networking (ICN) is an approach to evolve the
Internet
The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
infrastructure away from a host-centric paradigm, based on perpetual connectivity and the
end-to-end principle
The end-to-end principle is a design framework in computer networking. In networks designed according to this principle, guaranteeing certain application-specific features, such as reliability and security, requires that they reside in the commu ...
, to a network architecture in which the focal point is identified information (or content or data). Due to the inherent caching capability of the nodes in an ICN, it can be viewed as a loosely connected network of caches, which has unique requirements of caching policies. However, ubiquitous content caching introduces the challenge to content protection against unauthorized access, which requires extra care and solutions.
Unlike proxy servers, in ICN the cache is a network-level solution. Therefore, it has rapidly changing cache states and higher request arrival rates; moreover, smaller cache sizes further impose a different kind of requirements on the content eviction policies. In particular, eviction policies for ICN should be fast and lightweight. Various cache replication and eviction schemes for different ICN architectures and applications have been proposed.
Policies
=Time aware least recently used (TLRU)
=
The Time aware Least Recently Used (TLRU) is a variant of LRU designed for the situation where the stored contents in cache have a valid life time. The algorithm is suitable in network cache applications, such as Information-centric networking (ICN),
Content Delivery Network
A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially rel ...
s (CDNs) and distributed networks in general. TLRU introduces a new term: TTU (Time to Use). TTU is a time stamp of a content/page which stipulates the usability time for the content based on the locality of the content and the content publisher announcement. Owing to this locality based time stamp, TTU provides more control to the local administrator to regulate in network storage.
In the TLRU algorithm, when a piece of content arrives, a cache node calculates the local TTU value based on the TTU value assigned by the content publisher. The local TTU value is calculated by using a locally defined function. Once the local TTU value is calculated the replacement of content is performed on a subset of the total content stored in cache node. The TLRU ensures that less popular and small life content should be replaced with the incoming content.
=Least frequent recently used (LFRU)
=
The Least Frequent Recently Used (LFRU) cache replacement scheme combines the benefits of LFU and LRU schemes. LFRU is suitable for 'in network' cache applications, such as Information-centric networking (ICN), Content Delivery Networks (CDNs) and distributed networks in general. In LFRU, the cache is divided into two partitions called privileged and unprivileged partitions. The privileged partition can be defined as a protected partition. If content is highly popular, it is pushed into the privileged partition. Replacement of the privileged partition is done as follows: LFRU evicts content from the unprivileged partition, pushes content from privileged partition to unprivileged partition, and finally inserts new content into the privileged partition. In the above procedure the LRU is used for the privileged partition and an approximated LFU (ALFU) scheme is used for the unprivileged partition, hence the abbreviation LFRU.
The basic idea is to filter out the locally popular contents with ALFU scheme and push the popular contents to one of the privileged partition.
Weather forecast
Back in 2010 ''
The New York Times
''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid ...
'' suggested "Type 'weather' followed by your zip code." By 2011, the use of smartphones with weather forecasting options was overly taxing
AccuWeather
AccuWeather Inc. is an American media company that provides commercial weather forecasting services worldwide.
AccuWeather was founded in 1962 by Joel N. Myers, then a Pennsylvania State University graduate student working on a master's degree ...
servers; two requests within the same park would generate separate requests. An optimization by edge-servers to truncate the GPS coordinates to fewer decimal places meant that the cached results from the earlier query would be used. The number of to-the-server lookups per day dropped by half.
Software caches
Disk cache
While CPU caches are generally managed entirely by hardware, a variety of software manages other caches. The
page cache
In computing, a page cache, sometimes also called disk cache, is a transparent cache for the pages originating from a secondary storage device such as a hard disk drive (HDD) or a solid-state drive (SSD). The operating system keeps a page cache ...
in main memory, which is an example of disk cache, is managed by the operating system
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine learn ...
.
While the
disk buffer
In computer storage, disk buffer (often ambiguously called disk cache or cache buffer) is the embedded memory in a hard disk drive (HDD) or solid state drive (SSD) acting as a buffer between the rest of the computer and the physical hard di ...
, which is an integrated part of the hard disk drive or solid state drive, is sometimes misleadingly referred to as "disk cache", its main functions are write sequencing and read prefetching. Repeated cache hits are relatively rare, due to the small size of the buffer in comparison to the drive's capacity. However, high-end
disk controller
{{unreferenced, date=May 2010
The disk controller is the controller circuit which enables the CPU to communicate with a hard disk, floppy disk or other kind of disk drive. It also provides an interface between the disk drive and the bus conne ...
s often have their own on-board cache of the hard disk drive's
data blocks.
Finally, a fast local hard disk drive can also cache information held on even slower data storage devices, such as remote servers (
web cache
A Web cache (or HTTP cache) is a system for optimizing the World Wide Web. It is implemented both client-side and server-side. The caching of multimedias and other files can result in less overall delay when browsing the Web.
Parts of the sys ...
) or local
tape drive
A tape drive is a data storage device that reads and writes data on a magnetic tape. Magnetic tape data storage is typically used for offline, archival data storage. Tape media generally has a favorable unit cost and a long archival stability.
...
s or
optical jukebox
An optical jukebox is a robotic data storage device that can automatically load and unload optical discs, such as Compact Disc, DVD, Ultra Density Optical or Blu-ray and can provide terabytes (TB) or petabytes (PB) of tertiary storage. The de ...
es; such a scheme is the main concept of
hierarchical storage management
Hierarchical storage management (HSM), also known as Tiered storage, is a data storage and Data management technique that automatically moves data between high-cost and low-cost storage media. HSM systems exist because high-speed storage devices, ...
. Also, fast flash-based
solid-state drive
A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is a ...
s (SSDs) can be used as caches for slower rotational-media hard disk drives, working together as
hybrid drive
In computing, a hybrid drive (solid state hybrid drive – SSHD) is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding s ...
s or
solid-state hybrid drive
In computing, a hybrid drive (solid state hybrid drive – SSHD) is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding s ...
s (SSHDs).
Web cache
Web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s and
web proxy servers employ web caches to store previous responses from
web server
A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiate ...
s, such as
web pages and
image
An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensiona ...
s. Web caches reduce the amount of information that needs to be transmitted across the network, as information previously stored in the cache can often be re-used. This reduces bandwidth and processing requirements of the web server, and helps to improve
responsiveness
Responsiveness as a concept of computer science refers to the specific ability of a system or functional unit to complete assigned tasks within a given time. For example, it would refer to the ability of an artificial intelligence system to unde ...
for users of the web.
Web browsers employ a built-in web cache, but some
Internet service provider
An Internet service provider (ISP) is an organization that provides services for accessing, using, or participating in the Internet. ISPs can be organized in various forms, such as commercial, community-owned, non-profit, or otherwise private ...
s (ISPs) or organizations also use a caching proxy server, which is a web cache that is shared among all users of that network.
Another form of cache is
P2P caching
Peer-to-peer caching (P2P caching) is a computer network traffic management technology used by Internet Service Providers (ISPs) to accelerate content delivered over peer-to-peer (P2P) networks while reducing related bandwidth costs.
P2P caching ...
, where the files most sought for by
peer-to-peer
Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer n ...
applications are stored in an
ISP
An Internet service provider (ISP) is an organization that provides services for accessing, using, or participating in the Internet. ISPs can be organized in various forms, such as commercial, community-owned, non-profit, or otherwise private ...
cache to accelerate P2P transfers. Similarly, decentralised equivalents exist, which allow communities to perform the same task for P2P traffic, for example, Corelli.
Memoization
A cache can store data that is computed on demand rather than retrieved from a backing store.
Memoization is an
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
technique that stores the results of resource-consuming
function call
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions may ...
s within a lookup table, allowing subsequent calls to reuse the stored results and avoid repeated computation. It is related to the
dynamic programming
Dynamic programming is both a mathematical optimization method and a computer programming method. The method was developed by Richard Bellman in the 1950s and has found applications in numerous fields, from aerospace engineering to economics. ...
algorithm design methodology, which can also be thought of as a means of caching.
Content delivery network
A content delivery network (CDN) is a network of distributed servers that deliver pages and other Web content to a user, based on the geographic locations of the user, the origin of the web page and the content delivery server.
CDNs began in the late 1990s as a way to speed up the delivery of static content, such as HTML pages, images and videos. By replicating content on multiple servers around the world and delivering it to users based on their location, CDNs can significantly improve the speed and availability of a website or application. When a user requests a piece of content, the CDN will check to see if it has a copy of the content in its cache. If it does, the CDN will deliver the content to the user from the cache.
Cloud storage gateway
A cloud storage gateway, also known as an edge filer, is a
hybrid cloud storage Hybrid cloud storage, in Computer data storage, data storage, is a term for a storage infrastructure that uses a combination of on-premises storage resources with a public cloud storage provider. The on-premises storage is usually managed by the org ...
device that connects a local network to one or more cloud storage service, typically an
object storage
Object storage (also known as object-based storage) is a computer data storage that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data a ...
service such as
Amazon S3
Amazon S3 or Amazon Simple Storage Service is a service offered by Amazon Web Services (AWS) that provides object storage through a web service interface. Amazon S3 uses the same scalable storage infrastructure that Amazon.com uses to run its ...
. It provides a cache for frequently accessed data, providing high speed local access to frequently accessed data in the cloud storage service. Cloud storage gateways also provide additional benefits such as accessing cloud object storage through traditional file serving protocols as well as continued access to cached data during connectivity outages.
Other caches
The BIND
DNS
The Domain Name System (DNS) is a hierarchical and distributed naming system for computers, services, and other resources in the Internet or other Internet Protocol (IP) networks. It associates various information with domain names assigned to ...
daemon caches a mapping of domain names to
IP address
An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
es, as does a resolver library.
Write-through operation is common when operating over unreliable networks (like an Ethernet LAN), because of the enormous complexity of the
coherency protocol required between multiple write-back caches when communication is unreliable. For instance, web page caches and
client-side
Client-side refers to operations that are performed by the client in a client–server relationship in a computer network.
General concepts
Typically, a client is a computer application, such as a web browser, that runs on a user's local comput ...
network file system
Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, lik ...
caches (like those in
NFS or
SMB) are typically read-only or write-through specifically to keep the network protocol simple and reliable.
Search engine
A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
s also frequently make
web pages they have indexed available from their cache. For example,
Google
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
provides a "Cached" link next to each search result. This can prove useful when web pages from a
web server
A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiate ...
are temporarily or permanently inaccessible.
Database caching
Database caching is a process included in the design of computer applications which generate web pages on-demand (dynamically) by accessing backend databases.
When these applications are deployed on multi-tier environments that involve browser-ba ...
can substantially improve the throughput of
database
In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
applications, for example in the processing of
indexes
Index (or its plural form indices) may refer to:
Arts, entertainment, and media Fictional entities
* Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index''
* The Index, an item on a Halo megastru ...
,
data dictionaries, and frequently used subsets of data.
A
distributed cache
In computing, a distributed cache is an extension of the traditional concept of cache used in a single locale. A distributed cache may span multiple servers so that it can grow in size and in transactional capacity. It is mainly used to store appl ...
uses networked hosts to provide scalability, reliability and performance to the application.
The hosts can be co-located or spread over different geographical regions.
Buffer vs. cache
The semantics of a "buffer" and a "cache" are not totally different; even so, there are fundamental differences in intent between the process of caching and the process of buffering.
Fundamentally, caching realizes a performance increase for transfers of data that is being repeatedly transferred. While a caching system may realize a performance increase upon the initial (typically write) transfer of a data item, this performance increase is due to buffering occurring within the caching system.
With read caches, a data item must have been fetched from its residing location at least once in order for subsequent reads of the data item to realize a performance increase by virtue of being able to be fetched from the cache's (faster) intermediate storage rather than the data's residing location. With write caches, a performance increase of writing a data item may be realized upon the first write of the data item by virtue of the data item immediately being stored in the cache's intermediate storage, deferring the transfer of the data item to its residing storage at a later stage or else occurring as a background process. Contrary to strict buffering, a caching process must adhere to a (potentially distributed) cache coherency protocol in order to maintain consistency between the cache's intermediate storage and the location where the data resides. Buffering, on the other hand,
* reduces the number of transfers for otherwise novel data amongst communicating processes, which amortizes overhead involved for several small transfers over fewer, larger transfers,
* provides an intermediary for communicating processes which are incapable of direct transfers amongst each other, or
* ensures a minimum data size or representation required by at least one of the communicating processes involved in a transfer.
With typical caching implementations, a data item that is read or written for the first time is effectively being buffered; and in the case of a write, mostly realizing a performance increase for the application from where the write originated. Additionally, the portion of a caching protocol where individual writes are deferred to a batch of writes is a form of buffering. The portion of a caching protocol where individual reads are deferred to a batch of reads is also a form of buffering, although this form may negatively impact the performance of at least the initial reads (even though it may positively impact the performance of the sum of the individual reads). In practice, caching almost always involves some form of buffering, while strict buffering does not involve caching.
A
buffer
Buffer may refer to:
Science
* Buffer gas, an inert or nonflammable gas
* Buffer solution, a solution used to prevent changes in pH
* Buffering agent, the weak acid or base in a buffer solution
* Lysis buffer, in cell biology
* Metal ion buffer
* ...
is a temporary memory location that is traditionally used because CPU
instructions cannot directly address data stored in peripheral devices. Thus, addressable memory is used as an intermediate stage. Additionally, such a buffer may be feasible when a large block of data is assembled or disassembled (as required by a storage device), or when data may be delivered in a different order than that in which it is produced. Also, a whole buffer of data is usually transferred sequentially (for example to hard disk), so buffering itself sometimes increases transfer performance or reduces the variation or jitter of the transfer's latency as opposed to caching where the intent is to reduce the latency. These benefits are present even if the buffered data are written to the
buffer
Buffer may refer to:
Science
* Buffer gas, an inert or nonflammable gas
* Buffer solution, a solution used to prevent changes in pH
* Buffering agent, the weak acid or base in a buffer solution
* Lysis buffer, in cell biology
* Metal ion buffer
* ...
once and read from the buffer once.
A cache also increases transfer performance. A part of the increase similarly comes from the possibility that multiple small transfers will combine into one large block. But the main performance-gain occurs because there is a good chance that the same data will be read from cache multiple times, or that written data will soon be read. A cache's sole purpose is to reduce accesses to the underlying slower storage. Cache is also usually an
abstraction layer that is designed to be invisible from the perspective of neighboring layers.
See also
*
Cache coloring In computer science, cache coloring (also known as page coloring) is the process of attempting to allocate free pages that are contiguous from the CPU cache's point of view, in order to maximize the total number of pages cached by the processor. Cac ...
*
Cache hierarchy
Cache hierarchy, or multi-level caches, refers to a memory architecture that uses a hierarchy of memory stores based on varying access speeds to cache data. Highly requested data is cached in high-speed access memory stores, allowing swifter access ...
*
Cache-oblivious algorithm
In computing, a cache-oblivious algorithm (or cache-transcendent algorithm) is an algorithm designed to take advantage of a processor cache without having the size of the cache (or the length of the cache lines, etc.) as an explicit parameter. An ...
*
Cache stampede
A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under very high load. This behaviour is sometimes also called dog-piling.
To understand how cache stampedes occ ...
*
Cache language model
*
Cache manifest in HTML5
*
Dirty bit
A dirty bit or modified bit is a bit that is associated with a block of computer memory and indicates whether the corresponding block of memory has been modified. The dirty bit is set when the Central processing unit, processor writes to (modifies ...
*
Five-minute rule
*
Materialized view
In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summary ...
*
Memory hierarchy
*
Pipeline burst cache
*
Temporary file
A temporary file is a file created to store information temporarily, either for a program's intermediate use or for transfer to a permanent file when complete. It may be created by computer programs for a variety of purposes, such as when a program ...
References
Further reading
"What Every Programmer Should Know About Memory""Caching in the Distributed Environment"
{{Authority control
Computer architecture