HOME

TheInfoList



OR:

Scratchpad memory (SPM), also known as scratchpad, scratchpad RAM or local store in computer terminology, is a high-speed internal memory used for temporary storage of calculations, data, and other work in progress. In reference to a
microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circ ...
(or CPU), scratchpad refers to a special high-speed memory used to hold small items of data for rapid retrieval. It is similar to the usage and size of a scratchpad in life: a pad of paper for preliminary notes or sketches or writings, etc. In some systems it can be considered similar to the L1 cache in that it is the next closest memory to the ALU after the
processor registers A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
, with explicit instructions to move data to and from main memory, often using DMA-based data transfer. In contrast to a system that uses caches, a system with scratchpads is a system with
non-uniform memory access Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non ...
(NUMA) latencies, because the memory access latencies to the different scratchpads and the main memory vary. Another difference from a system that employs caches is that a scratchpad commonly does not contain a copy of data that is also stored in the main memory. Scratchpads are employed for simplification of caching logic, and to guarantee a unit can work without main memory contention in a system employing multiple processors, especially in multiprocessor system-on-chip for
embedded systems An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
. They are mostly suited for storing temporary results (as it would be found in the CPU stack) that typically wouldn't need to always be committing to the main memory; however when fed by DMA, they can also be used in place of a cache for mirroring the state of slower main memory. The same issues of
locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
apply in relation to efficiency of use; although some systems allow strided DMA to access rectangular data sets. Another difference is that scratchpads are explicitly manipulated by applications. They may be useful for realtime applications, where predictable timing is hindered by cache behavior. Scratchpads are not used in mainstream desktop processors where generality is required for
legacy software In computing, a legacy system is an old method, technology, computer system, or application program, "of, relating to, or being a previous or outdated computer system", yet still in use. Often referencing a system as "legacy" means that it paved ...
to run from generation to generation, in which the available on-chip memory size may change. They are better implemented in embedded systems, special-purpose processors and
game consoles A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to a ...
, where chips are often manufactured as MPSoC, and where software is often tuned to one hardware configuration.


Examples of use

*
Fairchild F8 The Fairchild F8 is an 8-bit microprocessor system from Fairchild Semiconductor, announced in 1974 and shipped in 1975. The original processor family included four main 40-pin integrated circuits (ICs); the 3850 CPU which was the arithmetic log ...
of 1975 contained 64 bytes of scratchpad. *
Cyrix 6x86 The Cyrix 6x86 is a line of sixth-generation, 32-bit x86 microprocessors designed and released by Cyrix in 1995. Cyrix, being a fabless company, had the chips manufactured by IBM and SGS-Thomson. The 6x86 was made as a direct competitor to In ...
is the only
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
-compatible desktop processor to incorporate a dedicated scratchpad. *
SuperH SuperH (or SH) is a 32-bit reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Hitachi and currently produced by Renesas. It is implemented by microcontrollers and microprocessors for embedded systems. At the ...
, used in Sega's consoles, could lock cachelines to an address outside of main memory for use as a scratchpad. * Sony's PS1's
R3000 The R3000 is a 32-bit RISC microprocessor chipset developed by MIPS Computer Systems that implemented the MIPS I instruction set architecture (ISA). Introduced in June 1988, it was the second MIPS implementation, succeeding the R2000 as the flag ...
had a scratchpad instead of an L1 cache. It was possible to place the CPU stack here, an example of the temporary workspace usage. * Adapteva's Epiphany parallel coprocessor features local-stores for each core, connected by a
network on a chip A network on a chip or network-on-chip (NoC or )This article uses the convention that "NoC" is pronounced . Therefore, it uses the convention "a" for the indefinite article corresponding to NoC ("a NoC"). Other sources may pronounce it as a ...
, with DMA possible between them and off-chip links (possibly to DRAM). The architecture is similar to Sony's Cell, except all cores can directly address each other's scratchpads, generating network messages from standard load/store instructions. * Sony's
PS2 The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October 2000, in Europe on 24 November 2000, and in Australia o ...
Emotion Engine The Emotion Engine is a central processing unit developed and manufactured by Sony Computer Entertainment and Toshiba for use in the PlayStation 2 video game console. It was also used in early PlayStation 3 models sold in Japan and North Americ ...
includes a 16  KB scratchpad, to and from which DMA transfers could be issued to its GS, and main memory. *
Cell Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery ...
's SPEs are restricted purely to working in their "local-store", relying on DMA for transfers from/to main memory and between local stores, much like a scratchpad. In this regard, additional benefit is derived from the lack of hardware to check and update coherence between multiple caches: the design takes advantage of the assumption that each processor's workspace is separate and private. It is expected this benefit will become more noticeable as the number of processors scales into the "many-core" future. Yet because of the elimination of some hardware logics, the data and instructions of applications on SPEs must be managed through software if the whole task on SPE can not fit in local store. * Many other processors allow L1 cache lines to be locked. * Most digital signal processors use a scratchpad. Many past 3D accelerators and game consoles (including the PS2) have used DSPs for
vertex transformation In linear algebra, linear transformations can be represented by matrices. If T is a linear transformation mapping \mathbb^n to \mathbb^m and \mathbf x is a column vector with n entries, then T( \mathbf x ) = A \mathbf x for some m \times n matrix ...
s. This differs from the stream based approach of modern GPUs which have more in common with a CPU cache's functions. * NVIDIA's 8800
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
running under
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach ...
provides 16 KB of scratchpad (NVIDIA calls it Shared Memory) per thread-bundle when being used for
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditiona ...
tasks. Scratchpad also was used in later Fermi GPU (
GeForce 400 Series Serving as the introduction of Fermi, the GeForce 400 series is a series of graphics processing units developed by Nvidia. Its release was originally slated in November 2009; however, after delays, it was released on March 26, 2010 with availa ...
). * Ageia's
PhysX PhysX is an open-source realtime physics engine middleware SDK developed by Nvidia as a part of Nvidia GameWorks software suite. Initially, video games supporting PhysX were meant to be accelerated by PhysX PPU (expansion cards designed by ...
chip includes a scratchpad RAM in a manner similar to the Cell; its theory states that a cache hierarchy is of less use than software managed physics and collision calculations. These memories are also banked and a switch manages transfers between them. * Intel's Knights Landing processor has a 16 GB MCDRAM that can be configured as either a cache, scratchpad memory, or divided into some cache and some scratchpad memory. * Movidius Myriad 2, a
vision processing unit A vision processing unit (VPU) is (as of 2018) an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks. Overview Vision processing units are distinct from video processing uni ...
, organized as a multicore architecture with a large multiported shared scratchpad. *
Graphcore Graphcore is a British semiconductor company that develops accelerators for AI and machine learning. It aims to make a massively parallel Intelligence Processing Unit (IPU) that holds the complete machine learning model inside the processor. Hi ...
has designed an
AI accelerator An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications i ...
based on scratchpad memories


Alternatives


Cache control vs scratchpads

Some architectures such as PowerPC attempt to avoid the need for cacheline locking or scratchpads through the use of cache control instructions. Marking an area of memory with "Data Cache Block: Zero" (allocating a line but setting its contents to zero instead of loading from main memory) and discarding it after use ('Data Cache Block: Invalidate', signaling that main memory didn't receive any updated data) the cache is made to behave as a scratchpad. Generality is maintained in that these are hints and the underlying hardware will function correctly regardless of actual cache size.


Shared L2 vs Cell local stores

Regarding interprocessor communication in a multicore setup, there are similarities between the Cell's inter-localstore DMA and a shared L2 cache setup as in the Intel Core 2 Duo or the Xbox 360's custom powerPC: the L2 cache allows processors to share results without those results having to be committed to main memory. This can be an advantage where the
working set Working set is a concept in computer science which defines the amount of memory that a process requires in a given time interval. Definition Peter Denning (1968) defines "the working set of information W(t, \tau) of a process at time t to be the ...
for an algorithm encompasses the entirety of the L2 cache. However, when a program is written to take advantage of inter-localstore DMA, the Cell has the benefit of each-other-Local-Store serving the purpose of BOTH the private workspace for a single processor AND the point of sharing between processors; i.e., the other Local Stores are on a similar footing viewed from one processor as the shared L2 cache in a conventional chip. The tradeoff is that of memory wasted in buffering and programming complexity for synchronization, though this would be similar to precached pages in a conventional chip. Domains where using this capability is effective include: * Pipeline processing (where one achieves the same effect as increasing the L1 cache's size by splitting one job into smaller chunks) * Extending the working set, e.g., a sweet spot for a merge sort where the data fits within 8×256 KB * Shared code uploading, like loading a piece of code to one SPU, then copy it from there to the others to avoid hitting the main memory again It would be possible for a conventional processor to gain similar advantages with cache-control instructions, for example, allowing the prefetching to the L1 bypassing the L2, or an eviction hint that signaled a transfer from L1 to L2 but not committing to main memory; however, at present no systems offer this capability in a usable form and such instructions in effect should mirror explicit transfer of data among cache areas used by each core.


See also

*
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
*
NUMA Nuclear mitotic apparatus protein 1 is a protein that in humans is encoded by the ''NUMA1'' gene. Interactions Nuclear mitotic apparatus protein 1 has been shown to interact with PIM1, Band 4.1, GPSM2 G-protein-signaling modulator 2, also call ...
* MPSoC


Notes


References


External links

* Rajeshwari Banakar
Scratchpad Memory : A Design Alternative for Cache. On-chip memory in Embedded Systems
// CODES'02. May 6–8, 2002 {{DEFAULTSORT:Scratchpad Ram
Internet culture Internet culture is a culture based on the many way people have used computer networks and their use for communication, entertainment, business, and recreation. Some features of Internet culture include online communities, gaming, and social medi ...