Cache Pollution
   HOME

TheInfoList



OR:

Cache pollution describes situations where an executing
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
loads data into
CPU cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...
unnecessarily, thus causing other useful data to be evicted from the cache into lower levels of the
memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlli ...
, degrading performance. For example, in a
multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
, one core may replace the blocks fetched by other cores into shared cache, or prefetched blocks may replace demand-fetched blocks from the cache.


Example

Consider the following illustration: T = T + 1; for i in 0.. sizeof(CACHE) C = C + 1; T = T + C izeof(CACHE)-1 (The assumptions here are that the cache is composed of only one level, it is unlocked, the replacement policy is pseudo-LRU, all data is cacheable, the set associativity of the cache is N (where N > 1), and at most one
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
is available to contain program values). Right before the loop starts, T will be fetched from memory into cache, its value updated. However, as the loop executes, because the number of data elements the loop references requires the whole cache to be filled to its capacity, the cache block containing T has to be evicted. Thus, the next time the program requests T to be updated, the cache misses, and the cache controller has to request the
data bus In computer architecture, a bus (shortened form of the Latin '' omnibus'', and historically also called data highway or databus) is a communication system that transfers data between components inside a computer, or between computers. This ex ...
to bring the corresponding cache block from
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
again. In this case the cache is said to be "polluted". Changing the pattern of data accesses by positioning the first update of T between the loop and the second update can eliminate the inefficiency: for i in 0.. sizeof(CACHE) C = C + 1; T = T + 1; T = T + C izeof(CACHE)-1


Solutions

Other than code-restructuring mentioned above, the solution to cache pollution is ensure that only high-reuse data are stored in cache. This can be achieved by using special cache control instructions,
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
support or hardware support. Examples of specialized hardware instructions include "lvxl" provided by
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
AltiVec. This instruction loads a 128 bit wide value into a register and marks the corresponding cache block as "least recently used" i.e. as the prime candidate for eviction upon a need to
evict Eviction is the removal of a tenant from rental property by the landlord. In some jurisdictions it may also involve the removal of persons from premises that were foreclosed by a mortgagee (often, the prior owners who defaulted on a mortgage ...
a block from its cache set. To appropriately use that instruction in the context of the above example, the data elements referenced by the loop would have to be loaded using this instruction. When implemented in this manner, cache pollution would not take place, since the execution of such loop would not cause premature eviction of T from cache. This would be avoided because, as the loop would progress, the addresses of the elements in C would map to the same cache way, leaving the actually older (but not marked as "least recently used") data intact on the other way(s). Only the oldest data (not pertinent for the example given) would be evicted from cache, which T is not a member of, since its update occurs right before the loop's start. Similarly, using operating system (OS) support, the pages in
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
that correspond to the C data array can be marked as "caching inhibited" or, in other words, non-cacheable. Similarly, at hardware level, cache bypassing schemes can be used which identify low-reuse data based on program access pattern and bypass them from cache. Also, shared cache can be partitioned to avoid destructive interference between running applications. The tradeoff in these solutions is that OS-based schemes may have large latency which may nullify the gain achievable by cache pollution avoidance (unless the memory region has been non-cacheable to begin with), whereas hardware-based techniques may not have a global view of the program control flow and memory access pattern.


Increasing importance

Cache pollution control has been increasing in importance because the penalties caused by the so-called " memory wall" keep on growing. Chip manufacturers continue devising new tricks to overcome the ever increasing relative memory-to-CPU latency. They do that by increasing cache sizes and by providing useful ways for software engineers to control the way data arrives and stays at the CPU. Cache pollution control is one of the numerous devices available to the (mainly embedded) programmer. However, other methods, most of which are proprietary and highly hardware and application specific, are used as well.


References

{{DEFAULTSORT:Cache Pollution Computer architecture Cache (computing) Computer memory