Dm-cache
   HOME

TheInfoList



OR:

dm-cache is a component (more specifically, a target) of the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
's
device mapper The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level ''virtual block devices''. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, ...
, which is a
framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
for mapping
block device In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow an ...
s onto higher-level virtual block devices. It allows one or more fast storage devices, such as flash-based
solid-state drive A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is ...
s (SSDs), to act as a
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
for one or more slower storage devices such as
hard disk drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s (HDDs); this effectively creates
hybrid volume In computing, a hybrid drive (solid state hybrid drive – SSHD) is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding s ...
s and provides
secondary storage Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
performance improvements. The design of dm-cache requires three physical storage devices for the creation of a single hybrid volume; dm-cache uses those storage devices to separately store actual data, cache data, and required
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
. Configurable operating modes and cache policies, with the latter in the form of separate modules, determine the way data caching is actually performed. dm-cache is licensed under the terms of
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
(GPL), with Joe Thornber, Heinz Mauelshagen and Mike Snitzer as its primary developers.


Overview

dm-cache uses solid-state drives ( SSDs) as an additional level of indirection while accessing hard disk drives ( HDDs), improving the overall performance by using fast
flash Flash, flashes, or FLASH may refer to: Arts, entertainment, and media Fictional aliases * Flash (DC Comics character), several DC Comics superheroes with super speed: ** Flash (Barry Allen) ** Flash (Jay Garrick) ** Wally West, the first Kid ...
-based SSDs as caches for the slower mechanical HDDs based on rotational
magnetic media Magnetic storage or magnetic recording is the storage of data on a magnetized medium. Magnetic storage uses different patterns of magnetisation in a magnetizable material to store data and is a form of non-volatile memory. The information is ac ...
. As a result, the costly speed of SSDs becomes combined with the storage capacity offered by slower but less expensive HDDs. Moreover, in the case of
storage area network A storage area network (SAN) or storage network is a computer network which provides access to consolidated, block-level data storage. SANs are primarily used to access data storage devices, such as disk arrays and tape libraries from serve ...
s (SANs) used in
cloud In meteorology, a cloud is an aerosol consisting of a visible mass of miniature liquid droplets, frozen crystals, or other particles suspended in the atmosphere of a planetary body or similar space. Water or various other chemicals may co ...
environments as shared storage systems for
virtual machine In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
s, dm-cache can also improve overall performance and reduce the load of SANs by providing data caching using client-side local storage. dm-cache is implemented as a component of the Linux kernel's
device mapper The device mapper is a framework provided by the Linux kernel for mapping physical block devices onto higher-level ''virtual block devices''. It forms the foundation of the logical volume manager (LVM), software RAIDs and dm-crypt disk encryption, ...
, which is a
volume management In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate, ...
framework that allows various mappings to be created between physical and virtual block devices. The way a mapping between devices is created determines how the virtual blocks are translated into underlying physical blocks, with the specific translation types referred to as ''targets''. Acting as a mapping target, dm-cache makes it possible for SSD-based caching to be part of the created virtual block device, while the configurable operating modes and cache policies determine how dm-cache works internally. The operating mode selects the way in which the data is kept in sync between an HDD and an SSD, while the cache policy, selectable from separate modules that implement each of the policies, provides the
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
for determining which blocks are promoted (moved from an HDD to an SSD), demoted (moved from an SSD to an HDD), cleaned, etc. When configured to use the ''multiqueue'' (mq) or ''stochastic multiqueue'' (smq) cache policy, with the latter being the default, dm-cache uses SSDs to store the data associated with performed random reads and writes, capitalizing on near-zero
seek time Higher performance in hard disk drives comes from devices which have better performance characteristics. These performance characteristics can be grouped into two categories: access time and data transfer time (or rate). Access time The ''access ...
s of SSDs and avoiding such I/O operations as typical HDD performance bottlenecks. The data associated with sequential reads and writes is not cached on SSDs, avoiding undesirable
cache invalidation Cache invalidation is a process in a computer system whereby entries in a cache are replaced or removed. It can be done explicitly, as part of a cache coherence protocol. In such a case, a processor changes a memory location and then invalidates ...
during such operations; performance-wise, this is beneficial because the sequential I/O operations are suitable for HDDs due to their mechanical nature. Not caching the sequential I/O also helps in extending the lifetime of SSDs used as caches.


History

Another dm-cache project with similar goals was announced by Eric Van Hensbergen and Ming Zhao in 2006, as the result of an internship work at IBM. Later, Joe Thornber, Heinz Mauelshagen and Mike Snitzer provided their own implementation of the concept, which resulted in the inclusion of dm-cache into the Linux kernel. dm-cache was merged into the
Linux kernel mainline The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU o ...
in kernel version 3.9, which was released on April 28, 2013.


Design

In dm-cache, creating a mapped virtual block device that acts as a
hybrid volume In computing, a hybrid drive (solid state hybrid drive – SSHD) is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding s ...
requires three physical storage devices: * ''Origin device'' provides slow primary storage (usually an HDD) * ''Cache device'' provides a fast cache (usually an SSD) * ''Metadata device'' records the placement of blocks and their dirty flags, as well as other internal data required by a cache policy, including per-block hit counts; a metadata device cannot be shared between multiple cache devices, and it is recommended to be
mirrored ''Mirrored'' is the debut studio album by American experimental rock band Battles. It was released on May 14, 2007 in the United Kingdom, and on May 22, 2007 in the United States. ''Mirrored'' marked the first album in which the band incorporated ...
Internally, dm-cache references to each of the origin devices through a number of fixed-size blocks; the size of these blocks, equaling to the size of a caching extent, is configurable only during the creation of a hybrid volume. The size of a caching extent must range between 32  KB and 1  GB, and it must be a multiple of 32 KB; typically, the size of a caching extent is between 256 and 1024 KB. The choice of the caching extents bigger than
disk sector In computer disk storage, a sector is a subdivision of a track on a magnetic disk or optical disc. Each sector stores a fixed amount of user-accessible data, traditionally 512 bytes for hard disk drives (HDDs) and 2048 bytes for CD-ROMs and D ...
s acts a compromise between the size of
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
and the possibility for wasting cache space. Having too small caching extents increases the size of metadata, both on the metadata device and in kernel memory, while having too large caching extents increases the amount of wasted cache space due to caching whole extents even in the case of high hit rates only for some of their parts. Operating modes supported by dm-cache are ''
write-back In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewhe ...
'', which is the default, ''
write-through In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...
'', and ''pass-through''. In the write-back operating mode, writes to cached blocks go only to the cache device, while the blocks on origin device are only marked as dirty in the metadata. For the write-through operating mode, write requests are not returned as completed until the data reaches both the origin and cache devices, with no clean blocks becoming marked as dirty. In the pass-through operating mode, all reads are performed directly from the origin device, avoiding the cache, while all writes go directly to the origin device; any cache write hits also cause invalidation of the cached blocks. The pass-through mode allows a hybrid volume to be activated when the state of a cache device is not known to be consistent with the origin device. The rate of data migration that dm-cache performs in both directions (i.e., data promotions and demotions) can be throttled down to a configured speed so regular I/O to the origin and cache devices can be preserved. Decommissioning a hybrid volume or shrinking a cache device requires use of the ''cleaner'' policy, which effectively flushes all blocks marked in metadata as dirty from the cache device to the origin device.


Cache policies

and version 4.2 of the Linux kernel, the following three cache policies are distributed with the Linux kernel mainline, out of which dm-cache by default uses the ''stochastic multiqueue'' policy: ; multiqueue (mq) : The ''multiqueue'' (mq) policy has three sets of 16 queues, using the first set for entries waiting for the cache and the remaining two sets for entries already in the cache, with the latter separated so the clean and dirty entries belong to each of the two sets. The age of cache entries in the queues is based on their associated logical time. The selection of entries going into the cache (i.e., becoming promoted) is based on variable thresholds, and queue selection is based on the hit count of an entry. This policy aims to take different
cache miss In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...
costs into account, and to make automatic adjustments to different load patterns. : This policy internally tracks
sequential I/O Sequential access is a term describing a group of elements (such as data in a memory array or a disk file or on magnetic tape data storage) being accessed in a predetermined, ordered sequence. It is the opposite of random access, the ability to a ...
operations so they can be routed around the cache, with different configurable thresholds for the differentiation between random I/O and sequential I/O operations. As a result, large contiguous I/O operations are left to be performed by the origin device because such data access patterns are suitable for HDDs, and because they avoid undesirable cache invalidation. ; stochastic multiqueue (smq) : The ''stochastic multiqueue'' (smq) policy performs in a similar way as the ''multiqueue'' policy, but requires fewer resources to operate; in particular, it uses substantially smaller amounts of
main memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
to track cached blocks. It also replaces the hit counting from the ''multiqueue'' policy with a "hotspot" queue, and decides on data promotion and demotion on a
least-recently used In computing, cache algorithms (also frequently called cache replacement algorithms or cache replacement policies) are optimizing instructions, or algorithms, that a computer program or a hardware-maintained structure can utilize in order to ma ...
(LRU) basis. As a result, this policy provides better performance compared to the ''multiqueue'' policy, adjusts better automatically to different load patterns, and eliminates the configuration of various thresholds. ; cleaner : The ''cleaner'' policy writes back to the origin device all blocks that are marked as dirty in the metadata. After the completion of this operation, a hybrid volume can be decommissioned or the size of a cache device can be reduced.


Use with LVM

Logical Volume Manager In computer storage, logical volume management or LVM provides a method of allocating space on mass-storage devices that is more flexible than conventional partitioning schemes to store volumes. In particular, a volume manager can concatenate ...
includes lvmcache, which provides a wrapper for dm-cache integrated with LVM.


See also

*
bcache bcache (abbreviated from ''block cache'') is a cache in the Linux kernel's block layer, which is used for accessing secondary storage devices. It allows one or more fast storage devices, such as flash-based solid-state drives (SSDs), to act as ...
a Linux kernel's block layer cache, developed by Kent Overstreet *
Flashcache Flashcache is a disk cache component for the Linux kernel, initially developed by Facebook since April 2010, and released as open source in 2011. Since January 2013, there is a fork of Flashcache, named EnhanceIO and developed by sTec, Inc. Si ...
a disk cache component for the Linux kernel, initially developed by Facebook *
Hybrid drive In computing, a hybrid drive (solid state hybrid drive – SSHD) is a logical or physical storage device that combines a faster storage medium such as solid-state drive (SSD) with a higher-capacity hard disk drive (HDD). The intent is adding s ...
a storage device that combines flash-based and spinning magnetic media storage technologies *
ReadyBoost ReadyBoost (codenamed EMD) is a disk caching software component developed by Microsoft for Windows Vista and included in later versions of Windows. ReadyBoost enables NAND memory mass storage CompactFlash, SD card, and USB flash drive devices t ...
a disk caching software component of Windows Vista and later Microsoft operating systems *
Smart Response Technology In computer data storage, Smart Response Technology (SRT, also called SSD Caching before it was launched) is a proprietary caching mechanism introduced in 2011 by Intel for their Z68 chipset (for the Sandy Bridge–series processors), which a ...
(SRT) a proprietary disk storage caching mechanism, developed by Intel for its chipsets *
ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an ope ...
a cross-OS storage management system that has a similar integrated caching device support (L2ARC)


References


External links


Linux Block Caching Choices in Stable Upstream Kernel
(PDF),
Dell Dell is an American based technology company. It develops, sells, repairs, and supports computers and related products and services. Dell is owned by its parent company, Dell Technologies. Dell sells personal computers (PCs), servers, data ...
, December 2013
Performance Comparison among EnhanceIO, bcache and dm-cache
LKML The Linux kernel mailing list (LKML) is the main electronic mailing list for Linux kernel development, where the majority of the announcements, discussions, debates, and flame wars over the kernel take place. Many other mailing lists exist to ...
, June 11, 2013
EnhanceIO, Bcache & DM-Cache Benchmarked
Phoronix Phoronix Test Suite (PTS) is a free and open-source benchmark software for Linux and other operating systems which is developed by Michael Larabel and Matthew Tippett. The Phoronix Test Suite has been endorsed by sites such as Linux.com, LinuxP ...
, June 11, 2013, by Michael Larabel
SSD Caching Using dm-cache Tutorial
July 2014, by Kyle Manna

ATCH 8/8 m-cachecache target], December 14, 2012 (guidelines for metadata device sizing) {{Operating system Device mapper Solid-state caching Free software programmed in C