Copy On Write
   HOME

TheInfoList



OR:

Copy-on-write (COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is duplicated but not modified, it is not necessary to create a new resource; the resource can be shared between the copy and the original. Modifications must still create a copy, hence the technique: the copy operation is deferred until the first write. By sharing resources in this way, it is possible to significantly reduce the resource consumption of unmodified copies, while adding a small overhead to resource-modifying operations.


In virtual memory management

Copy-on-write finds its main use in sharing the
virtual memory In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very l ...
of
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
processes, in the implementation of the fork system call. Typically, the process does not modify any memory and immediately executes a new process, replacing the address space entirely. Thus, it would be wasteful to copy all of the process's memory during a fork, and instead the copy-on-write technique is used. Copy-on-write can be implemented efficiently using the
page table A page table is the data structure used by a virtual memory system in a computer operating system to store the mapping between virtual addresses and physical addresses. Virtual addresses are used by the program executed by the accessing process, ...
by marking certain pages of
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
as read-only and keeping a count of the number of references to the page. When data is written to these pages, the operating-system
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...
intercepts the write attempt and allocates a new physical page, initialized with the copy-on-write data, although the allocation can be skipped if there is only one reference. The kernel then updates the page table with the new (writable) page, decrements the number of references, and performs the write. The new allocation ensures that a change in the memory of one process is not visible in another's. The copy-on-write technique can be extended to support efficient
memory allocation Memory management is a form of Resource management (computing), resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their re ...
by having a page of
physical memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a compute ...
filled with zeros. When the memory is allocated, all the pages returned refer to the page of zeros and are all marked copy-on-write. This way, physical memory is not allocated for the process until data is written, allowing processes to reserve more virtual memory than physical memory and use memory sparsely, at the risk of running out of virtual address space. The combined algorithm is similar to
demand paging In computer operating systems, demand paging (as opposed to anticipatory paging) is a method of virtual memory management. In a system that uses demand paging, the operating system copies a disk page into physical memory only if an attempt is mad ...
. Copy-on-write pages are also used in the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
's same-page merging feature.


In software

COW is also used in
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
, application and
system A system is a group of Interaction, interacting or interrelated elements that act according to a set of rules to form a unified whole. A system, surrounded and influenced by its environment (systems), environment, is described by its boundaries, ...
code.


Examples

The string class provided by the
C++ standard library The C standard library or libc is the standard library for the C programming language, as specified in the ISO C standard. ISO/IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the original ANSI C standard, it was ...
was specifically designed to allow copy-on-write implementations in the initial C++98 standard, but not in the newer C++11 standard: std::string x("Hello"); std::string y = x; // x and y use the same buffer. y += ", World!"; // Now y uses a different buffer; x still uses the same old buffer. In the
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
programming language, all types except references are implemented as copy-on-write. For example, strings and arrays are passed by reference, but when modified, they are duplicated if they have non-zero reference counts. This allows them to act as value types without the performance problems of copying on assignment or making them immutable. In the Qt framework, many types are copy-on-write ("implicitly shared" in Qt's terms). Qt uses atomic
compare-and-swap In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of tha ...
operations to increment or decrement the internal reference counter. Since the copies are cheap, Qt types can often be safely used by multiple threads without the need of locking mechanisms such as mutexes. The benefits of COW are thus valid in both single- and multithreaded systems.


In computer storage

COW may also be used as the underlying mechanism for
snapshots Snapshot, snapshots or snap shot may refer to: * Snapshot (photography), a photograph taken without preparation Computing * Snapshot (computer storage), the state of a system at a particular point in time * Snapshot (file format) or SNP, a file ...
, such as those provided by logical volume management, file systems such as
Btrfs Btrfs (pronounced as "better F S", "butter F S", "b-tree F S", or simply by spelling it out) is a computer storage format that combines a file system based on the copy-on-write (COW) principle with a logical volume manager (not to be confused ...
and
ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an ope ...
, and database servers such as
Microsoft SQL Server Microsoft SQL Server is a relational database management system developed by Microsoft. As a database server, it is a software product with the primary function of storing and retrieving data as requested by other software applications—which ma ...
. Typically, the snapshots store only the modified data, and are stored close to the original, so they are only a weak form of incremental backup and cannot substitute for a full backup.


See also

*
Allocate-on-flush Allocate-on-flush (also called delayed allocation) is a file system feature implemented in HFS+, XFS, Reiser4, ZFS, Btrfs, and ext4. The feature also closely resembles an older technique that Berkeley's Unix File System, UFS called "block reallocati ...
*
Demand paging In computer operating systems, demand paging (as opposed to anticipatory paging) is a method of virtual memory management. In a system that uses demand paging, the operating system copies a disk page into physical memory only if an attempt is mad ...
* Dirty COW – a computer security vulnerability for the Linux kernel *
Flyweight pattern In computer programming, the flyweight software design pattern refers to an object that minimizes memory usage by sharing some of its data with other similar objects. The flyweight pattern is one of twenty-three well-known '' GoF design patterns' ...
*
Memory management Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when ...
*
Persistent data structure In computing, a persistent data structure or not ephemeral data structure is a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, as their operations do not (v ...
* ReFS *
Virtual memory In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very l ...
*
Wear leveling Wear leveling (also written as wear levelling) is a technique Wear leveling techniques for flash memory systems. for prolonging the service life of some kinds of erasable computer storage media, such as flash memory, which is used in solid-state d ...


References

{{File systems Articles with example C++ code Computer data storage Software optimization Virtual memory Virtualization