NOVA (filesystem)
   HOME

TheInfoList



OR:

The NOVA (''
non-volatile memory Non-volatile memory (NVM) or non-volatile storage is a type of computer memory that can retain stored information even after power is removed. In contrast, volatile memory needs constant power in order to retain data. Non-volatile memory typic ...
accelerated'') file system is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
,
log-structured file system A log-structured filesystem is a file system in which data and metadata are written sequentially to a circular buffer, called a log. The design was first proposed in 1988 by John K. Ousterhout and Fred Douglis and first implemented in 1992 by Ou ...
for byte-addressable
persistent memory In computer science, persistent memory is any method or apparatus for efficiently storing data structures such that they can continue to be accessed using memory instructions or memory APIs even after the end of the process that created or last mo ...
(for example non-volatile dual in-line memory module (NVDIMM) and
3D XPoint 3D XPoint (pronounced ''three-D cross point'') is a discontinued non-volatile memory (NVM) technology developed jointly by Intel and Micron Technology. It was announced in July 2015 and is available on the open market under the brand name Optane ...
DIMMs) for
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
. NOVA is designed specifically for byte-addressable persistent memories and aims to provide high-performance, atomic file and metadata operations, and fault tolerance. To meet these goals NOVA combines several techniques found in other file systems. NOVA uses
log structure In algebraic geometry, a log structure provides an abstract context to study semistable schemes, and in particular the notion of logarithmic form, logarithmic differential form and the related Hodge theory, Hodge-theoretic concepts. This idea has ...
,
copy-on-write Copy-on-write (COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is dupl ...
(COW), journaling, and log-structured metadata updates to provide strong atomicity guarantees, and it uses a combination replication,
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s, and
RAID 4 In computer storage, the standard RAID levels comprise a basic set of RAID ("redundant array of independent disks" or "redundant array of inexpensive disks") configurations that employ the techniques of striping, mirroring, or parity to create lar ...
parity to protect data and metadata from media errors and software bugs. It also supports checkpoints to facilitate backups.


Filesystem

NOVA was developed at the
University of California, San Diego The University of California, San Diego (UC San Diego or colloquially, UCSD) is a public university, public Land-grant university, land-grant research university in San Diego, California. Established in 1960 near the pre-existing Scripps Insti ...
, in the Non-Volatile Systems Laboratory of the Computer Science and Engineering Department. Patches were initially made available for version 4.12 of the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
. it is limited to
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
Linux, and not ready for merging with the upstream kernel.


Log structure

NOVA is primarily a log-structured file system, but it differs from other log-structured file systems in several respects. First, rather than using a single log for the entire file system, each
inode The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attribute ...
has its own, dedicated log that records the updates to the inode. This allows for increased concurrency in file operations, since different threads can operate on inodes in parallel. Second, the logs do not contain file data, but only metadata updates, resulting in smaller logs. Third, the logs are not stored in physically contiguous memory. Instead, NOVA stores the logs in a
linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whic ...
of 4 KB memory pages. NOVA uses the logs to provide atomicity for operations that affect a single file (e.g., writing to a file or modifying its metadata). To do this, NOVA writes a log entry to empty space past the end of the log and then atomically updates the inode's pointer to the log tail.


Copy-on-write

NOVA uses
copy-on-write Copy-on-write (COW), sometimes referred to as implicit sharing or shadowing, is a resource-management technique used in computer programming to efficiently implement a "duplicate" or "copy" operation on modifiable resources. If a resource is dupl ...
(COW) to update file data. When a program writes data to a file, NOVA allocates some unused memory pages to hold the data and writes the data into them. Then, it appends a log entry to the inode's log that points to the new pages and describes their logical location in the file. Since appending the log entry is atomic, the write is also atomic.


Journaling

Some file operations (e.g., moving a file from one directory to another) require modifying multiple inodes. To make these operations atomic, NOVA uses a simple journaling mechanisms. First, it writes the new log entries to ends of the inodes that the operation will affect, then it uses the journal to record the necessary updates to the inodes' log tail pointers. Next, it marks the journal as committed and applies the updates to the tail pointers.


Metadata protection

NOVA uses replication and
checksum A checksum is a small-sized block of data derived from another block of digital data for the purpose of detecting errors that may have been introduced during its transmission or storage. By themselves, checksums are often used to verify data ...
s to provide protection against metadata corruption due to media errors and
software bug A software bug is an error, flaw or fault in the design, development, or operation of computer software that causes it to produce an incorrect or unexpected result, or to behave in unintended ways. The process of finding and correcting bugs i ...
s. Every metadata structure (e.g., inodes, superblocks, and log entries) contains a
CRC32 A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on t ...
checksum that allows NOVA to detect if structures contents have changed with its knowledge. NOVA also stores two copies of each data structure – the "primary" and the "replica" – and stores them far from one another in memory. Whenever NOVA accesses a metadata structure, it first recomputes the checksum on both the primary and the replica. If either check results in a mismatch, NOVA repairs the damage using the other copy. If neither checksum matches, then the structure is lost and NOVA returns an error.


Data protection

NOVA uses
RAID 4 In computer storage, the standard RAID levels comprise a basic set of RAID ("redundant array of independent disks" or "redundant array of inexpensive disks") configurations that employ the techniques of striping, mirroring, or parity to create lar ...
to protect file data. It divides each 4 KB page into 512-byte strips and stores a parity strip in a dedicated region of persistent memory. It also computes (and stores a replica of) a
CRC32 A cyclic redundancy check (CRC) is an error-detecting code commonly used in digital networks and storage devices to detect accidental changes to digital data. Blocks of data entering these systems get a short ''check value'' attached, based on t ...
checksum for the eight data strips and the parity strip. When NOVA reads a page, it confirms the checksum on each strip. If one of the strips is corrupt, it tries to recover the strip using the parity bits. If no other strips have experienced data corruption, recovery will succeed. Otherwise, recovery fails, the contents of the page are lost, and NOVA returns an error.


References


External links


NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories

Hardening the NOVA File System UCSD-CSE Techreport CS2017-1018

NOVA: The Fastest File System for NVDIMMs
{{Linux Free special-purpose file systems Free software programmed in C Linux kernel features Unix file system-related software