Quantcast File System
   HOME

TheInfoList



OR:

Quantcast File System (QFS) is an open-source
distributed file system A clustered file system (CFS) is a file system which is shared by being simultaneously Mount (computing), mounted on multiple Server (computing), servers. There are several approaches to computer cluster, clustering, most of which do not emplo ...
software package for large-scale
MapReduce MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster. A MapReduce program is composed of a ''map'' procedure, which performs filte ...
or other batch-processing workloads. It was designed as an alternative to the
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
Distributed File System (
HDFS Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
), intended to deliver better performance and cost-efficiency for large-scale processing clusters.


Design

QFS is software that runs on a cluster of hundreds or thousands of commodity
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
servers and allows other software layers to interact with them as if they were one giant hard drive. It has three components: *A chunk server runs on each machine that will host data, manages I/O to its hard drives, and monitors its activity and capacity. *A central process called the metaserver keeps the directory structure and maps of files to physical storage. It coordinates activities of all the chunk servers and monitors the overall health of the file system. For high performance it holds all its data in memory, writing checkpoints and transaction logs to disk for recovery. *A client component is the interface point that presents a file system
application programming interface An application programming interface (API) is a connection between computers or between computer programs. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standard that des ...
(API) to other layers of the software. It makes requests of the metaserver to identify which chunk servers hold (or will hold) its data, then interacts with the chunk servers directly to read and write. In a cluster of hundreds or thousands of machines, the odds are low that all will be running and reachable at any given moment, so fault tolerance is the central design challenge. QFS meets it with
Reed–Solomon error correction In information theory and coding theory, Reed–Solomon codes are a group of error-correcting codes that were introduced by Irving S. Reed and Gustave Solomon in 1960. They have many applications, including consumer technologies such as MiniDiscs, ...
. The form of Reed–Solomon encoding used in QFS stores redundant data in nine places and can reconstruct the file from any six of these stripes. When it writes a file, it by default stripes it across nine physically different machines — six holding the data, three holding parity information. Any three of those can become unavailable. If any six remain readable, QFS can reconstruct the original data. The result is fault tolerance at a cost of a 50% expansion of data. QFS is written in the programming language C++, operates within a fixed
memory footprint Memory footprint refers to the amount of main memory that a program uses or references while running. The word footprint generally refers to the extent of physical dimensions that an object occupies, giving a sense of its size. In computing, t ...
, and uses direct input and output (I/O).


History

QFS evolved from the Kosmos File System (KFS), an open source project started by
Kosmix Walmart Labs (formerly named Kosmix and @WalmartLabs) became part of Walmart Global Tech, the technology and business services organization within Walmart. Venky Harinarayan and Anand Rajaraman founded Kosmix in 2005. In April 2011, Walmart acq ...
in 2005.
Quantcast Quantcast is an American technology company, founded in 2006, that specializes in AI-driven real-time advertising, audience insights and measurement. It has offices in the United States, Canada, Australia, Singapore, United Kingdom, Ireland, Fran ...
adopted KFS in 2007, built its own improvements on it over the next several years, and released QFS 1.0 as an open source project in September, 2012.


References


External links

* {{File systems Hadoop Free system software Distributed file systems