CephFS
   HOME

TheInfoList



OR:

Ceph (pronounced ) is an
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
software-defined
storage Storage may refer to: Goods Containers * Dry cask storage, for storing high-level radioactive waste * Food storage * Intermodal container, cargo shipping * Storage tank Facilities * Garage (residential), a storage space normally used to store car ...
platform that implements object storage on a single distributed
computer cluster A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software. The comp ...
and provides 3-in-1 interfaces for object-, block- and file-level storage. Ceph aims primarily for completely distributed operation without a
single point of failure A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. SPOFs are undesirable in any system with a goal of high availability or reliability, be it a business practice, software appl ...
, scalability to the exabyte level, and to be freely available. Since version 12, Ceph does not rely on other filesystems and can directly manage HDDs and
SSD A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is ...
s with its own storage backend BlueStore and can completely self reliantly expose a
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
filesystem. Ceph replicates data and makes it
fault-tolerant Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of one or more faults within some of its components. If its operating quality decreases at all, the decrease is proportional to the ...
, using
commodity hardware Commodity computing (also known as commodity cluster computing) involves the use of large numbers of already-available computing components for parallel computing, to get the greatest amount of useful computation at low cost. It is computing done i ...
and Ethernet IP and requiring no specific hardware support. The Ceph’s system offers disaster recovery and data redundancy through techniques such as replication, erasure coding, snapshots and storage cloning. As a result of its design, the system is both self-healing and self-managing, aiming to minimize administration time and other costs. In this way, administrators have a single, consolidated system that collects the storage within a common management framework. Ceph consolidates several storage use cases and improves resource utilization. It also lets an organization deploy servers where needed. Some of the big production Ceph deployments include
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
,
OVH OVH, legally OVH Groupe SAS, is a French cloud computing company which offers VPS, dedicated servers and other web services. As of 2016 OVH owned the world's largest data center in surface area. As of 2019, it was the largest hosting provide ...
and
DigitalOcean DigitalOcean Holdings, Inc. () is an American multinational technology company and cloud service provider. The company is headquartered in New York City, New York, USA, with 15 globally distributed data centers worldwide. DigitalOcean provides ...
.


Design

Ceph employs five distinct kinds of daemons: * Cluster monitors () that keep track of active and failed cluster nodes, cluster configuration, and information about data placement and global cluster state. *
Object storage device Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
s () that use a direct, journaled disk storage (named BlueStore, which since the v12.x release replaces the FileStore which would use a filesystem) *
Metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
servers () that cache and broker access to
inode The inode (index node) is a data structure in a Unix-style file system that describes a file-system object such as a file or a directory. Each inode stores the attributes and disk block locations of the object's data. File-system object attribute ...
s and directories inside a CephFS filesystem. *
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
gateways () that expose the object storage layer as an interface compatible with Amazon S3 or
OpenStack Swift OpenStack is a free software, free, open-source software, open standard cloud computing platform. It is mostly deployed as Cloud computing#Infrastructure as a service (IaaS), infrastructure-as-a-service (IaaS) in both public and private clouds w ...
APIs * Managers () that perform cluster monitoring, bookkeeping, and maintenance tasks, and interface to external monitoring systems and management (e.g. balancer, dashboard, Prometheus, Zabbix plugin) All of these are fully distributed, and may run on the same set of servers. Clients with different needs can directly interact with different subsets of them. Ceph does
striping In computer data storage, data striping is the technique of segmenting logically sequential data, such as a file, so that consecutive segments are stored on different physical storage devices. Striping is useful when a processing device request ...
of individual files across multiple nodes to achieve higher throughput, similar to how RAID0 stripes partitions across multiple
hard drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magnet ...
s. Adaptive load balancing is supported whereby frequently accessed objects are replicated over more nodes. , BlueStore is the default and recommended storage type for production environments, which is Ceph's own storage implementation providing better latency and configurability than the filestore backend, and avoiding the shortcomings of the filesystem based storage involving additional processing and caching layers. The filestore backend is still considered useful and very stable;
XFS XFS is a high-performance 64-bit journaling file system created by Silicon Graphics, Inc (SGI) in 1993. It was the default file system in SGI's IRIX operating system starting with its version 5.3. XFS was ported to the Linux kernel in 2001; as ...
used to be the recommended underlying filesystem type for production environments, while Btrfs was recommended for non-production environments. ext4 filesystems were not recommended because of resulting limitations on the maximum RADOS objects length. Even using BlueStore, XFS is used for a small partition of metadata. From 2019 there is ongoing project to reimplement OSD in Ceph, called Crimson. Main goal of Crimson is minimizing CPU overhead and latency, because modern storage devices like NVMe got much faster than HDD and even
SSD A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data persistently, typically using flash memory, and functioning as secondary storage in the hierarchy of computer storage. It is ...
, but CPUs do not catch up with that change. Moreover is meant to be backward compatible drop-in replacement for . While Crimson can work with BlueStore, a new ObjectStore implementation called SeaStore is also being developed.


Object storage S3

Ceph implements distributed object storage - BlueStore. RADOS gateway () expose the object storage layer as an interface compatible with Amazon S3. These are often capacitive disks which are associated with Ceph's S3 object storage for use cases:
Big Data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
(datalake), Backup & Archives,
IOT The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other com ...
, media, video recording, etc. Ceph's software libraries provide client applications with direct access to the ''reliable autonomic distributed object store'' (RADOS) object-based storage system, and also provide a foundation for some of Ceph's features, including ''RADOS Block Device'' (RBD), ''RADOS Gateway'', and the ''Ceph File System''. In this way, administrators can maintain their storage devices as a unified system, which makes it easier to replicate and protect the data. The "librados"
software libraries In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subro ...
provide access in C, C++,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
,
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group ...
, and
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
. The RADOS Gateway also exposes the object store as a RESTful interface which can present as both native Amazon S3 and
OpenStack Swift OpenStack is a free software, free, open-source software, open standard cloud computing platform. It is mostly deployed as Cloud computing#Infrastructure as a service (IaaS), infrastructure-as-a-service (IaaS) in both public and private clouds w ...
APIs.


Block storage

Ceph's object storage system allows users to mount Ceph as a thin-provisioned block device. When an application writes data to Ceph using a block device, Ceph automatically stripes and replicates the data across the cluster. Ceph's ''RADOS Block Device'' (RBD) also integrates with Kernel-based Virtual Machines (KVMs). These are often fast disks (NVMe, SSD) which are associated with Ceph's block storage for use cases, including databases, virtual machines, data analytics, artificial intelligence, and machine learning. "Ceph-RBD" interfaces with the same Ceph object storage system that provides the librados interface and the CephFS file system, and it stores block device images as objects. Since RBD is built on librados, RBD inherits librados's abilities, including read-only snapshots and revert to snapshot. By striping images across the cluster, Ceph improves read access performance for large block device images. "Ceph-iSCSI" is a gateway which enables access to distributed, highly available block storage from any
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
and
VMware vSphere VMware vSphere (formerly VMware Infrastructure 4) is VMware's cloud computing virtualization platform. It includes an updated vCenter Configuration Manager, as well as vCenter Application Discovery Manager, and the ability of vMotion to move m ...
server or client capable of speaking the iSCSI protocol. By using ceph-iscsi on one or more iSCSI gateway hosts, Ceph RBD images become available as Logical Units (LUs) associated with iSCSI targets, which can be accessed in an optionally load-balanced, highly available fashion. Since all of ceph-iscsi configuration is stored in the Ceph RADOS object store, ceph-iscsi gateway hosts are inherently without persistent state and thus can be replaced, augmented, or reduced at will. As a result, Ceph Storage enables customers to run a truly distributed, highly-available, resilient, and self-healing enterprise storage technology on commodity hardware and an entirely open source platform. The block device can be virtualized, providing block storage to virtual machines, in virtualization platforms such as
Openshift OpenShift is a family of containerization software products developed by Red Hat. Its flagship product is the OpenShift Container Platform — a hybrid cloud platform as a service built around Linux containers orchestrated and managed by Kubernet ...
, OpenStack, Kubernetes, OpenNebula,
Ganeti Ganeti is a virtual machine cluster management tool originally developed by Google. The solution stack uses either Xen, KVM, or LXC as the virtualization platform, LVM for disk management, and optionally DRBD for disk replication across physica ...
,
Apache CloudStack CloudStack is open-source cloud computing software for creating, managing, and deploying infrastructure cloud services. It uses existing hypervisor platforms for virtualization, such as KVM, VMware vSphere, including ESXi and vCenter, and Xe ...
and
Proxmox Virtual Environment Proxmox Virtual Environment (Proxmox VE or PVE) is an open-source software server for virtualization management. It is a hosted hypervisor that can run operating systems including Linux and Windows on x64 hardware. It is a Debian-based Lin ...
.


File system storage

Ceph's file system (CephFS) runs on top of the same object storage system that provides object storage and block device interfaces. The Ceph metadata server cluster provides a service that maps the directories and file names of the file system to objects stored within RADOS clusters. The metadata server cluster can expand or contract, and it can rebalance the file system dynamically to distribute data evenly among cluster hosts. This ensures high performance and prevents heavy loads on specific hosts within the cluster. Clients mount the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
-compatible file system using a
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
client. An older
FUSE Fuse or FUSE may refer to: Devices * Fuse (electrical), a device used in electrical systems to protect against excessive current ** Fuse (automotive), a class of fuses for vehicles * Fuse (hydraulic), a device used in hydraulic systems to protect ...
-based client is also available. The servers run as regular Unix daemons. Ceph's file storage is often associated with log collection, messaging, and file storage.


History

Ceph was initially created by
Sage Weil Sage Weil (born March 17, 1978) is the founder and chief architect of Ceph, a distributed storage platform. He also was the creator of WebRing, a co-founder of Los Angeles-based hosting company DreamHost, and the founder and CTO of Inktank. We ...
for his doctoral dissertation, which was advised by Professor Scott A. Brandt at the Jack
Baskin School of Engineering The Jack Baskin School of Engineering, known simply as Baskin Engineering, is the school of engineering at the University of California, Santa Cruz. It consists of six departments: Applied Mathematics, Biomolecular Engineering, Computational Medi ...
,
University of California, Santa Cruz The University of California, Santa Cruz (UC Santa Cruz or UCSC) is a public university, public Land-grant university, land-grant research university in Santa Cruz, California. It is one of the ten campuses in the University of California syste ...
(UCSC), and sponsored by the
Advanced Simulation and Computing Program The Advanced Simulation and Computing Program (or ASC) is a super-computing program run by the National Nuclear Security Administration, in order to simulate, test, and maintain the United States nuclear stockpile. The program was created in 1995 ...
(ASC), including
Los Alamos National Laboratory Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, ...
(LANL), Sandia National Laboratories (SNL), and
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
(LLNL). The first line of code that ended up being part of Ceph was written by Sage Weil in 2004 while at a summer internship at LLNL, working on scalable filesystem metadata management (known today as Ceph's MDS). In 2005, as part of a summer project initiated by Scott A. Brandt and led by Carlos Maltzahn, Sage Weil created a fully functional file system prototype which adopted the name Ceph. Ceph made its debut with Sage Weil giving two presentations in November 2006, one at USENIX OSDI 2006 and another at SC'06. After his graduation in autumn 2007, Weil continued to work on Ceph full-time, and the core development team expanded to include Yehuda Sadeh Weinraub and Gregory Farnum. On March 19, 2010,
Linus Torvalds Linus Benedict Torvalds ( , ; born 28 December 1969) is a Finnish software engineer who is the creator and, historically, the lead developer of the Linux kernel, used by Linux distributions and other operating systems such as Android. He also c ...
merged the Ceph client into Linux kernel version 2.6.34 which was released on May 16, 2010. In 2012, Weil created
Inktank Storage Inktank Storage was the lead development contributor and financial sponsor company behind the open source Ceph distributed file system. Inktank was founded by Sage Weil and Bryan Bogensberger and initially funded by DreamHost, Citrix and Mark Shu ...
for professional services and support for Ceph. In April 2014,
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become ass ...
purchased Inktank, bringing the majority of Ceph development in-house to make it a production version for enterprises with support (hotline) and continuous maintenance (new versions). In October 2015, the Ceph Community Advisory Board was formed to assist the community in driving the direction of open source software-defined storage technology. The charter advisory board includes Ceph community members from global IT organizations that are committed to the Ceph project, including individuals from
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become ass ...
,
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
, Canonical,
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
,
Cisco Cisco Systems, Inc., commonly known as Cisco, is an American-based multinational digital communications technology conglomerate corporation headquartered in San Jose, California. Cisco develops, manufactures, and sells networking hardware, ...
,
Fujitsu is a Japanese multinational information and communications technology equipment and services corporation, established in 1935 and headquartered in Tokyo. Fujitsu is the world's sixth-largest IT services provider by annual revenue, and the la ...
, SanDisk, and
SUSE SUSE ( , ) is a German-based multinational open-source software company that develops and sells Linux products to business customers. Founded in 1992, it was the first company to market Linux for enterprise. It is the developer of SUSE Linux Ent ...
. In November 2018, the Linux Foundation launched the Ceph Foundation as a successor to the Ceph Community Advisory Board. Founding members of the Ceph Foundation included Amihan, Canonical, China Mobile,
DigitalOcean DigitalOcean Holdings, Inc. () is an American multinational technology company and cloud service provider. The company is headquartered in New York City, New York, USA, with 15 globally distributed data centers worldwide. DigitalOcean provides ...
,
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
,
OVH OVH, legally OVH Groupe SAS, is a French cloud computing company which offers VPS, dedicated servers and other web services. As of 2016 OVH owned the world's largest data center in surface area. As of 2019, it was the largest hosting provide ...
, ProphetStor Data Services,
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become ass ...
, SoftIron,
SUSE SUSE ( , ) is a German-based multinational open-source software company that develops and sells Linux products to business customers. Founded in 1992, it was the first company to market Linux for enterprise. It is the developer of SUSE Linux Ent ...
,
Western Digital Western Digital Corporation (WDC, commonly known as Western Digital or WD) is an American computer drive manufacturer and data storage company, headquartered in San Jose, California. It designs, manufactures and sells data technology produc ...
, XSKY Data Technology, and
ZTE ZTE Corporation is a Chinese partially state-owned technology company that specializes in telecommunication. Founded in 1985, ZTE is listed on both the Hong Kong and Shenzhen Stock Exchanges. ZTE's core business is wireless, exchange, optic ...
. In March 2021, SUSE discontinued its Enterprise Storage product incorporating Ceph in favor of Longhorn. and the former Enterprise Storage website was updated stating "SUSE has refocused the storage efforts around serving our strategic SUSE Enterprise Storage Customers and are no longer actively selling SUSE Enterprise Storage."


Release history


Etymology

The name "Ceph" is an abbreviation of "
cephalopod A cephalopod is any member of the molluscan class Cephalopoda (Greek plural , ; "head-feet") such as a squid, octopus, cuttlefish, or nautilus. These exclusively marine animals are characterized by bilateral body symmetry, a prominent head ...
", a class of molluscs that includes the octopus. The name (emphasized by the logo) suggests the highly parallel behavior of an octopus and was chosen to associate the file system with "Sammy", the banana slug mascot of UCSC. Both cephalopods and banana slugs are molluscs.


See also

*
BeeGFS BeeGFS (formerly FhGFS) is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is data ...
* Distributed file system * Distributed parallel fault-tolerant file systems *
Gfarm file system Gfarm file system is an open-source distributed file system, generally used for large-scale cluster computing and wide-area data sharing, and provides features to manage replica location explicitly. The name is derived from the Grid Data Farm arch ...
* GlusterFS * IBM General Parallel File System (GPFS) * Kubernetes * LizardFS *
Lustre Lustre or Luster may refer to: Places * Luster, Norway, a municipality in Vestlandet, Norway ** Luster (village), a village in the municipality of Luster * Lustre, Montana, an unincorporated community in the United States Entertainment * '' ...
*
MapR FS The MapR File System (MapR FS) is a clustered file system that supports both very large-scale and high-performance uses. MapR FS supports a variety of interfaces including conventional read/write file access via NFS and a FUSE interface, as well ...
*
Moose File System Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system ...
*
OrangeFS OrangeFS is an open-source parallel file system, the next generation of Parallel Virtual File System (PVFS). A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurr ...
*
Parallel Virtual File System The Parallel Virtual File System (PVFS) is an open-source parallel file system. A parallel file system is a type of distributed file system that distributes file data across multiple servers and provides for concurrent access by multiple tasks of ...
*
Quantcast File System Quantcast File System (QFS) is an open-source distributed file system software package for large-scale MapReduce or other batch-processing workloads. It was designed as an alternative to the Apache Hadoop Distributed File System (HDFS), intended ...
*
RozoFS RozoFS is a free software distributed file system. It comes as a free software, licensed under the GNU GPL v2. RozoFS uses erasure coding for redundancy. Design Rozo provides an open source POSIX filesystem, built on top of distributed file s ...
* Software-defined storage *
XtreemFS XtreemFS is an object-based, distributed file system for wide area networks.F. Hupfeld, T. Cortes, B. Kolbeck, E. Focht, M. Hess, J. Malo, J. Marti, J. Stender, E. Cesario"XtreemFS - a case for object-based storage in Grid data management" VLDB W ...
*
ZFS ZFS (previously: Zettabyte File System) is a file system with volume management capabilities. It began as part of the Sun Microsystems Solaris operating system in 2001. Large parts of Solaris – including ZFS – were published under an open ...
* Comparison of distributed file systems


References


Further reading

* * * *


External links

* * {{Red Hat Distributed file systems supported by the Linux kernel Free software Network file systems Red Hat software Userspace file systems Virtualization software for Linux