LizardFS is an
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
distributed file system
A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for ...
that is
POSIX
The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming inte ...
-compliant and licensed under
GPLv3
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...
. It was released in 2013 as fork of
MooseFS
Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system ...
. LizardFS is also offering a paid Technical Support (Standard, Enterprise and Enterprise Plus) with possibility of configurating and setting up the cluster and active cluster monitoring.
LizardFS is a distributed, scalable and fault-tolerant file system. The file system is designed so that it is possible to add more disks and servers “on the fly”, without the need for any server reboots or shut-downs.
Description
LizardFS makes files secure by keeping all the data in multiple replicas spread over the available servers. This storage is presented to the end-user as a single logical namespace. It can also be used to build space-efficient storage because it is designed to run on
commodity hardware. It has applications in multiple fields and is used by institutions in finance, telecommunications, medicine, education, post-production, game development, cloud hosting services, and others.
Hardware
LizardFS is fully hardware agnostic. Commodity hardware can be utilized for cost efficiency. The minimum requirements are two dedicated nodes with a number of disks, but to obtain a
high available installation at least 3 nodes are needed. This will also enable the use of
erasure coding.
Architecture
LizardFS keeps
metadata (e.g. file names, modification timestamps, directory trees) and the data separately. Metadata are kept on metadata servers, while data is kept on chunkservers.
A typical installation consists of:
* At least two metadata servers, which work in the master-slave mode for failure recovery. Their role is to manage the whole installation, so the active metadata server is often called the master server. The role of other metadata servers is to keep in sync with the active master server, so they are often called shadow master servers. Any shadow master server is ready to take the role of the master server at any time. A suggested configuration of a metadata server is a machine with fast
CPU
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
, at least 32 GB of RAM and at least one drive (preferably SSD) to store several GB of metadata.
* A set of chunkservers which store the data. Each file is divided into blocks called chunks (each up to 64 MB) which are stored on the chunkservers. A suggested configuration of a chunkserver is a machine with large disk space available either in a
JBOD
The most widespread standard for configuring multiple hard disk drives is RAID (Redundant Array of Inexpensive/Independent Disks), which comes in a number of standard configurations and non-standard configurations. Non-RAID drive architectures a ...
or
RAID
Raid, RAID or Raids may refer to:
Attack
* Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground
* Corporate raid, a type of hostile takeover in business
* Panty raid, a prankish raid by male colleg ...
configuration. CPU and RAM are not very important. You can have as little as 2 chunkservers or as many as hundreds of them.
* Clients who use the data stored on LizardFS. These machines use LizardFS mount to access files in the installation and process them just like those on their local hard drives. Files stored on LizardFS can be seen and accessed by as many clients as needed.
Features
*
Snapshots
Snapshot, snapshots or snap shot may refer to:
* Snapshot (photography), a photograph taken without preparation
Computing
* Snapshot (computer storage), the state of a system at a particular point in time
* Snapshot (file format) or SNP, a fil ...
- When creating a snapshot, only the metadata of a target file is copied, speeding up the operation. Chunks of the original and the duplicated file are shared until one of them is modified.
*
QoS - LizardFS offers mechanisms that allow administrators to set read/write bandwidth limits for all the traffic generated by a given mount point, as well as for a specific group of processes spread over multiple client machines and mountpoints.
*
Data replication
Replication in computing involves sharing information so as to ensure consistency between redundant resources, such as software or hardware components, to improve reliability, fault-tolerance, or accessibility.
Terminology
Replication in com ...
- Files stored in LizardFS are divided into blocks called chunks, each up to 64 MB in size. Each chunk is kept on chunkservers and administrators can choose how many copies of each file are maintained. For example, choosing to keep 3 copies (configuration goal=3), all of the data will survive a failure of any two disks or chunkservers, because LizardFS will never keep 2 copies of the same chunk on the same node.
*
Geo-replication
Geo-replication systems are designed to provide improved availability and disaster tolerance by using geographically distributed data centers. This is intended to improve the response time for applications such as web portals. Geo-replication can b ...
- With Geo-replication you can decide where the chunks are stored. The topology feature allows for suggesting which copy should be read by a client in the case when more than one copy is available. For example, when LizardFS is deployed across two data centers, e.g. one located in London and one in Paris, it is possible to assign the label “london” to each server in the London location and “paris” to each server in the Paris location.
* Metadata replication - Metadata is stored on metadata servers. At any time, one of the metadata servers also manages the whole installation and is called the master server. Other metadata servers remain in sync with it and are shadow master servers
*
High availability
High availability (HA) is a characteristic of a system which aims to ensure an agreed level of operational performance, usually uptime, for a higher than normal period.
Modernization has resulted in an increased reliance on these systems. F ...
- Shadow master servers provide LizardFS with High Availability. If there is at least one shadow master server running and the active master server is lost, one of the shadow master servers takes over
*
Quotas
Quota may refer to:
Economics
* Import quota, a trade restriction on the quantity of goods imported into a country
* Market Sharing Quota, an economic system used in Canadian agriculture
* Milk quota, a quota on milk production in Europe
* Indi ...
- LizardFS support disk quota mechanism known from other POSIX le systems. It offers an option to set soft and hard limits for a number of files and their total size for a specific user or a group of users. A user whose hard limit is exceeded cannot write new data to LizardFS.
* Trash - Another feature of LizardFS is a transparent and fully automatic trash bin. After removing any file, it is moved to a trash bin, which is visible only to the administrator. Any file in the trash bin can be restored or deleted permanently.
* Native
Windows™ client - LizardFS Windows Client can be installed on both workstations and servers. It provides access to files stored on LizardFS via a virtual drive. The Windows client is a licensed feature to be obtained by contacting the creators of LizardFS - Distributed FS Sp. z o.o.
* Monitoring LizardFS offers two monitoring interfaces. First of all, there is a command-line tool useful for systems like Nagios, Zabbix, Icinga, which are typically used for proactive monitoring. Moreover, there is a graphical web-based monitoring interface available for administrators, which allows tracking almost all aspects of a system.
*
Hadoop
Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
- This is a java based solution allowing Hadoop to use LizardFS storage, implementing an HDFS interface to LizardFS. It functions as a kind of a File System Abstraction Layer. It enables you to use Hadoop jobs to directly access the data on a LizardFS cluster. The plugin translates LizardFS protocol and makes the metadata readable for Yarn and Map Reduce
*
NFS and
pNFS
Network File System (NFS) is a distributed file system protocol originally developed by Sun Microsystems (Sun) in 1984, allowing a user on a client computer to access files over a computer network much like local storage is accessed. NFS, like ...
- LizardFS uses NFS-ganesha server to create NFS shares, so technically NFS client connects not with the master server, but with a Ganesha file server that talks directly with LizardFS components. From the user point of view, it works just like an ordinary NFS server.
See also
*
Hyper-converged infrastructure
Hyper-converged infrastructure (HCI) is a software-defined IT infrastructure that virtualizes all of the elements of conventional " hardware-defined" systems. HCI includes, at a minimum, virtualized computing (a hypervisor), software-defined ...
*
Distributed file system
A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for ...
*
List of file systems#Distributed parallel fault-tolerant file systems
*
MooseFS
Moose File System (MooseFS) is an open-source, POSIX-compliant distributed file system developed by Core Technology. MooseFS aims to be fault-tolerant, highly available, highly performing, scalable general-purpose network distributed file system ...
*
BeeGFS
BeeGFS (formerly FhGFS) is a parallel file system, developed and optimized for high-performance computing. BeeGFS includes a distributed metadata architecture for scalability and flexibility reasons. Its most used and widely known aspect is da ...
References
External links
* {{Official website, https://lizardfs.com/, LizardFS official website
LizardFS on GitHubLizardFS official documentation
File system management
Distributed file systems
Network file systems
Userspace file systems
Distributed file systems supported by the Linux kernel