Alluxio
   HOME

TheInfoList



OR:

Alluxio is an open-source virtual distributed file system (VDFS). Initially as research project "Tachyon", Alluxio was created at the
University of California, Berkeley The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
's
AMPLab AMPLAB was a University of California, Berkeley lab focused on big data analytics located in Soda Hall. The name stands for the Algorithms, Machines and People Lab. It has been publishing papers since 2008 and was officially launched in 2011. The ...
as
Haoyuan Li Haoyuan (H.Y.) Li is a computer scientist and entrepreneur specializing in distributed systems, big data, and cloud computing. He is best known for proposing Virtual Distributed File System (VDFS), and creating an open-source data orchestratio ...
's Ph.D. Thesis, advised by Professor
Scott Shenker Scott J. Shenker (born January 24, 1956 in Alexandria, Virginia) is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the Intern ...
& Professor
Ion Stoica Ion Stoica is a Romanian-American computer scientist specializing in distributed systems, cloud computing and computer networking. He is a professor of computer science at the University of California, Berkeley and co-director of AMPLab. He co-fo ...
. Alluxio sits between computation and storage in the big data analytics stack. It provides a data abstraction layer for computation frameworks, enabling applications to connect to numerous storage systems through a common interface. The software is published under the Apache License. Data Driven Applications, such as Data Analytics, Machine Learning, and AI, use APIs (such as Hadoop HDFS API, S3 API, FUSE API) provided by Alluxio to interact with data from various storage systems at a fast speed. Popular frameworks running on top of Alluxio include Apache Spark, Presto,
TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...
, Trino,
Apache Hive Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Tradi ...
, and
PyTorch PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...
, etc. Alluxio can be deployed on-premise, in the cloud (e.g. Microsoft Azure, AWS,
Google Compute Engine Google Compute Engine (GCE) is the Infrastructure as a Service (IaaS) component of Google Cloud Platform which is built on the global infrastructure that runs Google's search engine, Gmail, YouTube and other services. Google Compute Engine en ...
), or a hybrid cloud environment. It can run on bare-metal or in a containerized environments such as
Kubernetes Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
, Docker,
Apache Mesos Apache Mesos is an open-source project to manage computer clusters. It was developed at the University of California, Berkeley. History Mesos began as a research project in the UC Berkeley RAD Lab by then PhD students Benjamin Hindman, Andy Ko ...
.


History

Alluxio was initially started by
Haoyuan Li Haoyuan (H.Y.) Li is a computer scientist and entrepreneur specializing in distributed systems, big data, and cloud computing. He is best known for proposing Virtual Distributed File System (VDFS), and creating an open-source data orchestratio ...
at UC Berkeley's
AMPLab AMPLAB was a University of California, Berkeley lab focused on big data analytics located in Soda Hall. The name stands for the Algorithms, Machines and People Lab. It has been publishing papers since 2008 and was officially launched in 2011. The ...
in 2013, and open sourced in 2014. Alluxio had in excess of 1000 contributors in 2018, making it one of the most active projects in the data eco-system. In 2019, Alluxio is ranked as GitHub’s Top 100 Most Valuable Repositories Out of 96 Million. In 2020, Alluxio is ranked as Top 10 Most Critical Java Based Open Source project in the world.Google Comes Up With A Metric For Gauging Critical Open-Source Projects
/ref>


Enterprises that use Alluxio

The following is a list of notable enterprises that have used or are using Alluxio:


See also

* Clustered file system *
Comparison of distributed file systems In computing, a distributed file system (DFS) or network file system is any file system that allows access to files from multiple hosts sharing via a computer network. This makes it possible for multiple users on multiple machines to share file ...
*
Global Namespace A Global Namespace (GNS) is a heterogeneous, enterprise-wide abstraction of all file information, open to dynamic customization based on user-defined parameters. This becomes of particular importance as multiple network based file systems prolife ...
*
List of file systems The following lists identify, characterize, and link to more thorough information on Computer file systems. Many older operating systems support only their one "native" file system, which does not bear any name apart from the name of the operating ...


References


External links

* {{URL, https://www.alluxio.io Free and open-source software