HOME

TheInfoList



OR:

Apache CarbonData is a
free and open-source Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
column-oriented data storage format of the Apache Hadoop ecosystem. It is similar to the other columnar-storage file formats available in
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
namely
RCFile Within computing database management systems, the RCFile (Record Columnar File) is a data placement structure that determines how to store relational tables on computer clusters. It is designed for systems using the MapReduce framework. The RCFile ...
and
ORC An Orc (or Ork) is a fictional humanoid monster like a goblin. Orcs were brought into modern usage by the fantasy writings of J. R. R. Tolkien, especially ''The Lord of the Rings''. In Tolkien's works, Orcs are a brutish, aggressive, ugly, ...
. It is compatible with most of the data processing frameworks in the
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage ...
environment. It provides efficient
data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...
and
encoding In communications and information processing, code is a system of rules to convert information—such as a letter (alphabet), letter, word, sound, image, or gesture—into another form, sometimes data compression, shortened or secrecy, secret ...
schemes with enhanced performance to handle complex data in bulk.


History

CarbonData was developed at Huawei in 2013. The project was donated to the Apache Community in 2015 submitted to the Apache Incubator in June 2016. The project won top honors in the BlackDuck 2016 Open Source Rookies of the Year's Big Data category. Apache CarbonData has been a top-level Apache Software Foundation (ASF)-sponsored project since May 1, 2017.


See also

*
Pig (programming tool) Apache Pig is a high-level platform for creating programs that run on Apache Hadoop. The language for this platform is called Pig Latin. Pig can execute its Hadoop jobs in MapReduce, Apache Tez, or Apache Spark. Pig Latin abstracts the program ...
*
Apache Hive Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like interface to query data stored in various databases and file systems that integrate with Hadoop. Traditi ...
*
Apache Impala Apache Impala is an open source massively parallel processing (MPP) SQL query engine for data stored in a computer cluster running Apache Hadoop. Impala has been described as the open-source equivalent of Google F1, which inspired its developmen ...
*
Apache Drill Apache Drill is an open-source software framework that supports data-intensive distributed applications for interactive analysis of large-scale datasets. Built chiefly by contributions from developers from MapR, Drill is inspired by Google's D ...
*
Apache Kudu Apache Kudu is a free and open source column-oriented data store of the Apache Hadoop ecosystem. It is compatible with most of the data processing frameworks in the Hadoop environment. It provides completeness to Hadoop's storage layer to enable ...
*
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Califor ...
*
Apache Thrift Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous programming languages. It was developed at Facebook for "scalable cross-language services development" and as of 2 ...
*
Apache Parquet Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processin ...
* Trino (SQL query engine) *
Presto (SQL query engine) Presto (including PrestoDB, and PrestoSQL which was re-branded to Trino) is a distributed query engine for big data using the SQL query language. Its architecture allows users to query data sources such as Hadoop, Cassandra, Kafka, AWS S3, All ...


References


External links

* {{DEFAULTSORT:CarbonData 2015 software CarbonData Cloud computing Free system software Hadoop Software using the Apache license