HOME

TheInfoList



OR:

Apache ORC (Optimized Row Columnar) is a
free and open-source Free and open-source software (FOSS) is software available under a Software license, license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term ...
column-oriented data storage format. It is similar to the other columnar-storage file formats available in the
Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
ecosystem such as
RCFile Within database management systems, the record columnar file or RCFile is a data placement structure that determines how to store Table (database), relational tables on computer clusters. It is designed for systems using the MapReduce framework. Th ...
and Parquet. It is used by most of the data processing frameworks
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
,
Apache Hive Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like Interface (computing), interface to query data stored in various databases and file systems that i ...
,
Apache Flink Apache Flink is an Open-source software, open-source, unified stream processing, stream-processing and batch processing, batch-processing software framework, framework developed by the Apache Software Foundation. The core of Apache Flink is a dis ...
, and
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
. In February 2013, the Optimized Row Columnar (ORC) file format was announced by Hortonworks in collaboration with
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
. A month later, the
Apache Parquet Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing f ...
format was announced, developed by Cloudera and
Twitter Twitter, officially known as X since 2023, is an American microblogging and social networking service. It is one of the world's largest social media platforms and one of the most-visited websites. Users can share short text messages, image ...
. Apache ORC format is widely supported including
Amazon Web Services Amazon Web Services, Inc. (AWS) is a subsidiary of Amazon.com, Amazon that provides Software as a service, on-demand cloud computing computing platform, platforms and Application programming interface, APIs to individuals, companies, and gover ...
'
Glue Adhesive, also known as glue, cement, mucilage, or paste, is any non-metallic substance applied to one or both surfaces of two separate items that binds them together and resists their separation. The use of adhesives offers certain advantage ...
,
Google Cloud Platform Google Cloud Platform (GCP) is a suite of cloud computing services offered by Google that provides a series of modular cloud services including computing, Computer data storage, data storage, Data analysis, data analytics, and machine learnin ...
's BigQuery, and Pandas (software).


History


See also

* Apache Arrow *
Apache Hive Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like Interface (computing), interface to query data stored in various databases and file systems that i ...
* Apache NiFi *
Apache Parquet Apache Parquet is a free and open-source column-oriented data storage format in the Apache Hadoop ecosystem. It is similar to RCFile and ORC, the other columnar-storage file formats in Hadoop, and is compatible with most of the data processing f ...
*
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
* Pig (programming tool) * Trino (SQL query engine) * Presto (SQL query engine)


References

{{DEFAULTSORT:ORC 2013 software ORC Cloud computing Free system software Hadoop Software using the Apache license