HOME

TheInfoList



OR:

Sqoop is a
command-line interface A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
application for transferring data between
relational database A relational database (RDB) is a database based on the relational model of data, as proposed by E. F. Codd in 1970. A Relational Database Management System (RDBMS) is a type of database management system that stores data in a structured for ...
s and
Hadoop Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
. The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic.


Description

Sqoop supports incremental loads of a single table or a free form SQL query as well as saved jobs which can be run multiple times to import updates made to a database since the last import. Imports can also be used to populate tables in Hive or
HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed File Sy ...
. Exports can be used to put data from Hadoop into a relational database. Sqoop got the name from "SQL-to-Hadoop". Sqoop became a top-level
Apache The Apache ( ) are several Southern Athabaskan language-speaking peoples of the Southwestern United States, Southwest, the Southern Plains and Northern Mexico. They are linguistically related to the Navajo. They migrated from the Athabascan ho ...
project in March 2012.
Informatica Informatica Inc. is an American software development company founded in 1993. It is headquartered in Redwood City, California. Its core products include enterprise cloud data management and data integration. It was co-founded by Gaurav Dhillon a ...
provides a Sqoop-based connector from version 10.1.
Pentaho Pentaho is the brand name for several data management software products that make up the Pentaho+ Data Platform. These include Pentaho Data Integration, Pentaho Business Analytics,  Pentaho Data Catalog, and Pentaho Data Optimiser. Overview P ...
provides
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
Sqoop based connector steps, ''Sqoop Import'' and ''Sqoop Export'', in their ETL suite Pentaho Data Integration since version 4.5 of the software.
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
uses a Sqoop-based connector to help transfer data from
Microsoft SQL Server Microsoft SQL Server is a proprietary relational database management system developed by Microsoft using Structured Query Language (SQL, often pronounced "sequel"). As a database server, it is a software product with the primary function of ...
databases to Hadoop. Couchbase, Inc. also provides a Couchbase Server-Hadoop connector by means of Sqoop.


See also

*
Apache Hadoop Apache Hadoop () is a collection of open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. Hadoop wa ...
*
Apache Hive Apache Hive is a data warehouse software project. It is built on top of Apache Hadoop for providing data query and analysis. Hive gives an SQL-like Interface (computing), interface to query data stored in various databases and file systems that i ...
*
Apache Accumulo Apache Accumulo is a highly scalable sorted, distributed key-value store based on Google's Bigtable. It is a system built on top of Apache Hadoop, Apache ZooKeeper, and Apache Thrift. Written in Java, Accumulo has cell-level access labels and ...
*
Apache HBase HBase is an open-source non-relational distributed database modeled after Google's Bigtable and written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Fil ...


References


Bibliography

*


External links

*
Sqoop WikiSqoop Users Mailing List Archives
{{Apache Software Foundation
Sqoop Sqoop is a command-line interface application for transferring data between relational databases and Hadoop. The Apache Sqoop project was retired in June 2021 and moved to the Apache Attic. Description Sqoop supports incremental loads of a single ...
Cloud applications Hadoop