HOME

TheInfoList



OR:

Extract, load, transform (ELT) is an alternative to
extract, transform, load In computing, extract, transform, load (ETL) is a three-phase process where data is extracted, transformed (cleaned, sanitized, scrubbed) and loaded into an output data container. The data can be collated from one or more sources and it can also ...
(ETL) used with
data lake A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of data including raw copies of source system data, sensor data, social data etc., and transforme ...
implementations. In contrast to ETL, in ELT models the
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
is not transformed on entry to the data lake, but stored in its original raw format. This enables faster loading times. However, ELT requires sufficient processing power within the
data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of '' information processing'', which is the modification (processing) of information in any manner detectable by ...
engine to carry out the transformation on demand, to return the results in a timely manner. Since the data is not processed on entry to the data lake, the query and schema do not need to be defined a priori (although often the schema will be available during load since many data sources are extracts from databases or similar structured data systems and hence have an associated schema). ELT is a data pipeline model.Using Redshift Spectrum to load data pipelines
Published by deductive.com on January 17, 2018, retrieved on April 3, 2019


Cloud data lake components


Common storage options

* AWS ** Simple Storage Service (S3) **
Amazon RDS Amazon Relational Database Service (or Amazon RDS) is a distributed relational database service by Amazon Web Services (AWS). It is a web service running "in the cloud" designed to simplify the setup, operation, and scaling of a relational dat ...
* Azure *
Azure Blob Storage
* GCP **
Google Storage Google Cloud Storage is a RESTful online file storage web service for storing and accessing data on Google Cloud Platform infrastructure. The service combines the performance and scalability of Google's cloud with advanced security and sharing ...
(GCS)


Querying

* AWS *
Redshift Spectrum
*
Athena
*
EMR (Presto)
* Azure **
Azure Data Lake Azure Data Lake is a scalable data storage and analytics service. The service is hosted in Azure, Microsoft's public cloud. History Azure Data Lake service was released on November 16, 2016. It is based on COSMOS, which is used to store and ...
* GCP *
BigQuery


References


External links

* Dull, Tamara
"The Data Lake Debate: Pro is Up First"
''smartdatacollective.com'', March 20, 2015. Data warehousing {{computing-stub