Data blending is a process whereby
big data
Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software, software. Data with many entries (rows) offer greater statistical power, while data with ...
from multiple sources are merged into a single
data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
or
data set
A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
.
Data blending allows business analysts to cope with the expansion of data that they need to make critical business decisions based on good quality
business intelligence
Business intelligence (BI) consists of strategies, methodologies, and technologies used by enterprises for data analysis and management of business information. Common functions of BI technologies include Financial reporting, reporting, online an ...
.
Data blending has been described as different from
data integration
Data integration refers to the process of combining, sharing, or synchronizing data from multiple sources to provide users with a unified view.
There are a wide range of possible applications for data integration, from commercial (such as when a ...
due to the requirements of
data analysts to merge sources very quickly, too quickly for any practical intervention by
data scientists
Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms and systems to extract or extrapolate knowledge from potentially noisy, structu ...
.
[What Is Data Blending, and Which Tools Make It Easier?](_blank)
/ref> A study done by Forrester Consulting in 2015 found that 52 percent of companies are blending 50 or more data sources and 12 percent are blending over 1,000 sources.
Extract, transform, load
Data blending is similar to extract, transform, load
Extract, transform, load (ETL) is a three-phase computing process where data is ''extracted'' from an input source, ''transformed'' (including cleaning), and ''loaded'' into an output data container. The data can be collected from one or mor ...
(ETL). Both ETL and data blending take data from various sources and combine them. However, ETL is used to merge and structure data into a target database, often a data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business intelligence, reporting and data analysis and is a core component of business intelligence. Data warehouses are central Re ...
. Data blending differs slightly as it's about joining data for a specific use case at a specific time. With some software, data isn't written into a database, which is very different to ETL. For example, with Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
Data Studio.
Software products
Representing the increased demand for analysts to combine data sources, multiple software companies have seen large growth and raised millions of dollars, with some early entrants into the market now public companies. Examples include AWS, Alteryx, Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
Power Query
Power Query is an ETL tool created by Microsoft for data extraction, loading and transformation, and is used to retrieve data from sources, process it, and load them into one or more target systems. Power Query is available in several variati ...
, and Incorta, which enable combining data from many different data sources, for example, text files, databases, XML, JSON, and many other forms of structured and semi-structured data.
Tableau
In tableau software, data blending is a technique to combine data from multiple data sources in the data visualization
Data and information visualization (data viz/vis or info viz/vis) is the practice of designing and creating Graphics, graphic or visual Representation (arts), representations of a large amount of complex quantitative and qualitative data and i ...
. A key differentiator is the granularity of the data join. When blending data into a single data set, this would use a SQL database join, which would usually join at the most granular level, using an ID field where possible. A data blend in tableau should happen at the least granular level.
Looker Studio
In Google's Looker Studio, data sources are combined by joining the records of one data source with the records of up to 4 other data sources.
Similar to Tableau, the data blend only happens on the reporting layer. The blended data is never stored as a separate combined data source.
Challenges with data blending
The most common custom metadata question is: "How can this dataset blend with (join or union to) my other datasets?"[{{Cite book, title=Principles of Data Wrangling, last1=Heer, first1=Jeffrey, last2=Hellerstein, first2=Joseph, last3=Kandel, first3=Sean, last4=Rattenbury, first4=Tye, publisher=O'Reilly Media, date=July 2017]
See also
* Data preparation
Data preparation is the act of manipulating (or pre-processing) raw data (which may come from disparate data sources) into a form that can be readily and accurately analysed, e.g. for business purposes.
Data preparation is the first step in data ...
* Data fusion
* Data wrangling
* Data cleansing
Data cleansing or data cleaning is the process of identifying and correcting (or removing) corrupt, inaccurate, or irrelevant records from a dataset, table, or database. It involves detecting incomplete, incorrect, or inaccurate parts of the dat ...
* Data editing
* Data scraping
Data scraping is a technique where a computer program extracts data from Human-readable medium, human-readable output coming from another program.
Description
Normally, Data transmission, data transfer between programs is accomplished using data ...
* Data curation
Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted formal ...
* Data preprocessing
References
Data management