HOME

TheInfoList



OR:

Data blending is a process whereby
big data Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
from multiple sources are merged into a single
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central Repos ...
or
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
. It concerns not merely the merging of different
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
s or disparate sources of data but also different varieties of data. Data blending allows business analysts to cope with the expansion of data that they need to make critical business decisions based on good quality
business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical pr ...
. Data blending has been described as different from
data integration Data integration involves combining data residing in different sources and providing users with a unified view of them. This process becomes significant in a variety of situations, which include both commercial (such as when two similar companies ...
due to the requirements of data analysts to merge sources very quickly, too quickly for any practical intervention by
data scientists Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a bro ...
.What Is Data Blending, and Which Tools Make It Easier?
/ref> Representing the increased demand for analysts to combine data sources, multiple software companies have seen large growth and raised millions of dollars, with some early entrants into the market now public companies. Examples include AWS,
Alteryx Alteryx is an American computer software company based in Irvine, California, with a development center in Broomfield, Colorado. The company's products are used for data science and analytics. The software is designed to make advanced analytic ...
,
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
Power Query Power Query is an ETL tool created by Microsoft for data extraction, loading and transformation, and is used to retrieve data from sources, process it, and load them into one or more target systems. Power Query is available in several variations ...
, and Incorta, which enable combining data from many different data sources, for example, text files, databases, XML, JSON, and many other forms of structured and semi-structured data. Data blending is similar to ETL in many ways. Both ETL and data blending take data from various sources and combine them. However, ETL is used to merge and structure data into a target database, often a
data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central Repos ...
. Data blending differs slightly as it's about joining data for a specific use case at a specific time. With some software, data isn't written into a database, which is very different to ETL. For example, with
Google Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
Data Studio and
Tableau Tableau (French for 'little table' literally, also used to mean 'picture'; tableaux or, rarely, tableaus) may refer to: Arts * ''Tableau'', a series of four paintings by Piet Mondrian titled ''Tableau I'' through to ''Tableau IV'' * ''Tableau viv ...
, the data blend occurs on the reporting layer; it's not written anywhere, only displayed.


Data blending in Tableau

In Tableau software, data blending is a technique to combine data from multiple data sources in the
data visualization Data and information visualization (data viz or info viz) is an interdisciplinary field that deals with the graphic representation of data and information. It is a particularly efficient way of communicating when the data or information is num ...
. The data sources are stored separately and only displayed together in a
dashboard For business applications, see Dashboard (business). A dashboard (also called dash, instrument panel (IP), or fascia) is a control panel set within the central console of a vehicle or small aircraft. Usually located directly ahead of the driver ...
, on the reporting layer. This is one of the key concepts differentiating a Tableau data blend from other definitions of data blending. The other key differentiator is the granularity of the data join. Generally, when blending data into a single data set, this would use a
database join A join clause in SQL – corresponding to a join operation in relational algebra – combines columns from one or more tables into a new table. Informally, a join stitches two tables and puts on the same row records with matching fields : INNER, ...
, which would usually join at the most granular level, using an ID field where possible. A data blend in Tableau should happen at the least granular level.


Data blending in Google Data Studio

In
Google Data Studio Looker Studio, formerly Google Data Studio, is an online tool for converting data into customizable informative reports and dashboards introduced by Google on March 15, 2016 as part of the enterprise Google Analytics 360 suite. In May 2016, Google ...
, data sources are combined by joining the records of one data source with the records of up to 4 other data sources. Similar to Tableau, the data blend only happens on the reporting layer. The blended data is never stored as a separate combined data source.


Challenges with data blending

The most common custom metadata question is: "How can this dataset blend with (join or union to) my other datasets?" A 2015 Forrester Consulting study found that 52 percent of companies are blending 50 or more data sources and 12 percent are blending over 1,000 sources.{{Cite web, url=http://www.pentaho.com/data-mashups-for-analytics, title=Data Mashups for Analytics, website=Pentaho


See also

*
Data preparation Data preparation is the act of manipulating (or pre-processing) raw data (which may come from disparate data sources) into a form that can readily and accurately be analysed, e.g. for business purposes. Data preparation is the first step in data ...
*
Data fusion Data fusion is the process of integrating multiple data sources to produce more consistent, accurate, and useful information than that provided by any individual data source. Data fusion processes are often categorized as low, intermediate, or hig ...
*
Data wrangling Data wrangling, sometimes referred to as data munging, is the process of transforming and mapping data from one " raw" data form into another format with the intent of making it more appropriate and valuable for a variety of downstream purposes ...
*
Data cleansing Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the dat ...
*
Data editing Data editing is defined as the process involving the review and adjustment of collected survey data. Data editing helps define guidelines that will reduce potential bias and ensure consistent estimates leading to a clear analysis of the data set by ...
*
Data scraping Data scraping is a technique where a computer program extracts data from Human-readable medium, human-readable output coming from another program. Description Normally, Data transmission, data transfer between programs is accomplished using data ...
*
Data curation Data curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for re ...
*
Data pre-processing Data preprocessing can refer to manipulation or dropping of data before it is used in order to ensure or enhance performance, and is an important step in the data mining process. The phrase "garbage in, garbage out" is particularly applicable to ...


References

Business intelligence Data management