
In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, a snowflake schema is a
logical arrangement of tables in a
multidimensional database
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repo ...
such that the
entity relationship diagram resembles a
snowflake
A snowflake is a single ice crystal that has achieved a sufficient size, and may have amalgamated with others, which falls through the Earth's atmosphere as snow.Knight, C.; Knight, N. (1973). Snow crystals. Scientific American, vol. 228, no. ...
shape. The snowflake schema is represented by centralized
fact table
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, the ...
s which are connected to multiple
dimensions
In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coord ...
. "Snowflaking" is a method of normalizing the dimension tables in a
star schema. When it is completely normalized along all the dimension tables, the resultant structure resembles a snowflake with the
fact table
In data warehousing, a fact table consists of the measurements, metrics or facts of a business process. It is located at the center of a star schema or a snowflake schema surrounded by dimension tables. Where multiple fact tables are used, the ...
in the middle. The principle behind snowflaking is normalization of the dimension tables by removing low cardinality attributes and forming separate tables.
The snowflake schema is similar to the star schema. However, in the snowflake schema, dimensions are
normalized into multiple related tables, whereas the star schema's dimensions are denormalized with each dimension represented by a single table. A complex snowflake shape emerges when the dimensions of a snowflake schema are elaborate, having multiple levels of relationships, and the child tables have multiple parent tables ("forks in the road").
Common uses
Star and snowflake schemas are most commonly found in dimensional
data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
s and
data mart
A data mart is a structure/access pattern specific to ''data warehouse'' environments, used to retrieve client-facing data. The data mart is a subset of the data warehouse and is usually oriented to a specific business line or team. Whereas data w ...
s where speed of data retrieval is more important than the efficiency of data manipulations. As such, the tables in these schemas are not normalized much, and are frequently designed at a level of normalization short of
third normal form.
Data normalization and storage
Normalization splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into new tables. Normalization therefore tends to increase the number of tables that need to be joined in order to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes.
From a space storage point of view, dimensional tables are typically small compared to fact tables. This often negates the potential storage-space benefits of the star schema as compared to the snowflake schema. Example: One million sales transactions in 300 shops in 220 countries would result in 1,000,300 records in a star schema (1,000,000 records in the fact table and 300 records in the dimensional table where each country would be listed explicitly for each shop in that country). A more normalized snowflake schema with country keys referring to a country table would consist of the same 1,000,000 record fact table, a 300 record shop table with references to a country table with 220 records. In this case, the star schema, although further denormalized, would only reduce the number or records by a (negligible) ~0.02% (=
,000,000+300
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
instead of
,000,000+300+220
The comma is a punctuation mark that appears in several variants in different languages. It has the same shape as an apostrophe or single closing quotation mark () in many typefaces, but it differs from them in being placed on the baseline ...
Some database developers compromise by creating an underlying snowflake schema with
views built on top of it that perform many of the necessary joins to simulate a star schema. This provides the storage benefits achieved through the normalization of dimensions with the ease of querying that the star schema provides. The tradeoff is that requiring the server to perform the underlying joins automatically can result in a performance hit when querying as well as extra joins to tables that may not be necessary to fulfill certain queries.
Benefits
The snowflake schema is in the same family as the
star schema logical model. In fact, the
star schema is considered a special case of the snowflake schema. The snowflake schema provides some advantages over the
star schema in certain situations, including:
* Some
OLAP
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, repo ...
multidimensional database modeling tools are optimized for snowflake schemas.
*
Normalizing attributes results in storage savings, the tradeoff being additional complexity in source query joins.
Disadvantages
The primary disadvantage of the snowflake schema is that the additional levels of attribute normalization adds complexity to source query joins, when compared to the
star schema.
Snowflake schemas, in contrast to flat single table dimensions, have been heavily criticised. Their goal is assumed to be an efficient and compact storage of normalised data but this is at the significant cost of poor performance when browsing the joins required in this dimension.
This disadvantage may have reduced in the years since it was first recognized, owing to better query performance within the browsing tools.
When compared to a highly
normalized transactional schema, the snowflake schema's
denormalization removes the data integrity assurances provided by normalized schemas. Data loads into the snowflake schema must be highly controlled and managed to avoid update and insert anomalies.
Examples
The example schema shown to the right is a snowflaked version of the star schema example provided in the
star schema article.
The following example query is the snowflake schema equivalent of the star schema example code which returns the total number of television units sold by brand and by country for 1997. Notice that the snowflake schema query requires many more joins than the star schema version in order to fulfill even a simple query. The benefit of using the snowflake schema in this example is that the storage requirements are lower since the snowflake schema eliminates many duplicate values from the dimensions themselves.
SELECT
B.Brand,
G.Country,
SUM(F.Units_Sold)
FROM Fact_Sales F
INNER JOIN Dim_Date D ON F.Date_Id = D.Id
INNER JOIN Dim_Store S ON F.Store_Id = S.Id
INNER JOIN Dim_Geography G ON S.Geography_Id = G.Id
INNER JOIN Dim_Product P ON F.Product_Id = P.Id
INNER JOIN Dim_Brand B ON P.Brand_Id = B.Id
INNER JOIN Dim_Product_Category C ON P.Product_Category_Id = C.Id
WHERE
D.Year = 1997 AND
C.Product_Category = 'tv'
GROUP BY
B.Brand,
G.Country
See also
*
Star schema
*
Data warehouse
In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
*
Online analytical processing
Online analytical processing, or OLAP (), is an approach to answer multi-dimensional analytical (MDA) queries swiftly in computing. OLAP is part of the broader category of business intelligence, which also encompasses relational databases, re ...
(OLAP)
References
Bibliography
*
*
External links
Why is the Snowflake Schema a Good Data Warehouse Design? by
Mark Levene and
George LoizouReverse Snowflake Joins
{{DEFAULTSORT:Snowflake Schema
Data warehousing
Articles with example SQL code
Online analytical processing