
A dimension is a structure that categorizes
facts and
measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time. (Note: People and time sometimes are not modeled as dimensions.)
In a
data warehouse, dimensions provide structured labeling information to otherwise unordered numeric measures. The dimension is a
data set composed of individual, non-overlapping
data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:
# An identification such as a data element name
# A clear data element definition
# One or more representation terms
...
s. The primary functions of dimensions are threefold: to provide filtering, grouping and labelling.
These functions are often described as "
slice and dice". A common data warehouse example involves sales as the measure, with customer and product as dimensions. In each sale a customer buys a product. The data can be sliced by removing all customers except for a group under study, and then diced by grouping by product.
A dimensional
data element
In metadata, the term data element is an atomic unit of data that has precise meaning or precise semantics. A data element has:
# An identification such as a data element name
# A clear data element definition
# One or more representation terms
...
is similar to a
categorical variable in statistics.
Typically dimensions in a data warehouse are organized internally into one or more hierarchies. "Date" is a common dimension, with several possible hierarchies:
* "Days (are grouped into) Months (which are grouped into) Years",
* "Days (are grouped into) Weeks (which are grouped into) Years"
* "Days (are grouped into) Months (which are grouped into) Quarters (which are grouped into) Years"
* etc.
Types
Slowly changing dimensions
A
slowly changing dimension is a set of data attributes that change slowly over a period of time rather than changing regularly e.g. address or name. These attributes can change over a period of time and that will get combined as a slowly changing dimension. These dimension can be classified in types:
* Type 0 (Retain original): Attributes never change. No history.
* Type 1 (Overwrite): Old values are overwritten with new values for attribute. No history.
* Type 2 (Add new row): For a new value, a new row is created with either a start date / end date or version. This creates a history.
* Type 3 (Add new attribute): For a new value, a new columm is created. History is limited to the number of columns des