Fact table
   HOME

TheInfoList



OR:

In
data warehousing In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for reporting and data analysis and is considered a core component of business intelligence. DWs are central repositories of integra ...
, a fact table consists of the measurements, metrics or
facts A flexible alternating current transmission system (FACTS) is a system composed of static equipment used for the alternating current (AC) transmission of electrical energy. It is meant to enhance controllability and increase power transfer capabi ...
of a
business process A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
. It is located at the center of a
star schema In computing, the star schema is the simplest style of data mart schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables referencing any number of dim ...
or a
snowflake schema In computing, a snowflake schema is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake shape. The snowflake schema is represented by centralized fact tables which ...
surrounded by
dimension table A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time. (Note: People and time sometimes are not modeled as dimensions.) ...
s. Where multiple fact tables are used, these are arranged as a
fact constellation schema Fact constellation is a measure of online analytical processing, which is a collection of multiple fact tables sharing dimension tables, viewed as a collection of stars. It can be seen as an extension of the star schema. A fact constellation sch ...
. A fact table typically has two types of columns: those that contain facts and those that are a
foreign key A foreign key is a set of attributes in a table that refers to the primary key of another table. The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to ...
to dimension tables. The primary key of a fact table is usually a composite key that is made up of all of its foreign keys. Fact tables contain the content of the data warehouse and store different types of measures like additive, non-additive, and semi-additive measures. Fact tables provide the (usually) additive values that act as independent variables by which dimensional attributes are analyzed. Fact tables are often defined by their ''grain''. The grain of a fact table represents the most atomic level by which the facts may be defined. The grain of a sales fact table might be stated as "sales volume by day by product by store". Each record in this fact table is therefore uniquely defined by a day, product, and store. Other dimensions might be members of this fact table (such as location/region) but these add nothing to the uniqueness of the fact records. These "affiliate dimensions" allow for additional slices of the independent facts but generally provide insights at a higher level of aggregation (a region contains many stores).


Example

If the
business process A business process, business method or business function is a collection of related, structured activities or tasks by people or equipment in which a specific sequence produces a service or product (serves a particular business goal) for a parti ...
is sales, then the corresponding fact table will typically contain columns representing both
raw fact Raw is an adjective usually describing: * Raw materials, basic materials from which products are manufactured or made * Raw food, uncooked food Raw or RAW may also refer to: Computing and electronics * .RAW, a proprietary mass spectrometry data ...
s and aggregations in rows such as: * ''$12,000'', being "sales for New York store for 15-Jan-2005". * ''$34,000'', being "sales for Los Angeles store for 15-Jan-2005" * ''$22,000'', being "sales for New York store for 16-Jan-2005" * ''$21,000'', being "average daily sales for Los Angeles Store for Jan-2005" * ''$65,000'', being "average daily sales for Los Angeles Store for Feb-2005" * ''$33,000'', being "average daily sales for Los Angeles Store for year 2005" ''"Average daily sales"'' is a measurement that is stored in the fact table. The fact table also contains
foreign key A foreign key is a set of attributes in a table that refers to the primary key of another table. The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to ...
s from the
dimension table A dimension is a structure that categorizes facts and measures in order to enable users to answer business questions. Commonly used dimensions are people, products, place and time. (Note: People and time sometimes are not modeled as dimensions.) ...
s, where
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Exa ...
(e.g. dates) and other
dimensions In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
(e.g. store location, salesperson, product) are stored. All
foreign key A foreign key is a set of attributes in a table that refers to the primary key of another table. The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to ...
s between fact and dimension tables should be
surrogate key A surrogate key (or synthetic key, pseudokey, entity identifier, factless key, or technical key) in a database is a unique identifier for either an ''entity'' in the modeled world or an ''object'' in the database. The surrogate key is ''not'' deri ...
s, not reused keys from operational data.


Measure types

* Additive - measures that can be added across any dimension. * Non-additive - measures that cannot be added across any dimension. * Semi-additive - measures that can be added across some dimensions. A fact table might contain either detail-level facts or facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). Special care must be taken when handling ratios and percentages. One good design ruleKimball & Ross - The Data Warehouse Toolkit, 2nd Ed
iley 2002 Iley is a surname. Notable people with the surname include: * Daniel Iley (born 1996), Scottish gymnast * Jason Iley, English executive * Jim Iley (1935–2018), English football player and manager * John Iley (born 1967), English aerodynamicist S ...
/ref> is to never store percentages or ratios in fact tables but only calculate these in the data access tool. Thus only store the numerator and denominator in the fact table, which then can be aggregated and the aggregated stored values can then be used for calculating the ratio or percentage in the data access tool. In the real world, it is possible to have a fact table that contains no measures or facts. These tables are called "factless fact tables", or "
junction table An associative entity is a term used in relational and entity–relationship theory. A relational database requires the implementation of a base relation (or base table) to resolve many-to-many relationships. A base relation representing this ...
s". The ''factless fact tables'' may be used for modeling many-to-many relationships or for capturing
timestamps A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. Timestamps do not have to be based on some absolut ...
of events.


Types of fact tables

There are four fundamental measurement events, which characterize all fact tables. ; Transactional :A transactional table is the most basic and fundamental. The grain associated with a transactional fact table is usually specified as "one row per line in a transaction", e.g., every line on a receipt. Typically a transactional fact table holds data of the most detailed level, causing it to have a great number of
dimensions In physics and mathematics, the dimension of a mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any point within it. Thus, a line has a dimension of one (1D) because only one coordina ...
associated with it. ; Periodic snapshots :The periodic snapshot, as the name implies, takes a "picture of the moment", where the moment could be any defined period of time, e.g. a performance summary of a salesman over the previous month. A periodic snapshot table is dependent on the transactional table, as it needs the detailed data held in the transactional fact table in order to deliver the chosen performance output. ; Accumulating snapshots :This type of fact table is used to show the activity of a process that has a well-defined beginning and end, e.g., the processing of an order. An order moves through specific steps until it is fully processed. As steps towards fulfilling the order are completed, the associated row in the fact table is updated. An accumulating snapshot table often has multiple date columns, each representing a milestone in the process. Therefore, it's important to have an entry in the associated date dimension that represents an unknown date, as many of the milestone dates are unknown at the time of the creation of the row. ; Temporal snapshots :By applying
temporal database A temporal database stores data relating to time instances. It offers temporal data types and stores information relating to past, present and future time. Temporal databases could be uni-temporal, bi-temporal or tri-temporal. More specifically th ...
theory and modeling techniques the ''temporal snapshot fact table'' allows to have the equivalent of daily snapshots without really having daily snapshots. It introduces the concept of time Intervals into a fact table, allowing saving a lot of space, optimizing performances while allowing the end user to have the logical equivalent of the "picture of the moment" they are interested in.


Steps in designing a fact table

* Identify a business process for analysis (like sales). * Identify measures of facts (sales dollar), by asking questions like 'what number of X are relevant for the business process?', replacing the X with various options that make sense within the context of the business. * Identify dimensions for facts (product dimension, location dimension, time dimension, organization dimension), by asking questions that make sense within the context of the business, like 'analyze by X', where X is replaced with the subject to test. * List the columns that describe each dimension (region name, branch name, business unit name). * Determine the lowest level (granularity) of summary in a fact table (e.g. sales dollars). An alternative approach is the four-step design process described in Kimball: select the business process, declare the grain, identify the dimensions, and identify the facts.


References

{{DEFAULTSORT:Fact Table Data warehousing