HOME

TheInfoList



OR:

Denormalization is a strategy used on a previously- normalized database to increase performance. In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, denormalization is the process of trying to improve the read performance of a
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
, at the expense of losing some write performance, by adding redundant copies of data or by grouping data.S. K. Shin and G. L. Sanders
Denormalization strategies for data retrieval from data warehouses
Decision Support Systems, 42(1):267-282, October 2006.
It is often motivated by
performance A performance is an act or process of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Performance has evolved glo ...
or
scalability Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system. In an economic context, a scalable business model implies that ...
in relational database software needing to carry out very large numbers of read operations. Denormalization differs from the
unnormalized form In database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF2), is a database data model (organization of data in a database) which does not meet any of the conditions of data ...
in that denormalization benefits can only be fully realized on a data model that is otherwise normalized.


Implementation

A normalized design will often "store" different but related pieces of information in separate logical tables (called relations). If these relations are stored physically as separate disk files, completing a database query that draws information from several relations (a '' join operation'') can be slow. If many relations are joined, it may be prohibitively slow. There are two strategies for dealing with this by denormalization: * "DBMS support": The database management system stores redundant copies in the background, which are kept consistent by the DBMS software * "DBA implementation": The database administrator (or designer) design around the problem by denormalizing the logical data design


DBMS support

With this approach, database administrators can keep the logical design normalized, but allow the
database management system In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and an ...
(DBMS) to store additional redundant information on disk to optimize query response. In this case it is the DBMS software's responsibility to ensure that any redundant copies are kept consistent. This method is often implemented in
SQL Structured Query Language (SQL) (pronounced ''S-Q-L''; or alternatively as "sequel") is a domain-specific language used to manage data, especially in a relational database management system (RDBMS). It is particularly useful in handling s ...
as indexed views (
Microsoft SQL Server Microsoft SQL Server is a proprietary relational database management system developed by Microsoft using Structured Query Language (SQL, often pronounced "sequel"). As a database server, it is a software product with the primary function of ...
) or
materialized view In computing, a materialized view is a database object that contains the results of a query. For example, it may be a local copy of data located remotely, or may be a subset of the rows and/or columns of a table or join result, or may be a summa ...
s (
Oracle An oracle is a person or thing considered to provide insight, wise counsel or prophetic predictions, most notably including precognition of the future, inspired by deities. If done through occultic means, it is a form of divination. Descript ...
,
PostgreSQL PostgreSQL ( ) also known as Postgres, is a free and open-source software, free and open-source relational database management system (RDBMS) emphasizing extensibility and SQL compliance. PostgreSQL features transaction processing, transactions ...
). A view may, among other factors, represent information in a format convenient for querying, and the index ensures that queries against the view are optimized physically.


DBA implementation

With this approach, a database administrator or designer has to denormalize the logical data design. With care this can achieve a similar improvement in query response, but at a cost — it is now the database designer's responsibility to ensure that the denormalized database does not become inconsistent. This is done by creating rules in the database called '' constraints'', that specify how the redundant copies of information must be kept synchronized, which may easily make the de-normalization procedure pointless. It is the increase in logical
complexity Complexity characterizes the behavior of a system or model whose components interact in multiple ways and follow local rules, leading to non-linearity, randomness, collective dynamics, hierarchy, and emergence. The term is generally used to c ...
of the database design and the added complexity of the additional constraints that make this approach hazardous. Moreover, constraints introduce a
trade-off A trade-off (or tradeoff) is a situational decision that involves diminishing or losing on quality, quantity, or property of a set or design in return for gains in other aspects. In simple terms, a tradeoff is where one thing increases, and anoth ...
, speeding up reads (SELECT in SQL) while slowing down writes (INSERT, UPDATE, and DELETE). This means a denormalized database under heavy write load may offer ''worse'' performance than its functionally equivalent normalized counterpart.


Denormalization versus not normalized data

A denormalized data model is not the same as a data model that has not been normalized, and denormalization should only take place after a satisfactory level of normalization has taken place and that any required constraints and/or rules have been created to deal with the inherent anomalies in the design. For example, all the relations are in
third normal form Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was d ...
and any relations with join dependencies and multi-valued dependencies are handled appropriately. Examples of denormalization techniques include: * "Storing" the count of the "many" elements in a one-to-many relationship as an attribute of the "one" relation * Adding attributes to a relation from another relation with which it will be joined *
Star schema In computing, the star schema or star model is the simplest style of data mart Logical schema, schema and is the approach most widely used to develop data warehouses and dimensional data marts. The star schema consists of one or more fact tables ...
s, which are also known as fact-dimension models and have been extended to
snowflake schema In computing, a snowflake schema or snowflake model is a Logical schema, logical arrangement of tables in a multidimensional database such that the Entity-relationship model, entity relationship diagram resembles a snowflake shape. The snowfl ...
s * Prebuilt summarization or
OLAP cube An OLAP cube is a multi-dimensional array of data. Online analytical processing (OLAP) is a computer-based technique of analyzing data to look for insights. The term ''cube'' here refers to a multi-dimensional dataset, which is also sometimes cal ...
s With the continued dramatic increase in all three of storage, processing power and bandwidth, on all levels, denormalization in databases has moved from being an unusual or extension technique, to the commonplace, or even the norm. For example, one specific downside of denormalization was, simply, that it "uses more storage" (that is to say, literally more columns in a database). With the exception of truly enormous systems, increased storage requirements is considered a relatively small problem in the 2020s.


See also

*
Cache (computing) In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsew ...
*
Normalization Normalization or normalisation refers to a process that makes something more normal or regular. Science * Normalization process theory, a sociological theory of the implementation of new technologies or innovations * Normalization model, used in ...
*
Scalability Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system. In an economic context, a scalable business model implies that ...


References

{{Database normalization