Third Normal Form
   HOME

TheInfoList



OR:

Third normal form (3NF) is a
database schema The database schema is the structure of a database described in a formal language supported by the database management system (DBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divide ...
design approach for
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
s which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure
referential integrity Referential integrity is a property of data stating that all its references are valid. In the context of relational databases, it requires that if a value of one attribute (column) of a relation (table) references a value of another attribute (ei ...
, and simplify data management. It was defined in 1971 by
Edgar F. Codd Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational databa ...
, an English computer scientist who invented the
relational model The relational model (RM) is an approach to managing data using a Structure (mathematical logic), structure and language consistent with first-order logic, first-order predicate logic, first described in 1969 by English computer scientist Edgar F. ...
for
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...
management. A database relation (e.g. a
database table A table is a collection of related data held in a table format within a database. It consists of columns and rows. In relational databases, and flat file databases, a ''table'' is a set of data elements (values) using a model of vertical colum ...
) is said to meet third normal form standards if all the attributes (e.g. database columns) are functionally dependent on solely the
primary key In the relational model of databases, a primary key is a ''specific choice'' of a ''minimal'' set of attributes (Column (database), columns) that uniquely specify a tuple (Row (database), row) in a Relation (database), relation (Table (database), t ...
. Codd defined this as a relation in
second normal form Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E. F. Codd in 1971.Codd, E. F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Ser ...
where all non-prime attributes depend only on the
candidate key A candidate key, or simply a key, of a relational database is a minimal superkey. In other words, it is any set of columns that have a unique combination of values in each row (which makes it a superkey), with the additional constraint that removi ...
s and do not have a
transitive dependency A transitive dependency is a functional dependency which holds by virtue of transitivity among various software components. Computer programs In a computer program a direct dependency is functionality exported by a library, or API, or any software ...
on another key. A hypothetical example of a failure to meet third normal form would be a hospital database having a table of patients which included a column for the telephone number of their doctor. The phone number is dependent on the doctor, rather than the patient, thus would be better stored in a table of doctors. The negative outcome of such a design is that a doctor's number will be duplicated in the database if they have multiple patients, thus increasing both the chance of input error and the cost and risk of updating that number should it change (compared to a third normal form-compliant data model that only stores a doctor's number once on a doctor table). Codd later realized that 3NF did not eliminate all undesirable data anomalies and developed a stronger version to address this in 1974, known as
Boyce–Codd normal form Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain ty ...
.


Definition of third normal form

The third normal form (3NF) is a normal form used in
database normalization Database normalization or database normalisation (see spelling differences) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. ...
. 3NF was originally defined by E. F. Codd in 1971.Codd, E. F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Series 6, "Data Base Systems", New York City, May 24–25, 1971.) IBM Research Report RJ909 (August 31, 1971). Republished in Randall J. Rustin (ed.), ''Data Base Systems: Courant Computer Science Symposia Series 6''. Prentice-Hall, 1972. Codd's definition states that a table is in 3NF
if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bicondi ...
both of the following conditions hold: * The
relation Relation or relations may refer to: General uses *International relations, the study of interconnection of politics, economics, and law on a global level *Interpersonal relationship, association or acquaintance between two or more people *Public ...
R (table) is in
second normal form Second normal form (2NF) is a normal form used in database normalization. 2NF was originally defined by E. F. Codd in 1971.Codd, E. F. "Further Normalization of the Data Base Relational Model". (Presented at Courant Computer Science Symposia Ser ...
(2NF). * Every non-prime attribute of R is non-transitively dependent on every key of R. A ''non-prime attribute'' of R is an attribute that does not belong to any
candidate key A candidate key, or simply a key, of a relational database is a minimal superkey. In other words, it is any set of columns that have a unique combination of values in each row (which makes it a superkey), with the additional constraint that removi ...
of R.Codd, p. 43. A
transitive dependency A transitive dependency is a functional dependency which holds by virtue of transitivity among various software components. Computer programs In a computer program a direct dependency is functionality exported by a library, or API, or any software ...
is a
functional dependency In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation ' ...
in which ''X'' → ''Z'' (''X'' determines ''Z'') indirectly, by virtue of ''X'' → ''Y'' and ''Y'' → ''Z'' (where it is not the case that ''Y'' → ''X''). A 3NF definition that is equivalent to Codd's, but expressed differently, was given by Carlo Zaniolo in 1982. This definition states that a table is in 3NF if and only if for each of its functional dependencies ''X'' → ''A'', at least one of the following conditions holds:Zaniolo, Carlo. "A New Normal Form for the Design of Relational Database Schemata". ''ACM Transactions on Database Systems'' 7(3), September 1982. * ''X'' contains ''A'' (that is, ''A'' is a subset of ''X'', meaning ''X'' → ''A'' is trivial functional dependency), * ''X'' is a
superkey In the relational data model a superkey is a set of attributes that uniquely identifies each tuple of a relation. Because superkey values are unique, tuples with the same superkey value must also have the same non-key attribute values. That is, ...
, * every element of ''A'' \ ''X'', the
set difference In set theory, the complement of a set , often denoted by (or ), is the set of elements not in . When all sets in the universe, i.e. all sets under consideration, are considered to be members of a given set , the absolute complement of is the ...
between A and X, is a ''prime attribute'' (i.e., each attribute in ''A'' \ ''X'' is contained in some
candidate key A candidate key, or simply a key, of a relational database is a minimal superkey. In other words, it is any set of columns that have a unique combination of values in each row (which makes it a superkey), with the additional constraint that removi ...
). Zaniolo's definition gives a clear sense of the difference between 3NF and the more stringent
Boyce–Codd normal form Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain ty ...
(BCNF). BCNF simply eliminates the third alternative ("Every element of ''A'' \ ''X'', the set difference between ''A'' and ''X'', is a prime attribute.").


"Nothing but the key"

An approximation of Codd's definition of 3NF, paralleling the traditional
oath Traditionally an oath (from Anglo-Saxon ', also called plight) is either a statement of fact or a promise taken by a sacrality as a sign of verity. A common legal substitute for those who conscientiously object to making sacred oaths is to ...
to give true evidence in a court of law, was given by Bill Kent: " verynon-key ttributemust provide a fact about the key, the whole key, and nothing but the key".Kent, William
"A Simple Guide to Five Normal Forms in Relational Database Theory"
''Communications of the ACM'' 26 (2), Feb. 1983, pp. 120–125.
A common variation supplements this definition with the oath "so help me Codd".The author of a 1989 book on database management credits one of his students with coming up with the "so help me Codd" addendum. Diehr, George. ''Database Management'' (Scott, Foresman, 1989), p. 331. Requiring existence of "the key" ensures that the table is in 1NF; requiring that non-key attributes be dependent on "the whole key" ensures 2NF; further requiring that non-key attributes be dependent on "nothing but the key" ensures 3NF. While this phrase is a useful mnemonic, the fact that it only mentions a single key means it defines some necessary but not sufficient conditions to satisfy the 2nd and 3rd normal forms. Both 2NF and 3NF are concerned equally with ''all'' candidate keys of a table and not just any one key.
Chris Date Chris Date (born 1941) is an independent author, lecturer, researcher, and consultant, specializing in relational database theory. Biography Chris Date attended High Wycombe Royal Grammar School (U.K.) from 1951 to 1958 and received his BA i ...
refers to Kent's summary as "an intuitively attractive characterization" of 3NF and notes that with slight adaptation it may serve as a definition of the slightly stronger
Boyce–Codd normal form Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain ty ...
: "Each attribute must represent a fact about the key, the whole key, and nothing but the key."Date, C. J. ''An Introduction to Database Systems'' (7th ed.) (Addison Wesley, 2000), p. 379. The 3NF version of the definition is weaker than Date's BCNF variation, as the former is concerned only with ensuring that ''non-key'' attributes are dependent on keys. Prime attributes (which are keys or parts of keys) must not be functionally dependent at all; they each represent a fact about the key in the sense of providing part or all of the key itself. (This rule applies only to functionally dependent attributes, as applying it to all attributes would implicitly prohibit composite candidate keys, since each part of any such key would violate the "whole key" clause.) An example of a table that fails to meet the requirements of 3NF is: Because each row in the table needs to tell us who won a particular Tournament in a particular Year, the
composite key {{Unreferenced, date=October 2020 In database design, a composite key is a candidate key that consists of two or more attributes (table columns) that together uniquely identify an entity occurrence (table row). A compound key is a composite key f ...
is a minimal set of attributes guaranteed to uniquely identify a row. That is, is a candidate key for the table. The breach of 3NF occurs because the non-prime attribute (Winner's date of birth) is transitively dependent on the candidate key through the non-prime attribute Winner. The fact that Winner's date of birth is functionally dependent on Winner makes the table vulnerable to logical inconsistencies, as there is nothing to stop the same person from being shown with different dates of birth on different records. In order to express the same facts without violating 3NF, it is necessary to split the table into two: Update anomalies cannot occur in these tables, because unlike before, Winner is now a candidate key in the second table, thus allowing only one value for Date of birth for each Winner.


Computation

A relation can always be decomposed in third normal form, that is, the relation R is rewritten to projections R1, ..., Rn whose
join Join may refer to: * Join (law), to include additional counts or additional defendants on an indictment *In mathematics: ** Join (mathematics), a least upper bound of sets orders in lattice theory ** Join (topology), an operation combining two top ...
is equal to the original relation. Further, this decomposition does not lose any
functional dependency In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, a functional dependency is a constraint between two attributes in a relation. Given a relation ' ...
, in the sense that every functional dependency on R can be derived from the functional dependencies that hold on the projections R1, ..., Rn. What is more, such a decomposition can be computed in
polynomial time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by ...
.


Derivation of Zaniolo conditions

The definition of 3NF offered by Carlo Zaniolo in 1982, and given above, is proven in the following way: Let X → A be a nontrivial FD (i.e. one where X does not contain A) and let A be a non-key attribute. Also let Y be a key of R. Then Y → X.


Normalization beyond 3NF

Most 3NF tables are free of update, insertion, and deletion anomalies. Certain types of 3NF tables, rarely met with in practice, are affected by such anomalies; these are tables which either fall short of
Boyce–Codd normal form Boyce–Codd normal form (or BCNF or 3.5NF) is a normal form used in database normalization. It is a slightly stronger version of the third normal form (3NF). BCNF was developed in 1974 by Raymond F. Boyce and Edgar F. Codd to address certain ty ...
(BCNF) or, if they meet BCNF, fall short of the higher normal forms 4NF or 5NF.


Considerations for use in reporting environments

While 3NF was ideal for machine processing, the segmented nature of the data model can be difficult to consume by a human user. Analytics via query, reporting, and dashboards were often facilitated by a different type of data model that provided pre-calculated analysis such as trend lines, period-to-date calculations (month-to-date, quarter-to-date, year-to-date), cumulative calculations, basic statistics (average, standard deviation, moving averages) and previous period comparisons (year ago, month ago, week ago) e.g. dimensional modeling and beyond dimensional modeling, flattening of stars via
Hadoop Apache Hadoop () is a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage an ...
and
data science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a br ...
.


See also

* Attribute-value system


References


Further reading

*Date, C. J. (1999),
An Introduction to Database Systems
' (8th ed.). Addison-Wesley Longman. . *Kent, W. (1983)

', Communications of the ACM, vol. 26, pp. 120–126


External links



by Mike Chapple (About.com)
An Introduction to Database Normalization
by Mike Hillyer.
A tutorial on the first 3 normal forms
by Fred Coulson
Description of the database normalization basics
by Microsoft

by exploreDatabase {{DEFAULTSORT:Third Normal Form 3NF de:Normalisierung (Datenbank)#Dritte Normalform (3NF)