First normal form
   HOME

TheInfoList



OR:

First normal form (1NF) is a property of a
relation Relation or relations may refer to: General uses *International relations, the study of interconnection of politics, economics, and law on a global level *Interpersonal relationship, association or acquaintance between two or more people *Public ...
in a
relational database A relational database is a (most commonly digital) database based on the relational model of data, as proposed by E. F. Codd in 1970. A system used to maintain relational databases is a relational database management system (RDBMS). Many relatio ...
. A relation is in first normal form if and only if no attribute domain has relations as elements. Or more informally, that no table column can have tables as values (or no repeating groups).
Database normalization Database normalization or database normalisation (see spelling differences) is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. ...
is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement.
SQL-92 SQL-92 was the third revision of the SQL database query language. Unlike SQL-89, it was a major revision of the standard. Aside from a few minor incompatibilities, the SQL-89 standard is forward-compatible with SQL-92. The standard specificatio ...
does not support creating or using table-valued columns, which means that using only the "traditional relational database features" (excluding extensions even if they were later standardized) most relational databases will be in first normal form by necessity. Database systems which do not require first normal form are often called no sql systems. Newer SQL standards like SQL:1999 have started to allow so called non-atomic types, which include composite types. Even newer versions like SQL:2016 allow
json JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other ser ...
.


Overview

In a
hierarchical database A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containin ...
like the
IBM Information Management System The IBM Information Management System (IMS) is a joint hierarchical database and information management system that supports transaction processing. History IBM designed the IMS with Rockwell and Caterpillar starting in 1966 for the Apollo pr ...
, a record can contain sets of child records ― known as repeating groups or table-valued attributes. If such a data model is represented as relations, a repeating group would be an attribute where the value is itself a relation. First normal form eliminates nested relations by turning them into separate "top-level" relations associated with the parent row through foreign keys rather than through direct containment. The purpose of this normalization is to increase flexibility and
data independence Data independence is the type of data transparency that matters for a centralized DBMS. It refers to the immunity of user applications to changes made in the definition and organization of data. Application programs should not, ideally, be expo ...
, and to simplify the data language. It also opens the door to further normalization, which eliminates redundancy and anomalies. Most relational database management systems do not support nested records, so tables are in first normal form by default. In particular, SQL does not have any facilities for creating or exploiting nested tables. Normalization to first normal form would therefore be a necessary step when moving data from a hierarchical database to a relational database.


Rationale

The rationale for normalizing to 1NF: * Allows presenting, storing and interchanging relational data in the form of regular two-dimensional arrays. Supporting nested relations would require more complex data structures. * Simplifies the data language, since any data item can be identified just by relation name, attribute name and key. Supporting nested relations would require a more complex language with support for hierarchical data paths in order to address nested data items. * Representing relationships using foreign keys is more flexible, where a hierarchical model only can represent one-to many relationships. * Since locating data items is not directly coupled to the parent-child hierarchy, the database is more resilient to structural changes over time. * Makes further normalization levels possible which eliminate data redundancy and anomalies.


Drawbacks and criticism

* Performance for certain operations. In a hierarchical model, nested records are physically stored after the parent record, which means a whole sub-tree can be retrieved in a single read operation. In a 1NF form, it will require a join operation per record type, which can be costly, especially for complex trees. For this reason document databases eschew 1NF. *
Object-oriented languages Object-oriented programming (OOP) is a programming paradigm based on the concept of "Object (computer science), objects", which can contain data and Computer program, code. The data is in the form of Field (computer science), fields (often kno ...
represent runtime state as trees or
directed graphs Director may refer to: Literature * ''Director'' (magazine), a British magazine * ''The Director'' (novel), a 1971 novel by Henry Denker * ''The Director'' (play), a 2000 play by Nancy Hasty Music * Director (band), an Irish rock band * ''D ...
of objects connected by pointers or references. This does not map cleanly to a 1NF relational database, a problem sometimes called the Object-Relational Impedance Mismatch and which
ORM Orm (in Old Norse and in modern Danish, Swedish, Norwegian (bokmål and nynorsk) the word for "snake", "worm" or "dragon") became an Anglo-Saxon personal name during period of the Danelaw. Orm may also refer to: * Orm or Ormin, the author of ...
libraries try to bridge. * 1NF has been interpreted as not allowing complex data types for values. This is open to interpretation though, and C.J.Date has argued that values can be arbitrarily complex objects.


History

First normal form was introduced by E.F. Codd in the paper "A Relational Model of Data for Large Shared Data Banks", although it was initially just called "Normal Form". It was renamed to "First Normal Form" when additional normal forms were introduced in the paper Further Normalization of the Relational Model


Examples

The following scenarios first illustrate how a database design might violate first normal form, followed by examples that comply.


Designs that violate 1NF

This table over customers' credit card transactions does not conform to first normal form: To each customer corresponds a 'repeating group' of transactions. Such a design can be represented in a
Hierarchical database A hierarchical database model is a data model in which the data are organized into a tree-like structure. The data are stored as records which are connected to one another through links. A record is a collection of fields, with each field containin ...
but not a SQL database, since SQL does not support nested tables. The automated evaluation of any query relating to customers' transactions would broadly involve two stages: # Unpacking one or more customers' groups of transactions allowing the individual transactions in a group to be examined, and # Deriving a query result based on the results of the first stage For example, in order to find out the monetary sum of all transactions that occurred in October 2003 for all customers, the system would have to know that it must first unpack the ''Transactions'' group of each customer, then sum the ''Amounts'' of all transactions thus obtained where the ''Date'' of the transaction falls in October 2003. One of Codd's important insights was that structural complexity can be reduced. Reduced structural complexity gives users, applications, and DBMSs more power and flexibility to formulate and evaluate the queries. A more normalized equivalent of the structure above might look like this:


Designs that comply with 1NF

To bring the model into the first normal form, we can perform normalization. Normalization (to first normal form) is a process where attributes with non-simple domains are extracted to separate stand-alone relations. The extracted relations are amended with foreign keys referring to the primary key of the relation which contained it. The process can be applied recursively to non-simple domains nested in multiple levels. In this example, ''Customer ID'' is the primary key of the containing relations and will therefore be appended as foreign key to the new relation: In the modified structure, the
primary key In the relational model of databases, a primary key is a ''specific choice'' of a ''minimal'' set of attributes (Column (database), columns) that uniquely specify a tuple (Row (database), row) in a Relation (database), relation (Table (database), t ...
is in the first relation, in the second relation. Now each row represents an individual credit card transaction, and the DBMS can obtain the answer of interest, simply by finding all rows with a Date falling in October, and summing their Amounts. The data structure places all of the values on an equal footing, exposing each to the DBMS directly, so each can potentially participate directly in queries; whereas in the previous situation some values were embedded in lower-level structures that had to be handled specially. Accordingly, the normalized design lends itself to general-purpose query processing, whereas the unnormalized design does not. It is worth noting that this design meets the additional requirements for
second The second (symbol: s) is the unit of time in the International System of Units (SI), historically defined as of a day – this factor derived from the division of the day first into 24 hours, then to 60 minutes and finally to 60 seconds ...
and
third normal form Third normal form (3NF) is a database schema design approach for relational databases which uses normalizing principles to reduce the duplication of data, avoid data anomalies, ensure referential integrity, and simplify data management. It was de ...
.


Atomicity

Edgar F. Codd Edgar Frank "Ted" Codd (19 August 1923 – 18 April 2003) was an English computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases and relational databa ...
's definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the
DBMS In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
."Codd, E. F. ''The Relational Model for Database Management Version 2'' (Addison-Wesley, 1990). Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS (excluding certain special functions)"Codd, E. F. ''The Relational Model for Database Management Version 2'' (Addison-Wesley, 1990), p. 6. meaning a column should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same column.
Hugh Darwen Hugh Darwen is a computer scientist who was an employee of IBM United Kingdom from 1967. to 2004, and has been involved in the development of the relational model. Work From 1978 to 1982 he was a chief architect on Business System 12, a da ...
and
Chris Date Chris Date (born 1941) is an independent author, lecturer, researcher, and consultant, specializing in relational database theory. Biography Chris Date attended High Wycombe Royal Grammar School (U.K.) from 1951 to 1958 and received his BA i ...
have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood.Darwen, Hugh. "Relation-Valued Attributes; or, Will the Real First Normal Form Please Stand Up?", in C. J. Date and Hugh Darwen, ''Relational Database Writings 1989-1991'' (Addison-Wesley, 1992). In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic: *A character string would seem not to be atomic, as the RDBMS typically provides operators to decompose it into substrings. *A fixed-point number would seem not to be atomic, as the RDBMS typically provides operators to decompose it into integer and fractional components. * An
ISBN The International Standard Book Number (ISBN) is a numeric commercial book identifier that is intended to be unique. Publishers purchase ISBNs from an affiliate of the International ISBN Agency. An ISBN is assigned to each separate edition and ...
would seem not to be atomic, as it includes language and publisher identifier. Date suggests that "the notion of atomicity ''has no absolute meaning''": a value may be considered atomic for some purposes, but may be considered an assemblage of more basic elements for other purposes. If this position is accepted, 1NF cannot be defined with reference to atomicity. Columns of any conceivable data type (from string types and numeric types to
array An array is a systematic arrangement of similar objects, usually in rows and columns. Things called an array include: {{TOC right Music * In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
types and table types) are then acceptable in a 1NF table—although perhaps not always desirable; for example, it may be more desirable to separate a Customer Name column into two separate columns as First Name, Surname.


1NF tables as representations of relations

According to Date's definition, a table is in first normal form if and only if it is "
isomorphic In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word is ...
to some relation", which means, specifically, that it satisfies the following five conditions: Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form. Examples of tables (or
views A view is a sight or prospect or the ability to see or be seen from a particular place. View, views or Views may also refer to: Common meanings * View (Buddhism), a charged interpretation of experience which intensely shapes and affects thou ...
) that would not meet this definition of first normal form are: *A table that lacks a unique key constraint. Such a table would be able to accommodate duplicate rows, in violation of condition 3. *A view whose definition mandates that results be returned in a particular order, so that the row-ordering is an intrinsic and meaningful aspect of the view. (Such views cannot be created using SQL that conforms to the SQL:2003 standard.) This violates condition 1. The
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s in true relations are not ordered with respect to each other. *A table with at least one nullable attribute. A nullable attribute would be in violation of condition 4, which requires every column to contain exactly one value from its column's domain. This aspect of condition 4 is controversial. It marks an important departure from Codd's later vision of the
relational model The relational model (RM) is an approach to managing data using a Structure (mathematical logic), structure and language consistent with first-order logic, first-order predicate logic, first described in 1969 by English computer scientist Edgar F. ...
, which made explicit provision for nulls. First normal form, as defined by Chris Date, permits relation-valued attributes (tables within tables). Date argues that relation-valued attributes, by means of which a column within a table can contain a table, are useful in rare cases.


References


Further reading

* Date, C. J., & Lorentzos, N., & Darwen, H. (2002).
Temporal Data & the Relational Model
' (1st ed.). Morgan Kaufmann. . * Date, C. J. (1999),

' (8th ed.). Addison-Wesley Longman. . * Kent, W. (1983)

', Communications of the ACM, vol. 26, pp. 120–125 * Codd, E.F. (1970). A Relational Model of Data for. Large Shared Data Banks. IBM Research Laboratory, San Jose, California. * Codd, E. F. (1971). Further Normalization of the Relational Model. Courant Computer Science Symposium 6 in Data Base Systems edited by Rustin, R. {{Database normalization 1NF de:Normalisierung (Datenbank)#Erste Normalform (1NF) pl:Postać normalna (bazy danych)