In
computer programming
Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, especially
functional programming
In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
and
type theory
In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a fou ...
, an algebraic data type (ADT) is a kind of
composite type
In computer science, a composite data type or compound data type is any data type which can be constructed in a program using the programming language's primitive data types and other composite types. It is sometimes called a structure or aggreg ...
, i.e., a type formed by combining other types.
Two common classes of algebraic types are
product type
In programming languages and type theory, a product of ''types'' is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the prod ...
s (i.e.,
tuple
In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s and
records
A record, recording or records may refer to:
An item or collection of data Computing
* Record (computer science), a data structure
** Record, or row (database), a set of fields in a database related to one entity
** Boot sector or boot record, r ...
) and
sum type
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. ...
s (i.e.,
tagged Tagged may refer to:
* Tagged (website), a social discovery website
* Tagged (web series), an American teen psychological thriller web series
{{disambiguation ...
or
disjoint unions,
coproduct
In category theory, the coproduct, or categorical sum, is a construction which includes as examples the disjoint union of sets and of topological spaces, the free product of groups, and the direct sum of modules and vector spaces. The coproduc ...
types or ''variant types'').
The
values
In ethics and social sciences, value denotes the degree of importance of something or action, with the aim of determining which actions are best to do or what way is best to live (normative ethics in ethics), or to describe the significance of di ...
of a product type typically contain several values, called ''fields''. All values of that type have the same combination of field types. The set of all possible values of a product type is the set-theoretic product, i.e., the
Cartesian product, of the sets of all possible values of its field types.
The values of a sum type are typically grouped into several classes, called ''variants''. A value of a variant type is usually created with a quasi-functional entity called a ''constructor''. Each variant has its own constructor, which takes a specified number of arguments with specified types. The set of all possible values of a sum type is the set-theoretic sum, i.e., the
disjoint union
In mathematics, a disjoint union (or discriminated union) of a family of sets (A_i : i\in I) is a set A, often denoted by \bigsqcup_ A_i, with an injection of each A_i into A, such that the images of these injections form a partition of A ( ...
, of the sets of all possible values of its variants.
Enumerated type
In computer programming, an enumerated type (also called enumeration, enum, or factor in the R programming language, and a categorical variable in statistics) is a data type consisting of a set of named values called ''elements'', ''members'', '' ...
s are a special case of sum types in which the constructors take no arguments, as exactly one value is defined for each constructor.
Values of algebraic types are analyzed with
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
, which identifies a value by its constructor or field names and extracts the data it contains.
Algebraic data types were introduced in
Hope, a small
functional programming language
In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
developed in the 1970s at the
University of Edinburgh
The University of Edinburgh ( sco, University o Edinburgh, gd, Oilthigh Dhùn Èideann; abbreviated as ''Edin.'' in post-nominals) is a public research university based in Edinburgh, Scotland. Granted a royal charter by King James VI in 15 ...
.
Examples
One of the most common examples of an algebraic data type is the
singly linked list
In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes which ...
. A list type is a sum type with two variants,
Nil
for an empty list and
Cons
In computer programming, ( or ) is a fundamental function in most dialects of the Lisp programming language. ''constructs'' memory objects which hold two values or pointers to two values. These objects are referred to as (cons) cells, conses, ...
''x'' ''xs''
for the combination of a new element ''x'' with a list ''xs'' to create a new list. Here is an example of how a singly linked list would be declared in
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
:
data List a = Nil , Cons a (List a)
or
data [] a = [] , a : [a]
Cons
is an abbreviation of ''cons''truct. Many languages have special syntax for lists defined in this way. For example, Haskell and ML (programming language), ML use
[]
for
Nil
,
:
or
::
for
Cons
, respectively, and square brackets for entire lists. So
Cons 1 (Cons 2 (Cons 3 Nil))
would normally be written as
1:2:3:[]
or
[1,2,3]
in Haskell, or as
1::2::3::[]
or
[1,2,3]
in ML.
For a slightly more complex example, binary trees may be implemented in Haskell as follows:
data Tree = Empty
, Leaf Int
, Node Tree Tree
or
data BinaryTree a = BTNil
, BTNode a (BinaryTree a) (BinaryTree a)
Here,
Empty
represents an empty tree,
Leaf
contains a piece of data, and
Node
organizes the data into branches.
In most languages that support algebraic data types, it is possible to define
parametric types. Examples are given later in this article.
Somewhat similar to a function, a data constructor is applied to arguments of an appropriate type, yielding an instance of the data type to which the type constructor belongs. For example, the data constructor
Leaf
is logically a function
Int -> Tree
, meaning that giving an integer as an argument to
Leaf
produces a value of the type
Tree
. As
Node
takes two arguments of the type
Tree
itself, the datatype is
recursive
Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
.
Operations on algebraic data types can be defined by using
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
to retrieve the arguments. For example, consider a function to find the depth of a
Tree
, given here in Haskell:
depth :: Tree -> Int
depth Empty = 0
depth (Leaf n) = 1
depth (Node l r) = 1 + max (depth l) (depth r)
Thus, a
Tree
given to
depth
can be constructed using any of
Empty
,
Leaf
, or
Node
and must be matched for any of them respectively to deal with all cases. In case of
Node
, the pattern extracts the subtrees
l
and
r
for further processing.
Algebraic data types are highly suited to implementing
abstract syntax
In computer science, the abstract syntax of data is its structure described as a data type (possibly, but not necessarily, an abstract data type), independent of any particular representation or encoding. This is particularly used in the representa ...
. For example, the following algebraic data type describes a simple language representing numerical expressions:
data Expression = Number Int
, Add Expression Expression
, Minus Expression Expression
, Mult Expression Expression
, Divide Expression Expression
An element of such a data type would have a form such as
Mult (Add (Number 4) (Minus (Number 0) (Number 1))) (Number 2)
.
Writing an evaluation function for this language is a simple exercise; however, more complex transformations also become feasible. For example, an optimization pass in a compiler might be written as a function taking an abstract expression as input and returning an optimized form.
Explanation
What is happening is that there is a datatype which can be ''one of several types of things''. Each ''type of thing'' is associated with an identifier called a ''constructor'', which can be viewed as a kind of tag for that kind of data. Each constructor can carry with it a different type of data. A constructor could carry no data (e.g., "Empty" in the example above), or one piece of data (e.g., “Leaf” has one Int value), or multiple pieces of data (e.g., “Node” has two Tree values).
To do something with a value of this Tree algebraic data type, it is ''deconstructed'' using a process termed ''pattern matching''. It involves ''matching'' the data with a series of ''patterns''. The example function "depth" above pattern-matches its argument with three patterns. When the function is called, it finds the first pattern that matches its argument, performs any variable bindings that are found in the pattern, and evaluates the expression corresponding to the pattern.
Each pattern above has a form that resembles the structure of some possible value of this datatype. The first pattern simply matches values of the constructor ''Empty''. The second pattern matches values of the constructor ''Leaf''. Patterns are recursive, so then the data that is associated with that constructor is matched with the pattern "n". In this case, a lowercase identifier represents a pattern that matches any value, which then is bound to a variable of that name — in this case, a variable “
n
” is bound to the integer value stored in the data type — to be used in the expression to evaluate.
The recursion in patterns in this example are trivial, but a possible more complex recursive pattern would be something like
Node (Node (Leaf 4) x) (Node y (Node Empty z))
. Recursive patterns several layers deep are used for example in balancing
red–black trees, which involve cases that require looking at colors several layers deep.
The example above is operationally equivalent to the following pseudocode:
switch on (data.constructor)
case Empty:
return 0
case Leaf:
let n = data.field1
return 1
case Node:
let l = data.field1
let r = data.field2
return 1 + max (depth l) (depth r)
The comparison of this with pattern matching will point out some of the advantages of algebraic data types and pattern matching. The first advantage is
type safety
In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that i ...
. The pseudocode above relies on the diligence of the programmer to not access when the constructor is a Leaf, for example. Also, the type of is different for Leaf and Node (for Leaf it is ; for Node it is ), so the type system would have difficulties assigning a static type to it in a safe way in a traditional
record data structure. However, in pattern matching, the type of each extracted value is checked based on the types declared by the relevant constructor, and how many values can be extracted is known based on the constructor, so it does not face these problems.
Second, in pattern matching, the compiler statically checks that all cases are handled. If one of the cases of the ''depth'' function above were missing, the compiler would issue a warning, indicating that a case is not handled. This task may seem easy for the simple patterns above, but with many complex recursive patterns, the task becomes difficult for the average human (or compiler, if it must check arbitrary nested if-else constructs) to handle. Similarly, there may be patterns which never match (i.e., are already covered by prior patterns), and the compiler can also check and issue warnings for these, as they may indicate an error in reasoning.
Do not confuse these patterns with
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
patterns used in string pattern matching. The purpose is similar: to check whether a piece of data matches certain constraints, and if so, extract relevant parts of it for processing. However, the mechanism is very different. This kind of pattern matching on algebraic data types matches on the structural properties of an object rather than on the character sequence of strings.
Theory
A general algebraic data type is a possibly recursive
sum type
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. ...
of
product type
In programming languages and type theory, a product of ''types'' is another, compounded, type in a structure. The "operands" of the product are types, and the structure of a product type is determined by the fixed order of the operands in the prod ...
s. Each constructor tags a product type to separate it from others, or if there is only one constructor, the data type is a product type. Further, the parameter types of a constructor are the factors of the product type. A parameterless constructor corresponds to the
empty product
In mathematics, an empty product, or nullary product or vacuous product, is the result of multiplying no factors. It is by convention equal to the multiplicative identity (assuming there is an identity for the multiplication operation in question ...
. If a datatype is recursive, the entire sum of products is wrapped in a
recursive type
In computer programming languages, a recursive data type (also known as a recursively-defined, inductively-defined or inductive data type) is a data type for values that may contain other values of the same type. Data of recursive types are usuall ...
, and each constructor also rolls the datatype into the recursive type.
For example, the Haskell datatype:
data List a = Nil , Cons a (List a)
is represented in
type theory
In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a fou ...
as
with constructors
and
.
The Haskell List datatype can also be represented in type theory in a slightly different form, thus:
.
(Note how the
and
constructs are reversed relative to the original.) The original formation specified a type function whose body was a recursive type. The revised version specifies a recursive function on types. (The type variable
is used to suggest a function rather than a ''base type'' like
, since
is like a Greek ''f''.) The function must also now be applied
to its argument type
in the body of the type.
For the purposes of the List example, these two formulations are not significantly different; but the second form allows expressing so-called
nested data type
''Nested'' is the seventh studio album by Bronx-born singer, songwriter and pianist Laura Nyro, released in 1978 on Columbia Records.
Following on from her extensive tour to promote 1976's ''Smile'', which resulted in the 1977 live album ''Season ...
s, i.e., those where the recursive type differs parametrically from the original. (For more information on nested data types, see the works of
Richard Bird,
Lambert Meertens, and Ross Paterson.)
In
set theory
Set theory is the branch of mathematical logic that studies sets, which can be informally described as collections of objects. Although objects of any kind can be collected into a set, set theory, as a branch of mathematics, is mostly conce ...
the equivalent of a sum type is a
disjoint union
In mathematics, a disjoint union (or discriminated union) of a family of sets (A_i : i\in I) is a set A, often denoted by \bigsqcup_ A_i, with an injection of each A_i into A, such that the images of these injections form a partition of A ( ...
, a set whose elements are pairs consisting of a tag (equivalent to a constructor) and an object of a type corresponding to the tag (equivalent to the constructor arguments).
Programming languages with algebraic data types
Many programming languages incorporate algebraic data types as a first class notion, including:
See also
*
Disjoint union
In mathematics, a disjoint union (or discriminated union) of a family of sets (A_i : i\in I) is a set A, often denoted by \bigsqcup_ A_i, with an injection of each A_i into A, such that the images of these injections form a partition of A ( ...
*
Generalized algebraic data type In functional programming, a generalized algebraic data type (GADT, also first-class phantom type, guarded recursive datatype, or equality-qualified type) is a generalization of parametric algebraic data types.
Overview
In a GADT, the product co ...
*
Initial algebra
In mathematics, an initial algebra is an initial object in the category of -algebras for a given endofunctor . This initiality provides a general framework for induction and recursion.
Examples
Functor
Consider the endofunctor sending ...
*
Quotient type
In type theory, a kind of foundation of mathematics, a quotient type is an algebraic data type that represents a type whose equality relation has been redefined by a given equivalence relation such that the elements of the type are partitioned i ...
*
Tagged union
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. O ...
*
Type theory
In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a fou ...
*
Visitor pattern
In object-oriented programming and software engineering, the visitor design pattern is a way of separating an algorithm from an object structure on which it operates. A practical result of this separation is the ability to add new operations t ...
References
{{Data types
Functional programming
Type theory
Data types
Articles with example Haskell code
Composite data types