Algebraic data type
   HOME

TheInfoList



OR:

In
computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, especially
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declarat ...
and
type theory In mathematics and theoretical computer science, a type theory is the formal presentation of a specific type system. Type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundation of ...
, an algebraic data type (ADT) is a kind of composite data type, i.e., a
data type In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...
formed by combining other types. Two common classes of algebraic types are product types (i.e.,
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
s, and records) and sum types (i.e., tagged or disjoint unions, coproduct types or ''variant types''). The
values In ethics and social sciences, value denotes the degree of importance of some thing or action, with the aim of determining which actions are best to do or what way is best to live ( normative ethics), or to describe the significance of different a ...
of a product type typically contain several values, called ''fields''. All values of that type have the same combination of field types. The set of all possible values of a product type is the set-theoretic product, i.e., the
Cartesian product In mathematics, specifically set theory, the Cartesian product of two sets and , denoted , is the set of all ordered pairs where is an element of and is an element of . In terms of set-builder notation, that is A\times B = \. A table c ...
, of the sets of all possible values of its field types. The values of a sum type are typically grouped into several classes, called ''variants''. A value of a variant type is usually created with a quasi-functional entity called a ''constructor''. Each variant has its own constructor, which takes a specified number of arguments with specified types. The set of all possible values of a sum type is the set-theoretic sum, i.e., the
disjoint union In mathematics, the disjoint union (or discriminated union) A \sqcup B of the sets and is the set formed from the elements of and labelled (indexed) with the name of the set from which they come. So, an element belonging to both and appe ...
, of the sets of all possible values of its variants.
Enumerated type In computer programming, an enumerated type (also called enumeration, enum, or factor in the R (programming language), R programming language, a status variable in the JOVIAL programming language, and a categorical variable in statistics) is a data ...
s are a special case of sum types in which the constructors take no arguments, as exactly one value is defined for each constructor. Values of algebraic types are analyzed with
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
, which identifies a value by its constructor or field names and extracts the data it contains.


History

Algebraic data types were introduced in Hope, a small functional
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
developed in the 1970s at the
University of Edinburgh The University of Edinburgh (, ; abbreviated as ''Edin.'' in Post-nominal letters, post-nominals) is a Public university, public research university based in Edinburgh, Scotland. Founded by the City of Edinburgh Council, town council under th ...
.


Examples


Singly linked list

One of the most common examples of an algebraic data type is the singly linked list. A list type is a sum type with two variants, Nil for an empty list and Cons ''x'' ''xs'' for the combination of a new element ''x'' with a list ''xs'' to create a new list. Here is an example of how a singly linked list would be declared in
Haskell Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
: data List a = Nil , Cons a (List a) or data [] a = [] , a : [a] Cons is an abbreviation of ''cons''truct. Many languages have special syntax for lists defined in this way. For example, Haskell and ML (programming language), ML use [] for Nil, : or :: for Cons, respectively, and square brackets for entire lists. So Cons 1 (Cons 2 (Cons 3 Nil)) would normally be written as 1:2:3:[] or [1,2,3] in Haskell, or as 1::2::3::[] or [1,2,3] in ML.


Binary tree

For a slightly more complex example, binary trees may be implemented in Haskell as follows: data Tree = Empty , Leaf Int , Node Int Tree Tree or data BinaryTree a = BTNil , BTNode a (BinaryTree a) (BinaryTree a) Here, Empty represents an empty tree, Leaf represents a leaf node, and Node organizes the data into branches. In most languages that support algebraic data types, it is possible to define parametric types. Examples are given later in this article. Somewhat similar to a function, a data constructor is applied to arguments of an appropriate type, yielding an instance of the data type to which the type constructor belongs. For example, the data constructor Leaf is logically a function Int -> Tree, meaning that giving an integer as an argument to Leaf produces a value of the type Tree. As Node takes two arguments of the type Tree itself, the datatype is recursive. Operations on algebraic data types can be defined by using
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
to retrieve the arguments. For example, consider a function to find the depth of a Tree, given here in Haskell: depth :: Tree -> Int depth Empty = 0 depth (Leaf n) = 1 depth (Node n l r) = 1 + max (depth l) (depth r) Thus, a Tree given to depth can be constructed using any of Empty, Leaf, or Node and must be matched for any of them respectively to deal with all cases. In case of Node, the pattern extracts the subtrees l and r for further processing.


Abstract syntax

Algebraic data types are highly suited to implementing abstract syntax. For example, the following algebraic data type describes a simple language representing numerical expressions: data Expression = Number Int , Add Expression Expression , Minus Expression Expression , Mult Expression Expression , Divide Expression Expression An element of such a data type would have a form such as Mult (Add (Number 4) (Minus (Number 0) (Number 1))) (Number 2). Writing an evaluation function for this language is a simple exercise; however, more complex transformations also become feasible. For example, an optimization pass in a compiler might be written as a function taking an abstract expression as input and returning an optimized form.


Pattern matching

Algebraic data types are used to represent values that can be one of several ''types of things''. Each type of thing is associated with an identifier called a ''constructor'', which can be considered a tag for that kind of data. Each constructor can carry with it a different type of data. For example, considering the binary Tree example shown above, a constructor could carry no data (e.g., Empty), or one piece of data (e.g., Leaf has one Int value), or multiple pieces of data (e.g., Node has one Int value and two Tree values). To do something with a value of this Tree algebraic data type, it is ''deconstructed'' using a process called
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
. This involves matching the data with a series of ''patterns''. The example function depth above pattern-matches its argument with three patterns. When the function is called, it finds the first pattern that matches its argument, performs any variable bindings that are found in the pattern, and evaluates the expression corresponding to the pattern. Each pattern above has a form that resembles the structure of some possible value of this datatype. The first pattern simply matches values of the constructor Empty. The second pattern matches values of the constructor Leaf. Patterns are recursive, so then the data that is associated with that constructor is matched with the pattern "n". In this case, a lowercase identifier represents a pattern that matches any value, which then is bound to a variable of that name — in this case, a variable "n" is bound to the integer value stored in the data type — to be used in the expression to evaluate. The recursion in patterns in this example are trivial, but a possible more complex recursive pattern would be something like: Node i (Node j (Leaf 4) x) (Node k y (Node Empty z)) Recursive patterns several layers deep are used for example in balancing red–black trees, which involve cases that require looking at colors several layers deep. The example above is operationally equivalent to the following pseudocode: switch on (data.constructor) case Empty: return 0 case Leaf: let n = data.field1 return 1 case Node: let l = data.field2 let r = data.field3 return 1 + max (depth l) (depth r) The advantages of algebraic data types can be highlighted by comparison of the above pseudocode with a pattern matching equivalent. Firstly, there is type safety. In the pseudocode example above, programmer diligence is required to not access when the constructor is a Leaf. The type system would have difficulties assigning a static type in a safe way for traditional record data structures. However, in pattern matching such problems are not faced. The type of each extracted value is based on the types declared by the relevant constructor. The number of values that can be extracted is known based on the constructor. Secondly, in pattern matching, the compiler performs exhaustiveness checking to ensure all cases are handled. If one of the cases of the ''depth'' function above were missing, the compiler would issue a warning. Exhaustiveness checking may seem easy for simple patterns, but with many complex recursive patterns, the task soon becomes difficult for the average human (or compiler, if it must check arbitrary nested if-else constructs). Similarly, there may be patterns which never match (i.e., are already covered by prior patterns). The compiler can also check and issue warnings for these, as they may indicate an error in reasoning. Algebraic data type pattern matching should not be confused with
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
string pattern matching. The purpose of both is similar (to extract parts from a piece of data matching certain constraints) however, the implementation is very different. Pattern matching on algebraic data types matches on the structural properties of an object rather than on the character sequence of strings.


Theory

A general algebraic data type is a possibly recursive sum type of product types. Each constructor tags a product type to separate it from others, or if there is only one constructor, the data type is a product type. Further, the parameter types of a constructor are the factors of the product type. A parameterless constructor corresponds to the
empty product In mathematics, an empty product, or nullary product or vacuous product, is the result of multiplication, multiplying no factors. It is by convention equal to the multiplicative identity (assuming there is an identity for the multiplication operat ...
. If a datatype is recursive, the entire sum of products is wrapped in a recursive type, and each constructor also rolls the datatype into the recursive type. For example, the Haskell datatype: data List a = Nil , Cons a (List a) is represented in
type theory In mathematics and theoretical computer science, a type theory is the formal presentation of a specific type system. Type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundation of ...
as \lambda \alpha. \mu \beta. 1 + \alpha \times \beta with constructors \mathrm_\alpha = \mathrm\ (\mathrm\ \langle\rangle) and \mathrm_\alpha\ x\ l = \mathrm\ (\mathrm\ \langle x, l\rangle). The Haskell List datatype can also be represented in type theory in a slightly different form, thus: \mu \phi. \lambda \alpha. 1 + \alpha \times \phi\ \alpha. (Note how the \mu and \lambda constructs are reversed relative to the original.) The original formation specified a type function whose body was a recursive type. The revised version specifies a recursive function on types. (The type variable \phi is used to suggest a function rather than a ''base type'' like \beta, since \phi is like a Greek ''f''.) The function must also now be applied \phi to its argument type \alpha in the body of the type. For the purposes of the List example, these two formulations are not significantly different; but the second form allows expressing so-called nested data types, i.e., those where the recursive type differs parametrically from the original. (For more information on nested data types, see the works of Richard Bird, Lambert Meertens, and Ross Paterson.) In
set theory Set theory is the branch of mathematical logic that studies Set (mathematics), sets, which can be informally described as collections of objects. Although objects of any kind can be collected into a set, set theory – as a branch of mathema ...
the equivalent of a sum type is a
disjoint union In mathematics, the disjoint union (or discriminated union) A \sqcup B of the sets and is the set formed from the elements of and labelled (indexed) with the name of the set from which they come. So, an element belonging to both and appe ...
, a set whose elements are pairs consisting of a tag (equivalent to a constructor) and an object of a type corresponding to the tag (equivalent to the constructor arguments).


Programming languages with algebraic data types

Many programming languages incorporate algebraic data types as a first class notion, including:


See also

*
Disjoint union In mathematics, the disjoint union (or discriminated union) A \sqcup B of the sets and is the set formed from the elements of and labelled (indexed) with the name of the set from which they come. So, an element belonging to both and appe ...
*
Generalized algebraic data type In functional programming, a generalized algebraic data type (GADT, also first-class phantom type, guarded recursive datatype, or equality-qualified type) is a generalization of a Parametric polymorphism, parametric algebraic data type (ADT). Ove ...
*
Initial algebra In mathematics, an initial algebra is an initial object in the Category (mathematics), category of F-algebra, -algebras for a given endofunctor . This initiality provides a general framework for mathematical induction, induction and recursion. ...
* Quotient type * Tagged union *
Type theory In mathematics and theoretical computer science, a type theory is the formal presentation of a specific type system. Type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundation of ...
* Visitor pattern


References

{{Data types Functional programming Type theory Data types Composite data types Articles with example Haskell code