In
functional programming
In computer science, functional programming is a programming paradigm where programs are constructed by Function application, applying and Function composition (computer science), composing Function (computer science), functions. It is a declar ...
, a generalized algebraic data type (GADT, also first-class phantom type, guarded recursive datatype, or equality-qualified type) is a generalization of
parametric algebraic data type
In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.
Two common classes of algebraic types are product types (i.e., t ...
s.
Overview
In a GADT, the product constructors (called
data constructor
In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.
Two common classes of algebraic types are product types (i.e., t ...
s in
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
) can provide an explicit instantiation of the ADT as the type instantiation of their return value. This allows defining functions with a more advanced type behaviour. For a data constructor of Haskell 2010, the return value has the type instantiation implied by the instantiation of the ADT parameters at the constructor's application.
-- A parametric ADT that is not a GADT
data List a = Nil , Cons a (List a)
integers = Cons 12 (Cons 107 Nil) -- the type of integers is List Int
strings = Cons "boat" (Cons "dock" Nil) -- the type of strings is List String
-- A GADT
data Expr a where
EBool :: Bool -> Expr Bool
EInt :: Int -> Expr Int
EEqual :: Expr Int -> Expr Int -> Expr Bool
eval :: Expr a -> a
eval e = case e of
EBool a -> a
EInt a -> a
EEqual a b -> (eval a) (eval b)
expr1 = EEqual (EInt 2) (EInt 3) -- the type of expr1 is Expr Bool
ret = eval expr1 -- ret is False
They are currently implemented in the
GHC compiler as a non-standard extension, used by, among others,
Pugs
The Pug is a breed of dog originally from China, with physically distinctive features of a wrinkly, short-muzzled face and curled tail. The breed has a fine, glossy coat that comes in a variety of colors, most often light brown (fawn) or bla ...
and
Darcs.
OCaml
OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
supports GADT natively since version 4.00.
The GHC implementation provides support for existentially quantified type parameters and for local constraints.
History
An early version of generalized algebraic data types were described by and based on
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
in
ALF.
Generalized algebraic data types were introduced independently by and prior by as extensions to
ML's and
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
's
algebraic data type
In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.
Two common classes of algebraic types are product types (i.e., t ...
s. Both are essentially equivalent to each other. They are similar to the ''
inductive families of data types'' (or ''inductive datatypes'') found in
Coq
Coq is an interactive theorem prover first released in 1989. It allows for expressing mathematical assertions, mechanically checks proofs of these assertions, helps find formal proofs, and extracts a certified program from the constructive proof ...
's
Calculus of Inductive Constructions
In Uzbekistan, mathematical logic and computer science, the calculus of constructions (CoC) is a type theory created by Thierry Coquand. It can serve as both a typed programming language and as Constructivism (mathematics), constructive Foundations ...
and other
dependently typed language
In computer science and logic, a dependent type is a type whose definition depends on a value. It is an overlapping feature of type theory and type systems. In intuitionistic type theory, dependent types are used to encode logic's Generalized quan ...
s, modulo the dependent types and except that the latter have an additional
positivity restriction which is not enforced in GADTs.
introduced ''extended algebraic data types'' which combine GADTs together with the
existential data types and
type class
In computer science, a type class is a type system construct that supports ad hoc polymorphism. This is achieved by adding constraints to type variables in parametrically polymorphic types. Such a constraint typically involves a type class T and ...
constraints introduced by , and .
Type inference
Type inference refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics ...
in the absence of any programmer supplied
type annotation
In computer science, a type signature or type annotation defines the inputs and outputs for a function, subroutine or method. A type signature includes the number, types, and order of the arguments contained by a function. A type signature is ty ...
s is
undecidable and functions defined over GADTs do not admit
principal type In type theory, a type system is said to have the principal type property if, given a term and an environment, there exists a principal type for this term in this environment, i.e. a type such that all other types for this term in this environment a ...
s in general.
Type reconstruction
Type may refer to:
Science and technology Computing
* Typing, producing text via a keyboard, typewriter, etc.
* Data type, collection of values used for computations.
* File type
* TYPE (DOS command), a command to display contents of a file.
* Ty ...
requires several design trade-offs and is an area of active research (; ; ; ; ; ; ; ; ; ).
In spring 2021, Scala 3.0 is released.
This major update of
Scala introduce the possibility to write GADTs with the same syntax as
ADTs, which is not the case in other
programming languages
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
according to
Martin Odersky
Martin Odersky (born 5 September 1958) is a German computer scientist and professor of programming methods at École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland. He specializes in code analysis and programming languages. He designed ...
.
Applications
Applications of GADTs include
generic programming
Generic programming is a style of computer programming in which algorithms are written in terms of types ''to-be-specified-later'' that are then ''instantiated'' when needed for specific types provided as parameters. This approach, pioneered b ...
, modelling programming languages (
higher-order abstract syntax
In computer science, higher-order abstract syntax (abbreviated HOAS) is a technique for the representation of abstract syntax trees for languages with variable binders.
Relation to first-order abstract syntax
An abstract syntax is ''abstract'' be ...
), maintaining
invariant
Invariant and invariance may refer to:
Computer science
* Invariant (computer science), an expression whose value doesn't change during program execution
** Loop invariant, a property of a program loop that is true before (and after) each iteratio ...
s in
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
s, expressing constraints in
embedded domain-specific language
A domain-specific language (DSL) is a computer language specialized to a particular application domain. This is in contrast to a general-purpose language (GPL), which is broadly applicable across domains. There are a wide variety of DSLs, ranging ...
s, and modelling objects.
Higher-order abstract syntax
An important application of GADTs is to embed
higher-order abstract syntax
In computer science, higher-order abstract syntax (abbreviated HOAS) is a technique for the representation of abstract syntax trees for languages with variable binders.
Relation to first-order abstract syntax
An abstract syntax is ''abstract'' be ...
in a
type safe
In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
fashion. Here is an embedding of the
simply typed lambda calculus
The simply typed lambda calculus (\lambda^\to), a form
of type theory, is a typed interpretation of the lambda calculus with only one type constructor (\to) that builds function types. It is the canonical and simplest example of a typed lambda cal ...
with an arbitrary collection of
base types,
tuple
In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s and a
fixed point combinator
In mathematics and computer science in general, a '' fixed point'' of a function is a value that is mapped to itself by the function.
In combinatory logic for computer science, a fixed-point combinator (or fixpoint combinator) is a higher-order f ...
:
data Lam :: * -> * where
Lift :: a -> Lam a -- ^ lifted value
Pair :: Lam a -> Lam b -> Lam (a, b) -- ^ product
Lam :: (Lam a -> Lam b) -> Lam (a -> b) -- ^ lambda abstraction
App :: Lam (a -> b) -> Lam a -> Lam b -- ^ function application
Fix :: Lam (a -> a) -> Lam a -- ^ fixed point
And a type safe evaluation function:
eval :: Lam t -> t
eval (Lift v) = v
eval (Pair l r) = (eval l, eval r)
eval (Lam f) = \x -> eval (f (Lift x))
eval (App f x) = (eval f) (eval x)
eval (Fix f) = (eval f) (eval (Fix f))
The factorial function can now be written as:
fact = Fix (Lam (\f -> Lam (\y -> Lift (if eval y 0 then 1 else eval y * (eval f) (eval y - 1)))))
eval(fact)(10)
We would have run into problems using regular algebraic data types. Dropping the type parameter would have made the lifted base types existentially quantified, making it impossible to write the evaluator. With a type parameter we would still be restricted to a single base type. Furthermore, ill-formed expressions such as
App (Lam (\x -> Lam (\y -> App x y))) (Lift True)
would have been possible to construct, while they are type incorrect using the GADT. A well-formed analogue is
App (Lam (\x -> Lam (\y -> App x y))) (Lift (\z -> True))
. This is because the type of
x
is
Lam (a -> b)
, inferred from the type of the
Lam
data constructor.
See also
*
Type variable In type theory and programming languages, a type variable is a mathematical variable ranging over types. Even in programming languages that allow mutable variables, a type variable remains an abstraction, in the sense that it does not correspond t ...
Notes
Further reading
; Applications
*
*
*
*
; Semantics
* Patricia Johann and Neil Ghani (2008).
Foundations for Structured Programming with GADTs.
* Arie Middelkoop, Atze Dijkstra and S. Doaitse Swierstra (2011).
A lean specification for GADTs: system F with first-class equality proofs. ''Higher-Order and Symbolic Computation''.
; Type reconstruction
*
*
*
*
*
*
; Other
* Andrew Kennedy and Claudio V. Russo.
Generalized algebraic data types and object-oriented programming. ''In Proceedings of the 20th annual ACM SIGPLAN conference on Object oriented programming, systems, languages, and applications''. ACM Press, 2005.
External links
Generalised Algebraic Datatype Pageon the Haskell
wiki
A wiki ( ) is an online hypertext publication collaboratively edited and managed by its own audience, using a web browser. A typical wiki contains multiple pages for the subjects or scope of the project, and could be either open to the pu ...
Generalised Algebraic Data Typesin the GHC Users' Guide
Generalized Algebraic Data Types and Object-Oriented ProgrammingGADTs – Haskell Prime – TracPapers about type inference for GADTs bibliography by
Simon Peyton Jones
Simon Peyton Jones (born 18 January 1958) is a British computer scientist who researches the implementation and applications of functional programming languages, particularly lazy functional programming.
Education
Peyton Jones graduated fr ...
Type inference with constraints bibliography by
Simon Peyton Jones
Simon Peyton Jones (born 18 January 1958) is a British computer scientist who researches the implementation and applications of functional programming languages, particularly lazy functional programming.
Education
Peyton Jones graduated fr ...
Emulating GADTs in Java via the Yoneda lemma
{{data types
Functional programming
Dependently typed programming
Type theory
Composite data types
Data types