HOME

TheInfoList



OR:

In
programming language theory Programming language theory (PLT) is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of formal languages known as programming languages. Programming language theory is clos ...
, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of
type polymorphism In programming language theory and type theory, polymorphism is the provision of a single interface to entities of different types or the use of a single symbol to represent multiple different types.: "Polymorphic types are types whose operati ...
in which a subtype is a
datatype In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
that is related to another datatype (the supertype) by some notion of
substitutability The Liskov substitution principle (LSP) is a particular definition of a subtyping relation, called strong behavioral subtyping, that was initially introduced by Barbara Liskov in a 1988 conference keynote address titled ''Data abstraction and ...
, meaning that program elements, typically
subroutines In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed. Functions may ...
or functions, written to operate on elements of the supertype can also operate on elements of the subtype. If S is a subtype of T, the subtyping relation (written as ,  , or   ) means that any term of type S can ''safely be used'' in ''any context'' where a term of type T is expected. The precise semantics of subtyping here crucially depends on the particulars of how ''"safely be used"'' and ''"any context"'' are defined by a given type formalism or
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
. The
type system In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
of a programming language essentially defines its own subtyping relation, which may well be
trivial Trivia is information and data that are considered to be of little value. It can be contrasted with general knowledge and common sense. Latin Etymology The ancient Romans used the word ''triviae'' to describe where one road split or fork ...
, should the language support no (or very little) conversion mechanisms. Due to the subtyping relation, a term may belong to more than one type. Subtyping is therefore a form of type polymorphism. In
object-oriented programming Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of ...
the term 'polymorphism' is commonly used to refer solely to this ''subtype polymorphism'', while the techniques of parametric polymorphism would be considered '' generic programming''. Functional programming languages often allow the subtyping of records. Consequently,
simply typed lambda calculus The simply typed lambda calculus (\lambda^\to), a form of type theory, is a typed interpretation of the lambda calculus with only one type constructor (\to) that builds function types. It is the canonical and simplest example of a typed lambda c ...
extended with record types is perhaps the simplest theoretical setting in which a useful notion of subtyping may be defined and studied. Because the resulting calculus allows terms to have more than one type, it is no longer a "simple"
type theory In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a founda ...
. Since functional programming languages, by definition, support function literals, which can also be stored in records, records types with subtyping provide some of the features of object-oriented programming. Typically, functional programming languages also provide some, usually restricted, form of parametric polymorphism. In a theoretical setting, it is desirable to study the interaction of the two features; a common theoretical setting is system F<:. Various calculi that attempt to capture the theoretical properties of object-oriented programming may be derived from system F<:. The concept of subtyping is related to the linguistic notions of
hyponymy In linguistics, semantics, general semantics, and ontologies, hyponymy () is a semantic relation between a hyponym denoting a subtype and a hypernym or hyperonym (sometimes called umbrella term or blanket term) denoting a supertype. In other ...
and holonymy. It is also related to the concept of
bounded quantification In type theory, bounded quantification (also bounded polymorphism or constrained genericity) refers to universal or existential quantifiers which are restricted ("bounded") to range only over the subtypes of a particular type. Bounded quantifica ...
in mathematical logic (see
Order-sorted logic Many-sorted logic can reflect formally our intention not to handle the universe as a homogeneous collection of objects, but to partition it in a way that is similar to types in typeful programming. Both functional and assertive "parts of speech" ...
). Subtyping should not be confused with the notion of (class or object)
inheritance Inheritance is the practice of receiving private property, titles, debts, entitlements, privileges, rights, and obligations upon the death of an individual. The rules of inheritance differ among societies and have changed over time. Of ...
from object-oriented languages; subtyping is a relation between types (interfaces in object-oriented parlance) whereas inheritance is a relation between implementations stemming from a language feature that allows new objects to be created from existing ones. In a number of object-oriented languages, subtyping is called interface inheritance, with inheritance referred to as ''implementation inheritance''.


Origins

The notion of subtyping in programming languages dates back to the 1960s; it was introduced in
Simula Simula is the name of two simulation programming languages, Simula I and Simula 67, developed in the 1960s at the Norwegian Computing Center in Oslo, by Ole-Johan Dahl and Kristen Nygaard. Syntactically, it is an approximate superset of ALGO ...
derivatives. The first formal treatments of subtyping were given by
John C. Reynolds John Charles Reynolds (June 1, 1935 – April 28, 2013) was an American computer scientist. Education and affiliations John Reynolds studied at Purdue University and then earned a Doctor of Philosophy (Ph.D.) in theoretical physics from Harvard U ...
in 1980 who used
category theory Category theory is a general theory of mathematical structures and their relations that was introduced by Samuel Eilenberg and Saunders Mac Lane in the middle of the 20th century in their foundational work on algebraic topology. Nowadays, ca ...
to formalize implicit conversions, and Luca Cardelli (1985). The concept of subtyping has gained visibility (and synonymy with polymorphism in some circles) with the mainstream adoption of object-oriented programming. In this context, the principle of safe substitution is often called the
Liskov substitution principle The Liskov substitution principle (LSP) is a particular definition of a subtyping relation, called strong behavioral subtyping, that was initially introduced by Barbara Liskov in a 1988 conference keynote address titled ''Data abstraction and ...
, after
Barbara Liskov Barbara Liskov (born November 7, 1939 as Barbara Jane Huberman) is an American computer scientist who has made pioneering contributions to programming languages and distributed computing. Her notable work includes the development of the Liskov su ...
who popularized it in a
keynote A keynote in public speaking is a talk that establishes a main underlying theme. In corporate or commercial settings, greater importance is attached to the delivery of a keynote speech or keynote address. The keynote establishes the framework fo ...
address at a conference on object-oriented programming in 1987. Because it must consider mutable objects, the ideal notion of subtyping defined by Liskov and Jeannette Wing, called
behavioral subtyping In object-oriented programming, behavioral subtyping is the principle that subclasses should satisfy the expectations of clients accessing subclass objects through references of superclass type, not just as regards syntactic safety (such as the abse ...
is considerably stronger than what can be implemented in a type checker. (See below for details.)


Examples

A simple practical example of subtypes is shown in the diagram. The type "bird" has three subtypes "duck", "cuckoo" and "ostrich". Conceptually, each of these is a variety of the basic type "bird" that inherits many "bird" characteristics but has some specific differences. The UML notation is used in this diagram, with open-headed arrows showing the direction and type of the relationship between the supertype and its subtypes. As a more practical example, a language might allow integer values to be used wherever floating point values are expected (Integer <: Float), or it might define a generic type Number as a common supertype of integers and the reals. In this second case, we only have Integer <: Number and Float <: Number, but Integer and Float are not subtypes of each other. Programmers may take advantage of subtyping to write code in a more abstract manner than would be possible without it. Consider the following example: function max (x as Number, y as Number) is if x < y then return y else return x end If integer and real are both subtypes of Number, and an operator of comparison with an arbitrary Number is defined for both types, then values of either type can be passed to this function. However, the very possibility of implementing such an operator highly constrains the Number type (for example, one can't compare an integer with a complex number), and actually only comparing integers with integers, and reals with reals, makes sense. Rewriting this function so that it would only accept 'x' and 'y' of the same type requires bounded polymorphism.


Subsumption

In type theory the concept of ''subsumption''Benjamin C. Pierce, ''Types and Programming Languages'', MIT Press, 2002, 15.1 "Subsumption", p. 181-182 is used to define or evaluate whether a type S is a subtype of type T. A type is a set of values. The set can be described ''extensionally'' by listing all the values, or it can be described ''intensionally'' by stating the membership of the set by a predicate over a domain of possible values. In common programming languages enumeration types are defined extensionally by listing values. User-defined types like records (structs, interfaces) or classes are defined intensionally by an explicit type declaration or by using an existing value, which encodes type information, as a prototype to be copied or extended. In discussing the concept of subsumption, the set of values of a type is indicated by writing its name in mathematical italics: . The type, viewed as a predicate over a domain, is indicated by writing its name in bold: T. The conventional symbol <: means "is a subtype of", and :> means "is a supertype of". * A type T ''subsumes'' S if the set of values which it defines, is a superset of the set , so that every member of is also a member of . * A type may be subsumed by more than one type: the supertypes of S intersect at . * If S <: T (and therefore ), then T, the predicate which circumscribes the set , must be part of the predicate S (over the same domain) which defines . * If S subsumes T, and T subsumes S, then the two types are equal (although they may not be the same type if the type system distinguishes types by name). In terms of information specificity, a subtype is considered more specific than any one of its supertypes, because it holds at least as much information as each of them. This may increase the applicability, or ''relevance'' of the subtype (the number of situations where it can be accepted or introduced), as compared to its "more general" supertypes. The disadvantage of having this more detailed information is that it represents incorporated choices which reduce the ''prevalance'' of the subtype (the number of situations which are able to generate or produce it). In the context of subsumption, the type definitions can be expressed using
Set-builder notation In set theory and its applications to logic, mathematics, and computer science, set-builder notation is a mathematical notation for describing a set by enumerating its elements, or stating the properties that its members must satisfy. Defining ...
, which uses a predicate to define a set. Predicates can be defined over a domain (set of possible values) . Predicates are partial functions that compare values to selection criteria. For example: "is an integer value greater than or equal to 100 and less than 200?". If a value matches the criteria then the function returns the value. If not, the value is not selected, and nothing is returned. (List comprehensions are a form of this pattern used in many programming languages.) If there are two predicates, P_T which applies selection criteria for the type T, and P_s which applies additional criteria for the type S, then sets for the two types can be defined: :T = \ :S =\ The predicate \mathbf T = P_T is applied alongside P_s as part of the compound predicate S defining . The two predicates are ''conjoined'', so both must be true for a value to be selected. The predicate \mathbf S = \mathbf T \land P_s = P_T \land P_s subsumes the predicate T, so For example: there is a subfamily of cat species called ''Felinae'', which is part of the family ''Felidae''. The genus ''Felis'', to which the domestic cat species ''Felis catus'' belongs, is part of that subfamily. :\mathit :\mathit The conjunction of predicates has been expressed here through application of the second predicate over the domain of values conforming to the first predicate. Viewed as types, . If T subsumes S (T :> S) then a procedure, function or expression given a value s \in S as an operand (parameter value or term) will therefore be able to operate over that value as one of type T, because s \in T. In the example above, we could expect the function ''ofSubfamily'' to be applicable to values of all three types Felidae, Felinae and Felis.


Subtyping schemes

Type theorists make a distinction between nominal subtyping, in which only types declared in a certain way may be subtypes of each other, and structural subtyping, in which the structure of two types determines whether or not one is a subtype of the other. The class-based object-oriented subtyping described above is nominal; a structural subtyping rule for an object-oriented language might say that if objects of type ''A'' can handle all of the messages that objects of type ''B'' can handle (that is, if they define all the same
method Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to: *Scien ...
s), then ''A'' is a subtype of ''B'' regardless of whether either inherits from the other. This so-called ''
duck typing Duck typing in computer programming is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose. With nominative ty ...
'' is common in dynamically typed object-oriented languages. Sound structural subtyping rules for types other than object types are also well known. Implementations of programming languages with subtyping fall into two general classes: ''inclusive'' implementations, in which the representation of any value of type ''A'' also represents the same value at type ''B'' if ''A'' <: ''B'', and ''coercive'' implementations, in which a value of type ''A'' can be ''automatically converted'' into one of type ''B''. The subtyping induced by subclassing in an object-oriented language is usually inclusive; subtyping relations that relate integers and floating-point numbers, which are represented differently, are usually coercive. In almost all type systems that define a subtyping relation, it is reflexive (meaning ''A'' <: ''A'' for any type ''A'') and transitive (meaning that if ''A'' <: ''B'' and ''B'' <: ''C'' then ''A'' <: ''C''). This makes it a
preorder In mathematics, especially in order theory, a preorder or quasiorder is a binary relation that is reflexive and transitive. Preorders are more general than equivalence relations and (non-strict) partial orders, both of which are special cas ...
on types.


Record types


Width and depth subtyping

Types of records give rise to the concepts of ''width'' and ''depth'' subtyping. These express two different ways of obtaining a new type of record that allows the same operations as the original record type. Recall that a record is a collection of (named) fields. Since a subtype is a type which allows all operations allowed on the original type, a record subtype should support the same operations on the fields as the original type supported. One kind of way to achieve such support, called ''width subtyping'', adds more fields to the record. More formally, every (named) field appearing in the width supertype will appear in the width subtype. Thus, any operation feasible on the supertype will be supported by the subtype. The second method, called ''depth subtyping'', replaces the various fields with their subtypes. That is, the fields of the subtype are subtypes of the fields of the supertype. Since any operation supported for a field in the supertype is supported for its subtype, any operation feasible on the record supertype is supported by the record subtype. Depth subtyping only makes sense for immutable records: for example, you can assign 1.5 to the 'x' field of a real point (a record with two real fields), but you can't do the same to the 'x' field of an integer point (which, however, is a deep subtype of the real point type) because 1.5 is not an integer (see
Variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of numbe ...
). Subtyping of records can be defined in System F<:, which combines parametric polymorphism with subtyping of record types and is a theoretical basis for many functional programming languages that support both features. Some systems also support subtyping of labeled
disjoint union In mathematics, a disjoint union (or discriminated union) of a family of sets (A_i : i\in I) is a set A, often denoted by \bigsqcup_ A_i, with an injection of each A_i into A, such that the images of these injections form a partition of A ( ...
types (such as
algebraic data type In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types. Two common classes of algebraic types are product types (i.e., ...
s). The rule for width subtyping is reversed: every tag appearing in the width subtype must appear in the width supertype.


Function types

If is a function type, then a subtype of it is any function type with the property that and This can be summarised using the following
typing rule In type theory, a typing rule is an inference rule that describes how a type system assigns a type to a Syntax (programming languages), syntactic construction. These rules may be applied by the type system to determine if a Computer program, progr ...
: \over The parameter type of is said to be contravariant because the subtyping relation is reversed for it, whereas the return type is covariant. Informally, this reversal occurs because the refined type is "more liberal" in the types it accepts and "more conservative" in the type it returns. This is what exactly works in Scala: a ''n''-ary function is internally a class that inherits the \mathtt trait (which can be seen as a general
interface Interface or interfacing may refer to: Academic journals * ''Interface'' (journal), by the Electrochemical Society * '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics'' * '' Int ...
in
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
-like languages), where \mathtt are the parameter types, and \mathtt is its return type; "−" before the type means the type is contravariant while "+" means covariant. In languages that allow side effects, like most object-oriented languages, subtyping is generally not sufficient to guarantee that a function can be safely used in the context of another. Liskov's work in this area focused on
behavioral subtyping In object-oriented programming, behavioral subtyping is the principle that subclasses should satisfy the expectations of clients accessing subclass objects through references of superclass type, not just as regards syntactic safety (such as the abse ...
, which besides the type system safety discussed in this article also requires that subtypes preserve all invariants guaranteed by the supertypes in some
contract A contract is a legally enforceable agreement between two or more parties that creates, defines, and governs mutual rights and obligations between them. A contract typically involves the transfer of goods, services, money, or a promise to tr ...
.Barbara Liskov, Jeannette Wing,
A behavioral notion of subtyping
', ACM Transactions on Programming Languages and Systems, Volume 16, Issue 6 (November 1994), pp. 1811–1841. An updated version appeared as CMU technical report:
This definition of subtyping is generally undecidable, so it cannot be verified by a type checker. The subtyping of mutable references is similar to the treatment of parameter values and return values. Write-only references (or ''sinks'') are contravariant, like parameter values; read-only references (or ''sources'') are covariant, like return values. Mutable references which act as both sources and sinks are invariant.


Relationship with inheritance

Subtyping and inheritance are independent (orthogonal) relationships. They may coincide, but none is a special case of the other. In other words, between two types ''S'' and ''T'', all combinations of subtyping and inheritance are possible: # ''S'' is neither a subtype nor a derived type of ''T'' # ''S'' is a subtype but is not a derived type of ''T'' # ''S'' is not a subtype but is a derived type of ''T'' # ''S'' is both a subtype and a derived type of ''T'' The first case is illustrated by independent types, such as Boolean and Float. The second case can be illustrated by the relationship between Int32 and Int64. In most object oriented programming languages, Int64 are unrelated by inheritance to Int32. However Int32 can be considered a subtype of Int64 since any 32 bit integer value can be promoted into a 64 bit integer value. The third case is a consequence of function subtyping input contravariance. Assume a super class of type ''T'' having a method ''m'' returning an object of the same type (''i.e.'' the type of ''m'' is ''T'' → ''T'', also note that the first parameter of ''m'' is this/self) and a derived class type ''S'' from ''T''. By inheritance, the type of ''m'' in ''S'' is ''S'' → ''S''. In order for ''S'' to be a subtype of ''T'' the type of ''m'' in ''S'' must be a subtype of the type of ''m'' in ''T'' , in other words: ''S'' → ''S'' ≤: ''T'' → ''T''. By bottom-up application of the function subtyping rule, this means: ''S'' ≤: ''T'' and ''T'' ≤: ''S'', which is only possible if ''S'' and ''T'' are the same. Since inheritance is an irreflexive relation, ''S'' can't be a subtype of ''T''. Subtyping and inheritance are compatible when all inherited fields and methods of the derived type have types which are subtypes of the corresponding fields and methods from the inherited type .


Coercions

In coercive subtyping systems, subtypes are defined by implicit
type conversion In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point valu ...
functions from subtype to supertype. For each subtyping relationship (''S'' <: ''T''), a coercion function ''coerce'': ''S'' → ''T'' is provided, and any object ''s'' of type ''S'' is regarded as the object ''coerce''''S'' → ''T''(''s'') of type ''T''. A coercion function may be defined by composition: if ''S'' <: ''T'' and ''T'' <: ''U'' then ''s'' may be regarded as an object of type ''u'' under the compound coercion (''coerce''''T'' → ''U'' ∘ ''coerce''''S'' → ''T''). The type coercion from a type to itself ''coerce''''T'' → ''T'' is the
identity function Graph of the identity function on the real numbers In mathematics, an identity function, also called an identity relation, identity map or identity transformation, is a function that always returns the value that was used as its argument, un ...
''id''T. Coercion functions for records and
disjoint union In mathematics, a disjoint union (or discriminated union) of a family of sets (A_i : i\in I) is a set A, often denoted by \bigsqcup_ A_i, with an injection of each A_i into A, such that the images of these injections form a partition of A ( ...
subtypes may be defined componentwise; in the case of width-extended records, type coercion simply discards any components which are not defined in the supertype. The type coercion for function types may be given by ''f(''s'') = ''coerce''''S''2 → ''T''2(''f''(''coerce''''T''1 → ''S''1(''t''))), reflecting the contravariance of parameter values and covariance of return values. The coercion function is uniquely determined given the subtype and
supertype In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutability ...
. Thus, when multiple subtyping relationships are defined, one must be careful to guarantee that all type coercions are coherent. For instance, if an integer such as 2 : ''int'' can be coerced to a floating point number (say, 2.0 : ''float''), then it is not admissible to coerce 2.1 : ''float'' to 2 : ''int'', because the compound coercion ''coerce''''float'' → ''float'' given by ''coerce''''int'' → ''float'' ∘ ''coerce''''float'' → ''int'' would then be distinct from the identity coercion ''id''''float''.


See also

* Covariance and contravariance * The circle-ellipse problem (for the perils of subtyping variable-types on the same basis as value-types) *
Class-based programming Class-based programming, or more commonly class-orientation, is a style of object-oriented programming (OOP) in which inheritance occurs via defining ''classes'' of objects, instead of inheritance occurring via the objects alone (compare prototy ...
* Top type * Refinement type *
Behavioral subtyping In object-oriented programming, behavioral subtyping is the principle that subclasses should satisfy the expectations of clients accessing subclass objects through references of superclass type, not just as regards syntactic safety (such as the abse ...


Notes


References

Textbooks * Benjamin C. Pierce, ''Types and programming languages'', MIT Press, 2002, , chapter 15 (subtyping of record types), 19.3 (nominal vs. structural types and subtyping), and 23.2 (varieties of polymorphism) * C. Szyperski, D. Gruntz, S. Murer, ''Component software: beyond object-oriented programming'', 2nd ed., Pearson Education, 2002, , pp. 93–95 (a high-level presentation aimed at programming language users) Papers * Reynolds, John C. Using category theory to design implicit conversions and generic operators. In N. D. Jones, editor, Proceedings of the Aarhus Workshop on Semantics-Directed Compiler Generation, number 94 in Lecture Notes in Computer Science. Springer-Verlag, January 1980. Also in Carl A. Gunter and John C. Mitchell, editors, Theoretical Aspects of Object-Oriented Programming: Types, Semantics, and Language Design (MIT Press, 1994).


Further reading

*
John C. Reynolds John Charles Reynolds (June 1, 1935 – April 28, 2013) was an American computer scientist. Education and affiliations John Reynolds studied at Purdue University and then earned a Doctor of Philosophy (Ph.D.) in theoretical physics from Harvard U ...
, ''Theories of programming languages'', Cambridge University Press, 1998, , chapter 16. *
Martín Abadi Martín Abadi (born 1963) is an Argentine computer scientist, working at Google . He earned his Doctor of Philosophy (PhD) in computer science from Stanford University in 1987 as a student of Zohar Manna. He is well known for his work on ...
, Luca Cardelli, ''A theory of objects'', Springer, 1996, . Section 8.6 contrast the subtyping of records and objects. {{DEFAULTSORT:Subtype Polymorphism Data types Polymorphism (computer science) Type theory Object-oriented programming