HOME

TheInfoList



OR:

Type inference refers to the automatic detection of the type of an expression in a
formal language In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules. The alphabet of a formal language consists of sy ...
. These include
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s and mathematical
type system In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progr ...
s, but also
natural languages In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
in some branches of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
.


Nontechnical explanation

Types in a most general view can be associated to a designated use suggesting and restricting the activities possible for an object of that type. Many nouns in language specify such uses. For instance, the word
leash A leash (also called a lead, lead line or tether) is a rope or similar material used to control an animal by attaching it to a collar, harness, or halter. In British English, a leash is generally for a larger (possibly dangerous or aggressive ...
indicates a different use than the word line. Calling something a
table Table may refer to: * Table (furniture), a piece of furniture with a flat surface and one or more legs * Table (landform), a flat area of land * Table (information), a data arrangement with rows and columns * Table (database), how the table data ...
indicates another designation than calling it firewood, though it might be materially the same thing. While their material properties make things usable for some purposes, they are also subject of particular designations. This is especially the case in abstract fields, namely mathematics and computer science, where the material is finally only bits or formulas. To exclude unwanted, but materially possible uses, the concept of types is defined and applied in many variations. In mathematics,
Russell's paradox In mathematical logic, Russell's paradox (also known as Russell's antinomy) is a set-theoretic paradox discovered by the British philosopher and mathematician Bertrand Russell in 1901. Russell's paradox shows that every set theory that contains ...
sparked early versions of type theory. In programming languages, typical examples are "type errors", e.g. ordering a computer to sum values that are not numbers. While materially possible, the result would no longer be meaningful and perhaps disastrous for the overall process. In a ''typing'', an expression is opposed to a type. For example, 4, 2+2, and 2\cdot 2 are all separate terms with the type \mathrm for natural numbers. Traditionally, the expression is followed by a colon and its type, such as 2 : \mathrm. This means that the value 2 is of type \mathrm. This form is also used to declare new names, e.g. n : \mathrm, much like introducing a new character to a scene by the words "detective Decker". Contrary to a story, where the designations slowly unfold, the objects in formal languages often have to be defined with their type from very beginning. Additionally, if the expressions are ambiguous, types may be needed to make the intended use explicit. For instance, the expression 2 might have a type \mathrm but could also be read as a rational or real number or even as a plain text. As a consequence, programs or proofs can become so encumbered with types, that it is desirable to deduce them from the context. This can be possible by collecting the uses of untyped expression (including undefined names). If, for instance, a yet undefined name ''n'' is used in an expression n + 2, one could conclude, that ''n'' is at least a number. The process of deducing the type from an expression and its context is ''type inference''. In general not only objects, but also activities have types and may be introduced simply by their use. For a Star Trek story, such a unknown activity could be "beaming", which for sake of the story's flow is just executed and never formally introduced. Nevertheless one can deduce its type (transport) following what happens. Additionally, both objects and activities can be constructed from their parts. In such a setting, type inference cannot only become more complex, but also more helpful, as it allows to collect a complete description of everything in a composed scene, while still being able to detect conflicting or unintended uses.


Type-checking vs. type-inference

In a typing, an expression E is opposed to a type T, formally written as E : T. Usually a typing only makes sense within some context, which is omitted here. In this setting, the following questions are of particular interest: # E : T? In this case, both an expression E and a type T are given. Now, is E really a T? This scenario is known as
type-checking In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
. # E : _? Here, only the expression is known. If there is a way to derive a type for E, then we have accomplished ''type inference''. # _ : T? The other way round. Given only a type, is there any expression for it or does the type have no values? Is there any example of a T? For the
simply typed lambda calculus The simply typed lambda calculus (\lambda^\to), a form of type theory, is a typed interpretation of the lambda calculus with only one type constructor (\to) that builds function types. It is the canonical and simplest example of a typed lambda c ...
, all three questions are decidable. The situation is not as comfortable when more expressive types are allowed.


Types in programming languages

Types are a feature present in some strongly statically typed languages. It is often characteristic of
functional programming language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
s in general. Some languages that include type inference include
C++11 C++11 is a version of the ISO/ IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions b ...
, C# (starting with version 3.0),
Chapel A chapel is a Christian place of prayer and worship that is usually relatively small. The term has several meanings. Firstly, smaller spaces inside a church that have their own altar are often called chapels; the Lady chapel is a common ty ...
, Clean,
Crystal A crystal or crystalline solid is a solid material whose constituents (such as atoms, molecules, or ions) are arranged in a highly ordered microscopic structure, forming a crystal lattice that extends in all directions. In addition, macro ...
, D, F#,
FreeBASIC FreeBASIC is a free and open source multiplatform compiler and programming language based on BASIC licensed under the GNU GPL for Microsoft Windows, protected-mode MS-DOS (DOS extender), Linux, FreeBSD and Xbox. The Xbox version is no longer m ...
, Go,
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
,
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
(starting with version 10),
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
, Kotlin, ML, Nim,
OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
, Opa, Q#,
RPython PyPy () is an implementation of the Python programming language. PyPy often runs faster than the standard implementation CPython because PyPy uses a just-in-time compiler. Most Python code runs well on PyPy except for code that depends on CPyth ...
,
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
, Scala,
Swift Swift or SWIFT most commonly refers to: * SWIFT, an international organization facilitating transactions between banks ** SWIFT code * Swift (programming language) * Swift (bird), a family of birds It may also refer to: Organizations * SWIFT, ...
,
TypeScript TypeScript is a free and open source programming language developed and maintained by Microsoft. It is a strict syntactical superset of JavaScript and adds optional static typing to the language. It is designed for the development of large app ...
,
Vala Vala or VALA may refer to: Religion and mythology * Vala (Vedic), a demon or a stone cavern in the Hindu scriptures * Völva, also spelled Vala, a priestess in Norse mythology and Norse paganism Fiction * Vala (Middle-earth), an angelic being in ...
,
Dart Dart or DART may refer to: * Dart, the equipment in the game of darts Arts, entertainment and media * Dart (comics), an Image Comics superhero * Dart, a character from ''G.I. Joe'' * Dart, a ''Thomas & Friends'' railway engine character * Dar ...
, and
Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic .NET (now simply referred to as "Visual Basic"), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic ( ...
(starting with version 9.0). The majority of them use a simple form of type inference; the Hindley-Milner type system can provide more complete type inference. The ability to infer types automatically makes many programming tasks easier, leaving the programmer free to omit type annotations while still permitting type checking. In some programming languages, all values have a data type explicitly declared at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
, limiting the values a particular expression can take on at run-time. Increasingly,
just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may cons ...
blurs the distinction between run time and compile time. However, historically, if the type of a value is known only at run-time, these languages are
dynamically typed In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
. In other languages, the type of an expression is known only at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
; these languages are statically typed. In most statically typed languages, the input and output types of functions and
local variable In computer science, a local variable is a variable that is given ''local scope''. A local variable reference in the function or block in which it is declared overrides the same variable name in the larger scope. In programming languages with o ...
s ordinarily must be explicitly provided by type annotations. For example, in C: int add_one(int x) The
signature A signature (; from la, signare, "to sign") is a handwritten (and often stylized) depiction of someone's name, nickname, or even a simple "X" or other mark that a person writes on documents as a proof of identity and intent. The writer of a ...
of this function definition, int add_one(int x), declares that add_one is a function that takes one argument, an
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
, and returns an integer. int result; declares that the local variable result is an integer. In a hypothetical language supporting type inference, the code might be written like this instead: add_one(x) This is identical to how code is written in the language
Dart Dart or DART may refer to: * Dart, the equipment in the game of darts Arts, entertainment and media * Dart (comics), an Image Comics superhero * Dart, a character from ''G.I. Joe'' * Dart, a ''Thomas & Friends'' railway engine character * Dar ...
, except that it is subject to some added constraints as described below. It would be possible to ''infer'' the types of all the variables at compile time. In the example above, the compiler would infer that result and x have type integer since the constant 1 is type integer, and hence that add_one is a function int -> int. The variable result2 isn't used in a legal manner, so it wouldn't have a type. In the imaginary language in which the last example is written, the compiler would assume that, in the absence of information to the contrary, + takes two integers and returns one integer. (This is how it works in, for example,
OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
.) From this, the type inferencer can infer that the type of x + 1 is an integer, which means result is an integer and thus the return value of add_one is an integer. Similarly, since + requires both of its arguments be of the same type, x must be an integer, and thus, add_one accepts one integer as an argument. However, in the subsequent line, ''result2'' is calculated by adding a decimal 1.0 with floating-point arithmetic, causing a conflict in the use of x for both integer and floating-point expressions. The correct type-inference algorithm for such a situation has been known since 1958 and has been known to be correct since 1982. It revisits the prior inferences and uses the most general type from the outset: in this case floating-point. This can however have detrimental implications, for instance using a floating-point from the outset can introduce precision issues that would have not been there with an integer type. Frequently, however, degenerate type-inference algorithms are used that cannot backtrack and instead generate an error message in such a situation. This behavior may be preferable as type inference may not always be neutral algorithmically, as illustrated by the prior floating-point precision issue. An algorithm of intermediate generality implicitly declares ''result2'' as a floating-point variable, and the addition implicitly converts x to a floating point. This can be correct if the calling contexts never supply a floating point argument. Such a situation shows the difference between ''type inference'', which does not involve
type conversion In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point valu ...
, and
implicit type conversion In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value ...
, which forces data to a different data type, often without restrictions. Finally, a significant downside of complex type-inference algorithm is that the resulting type inference resolution is not going to be obvious to humans (notably because of the backtracking), which can be detrimental as code is primarily intended to be comprehensible to humans. The recent emergence of
just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may cons ...
allows for hybrid approaches where the type of arguments supplied by the various calling context is known at compile time, and can generate a large number of compiled versions of the same function. Each compiled version can then be optimized for a different set of types. For instance, JIT compilation allows there to be at least two compiled versions of ''add_one'': :A version that accepts an integer input and uses implicit type conversion. :A version that accepts a floating-point number as input and uses floating point instructions throughout.


Technical description

Type inference is the ability to automatically deduce, either partially or fully, the type of an expression at compile time. The compiler is often able to infer the type of a variable or the
type signature In computer science, a type signature or type annotation defines the inputs and outputs for a function, subroutine or method. A type signature includes the number, types, and order of the arguments contained by a function. A type signature is typ ...
of a function, without explicit type annotations having been given. In many cases, it is possible to omit type annotations from a program completely if the type inference system is robust enough, or the program or language is simple enough. To obtain the information required to infer the type of an expression, the compiler either gathers this information as an aggregate and subsequent reduction of the type annotations given for its subexpressions, or through an implicit understanding of the type of various atomic values (e.g. true : Bool; 42 : Integer; 3.14159 : Real; etc.). It is through recognition of the eventual reduction of expressions to implicitly typed atomic values that the compiler for a type inferring language is able to compile a program completely without type annotations. In complex forms of
higher-order programming Higher-order programming is a style of computer programming that uses software components, like functions, modules or objects, as values. It is usually instantiated with, or borrowed from, models of computation such as lambda calculus which make ...
and polymorphism, it is not always possible for the compiler to infer as much, and type annotations are occasionally necessary for disambiguation. For instance, type inference with
polymorphic recursion In computer science, polymorphic recursion (also referred to as Milner– Mycroft typability or the Milner–Mycroft calculus) refers to a recursive parametrically polymorphic function where the type parameter changes with each recursive i ...
is known to be undecidable. Furthermore, explicit type annotations can be used to optimize code by forcing the compiler to use a more specific (faster/smaller) type than it had inferred. Some methods for type inference are based on
constraint satisfaction In artificial intelligence and operations research, constraint satisfaction is the process of finding a solution through a set of constraints that impose conditions that the variables must satisfy. A solution is therefore a set of values for th ...
or
satisfiability modulo theories In computer science and mathematical logic, satisfiability modulo theories (SMT) is the problem of determining whether a mathematical formula is satisfiable. It generalizes the Boolean satisfiability problem (SAT) to more complex formulas involvi ...
.


Example

As an example, the
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
function map applies a function to each element of a list, and may be defined as: map f [] = [] map f (first:rest) = f first : map f rest Type inference on the map function proceeds as follows. map is a function of two arguments, so its type is constrained to be of the form a → b → c. In Haskell, the patterns [] and (first:rest) always match lists, so the second argument must be a list type: b = [d] for some type d. Its first argument f is function application, applied to the argument first, which must have type d, corresponding with the type in the list argument, so f :: d → e (:: means "is of type") for some type e. The return value of map f, finally, is a list of whatever f produces, so /code>. Putting the parts together leads to map :: (d → e) → /code>. Nothing is special about the type variables, so it can be relabeled as map :: (a → b) → It turns out that this is also the most general type, since no further constraints apply. As the inferred type of map is parametrically polymorphic, the type of the arguments and results of f are not inferred, but left as type variables, and so map can be applied to functions and lists of various types, as long as the actual types match in each invocation.


Hindley–Milner type inference algorithm

The algorithm first used to perform type inference is now informally termed the Hindley–Milner algorithm, although the algorithm should properly be attributed to Damas and Milner. The origin of this algorithm is the type inference algorithm for the
simply typed lambda calculus The simply typed lambda calculus (\lambda^\to), a form of type theory, is a typed interpretation of the lambda calculus with only one type constructor (\to) that builds function types. It is the canonical and simplest example of a typed lambda c ...
that was devised by
Haskell Curry Haskell Brooks Curry (; September 12, 1900 – September 1, 1982) was an American mathematician and logician. Curry is best known for his work in combinatory logic. While the initial concept of combinatory logic was based on a single paper by ...
and
Robert Feys Robert Feys (19 December 1889 – 13 April 1961) was a Belgian logician and philosopher, who worked at the University of Leuven (Belgium).De Raeymaeker, Louis.In memoriam le chanoine Robert Feys" ''Revue Philosophique de Louvain'' 59.62 (1961): 37 ...
in 1958. In 1969 J. Roger Hindley extended this work and proved that their algorithm always inferred the most general type. In 1978
Robin Milner Arthur John Robin Gorell Milner (13 January 1934 – 20 March 2010), known as Robin Milner or A. J. R. G. Milner, was a British computer scientist, and a Turing Award winner.
, independently of Hindley's work, provided an equivalent algorithm, Algorithm W. In 1982
Luis Damas Luis is a given name. It is the Spanish form of the originally Germanic name or . Other Iberian Romance languages have comparable forms: (with an accent mark on the i) in Portuguese and Galician, in Aragonese and Catalan, while is archai ...
finally proved that Milner's algorithm is complete and extended it to support systems with polymorphic references.


Side-effects of using the most general type

By design, type inference, especially correct (backtracking) type inference will introduce use of the most general type appropriate, however this can have implications as more general types may not always be algorithmically neutral, the typical cases being: * floating-point being considered as a general type of integer, while floating-point will introduce precision issues * variant/dynamic types being considered as a general type of other types, which will introduce casting rules and comparison that could be different, for instance such types use the '+' operator for both numeric additions and string concatenations, but what operation is performed is determined dynamically rather than statically


Type inference for natural languages

Type inference algorithms have been used to analyze
natural languages In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
as well as programming languages. Type inference algorithms are also used in some
grammar induction Grammar induction (or grammatical inference) is the process in machine learning of learning a formal grammar (usually as a collection of ''re-write rules'' or '' productions'' or alternatively as a finite state machine or automaton of some kind) fr ...
and
constraint-based grammar Model-theoretic grammars, also known as constraint-based grammars, contrast with generative grammars in the way they define sets of sentences: they state constraints on syntactic structure rather than providing operations for generating syntactic ...
systems for natural languages.


References


External links


Archived e-mail message
by Roger Hindley, explains history of type inference
Polymorphic Type Inference
by Michael Schwartzbach, gives an overview of Polymorphic type inference.
Basic Typechecking
paper by Luca Cardelli, describes algorithm, includes implementation in Modula-2
Implementation
of Hindley-Milner type inference in Scala, by Andrew Forrest (retrieved July 30, 2009) *
What is Hindley-Milner? (and why is it cool?)
Explains Hindley-Milner, examples in Scala {{DEFAULTSORT:Type Inference Type systems Type theory Inference Articles with example code