In
computer programming
Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as ana ...
, a type system is a
logical system
A formal system is an abstract structure used for inferring theorems from axioms according to a set of rules. These rules, which are used for carrying out the inference of theorems from axioms, are the logical calculus of the formal system.
A form ...
comprising a set of rules that assigns a property called a
type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components.
A computer program ...
, such as
variables,
expressions,
functions, or
modules
Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a sy ...
. A type system dictates the operations that can be performed on a term. For variables, the type system determines the allowed values of that term. Type systems formalize and enforce the otherwise implicit categories the programmer uses for
algebraic data type
In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.
Two common classes of algebraic types are product types (i.e., t ...
s,
data structure
In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
s, or other components (e.g. "string", "array of float", "function returning boolean").
Type systems are often specified as part of
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
s and built into
interpreters and
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s, although the type system of a language can be extended by
optional tools that perform added checks using the language's original type syntax and grammar. The main purpose of a type system in a programming language is to reduce possibilities for
bugs in computer programs due to
type error
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type (computer science), type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constru ...
s. The given type system in question determines what constitutes a type error, but in general, the aim is to prevent operations expecting a certain kind of value from being used with values for which that operation does not make sense (validity errors). Type systems allow defining
interfaces
Interface or interfacing may refer to:
Academic journals
* Interface (journal), ''Interface'' (journal), by the Electrochemical Society
* ''Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Lin ...
between different parts of a computer program, and then checking that the parts have been connected in a consistent way. This checking can happen statically (at
compile time
In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled.
The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
), dynamically (at
run time
Run(s) or RUN may refer to:
Places
* Run (island), one of the Banda Islands in Indonesia
* Run (stream), a stream in the Dutch province of North Brabant
People
* Run (rapper), Joseph Simmons, now known as "Reverend Run", from the hip-hop group ...
), or as a combination of both. Type systems have other purposes as well, such as expressing business rules, enabling certain
compiler optimization
In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
s, allowing for
multiple dispatch
Multiple dispatch or multimethods is a feature of some programming languages in which a function or method can be dynamically dispatched based on the run-time (dynamic) type or, in the more general case, some other attribute of more than one of ...
, and providing a form of
documentation
Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance and use. As a form of knowledge manageme ...
.
Usage overview
An example of a simple type system is that of the
C language
C (''pronounced like the letter c'') is a general-purpose computer programming language. It was created in the 1970s by Dennis Ritchie, and remains very widely used and influential. By design, C's features cleanly reflect the capabilities o ...
. The portions of a C program are the
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
definitions. One function is invoked by another function. The interface of a function states the name of the function and a list of parameters that are passed to the function's code. The code of an invoking function states the name of the invoked, along with the names of variables that hold values to pass to it. During execution, the values are placed into temporary storage, then execution jumps to the code of the invoked function. The invoked function's code accesses the values and makes use of them. If the instructions inside the function are written with the assumption of receiving an
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
value, but the calling code passed a
floating-point value, then the wrong result will be computed by the invoked function. The C compiler checks the types of the arguments passed to a function when it is called against the types of the parameters declared in the function's definition. If the types do not match, the compiler throws a compile-time error.
A
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
may also use the static type of a value to optimize the storage it needs and the choice of algorithms for operations on the value. In many
C compilers the ''float''
data type
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
, for example, is represented in 32
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s, in accord with the
IEEE specification for single-precision floating point numbers. They will thus use floating-point-specific
microprocessor operations on those values (floating-point addition, multiplication, etc.).
The depth of type constraints and the manner of their evaluation affect the ''typing'' of the language. A
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
may further associate an operation with various resolutions for each type, in the case of
type polymorphism
In programming language theory and type theory, polymorphism is the provision of a single interface to entities of different types or the use of a single symbol to represent multiple different types.: "Polymorphic types are types whose operatio ...
.
Type theory
In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundat ...
is the study of type systems. The concrete types of some programming languages, such as integers and strings, depend on practical issues of computer architecture, compiler implementation, and language design.
Fundamentals
Formally,
type theory
In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundat ...
studies type systems. A programming language must have the opportunity to type check using the ''type system'' whether at compile time or runtime, manually annotated or automatically inferred. As
Mark Manasse
Mark may refer to:
Currency
* Bosnia and Herzegovina convertible mark, the currency of Bosnia and Herzegovina
* East German mark, the currency of the German Democratic Republic
* Estonian mark, the currency of Estonia between 1918 and 1927
* Fi ...
concisely put it:
Assigning a data type, termed ''typing'', gives meaning to a sequence of
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s such as a value in
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
or some
object
Object may refer to:
General meanings
* Object (philosophy), a thing, being, or concept
** Object (abstract), an object which does not exist at any particular time or place
** Physical object, an identifiable collection of matter
* Goal, an ai ...
such as a
variable
Variable may refer to:
* Variable (computer science), a symbolic name associated with a value and whose associated value may be changed
* Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
. The hardware of a
general purpose computer
A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These progra ...
is unable to discriminate between for example a
memory address
In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Su ...
and an
instruction code, or between a
character
Character or Characters may refer to:
Arts, entertainment, and media Literature
* ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk
* ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
, an
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
, or a
floating-point number
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be r ...
, because it makes no intrinsic distinction between any of the possible values that a sequence of bits might ''mean''.
[ Associating a sequence of bits with a type conveys that meaning to the programmable hardware to form a '']symbolic system
In logic, mathematics, computer science, and linguistics, a formal language consists of words whose letters are taken from an alphabet and are well-formed according to a specific set of rules.
The alphabet of a formal language consists of sym ...
'' composed of that hardware and some program.
A program associates each value with at least one specific type, but it also can occur that one value is associated with many subtype
Subtype may refer to:
* Viral subtypes, such as Subtypes of HIV
* Subtyping
In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is ...
s. Other entities, such as objects
Object may refer to:
General meanings
* Object (philosophy), a thing, being, or concept
** Object (abstract), an object which does not exist at any particular time or place
** Physical object, an identifiable collection of matter
* Goal, an ...
, modules
Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a sy ...
, communication channels, and dependencies can become associated with a type. Even a type can become associated with a type. An implementation of a ''type system'' could in theory associate identifications called ''data type
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
'' (a type of a value), ''class
Class or The Class may refer to:
Common uses not otherwise categorized
* Class (biology), a taxonomic rank
* Class (knowledge representation), a collection of individuals or objects
* Class (philosophy), an analytical concept used differentl ...
'' (a type of an object), and '' kind'' (a ''type of a type'', or metatype). These are the abstractions that typing can go through, on a hierarchy of levels contained in a system.
When a programming language evolves a more elaborate type system, it gains a more finely grained rule set than basic type checking, but this comes at a price when the type inferences (and other properties) become undecidable, and when more attention must be paid by the programmer to annotate code or to consider computer-related operations and functioning. It is challenging to find a sufficiently expressive type system that satisfies all programming practices in a type safe
In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
manner.
A programming language compiler can also implement a ''dependent type
In computer science and logic, a dependent type is a type whose definition depends on a value. It is an overlapping feature of type theory and type systems. In intuitionistic type theory, dependent types are used to encode logic's quantifiers lik ...
'' or an ''effect system
In computing, an effect system is a formal system that describes the computational effects of computer programs, such as side effects. An effect system can be used to provide a compile-time check of the possible effects of the program.
The effect ...
'', which enables even more program specifications to be verified by a type checker. Beyond simple value-type pairs, a virtual "region" of code is associated with an "effect" component describing ''what'' is being done ''with what'', and enabling for example to "throw" an error report. Thus the symbolic system may be a ''type and effect system'', which endows it with more safety checking than type checking alone.
Whether automated by the compiler or specified by a programmer, a type system makes program behavior illegal if outside the type-system rules. Advantages provided by programmer-specified type systems include:
* ''Abstraction'' (or ''modularity'') – Types enable programmers to think at a higher level than the bit or byte, not bothering with low-level implementation. For example, programmers can begin to think of a string as a set of character values instead of as a mere array of bytes. Higher still, types enable programmers to think about and express interface
Interface or interfacing may refer to:
Academic journals
* ''Interface'' (journal), by the Electrochemical Society
* ''Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics''
* '' Inte ...
s between two of ''any''-sized subsystems. This enables more levels of localization so that the definitions required for interoperability of the subsystems remain consistent when those two subsystems communicate.
* ''Documentation'' – In more expressive type systems, types can serve as a form of documentation
Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance and use. As a form of knowledge manageme ...
clarifying the intent of the programmer. For example, if a programmer declares a function as returning a timestamp type, this documents the function when the timestamp type can be explicitly declared deeper in the code to be an integer type.
Advantages provided by compiler-specified type systems include:
* ''Optimization'' – Static type-checking may provide useful compile-time information. For example, if a type requires that a value must align in memory at a multiple of four bytes, the compiler may be able to use more efficient machine instructions.
* ''Safety'' – A type system enables the compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
to detect meaningless or invalid code. For example, we can identify an expression 3 / "Hello, World"
as invalid, when the rules do not specify how to divide an integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
by a string
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
. Strong typing offers more safety, but cannot guarantee complete ''type safety
In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
''.
Type errors
A type error is an unintended condition which might manifest in multiple stages of a program's development. Thus a facility for detection of the error is needed in the type system. In some languages, such as Haskell, for which type inference
Type inference refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics ...
is automated, lint might be available to its compiler to aid in the detection of error.
Type safety contributes to program correctness
In theoretical computer science, an algorithm is correct with respect to a specification if it behaves as specified. Best explored is ''functional'' correctness, which refers to the input-output behavior of the algorithm (i.e., for each input it p ...
, but might only guarantee correctness at the cost of making the type checking itself an undecidable problem
In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is proved to be impossible to construct an algorithm that always leads to a correct yes-or-no answer. The halting problem is an ...
(as in the Halting problem
In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a g ...
). In a ''type system'' with automated type checking a program may prove to run incorrectly yet produce no compiler errors. Division by zero
In mathematics, division by zero is division (mathematics), division where the divisor (denominator) is 0, zero. Such a division can be formally expression (mathematics), expressed as \tfrac, where is the dividend (numerator). In ordinary ari ...
is an unsafe and incorrect operation, but a type checker which only runs at compile time
In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled.
The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
does not scan for division by zero in most languages; that division would surface as a runtime error. To prove the absence of these defects, other kinds of formal method
In computer science, formal methods are mathematically rigorous techniques for the specification, development, and verification of software and hardware systems. The use of formal methods for software and hardware design is motivated by the expec ...
s, collectively known as program analyses, are in common use. Alternatively, a sufficiently expressive type system, such as in dependently-typed languages, can prevent these kinds of errors (for example, expressing ''the type of non-zero numbers''). In addition software testing
Software testing is the act of examining the artifacts and the behavior of the software under test by validation and verification. Software testing can also provide an objective, independent view of the software to allow the business to apprecia ...
is an empirical
Empirical evidence for a proposition is evidence, i.e. what supports or counters this proposition, that is constituted by or accessible to sense experience or experimental procedure. Empirical evidence is of central importance to the sciences and ...
method for finding errors that such a type checker would not detect.
Type checking
The process of verifying and enforcing the constraints of types—''type checking''—may occur at compile time
In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled.
The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concept ...
(a static check) or at run-time. If a language specification requires its typing rules strongly (i.e., more or less allowing only those automatic type conversion
In computer science, type conversion, type casting, type coercion, and type juggling are different ways of changing an expression from one data type to another. An example would be the conversion of an integer value into a floating point value ...
s that do not lose information), one can refer to the process as ''strongly typed'', if not, as ''weakly typed''. The terms are not usually used in a strict sense.
Static type checking
Static type checking is the process of verifying the type safety
In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
of a program based on analysis of a program's text (source code
In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
). If a program passes a static type checker, then the program is guaranteed to satisfy some set of type safety properties for all possible inputs.
Static type checking can be considered a limited form of program verification
In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal metho ...
(see type safety
In computer science, type safety and type soundness are the extent to which a programming language discourages or prevents type errors. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is ...
), and in a type-safe language, can be considered also an optimization. If a compiler can prove that a program is well-typed, then it does not need to emit dynamic safety checks, allowing the resulting compiled binary to run faster and to be smaller.
Static type checking for Turing-complete
In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any Tur ...
languages is inherently conservative. That is, if a type system is both ''sound'' (meaning that it rejects all incorrect programs) and ''decidable'' (meaning that it is possible to write an algorithm that determines whether a program is well-typed), then it must be ''incomplete'' (meaning there are correct programs, which are also rejected, even though they do not encounter runtime errors). For example, consider a program containing the code:
:if then else
Even if the expression
always evaluates to true
at run-time, most type checkers will reject the program as ill-typed, because it is difficult (if not impossible) for a static analyzer to determine that the else
branch will not be taken. Consequently, a static type checker will quickly detect type errors in rarely used code paths. Without static type checking, even code coverage
In computer science, test coverage is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high test coverage has more of its source code executed during testing, ...
tests with 100% coverage may be unable to find such type errors. The tests may fail to detect such type errors, because the combination of all places where values are created and all places where a certain value is used must be taken into account.
A number of useful and common programming language features cannot be checked statically, such as downcasting
In class-based programming, downcasting or type refinement is the act of casting a reference of a base class to one of its derived classes.
In many programming languages, it is possible to check through type introspection to determine whether the ...
. Thus, many languages will have both static and dynamic type checking; the static type checker verifies what it can, and dynamic checks verify the rest.
Many languages with static type checking provide a way to bypass the type checker. Some languages allow programmers to choose between static and dynamic type safety. For example, C# distinguishes between ''statically-typed'' and ''dynamically-typed'' variables. Uses of the former are checked statically, whereas uses of the latter are checked dynamically. Other languages allow writing code that is not type-safe; for example, in C, programmers can freely cast a value between any two types that have the same size, effectively subverting the type concept.
For a list of languages with static type checking, see the category for statically-typed languages.
Dynamic type checking and runtime type information
Dynamic type checking is the process of verifying the type safety of a program at runtime. Implementations of dynamically type-checked languages generally associate each runtime object with a ''type tag'' (i.e., a reference to a type) containing its type information. This runtime type information (RTTI) can also be used to implement dynamic dispatch
In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented ...
, late binding
In computing, late binding or dynamic linkage—though not an identical process to Dynamic linker, dynamically linking imported code Library (computing), libraries—is a computer programming mechanism in which the Method (computer programming), ...
, downcasting
In class-based programming, downcasting or type refinement is the act of casting a reference of a base class to one of its derived classes.
In many programming languages, it is possible to check through type introspection to determine whether the ...
, reflection Reflection or reflexion may refer to:
Science and technology
* Reflection (physics), a common wave phenomenon
** Specular reflection, reflection from a smooth surface
*** Mirror image, a reflection in a mirror or in water
** Signal reflection, in ...
, and similar features.
Most type-safe languages include some form of dynamic type checking, even if they also have a static type checker. The reason for this is that many useful features or properties are difficult or impossible to verify statically. For example, suppose that a program defines two types, A and B, where B is a subtype of A. If the program tries to convert a value of type A to type B, which is known as downcasting
In class-based programming, downcasting or type refinement is the act of casting a reference of a base class to one of its derived classes.
In many programming languages, it is possible to check through type introspection to determine whether the ...
, then the operation is legal only if the value being converted is actually a value of type B. Thus, a dynamic check is needed to verify that the operation is safe. This requirement is one of the criticisms of downcasting.
By definition, dynamic type checking may cause a program to fail at runtime. In some programming languages, it is possible to anticipate and recover from these failures. In others, type-checking errors are considered fatal.
Programming languages that include dynamic type checking but not static type checking are often called "dynamically-typed programming languages". For a list of such languages, see the category for dynamically-typed programming languages.
Combining static and dynamic type checking
Some languages allow both static and dynamic typing. For example, Java and some other ostensibly statically-typed languages support downcasting
In class-based programming, downcasting or type refinement is the act of casting a reference of a base class to one of its derived classes.
In many programming languages, it is possible to check through type introspection to determine whether the ...
types to their subtypes
In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutability, ...
, querying an object to discover its dynamic type and other type operations that depend on runtime type information. Another example is C++ RTTI. More generally, most programming languages include mechanisms for dispatching over different 'kinds' of data, such as disjoint union
In mathematics, a disjoint union (or discriminated union) of a family of sets (A_i : i\in I) is a set A, often denoted by \bigsqcup_ A_i, with an injection of each A_i into A, such that the images of these injections form a partition of A (th ...
s, runtime polymorphism
In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriented ...
, and variant type
In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. O ...
s. Even when not interacting with type annotations or type checking, such mechanisms are materially similar to dynamic typing implementations. See programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
for more discussion of the interactions between static and dynamic typing.
Objects in object-oriented languages are usually accessed by a reference whose static target type (or manifest type) is equal to either the object's run-time type (its latent type) or a supertype thereof. This is conformant with the Liskov substitution principle
The Liskov substitution principle (LSP) is a particular definition of a subtyping relation, called strong behavioral subtyping, that was initially introduced by Barbara Liskov in a 1988 conference keynote address titled ''Data abstraction and h ...
, which states that all operations performed on an instance of a given type can also be performed on an instance of a subtype. This concept is also known as subsumption or subtype polymorphism
In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutability ...
. In some languages subtypes may also possess covariant or contravariant return types and argument types respectively.
Certain languages, for example Clojure
Clojure (, like ''closure'') is a dynamic and functional dialect of the Lisp programming language on the Java platform. Like other Lisp dialects, Clojure treats code as data and has a Lisp macro system. The current development process is comm ...
, Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fro ...
, or Cython
Cython () is a programming language that aims to be a superset of the Python programming language, designed to give C-like performance with code that is written mostly in Python with optional additional C-inspired syntax.
Cython is a compiled ...
are dynamically type checked by default, but allow programs to opt into static type checking by providing optional annotations. One reason to use such hints would be to optimize the performance of critical sections of a program. This is formalized by gradual typing
Gradual typing is a type system in which some variables and expressions may be given types and the correctness of the typing is checked at compile time (which is static typing) and some expressions may be left untyped and eventual type errors a ...
. The programming environment ''DrRacket
Racket is a general-purpose, multi-paradigm programming language and a multi-platform distribution that includes the Racket language, compiler, large standard library, IDE, development tools, and a set of additional languages including Typed R ...
'', a pedagogic environment based on Lisp, and a precursor of the language Racket is also soft-typed.
Conversely, as of version 4.0, the C# language provides a way to indicate that a variable should not be statically type checked. A variable whose type is dynamic
will not be subject to static type checking. Instead, the program relies on runtime type information to determine how the variable may be used.
In Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
, the type provides dynamic typing of types.
Static and dynamic type checking in practice
The choice between static and dynamic typing requires certain trade-off
A trade-off (or tradeoff) is a situational decision that involves diminishing or losing one quality, quantity, or property of a set or design in return for gains in other aspects. In simple terms, a tradeoff is where one thing increases, and anot ...
s.
Static typing can find type errors reliably at compile time, which increases the reliability of the delivered program. However, programmers disagree over how commonly type errors occur, resulting in further disagreements over the proportion of those bugs that are coded that would be caught by appropriately representing the designed types in code. Static typing advocates believe programs are more reliable when they have been well type-checked, whereas dynamic-typing advocates point to distributed code that has proven reliable and to small bug databases. The value of static typing increases as the strength of the type system is increased. Advocates of dependent typing
In computer science and logic, a dependent type is a type whose definition depends on a value. It is an overlapping feature of type theory and type systems. In intuitionistic type theory, dependent types are used to encode logic's quantifiers lik ...
, implemented in languages such as Dependent ML
Dependent ML is an experimental functional programming language proposed by Hongwei Xi and Frank Pfenning. Dependent ML extends ML by a restricted notion of dependent types: types may be dependent on static indices of type Nat (natural numbers). ...
and Epigram
An epigram is a brief, interesting, memorable, and sometimes surprising or satirical statement. The word is derived from the Greek "inscription" from "to write on, to inscribe", and the literary device has been employed for over two mille ...
, have suggested that almost all bugs can be considered type errors, if the types used in a program are properly declared by the programmer or correctly inferred by the compiler.
Static typing usually results in compiled code that executes faster. When the compiler knows the exact data types that are in use (which is necessary for static verification, either through declaration or inference) it can produce optimized machine code. Some dynamically-typed languages such as Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fro ...
allow optional type declarations for optimization for this reason.
By contrast, dynamic typing may allow compilers to run faster and interpreters
Interpreting is a translational activity in which one produces a first and final target-language output on the basis of a one-time exposure to an expression in a source language.
The most common two modes of interpreting are simultaneous inter ...
to dynamically load new code, because changes to source code in dynamically-typed languages may result in less checking to perform and less code to revisit. This too may reduce the edit-compile-test-debug cycle.
Statically-typed languages that lack type inference
Type inference refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics ...
(such as C and Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
prior to version 10) require that programmers declare the types that a method or function must use. This can serve as added program documentation, that is active and dynamic, instead of static. This allows a compiler to prevent it from drifting out of synchrony, and from being ignored by programmers. However, a language can be statically typed without requiring type declarations (examples include Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lang ...
, Scala, OCaml
OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
, F#, and to a lesser extent C# and C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
), so explicit type declaration is not a necessary requirement for static typing in all languages.
Dynamic typing allows constructs that some (simple) static type checking would reject as illegal. For example, ''eval
In some programming languages, eval , short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had b ...
'' functions, which execute arbitrary data as code, become possible. An ''eval'' function is possible with static typing, but requires advanced uses of algebraic data types
In computer programming, especially functional programming and type theory, an algebraic data type (ADT) is a kind of composite type, i.e., a type formed by combining other types.
Two common classes of algebraic types are product types (i.e., t ...
. Further, dynamic typing better accommodates transitional code and prototyping, such as allowing a placeholder data structure (mock object
In object-oriented programming, mock objects are simulated objects that mimic the behaviour of real objects in controlled ways, most often as part of a software testing initiative. A programmer typically creates a mock object to test the behaviou ...
) to be transparently used in place of a full data structure (usually for the purposes of experimentation and testing).
Dynamic typing typically allows duck typing
Duck typing in computer programming is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose. With nominative t ...
(which enables easier code reuse). Many languages with static typing also feature duck typing
Duck typing in computer programming is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose. With nominative t ...
or other mechanisms like generic programming
Generic programming is a style of computer programming in which algorithms are written in terms of types ''to-be-specified-later'' that are then ''instantiated'' when needed for specific types provided as parameters. This approach, pioneered b ...
that also enable easier code reuse.
Dynamic typing typically makes metaprogramming
Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself ...
easier to use. For example, C++ templates
C, or c, is the third letter in the Latin alphabet, used in the modern English alphabet, the alphabets of other western European languages and others worldwide. Its name in English is ''cee'' (pronounced ), plural ''cees''.
History
"C" ...
are typically more cumbersome to write than the equivalent Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
or Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
code since C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
has stronger rules regarding type definitions (for both functions and variables). This forces a developer to write more boilerplate code
In computer programming, boilerplate code, or simply boilerplate, are sections of code that are repeated in multiple places with little to no variation. When using languages that are considered ''verbose'', the programmer must write a lot of boile ...
for a template than a Python developer would need to. More advanced run-time constructs such as metaclass
In object-oriented programming, a metaclass is a class whose instances are classes. Just as an ordinary class defines the behavior of certain objects, a metaclass defines the behavior of certain classes and their instances. Not all object-oriente ...
es and introspection
Introspection is the examination of one's own conscious thoughts and feelings. In psychology, the process of introspection relies on the observation of one's mental state, while in a spiritual context it may refer to the examination of one's s ...
are often harder to use in statically-typed languages. In some languages, such features may also be used e.g. to generate new types and behaviors on the fly, based on run-time data. Such advanced constructs are often provided by dynamic programming language
In computer science, a dynamic programming language is a class of high-level programming languages, which at runtime execute many common programming behaviours that static programming languages perform during compilation. These behaviors cou ...
s; many of these are dynamically typed, although ''dynamic typing'' need not be related to ''dynamic programming languages''.
Strong and weak type systems
Languages are often colloquially referred to as ''strongly typed'' or ''weakly typed''. In fact, there is no universally accepted definition of what these terms mean. In general, there are more precise terms to represent the differences between type systems that lead people to call them "strong" or "weak".
Type safety and memory safety
A third way of categorizing the type system of a programming language is by the safety of typed operations and conversions. Computer scientists use the term ''type-safe language'' to describe languages that do not allow operations or conversions that violate the rules of the type system.
Computer scientists use the term ''memory-safe language'' (or just ''safe language'') to describe languages that do not allow programs to access memory that has not been assigned for their use. For example, a memory-safe language will check array bounds, or else statically guarantee (i.e., at compile time before execution) that array accesses out of the array boundaries will cause compile-time and perhaps runtime errors.
Consider the following program of a language that is both type-safe and memory-safe:
var x := 5;
var y := "37";
var z := x + y;
In this example, the variable will have the value 42. Although this may not be what the programmer anticipated, it is a well-defined result. If were a different string, one that could not be converted to a number (e.g. "Hello World"), the result would be well-defined as well. Note that a program can be type-safe or memory-safe and still crash on an invalid operation. This is for languages where the type system is not sufficiently advanced to precisely specify the validity of operations on all possible operands. But if a program encounters an operation that is not type-safe, terminating the program is often the only option.
Now consider a similar example in C:
int x = 5;
char y[] = "37";
char* z = x + y;
printf("%c\n", *z);
In this example will point to a memory address five characters beyond , equivalent to three characters after the terminating zero character of the string pointed to by . This is memory that the program is not expected to access. In C terms this is simply undefined behaviour
In computer programming, undefined behavior (UB) is the result of executing a program whose behavior is prescribed to be unpredictable, in the language specification to which the computer code adheres. This is different from unspecified behavior, ...
and the program may do anything; with a simple compiler it might actually print whatever byte is stored after the string "37". As this example shows, C is not memory-safe. As arbitrary data was assumed to be a character, it is also not a type-safe language.
In general, type-safety and memory-safety go hand in hand. For example, a language that supports pointer arithmetic and number-to-pointer conversions (like C) is neither memory-safe nor type-safe, because it allows arbitrary memory to be accessed as if it were valid memory of any type.
For more information, see memory safety
Memory safety is the state of being protected from various software bugs and Vulnerability (computing), security vulnerabilities when dealing with random-access memory, memory access, such as buffer overflows and dangling pointers. For example, J ...
.
Variable levels of type checking
Some languages allow different levels of checking to apply to different regions of code. Examples include:
* The use strict
directive in JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
and Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
applies stronger checking.
* The declare(strict_types=1)
in PHP
PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group ...
on a per-file basis allows only a variable of exact type of the type declaration will be accepted, or a TypeError
will be thrown.
* The Option Strict On
in VB.NET
Visual Basic, originally called Visual Basic .NET (VB.NET), is a multi-paradigm, object-oriented programming language, implemented on .NET, Mono, and the .NET Framework. Microsoft launched VB.NET in 2002 as the successor to its original Visua ...
allows the compiler to require a conversion between objects.
Additional tools such as lint and IBM Rational Purify
PurifyPlus is a memory debugger program used by software developers to detect memory access errors in programs, especially those written in C or C++. It was originally written by Reed Hastings of Pure Software.Gilad Bracha
Gilad Bracha is a software engineer at F5 Networks, and formerly at Google, where he was on the Dart programming language team. He is creator of the Newspeak language, and co-author of the second and third editions of the Java Language Specificati ...
, that the choice of type system be made independent of choice of language; that a type system should be a module that can be ''plugged'' into a language as needed. He believes this is advantageous, because what he calls mandatory type systems make languages less expressive and code more fragile. The requirement that the type system does not affect the semantics of the language is difficult to fulfill.
Optional typing is related to, but distinct from, gradual typing
Gradual typing is a type system in which some variables and expressions may be given types and the correctness of the typing is checked at compile time (which is static typing) and some expressions may be left untyped and eventual type errors a ...
. While both typing disciplines can be used to perform static analysis of code (static typing
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
), optional type systems do not enforce type safety at runtime (dynamic typing
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
). [
]
Polymorphism and types
The term ''polymorphism'' refers to the ability of code (especially, functions or classes) to act on values of multiple types, or to the ability of different instances of the same data structure to contain elements of different types. Type systems that allow polymorphism generally do so in order to improve the potential for code re-use: in a language with polymorphism, programmers need only implement a data structure such as a list or an associative array
In computer science, an associative array, map, symbol table, or dictionary is an abstract data type that stores a collection of (key, value) pairs, such that each possible key appears at most once in the collection. In mathematical terms an as ...
once, rather than once for each type of element with which they plan to use it. For this reason computer scientists sometimes call the use of certain forms of polymorphism ''generic programming
Generic programming is a style of computer programming in which algorithms are written in terms of types ''to-be-specified-later'' that are then ''instantiated'' when needed for specific types provided as parameters. This approach, pioneered b ...
''. The type-theoretic foundations of polymorphism are closely related to those of abstraction
Abstraction in its main sense is a conceptual process wherein general rules and concepts are derived from the usage and classification of specific examples, literal ("real" or "concrete") signifiers, first principles, or other methods.
"An abstr ...
, modularity
Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a sy ...
and (in some cases) subtyping
In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutabilit ...
.
Specialized type systems
Many type systems have been created that are specialized for use in certain environments with certain types of data, or for out-of-band static program analysis
In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution.
The term i ...
. Frequently, these are based on ideas from formal type theory
In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a foundat ...
and are only available as part of prototype research systems.
The following table gives an overview over type theoretic concepts that are used in specialized type systems.
The names range over terms and the names range over types.
The following notation will be used:
* means that has type ;
* is that application of on ;
*