HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, type safety and type soundness are the extent to which a
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
discourages or prevents
type error In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
s. Type safety is sometimes alternatively considered to be a property of facilities of a computer language; that is, some facilities are type-safe and their usage will not result in type errors, while other facilities in the same language may be type-unsafe and a program using them may encounter type errors. The behaviors classified as type errors by a given programming language are usually those that result from attempts to perform operations on
values In ethics and social sciences, value denotes the degree of importance of something or action, with the aim of determining which actions are best to do or what way is best to live (normative ethics in ethics), or to describe the significance of di ...
that are not of the appropriate data type, e.g., adding a string to an integer when there's no definition on how to handle this case. This classification is partly based on opinion. Type enforcement can be static, catching potential errors at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concep ...
, or dynamic, associating type information with values at run-time and consulting them as needed to detect imminent errors, or a combination of both. Dynamic type enforcement often allows programs to run that would be invalid under static enforcement. In the context of static (compile-time) type systems, type safety usually involves (among other things) a guarantee that the eventual value of any
expression Expression may refer to: Linguistics * Expression (linguistics), a word, phrase, or sentence * Fixed expression, a form of words with a specific meaning * Idiom, a type of fixed expression * Metaphorical expression, a particular word, phrase, o ...
will be a legitimate member of that expression's static type. The precise requirement is more subtle than this — see, for example,
subtyping In programming language theory, subtyping (also subtype polymorphism or inclusion polymorphism) is a form of type polymorphism in which a subtype is a datatype that is related to another datatype (the supertype) by some notion of substitutability, ...
and polymorphism for complications.


Definitions

Intuitively, type soundness is captured by
Robin Milner Arthur John Robin Gorell Milner (13 January 1934 – 20 March 2010), known as Robin Milner or A. J. R. G. Milner, was a British computer scientist, and a Turing Award winner.
's pithy statement that :Well-typed programs cannot "go wrong". In other words, if a type system is ''sound'', then expressions accepted by that type system must evaluate to a value of the appropriate type (rather than produce a value of some other, unrelated type or crash with a type error). Vijay Saraswat provides the following, related definition: :A language is type-safe if the only operations that can be performed on data in the language are those sanctioned by the type of the data. However, what precisely it means for a program to be "well typed" or to "go wrong" are properties of its
static Static may refer to: Places *Static Nunatak, a nunatak in Antarctica United States * Static, Kentucky and Tennessee *Static Peak, a mountain in Wyoming **Static Peak Divide, a mountain pass near the peak Science and technology Physics *Static el ...
and
dynamic semantics Dynamic semantics is a framework in logic and natural language semantics that treats the meaning of a sentence as its potential to update a context. In static semantics, knowing the meaning of a sentence amounts to knowing when it is true; in dynam ...
, which are specific to each programming language. Consequently, a precise, formal definition of type soundness depends upon the style of formal semantics used to specify a language. In 1994, Andrew Wright and
Matthias Felleisen Matthias Felleisen is a German-American computer science professor and author. He grew up in Germany and immigrated to the US when he was 21 years old. He received his PhD from Indiana University under the direction of Daniel P. Friedman. Afte ...
formulated what has become the standard definition and proof technique for type safety in languages defined by
operational semantics Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its execut ...
, which is closest to the notion of type safety as understood by most programmers. Under this approach, the semantics of a language must have the following two properties to be considered type-sound: ;Progress: A well-typed program never gets "stuck": every
expression Expression may refer to: Linguistics * Expression (linguistics), a word, phrase, or sentence * Fixed expression, a form of words with a specific meaning * Idiom, a type of fixed expression * Metaphorical expression, a particular word, phrase, o ...
is either already a value or can be reduced towards a value in some well-defined way. In other words, the program never gets into an undefined state where no further transitions are possible. ;Preservation (or subject reduction): After each evaluation step, the type of each expression remains the same (that is, its type is ''preserved''). A number of other formal treatments of type soundness have also been published in terms of denotational semantics and
structural operational semantics Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its execut ...
.


Relation to other forms of safety

In isolation, type soundness is a relatively weak property, as it essentially just states that the rules of a type system are internally consistent and cannot be subverted. However, in practice, programming languages are designed so that well-typedness also entails other, stronger properties, some of which include: * Prevention of illegal operations. For example, a type system can reject the expression 3 / "Hello, World" as invalid, because the division operator is not defined for a string
divisor In mathematics, a divisor of an integer n, also called a factor of n, is an integer m that may be multiplied by some integer to produce n. In this case, one also says that n is a multiple of m. An integer n is divisible or evenly divisible by ...
. *
Memory safety safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error ...
** Type systems can prevent
wild pointer Wild, wild, wilds or wild may refer to: Common meanings * Wild animal * Wilderness, a wild natural environment * Wildness, the quality of being wild or untamed Art, media and entertainment Film and television * ''Wild'' (2014 film), a 2014 A ...
s that could otherwise arise from a pointer to one type of object being treated as a pointer to another type. ** More sophisticated type systems, such as those supporting
dependent type In computer science and logic, a dependent type is a type whose definition depends on a value. It is an overlapping feature of type theory and type systems. In intuitionistic type theory, dependent types are used to encode logic's quantifiers lik ...
s, can detect and reject out-of-bound accesses, preventing potential
buffer overflow In information security and programming, a buffer overflow, or buffer overrun, is an anomaly whereby a program, while writing data to a buffer, overruns the buffer's boundary and overwrites adjacent memory locations. Buffers are areas of memo ...
s. *
Logic error In computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming i ...
s originating in the
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comp ...
of different types. For instance, inches and millimeters may both be stored as integers, but should not be substituted for each other or added. A type system can enforce two different types of integer for them.


Type-safe and type-unsafe languages

Type safety is usually a requirement for any toy language (i.e. esoteric language) proposed in academic programming language research. Many languages, on the other hand, are too big for human-generated type safety proofs, as they often require checking thousands of cases. Nevertheless, some languages such as Standard ML, which has rigorously defined semantics, have been proved to meet one definition of type safety. Some other languages such as
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
are ''believed'' to meet some definition of type safety, provided certain "escape" features are not used (for example Haskell's , used to "escape" from the usual restricted environment in which I/O is possible, circumvents the type system and so can be used to break type safety.) Type punning is another example of such an "escape" feature. Regardless of the properties of the language definition, certain errors may occur at run-time due to bugs in the implementation, or in linked
libraries A library is a collection of Document, materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or electronic media, digital access (soft copies) materials, and may be a ...
written in other languages; such errors could render a given implementation type unsafe in certain circumstances. An early version of Sun's Java virtual machine was vulnerable to this sort of problem.


Strong and weak typing

Programming languages are often colloquially classified as strongly typed or weakly typed (also loosely typed) to refer to certain aspects of type safety. In 1974, Liskov and Zilles defined a strongly-typed language as one in which "whenever an object is passed from a calling function to a called function, its type must be compatible with the type declared in the called function." In 1977, Jackson wrote, "In a strongly typed language each data area will have a distinct type and each process will state its communication requirements in terms of these types." In contrast, a weakly typed language may produce unpredictable results or may perform implicit type conversion.


Memory management and type safety

Type safety is closely linked to
memory safety safety is the state of being protected from various software bugs and security vulnerabilities when dealing with memory access, such as buffer overflows and dangling pointers. For example, Java is said to be memory-safe because its runtime error ...
. For instance, in an implementation of a language that has some type t which allows some bit patterns but not others, a
dangling pointer Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references ar ...
memory error allows writing a bit pattern that does not represent a legitimate member of t into a dead
variable Variable may refer to: * Variable (computer science), a symbolic name associated with a value and whose associated value may be changed * Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
of type t, causing a type error when the variable is read. Conversely, if the language is memory-safe, it cannot allow an arbitrary integer to be used as a pointer, hence there must be a separate pointer or reference type. As a minimal condition, a type-safe language must not allow
dangling pointer Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references ar ...
s across allocations of different types. But most languages enforce the proper use of
abstract data type In computer science, an abstract data type (ADT) is a mathematical model for data types. An abstract data type is defined by its behavior (semantics) from the point of view of a ''user'', of the data, specifically in terms of possible values, pos ...
s defined by programmers even when this is not strictly necessary for memory safety or for the prevention of any kind of catastrophic failure. Allocations are given a type describing its contents, and this type is fixed for the duration of the allocation. This allows type-based
alias analysis Alias may refer to: * Pseudonym * Pen name * Nickname Arts and entertainment Film and television * ''Alias'' (2013 film), a 2013 Canadian documentary film * ''Alias'' (TV series), an American action thriller series 2001–2006 * ''Alias the J ...
to infer that allocations of different types are distinct. Most type-safe languages use
garbage collection Waste collection is a part of the process of waste management. It is the transfer of solid waste from the point of use and disposal to the point of treatment or landfill. Waste collection also includes the curbside collection of recyclabl ...
. Pierce says, "it is extremely difficult to achieve type safety in the presence of an explicit deallocation operation", due to the dangling pointer problem. However
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO( ...
is generally considered type-safe and uses a borrow checker to achieve memory safety, instead of garbage collection.


Type safety in object oriented languages

In
object oriented Object-oriented programming (OOP) is a programming paradigm based on the concept of " objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of p ...
languages type safety is usually intrinsic in the fact that a
type system In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progr ...
is in place. This is expressed in terms of class definitions. A class essentially defines the structure of the objects derived from it and an
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
as a ''contract'' for handling these objects. Each time a new object is created it will ''comply'' with that contract. Each function that exchanges objects derived from a specific class, or implementing a specific
interface Interface or interfacing may refer to: Academic journals * ''Interface'' (journal), by the Electrochemical Society * '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics'' * '' Int ...
, will adhere to that contract: hence in that function the operations permitted on that object will be only those defined by the methods of the class the object implements. This will guarantee that the object integrity will be preserved. Exceptions to this are object oriented languages that allow dynamic modification of the object structure, or the use of
reflection Reflection or reflexion may refer to: Science and technology * Reflection (physics), a common wave phenomenon ** Specular reflection, reflection from a smooth surface *** Mirror image, a reflection in a mirror or in water ** Signal reflection, in ...
to modify the content of an object to overcome the constraints imposed by the class methods definitions.


Type safety issues in specific languages


Ada

Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, ...
was designed to be suitable for
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
s, device drivers and other forms of
system programming Systems programming, or system programming, is the activity of programming computer system software. The primary distinguishing characteristic of systems programming when compared to application programming is that application programming aims to pr ...
, but also to encourage type-safe programming. To resolve these conflicting goals, Ada confines type-unsafety to a certain set of special constructs whose names usually begin with the string . Unchecked_Deallocation can be effectively banned from a unit of Ada text by applying to this unit. It is expected that programmers will use constructs very carefully and only when necessary; programs that do not use them are type-safe. The SPARK programming language is a subset of Ada eliminating all its potential ambiguities and insecurities while at the same time adding statically checked contracts to the language features available. SPARK avoids the issues with
dangling pointer Dangling pointers and wild pointers in computer programming are pointers that do not point to a valid object of the appropriate type. These are special cases of memory safety violations. More generally, dangling references and wild references ar ...
s by disallowing allocation at run time entirely. Ada2012 adds statically checked contracts to the language itself (in form of pre-, and post-conditions, as well as type invariants).


C

The C programming language is type-safe in limited contexts; for example, a compile-time error is generated when an attempt is made to convert a pointer to one type of structure to a pointer to another type of structure, unless an explicit cast is used. However, a number of very common operations are non-type-safe; for example, the usual way to print an integer is something like printf("%d", 12), where the %d tells printf at run-time to expect an integer argument. (Something like printf("%s", 12), which tells the function to expect a pointer to a character-string and yet supplies an integer argument, may be accepted by compilers, but will produce undefined results.) This is partially mitigated by some compilers (such as gcc) checking type correspondences between printf arguments and format strings. In addition, C, like Ada, provides unspecified or undefined explicit conversions; and unlike in Ada, idioms that use these conversions are very common, and have helped to give C a type-unsafe reputation. For example, the standard way to allocate memory on the heap is to invoke a memory allocation function, such as
malloc C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely , , , and . The C++ programming language includes t ...
, with an argument indicating how many bytes are required. The function returns an untyped pointer (type void *), which the calling code must explicitly or implicitly cast to the appropriate pointer type. Pre-standardized implementations of C required an explicit cast to do so, therefore the code (struct foo *) malloc(
sizeof sizeof is a unary operator in the programming languages C and C++. It generates the storage size of an expression or a data type, measured in the number of ''char''-sized units. Consequently, the construct ''sizeof (char)'' is guaranteed to be ' ...
(struct foo))
became the accepted practice.


C++

Some features of C++ that promote more type-safe code: * The
new New is an adjective referring to something recently made, discovered, or created. New or NEW may refer to: Music * New, singer of K-pop group The Boyz Albums and EPs * ''New'' (album), by Paul McCartney, 2013 * ''New'' (EP), by Regurgitator ...
operator returns a pointer of type based on operand, whereas
malloc C dynamic memory allocation refers to performing manual memory management for dynamic memory allocation in the C programming language via a group of functions in the C standard library, namely , , , and . The C++ programming language includes t ...
returns a void pointer. * C++ code can use virtual functions and
templates Template may refer to: Tools * Die (manufacturing), used to cut or shape material * Mold, in a molding process * Stencil, a pattern or overlay used in graphic arts (drawing, painting, etc.) and sewing to replicate letters, shapes or designs Co ...
to achieve polymorphism without void pointers. * Safer casting operators, such as
dynamic cast In computer programming, run-time type information or run-time type identification (RTTI) is a feature of some programming languages (such as C++, Object Pascal, and Ada) that exposes information about an object's data type at runtime. Run-time ty ...
that performs run-time type checking. * C++11 strongly-typed enumerations cannot be implicitly converted to or from integers or other enumeration types. * C++ explicit constructors and C++11 explicit conversion operators prevent implicit type conversions.


C#

C# is type-safe (but not statically type-safe). It has support for untyped pointers, but this must be accessed using the "unsafe" keyword which can be prohibited at the compiler level. It has inherent support for run-time cast validation. Casts can be validated by using the "as" keyword that will return a null reference if the cast is invalid, or by using a C-style cast that will throw an exception if the cast is invalid. See C Sharp conversion operators. Undue reliance on the
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
type (from which all other types are derived) runs the risk of defeating the purpose of the C# type system. It is usually better practice to abandon object references in favour of generics, similar to templates in C++ and
generics in Java Generics are a facility of generic programming that were added to the Java programming language in 2004 within version J2SE 5.0. They were designed to extend Java's type system to allow "a type or method to operate on objects of various types whil ...
.


Java

The
Java language Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers ''write once, run any ...
is designed to enforce type safety. Anything in Java ''happens'' inside an
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
and each object is an instance of a class. To implement the ''type safety'' enforcement, each object, before usage, needs to be allocated. Java allows usage of
primitive types In computer science, primitive data types are a set of basic data types from which all other data types are constructed. Specifically it often refers to the limited set of data representations in use by a particular processor, which all compiled pr ...
but only inside properly allocated objects. Sometimes a part of the type safety is implemented indirectly: e.g. the class BigDecimal represents a floating point number of arbitrary precision, but handles only numbers that can be expressed with a finite representation. The operation BigDecimal.divide() calculates a new object as the division of two numbers expressed as BigDecimal. In this case if the division has no finite representation, as when one computes e.g. 1/3=0.33333..., the divide() method can raise an exception if no rounding mode is defined for the operation. Hence the library, rather than the language, guarantees that the object respects the contract implicit in the class definition.


Standard ML

Standard ML has rigorously defined semantics and is known to be type-safe. However, some implementations, including Standard ML of New Jersey (SML/NJ), its syntactic variant Mythryl and MLton, provide libraries that offer unsafe operations. These facilities are often used in conjunction with those implementations'
foreign function interface A foreign function interface (FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written in another. Naming The term comes from the specification for Common Lisp, which explicit ...
s to interact with non-ML code (such as C libraries) that may require data laid out in specific ways. Another example is the SML/NJ interactive toplevel itself, which must use unsafe operations to execute ML code entered by the user.


Modula-2

Modula-2 is a strongly-typed language with a design philosophy to require any unsafe facilities to be explicitly marked as unsafe. This is achieved by "moving" such facilities into a built-in pseudo-library called SYSTEM from where they must be imported before they can be used. The import thus makes it visible when such facilities are used. Unfortunately, this was not consequently implemented in the original language report and its implementation. There still remained unsafe facilities such as the type cast syntax and variant records (inherited from Pascal) that could be used without prior import. The difficulty in moving these facilities into the SYSTEM pseudo-module was the lack of any identifier for the facility that could then be imported since only identifiers can be imported, but not syntax. IMPORT SYSTEM; (* allows the use of certain unsafe facilities: *) VAR word : SYSTEM.WORD; addr : SYSTEM.ADDRESS; addr := SYSTEM.ADR(word); (* but type cast syntax can be used without such import *) VAR i : INTEGER; n : CARDINAL; n := CARDINAL(i); (* or *) i := INTEGER(n); The ISO Modula-2 standard corrected this for the type cast facility by changing the type cast syntax into a function called CAST which has to be imported from pseudo-module SYSTEM. However, other unsafe facilities such as variant records remained available without any import from pseudo-module SYSTEM. IMPORT SYSTEM; VAR i : INTEGER; n : CARDINAL; i := SYSTEM.CAST(INTEGER, n); (* Type cast in ISO Modula-2 *) A recent revision of the language applied the original design philosophy rigorously. First, pseudo-module SYSTEM was renamed to UNSAFE to make the unsafe nature of facilities imported from there more explicit. Then all remaining unsafe facilities where either removed altogether (for example variant records) or moved to pseudo-module UNSAFE. For facilities where there is no identifier that could be imported, enabling identifiers were introduced. In order to enable such a facility, its corresponding enabling identifier must be imported from pseudo-module UNSAFE. No unsafe facilities remain in the language that do not require import from UNSAFE. IMPORT UNSAFE; VAR i : INTEGER; n : CARDINAL; i := UNSAFE.CAST(INTEGER, n); (* Type cast in Modula-2 Revision 2010 *) FROM UNSAFE IMPORT FFI; (* enabling identifier for foreign function interface facility *) <*FFI="C"*> (* pragma for foreign function interface to C *)


Pascal

Pascal has had a number of type safety requirements, some of which are kept in some compilers. Where a Pascal compiler dictates "strict typing", two variables cannot be assigned to each other unless they are either compatible (such as conversion of integer to real) or assigned to the identical subtype. For example, if you have the following code fragment: type TwoTypes = record I: Integer; Q: Real; end; DualTypes = record I: Integer; Q: Real; end; var T1, T2: TwoTypes; D1, D2: DualTypes; Under strict typing, a variable defined as is ''not compatible'' with (because they are not identical, even though the components of that user defined type are identical) and an assignment of T1 := D2; is illegal. An assignment of T1 := T2; would be legal because the subtypes they are defined to ''are'' identical. However, an assignment such as T1.Q := D1.Q; would be legal.


Common Lisp

In general, Common Lisp is a type-safe language. A Common Lisp compiler is responsible for inserting dynamic checks for operations whose type safety cannot be proven statically. However, a programmer may indicate that a program should be compiled with a lower level of dynamic type-checking. A program compiled in such a mode cannot be considered type-safe.


C++ examples

The following examples illustrates how C++ cast operators can break type safety when used incorrectly. The first example shows how basic data types can be incorrectly cast: #include using namespace std; int main () In this example, reinterpret_cast explicitly prevents the compiler from performing a safe conversion from integer to floating-point value. When the program runs it will output a garbage floating-point value. The problem could have been avoided by instead writing float fval = ival; The next example shows how object references can be incorrectly downcast: #include using namespace std; class Parent ; class Child1 : public Parent ; class Child2 : public Parent ; int main () The two child classes have members of different types. When downcasting a parent class pointer to a child class pointer, then the resulting pointer may not point to a valid object of correct type. In the example, this leads to garbage value being printed. The problem could have been avoided by replacing static_cast with dynamic_cast that throws an exception on invalid casts.


See also

*
Type theory In mathematics, logic, and computer science, a type theory is the formal presentation of a specific type system, and in general type theory is the academic study of type systems. Some type theories serve as alternatives to set theory as a fou ...


Notes


References

* * * * {{DEFAULTSORT:Type Safety Programming language topics Type theory Articles with example Pascal code