computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...

, a reference is a value that enables a program to indirectly access a particular

datum Data ( , ) are a collection of discrete or continuous value (semiotics), values that convey information, describing the quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols t ...

, such as a variable's value or a record, in the

computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...

or in some other storage device. The reference is said to refer to the datum, and accessing the datum is called dereferencing the reference. A reference is distinct from the datum itself. A reference is an abstract data type and may be implemented in many ways. Typically, a reference refers to data stored in memory on a given system, and its internal value is the memory address of the data, i.e. a reference is implemented as a pointer. For this reason a reference is often said to "point to" the data. Other implementations include an offset (difference) between the datum's address and some fixed "base" address, an

index Index (: indexes or indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on the Halo Array in the ...

, or

identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...

used in a lookup operation into an array or table, an operating system

handle A handle is a part of, or an attachment to, an object that allows it to be grasped and object manipulation, manipulated by hand. The design of each type of handle involves substantial ergonomics, ergonomic issues, even where these are dealt wi ...

, a physical address on a storage device, or a network address such as a URL.

Formal representation

A reference ''R'' is a value that admits one operation, dereference(''R''), which yields a value. Usually the reference is typed so that it returns values of a specific type, e.g.: interface Reference Often the reference also admits an assignment operation store(''R'', ''x''), meaning it is an abstract variable.

Use

References are widely used in programming, especially to efficiently pass large or mutable data as

arguments An argument is a series of sentences, statements, or propositions some of which are called premises and one is the conclusion. The purpose of an argument is to give reasons for one's conclusion via justification, explanation, and/or persua ...

to procedures, or to share such data among various uses. In particular, a reference may point to a variable or record that contains references to other data. This idea is the basis of indirect addressing and of many linked data structures, such as

linked list In computer science, a linked list is a linear collection of data elements whose order is not given by their physical placement in memory. Instead, each element points to the next. It is a data structure consisting of a collection of nodes whi ...

s. References increase flexibility in where objects can be stored, how they are allocated, and how they are passed between areas of

code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...

. As long as one can access a reference to the data, one can access the data through it, and the data itself need not be moved. They also make sharing of data between different code areas easier; each keeps a reference to it. References can cause significant complexity in a program, partially due to the possibility of dangling and wild references and partially because the

topology Topology (from the Greek language, Greek words , and ) is the branch of mathematics concerned with the properties of a Mathematical object, geometric object that are preserved under Continuous function, continuous Deformation theory, deformat ...

of data with references is a directed graph, whose analysis can be quite complicated. Nonetheless, references are still simpler to analyze than pointers due to the absence of pointer arithmetic. The mechanism of references, if varying in implementation, is a fundamental programming language feature common to nearly all modern programming languages. Even some languages that support no direct use of references have some internal or implicit use. For example, the

call by reference In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the ...

calling convention can be implemented with either explicit or implicit use of references.

Examples

Pointers are the most primitive type of reference. Due to their intimate relationship with the underlying hardware, they are one of the most powerful and efficient types of references. However, also due to this relationship, pointers require a strong understanding by the programmer of the details of memory architecture. Because pointers store a memory location's address, instead of a value directly, inappropriate use of pointers can lead to undefined behavior in a program, particularly due to dangling pointers or wild pointers. Smart pointers are opaque data structures that act like pointers but can only be accessed through particular methods. A

is an abstract reference, and may be represented in various ways. A common example are file handles (the FILE data structure in the C standard I/O library), used to abstract file content. It usually represents both the file itself, as when requesting a

lock Lock(s) or Locked may refer to: Common meanings *Lock and key, a mechanical device used to secure items of importance *Lock (water navigation), a device for boats to transit between different levels of water, as in a canal Arts and entertainme ...

on the file, and a specific position within the file's content, as when reading a file. In

distributed computing Distributed computing is a field of computer science that studies distributed systems, defined as computer systems whose inter-communicating components are located on different networked computers. The components of a distributed system commu ...

, the reference may contain more than an address or identifier; it may also include an embedded specification of the network protocols used to locate and access the referenced object, the way information is encoded or serialized. Thus, for example, a WSDL description of a remote web service can be viewed as a form of reference; it includes a complete specification of how to locate and bind to a particular

web service A web service (WS) is either: * a service offered by an electronic device to another electronic device, communicating with each other via the Internet, or * a server running on a computer device, listening for requests at a particular port over a n ...

. A reference to a live distributed object is another example: it is a complete specification for how to construct a small software component called a ''proxy'' that will subsequently engage in a peer-to-peer interaction, and through which the local machine may gain access to data that is replicated or exists only as a weakly consistent message stream. In all these cases, the reference includes the full set of instructions, or a recipe, for how to access the data; in this sense, it serves the same purpose as an identifier or address in memory. If we have a set of keys ''K'' and a set of data objects ''D'', any well-defined (single-valued) function from ''K'' to ''D'' ∪ defines a type of reference, where ''null'' is the image of a key not referring to anything meaningful. An alternative representation of such a function is a directed graph called a reachability graph. Here, each datum is represented by a vertex and there is an edge from ''u'' to ''v'' if the datum in ''u'' refers to the datum in ''v''. The maximum out-degree is one. These graphs are valuable in garbage collection, where they can be used to separate accessible from inaccessible objects.

External and internal storage

In many data structures, large, complex objects are composed of smaller objects. These objects are typically stored in one of two ways: # With internal storage, the contents of the smaller object are stored inside the larger object. # With external storage, the smaller objects are allocated in their own location, and the larger object only stores references to them. Internal storage is usually more efficient, because there is a space cost for the references and dynamic allocation metadata, and a time cost associated with dereferencing a reference and with allocating the memory for the smaller objects. Internal storage also enhances

locality of reference In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...

by keeping different parts of the same large object close together in memory. However, there are a variety of situations in which external storage is preferred: * If the data structure is recursive, meaning it may contain itself. This cannot be represented in the internal way. * If the larger object is being stored in an area with limited space, such as the stack, then we can prevent running out of storage by storing large component objects in another memory region and referring to them using references. * If the smaller objects may vary in size, it is often inconvenient or expensive to resize the larger object so that it can still contain them. * References are often easier to work with and adapt better to new requirements. Some languages, such as

Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...

Smalltalk Smalltalk is a purely object oriented programming language (OOP) that was originally created in the 1970s for educational use, specifically for constructionist learning, but later found use in business. It was created at Xerox PARC by Learni ...

, Python, and Scheme, do not support internal storage. In these languages, all objects are uniformly accessed through references.

Language support

Assembly

assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...

, it is typical to express references using either raw memory addresses or indexes into tables. These work, but are somewhat tricky to use, because an address tells you nothing about the value it points to, not even how large it is or how to interpret it; such information is encoded in the program logic. The result is that misinterpretations can occur in incorrect programs, causing bewildering errors.

Lisp

One of the earliest opaque references was that of the

Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...

language cons cell, which is simply a record containing two references to other Lisp objects, including possibly other cons cells. This simple structure is most commonly used to build singly

s, but can also be used to build simple

binary tree In computer science, a binary tree is a tree data structure in which each node has at most two children, referred to as the ''left child'' and the ''right child''. That is, it is a ''k''-ary tree with . A recursive definition using set theor ...

s and so-called "dotted lists", which terminate not with a null reference but a value.

C/C++

The pointer is still one of the most popular types of references today. It is similar to the assembly representation of a raw address, except that it carries a static

datatype In computer science and computer programming, a data type (or simply type) is a collection or grouping of data values, usually specified by a set of possible values, a set of allowed operations on these values, and/or a representation of these ...

which can be used at compile-time to ensure that the data it refers to is not misinterpreted. However, because C has a weak type system which can be violated using casts (explicit conversions between various pointer types and between pointer types and integers), misinterpretation is still possible, if more difficult. Its successor C++ tried to increase type safety of pointers with new cast operators, a reference type &, and smart pointers in its standard library, but still retained the ability to circumvent these safety mechanisms for compatibility.

Fortran

Fortran does not have an explicit representation of references, but does use them implicitly in its call-by-reference calling semantics. A Fortran reference is best thought of as an ''alias'' of another object, such as a scalar variable or a row or column of an array. There is no syntax to dereference the reference or manipulate the contents of the referent directly. Fortran references can be null. As in other languages, these references facilitate the processing of dynamic structures, such as linked lists, queues, and trees.

Object-oriented languages

A number of object-oriented languages such as Eiffel,

, C#, and

Visual Basic Visual Basic is a name for a family of programming languages from Microsoft. It may refer to: * Visual Basic (.NET), the current version of Visual Basic launched in 2002 which runs on .NET * Visual Basic (classic), the original Visual Basic suppo ...

have adopted a much more opaque type of reference, usually referred to as simply a ''reference''. These references have types like C pointers indicating how to interpret the data they reference, but they are typesafe in that they cannot be interpreted as a raw address and unsafe conversions are not permitted. References are extensively used to access and assign objects. References are also used in function/ method calls or message passing, and reference counts are frequently used to perform garbage collection of unused objects.

Functional languages

Standard ML Standard ML (SML) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Modular programming, modular, Functional programming, functional programming language with compile-time type checking and t ...

OCaml OCaml ( , formerly Objective Caml) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Comparison of multi-paradigm programming languages, multi-paradigm programming language which extends the ...

, and many other functional languages, most values are persistent: they cannot be modified by assignment. Assignable "reference cells" provide mutable variables, data that can be modified. Such reference cells can hold any value, and so are given the polymorphic type α ref, where α is to be replaced with the type of value pointed to. These mutable references can be pointed to different objects over their lifetime. For example, this permits building of circular data structures. The reference cell is functionally equivalent to a mutable array of length 1. To preserve safety and efficient implementations, references cannot be type-cast in ML, nor can pointer arithmetic be performed. In the functional paradigm, many structures that would be represented using pointers in a language like C are represented using other facilities, such as the powerful algebraic datatype mechanism. The programmer is then able to enjoy certain properties (such as the guarantee of immutability) while programming, even though the compiler often uses machine pointers "under the hood".

Perl/PHP

Perl Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language". Perl was developed ...

supports hard references, which function similarly to those in other languages, and symbolic references, which are just string values that contain the names of variables. When a value that is not a hard reference is dereferenced, Perl considers it to be a symbolic reference and gives the variable with the name given by the value. PHP has a similar feature in the form of its $$var syntax.

References

External links

Pointer Fun With Binky
Introduction to pointers in a 3-minute educational video – Stanford Computer Science Education Library {{Web syndication Data types Programming language concepts Primitive types