Tagged Pointer
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, a tagged pointer is a pointer (concretely a
memory address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Su ...
) with additional data associated with it, such as an
indirection bit Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...
or
reference count In computer science, reference counting is a programming technique of storing the number of references, pointers, or handles to a resource, such as an object, a block of memory, disk space, and others. In garbage collection algorithms, referen ...
. This additional data is often "folded" into the pointer, meaning stored inline in the data representing the address, taking advantage of certain properties of memory addressing. The name comes from "
tagged architecture In computer science, a tagged architecture is a particular type of computer architecture where every word of memory constitutes a tagged union, being divided into a number of bits of data, and a ''tag'' section that describes the type of the data: ...
" systems, which reserved bits at the hardware level to indicate the significance of each word; the additional data is called a "tag" or "tags", though strictly speaking "tag" refers to data specifying a ''type,'' not other data; however, the usage "tagged pointer" is ubiquitous.


Folding tags into the pointer

There are various techniques for folding tags into a pointer. Most architectures are
byte-addressable Byte addressing in hardware architectures supports accessing individual bytes. Computers with byte addressing are sometimes called ''byte machines,'' in contrast to ''word-addressable'' architectures, ''word machines'', that access data by word. ...
(the smallest addressable unit is a byte), but certain types of data will often be '' aligned'' to the size of the data, often a
word A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...
or multiple thereof. This discrepancy leaves a few of the least significant bits of the pointer unused, which can be used for tags – most often as a
bit field A bit field is a data structure that consists of one or more adjacent bits which have been allocated for specific purposes, so that any single bit or group of bits within the structure can be set or inspected. A bit field is most commonly used to r ...
(each bit a separate tag) – as long as code that uses the pointer masks out these bits before accessing memory. E.g., on a
32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in 32-bit units. Compared to smaller bit widths, 32-bit computers can perform large calculation ...
architecture (for both addresses and word size), a word is 32 bits = 4 bytes, so word-aligned addresses are always a multiple of 4, hence end in 00, leaving the last 2 bits available; while on a
64-bit In computer architecture, 64-bit Integer (computer science), integers, memory addresses, or other Data (computing), data units are those that are 64 bits wide. Also, 64-bit central processing unit, CPUs and arithmetic logic unit, ALUs are those ...
architecture, a word is 64 bits = 8 bytes, so word-aligned addresses end in 000, leaving the last 3 bits available. In cases where data is aligned at a multiple of word size, further bits are available. In case of
word-addressable In computer architecture, ''word addressing'' means that addresses of memory on a computer uniquely identify Word (computer architecture), words of memory. It is usually used in contrast with byte addressing, where addresses uniquely identify byt ...
architectures, word-aligned data does not leave any bits available, as there is no discrepancy between alignment and addressing, but data aligned at a multiple of word size does. Conversely, in some operating systems,
virtual address In computing, a virtual address space (VAS) or address space is the set of ranges of virtual addresses that an operating system makes available to a process. The range of virtual addresses usually starts at a low address and can extend to the hig ...
es are narrower than the overall architecture width, which leaves the most significant bits available for tags; this can be combined with the previous technique in case of aligned addresses. This is particularly the case on 64-bit architectures, as 64 bits of address space are far above the data requirements of all but the largest applications, and thus many practical 64-bit processors have narrower addresses. Note that the virtual address width may be narrower than the
physical address In computing, a physical address (also real address, or binary address), is a memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a ''particular'' storage cell o ...
width, which in turn may be narrower than the architecture width; for tagging of pointers in
user space A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour. Kernel ...
, the virtual address space provided by the operating system (in turn provided by the
memory management unit A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a computer hardware unit having all memory references passed through itself, primarily performing the translation of virtual memory addresses to physical ad ...
) is the relevant width. In fact, some processors specifically forbid use of such tagged pointers at the processor level, notably
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
, which requires the use of canonical form addresses by the operating system, with most significant bits all 0s or all 1s. Lastly, the
virtual memory In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very l ...
system in most modern
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s reserves a block of logical memory around address 0 as unusable. This means that, for example, a pointer to 0 is never a valid pointer and can be used as a special
null pointer In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown lengt ...
value. Unlike the previously mentioned techniques, this only allows a single special pointer value, not extra data for pointers generally.


Examples

One of the earliest examples of hardware support for tagged pointers in a commercial platform was the
IBM System/38 The System/38 is a discontinued minicomputer and midrange computer manufactured and sold by IBM. The system was announced in 1978. The System/38 has 48-bit addressing, which was unique for the time, and a novel integrated database system. It w ...
. IBM later added tagged pointer support to the
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
architecture to support the
IBM i IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS in ...
operating system, which is an evolution of the System/38 platform. A significant example of the use of tagged pointers is the
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
runtime on
iOS 7 iOS 7 is the seventh major release of the iOS mobile operating system developed by Apple Inc., being the successor to iOS 6. It was announced at the company's Worldwide Developers Conference on June 10, 2013, and was released on September 18 o ...
on
ARM64 AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
, notably used on the
iPhone 5S The iPhone 5S (stylized and marketed as iPhone 5s) is a smartphone that was designed and marketed by Apple Inc. It is the seventh generation of the iPhone, succeeding the iPhone 5, and unveiled in September 2013, alongside the iPhone 5C. Th ...
. In iOS 7, virtual addresses only contain 33 bits of address information but are 64-bits long leaving 31 bits for tags. Objective-C class pointers are 8-byte aligned freeing up an additional 3 bits of address space, and the tag fields are used for many purposes, such as storing a reference count and whether the object has a destructor.Friday Q&A 2013-09-27: ARM64 and You
by Mike Ash

Non-pointer isa]
Early versions of MacOS used tagged addresses called Handles to store references to data objects. The high bits of the address indicated whether the data object was locked, purgeable, and/or originated from a resource file, respectively. This caused compatibility problems when MacOS addressing advanced from 24 bits to 32 bits in System 7.


Null versus aligned pointer

Use of zero to represent a null pointer is extremely common, with many programming languages (such as Ada programming language, Ada) explicitly relying on this behavior. In theory, other values in an operating system-reserved block of logical memory could be used to tag conditions other than a null pointer, but these uses appear to be rare, perhaps because they are at best non-portable. It is generally accepted practice in software design that if a special pointer value distinct from null (such as a
sentinel Sentinel may refer to: Places Mountains * Mount Sentinel, a mountain next to the University of Montana in Missoula, Montana * Sentinel Buttress, a volcanic crag on James Ross Island, Antarctica * Sentinel Dome, a naturally occurring grani ...
in certain
data structure In computer science, a data structure is a data organization, management, and storage format that is usually chosen for efficient access to data. More precisely, a data structure is a collection of data values, the relationships among them, a ...
s) is needed, the programmer should explicitly provide for it. Taking advantage of the alignment of pointers provides more flexibility than null pointers/sentinels because it allows pointers to be tagged with information about the type of data pointed to, conditions under which it may be accessed, or other similar information about the pointer's use. This information can be provided along with every valid pointer. In contrast, null pointers/sentinels provide only a finite number of tagged values distinct from valid pointers. In a
tagged architecture In computer science, a tagged architecture is a particular type of computer architecture where every word of memory constitutes a tagged union, being divided into a number of bits of data, and a ''tag'' section that describes the type of the data: ...
, a number of bits in every word of memory are reserved to act as a tag. Tagged architectures, such as the
Lisp machine Lisp machines are general-purpose computers designed to efficiently run Lisp as their main software and programming language, usually via hardware support. They are an example of a high-level language computer architecture, and in a sense, the ...
s, often have hardware support for interpreting and processing tagged pointers.
GNU libc The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s by t ...
malloc() provides 8-byte aligned memory addresses for 32-bit platforms, and 16-byte alignment for 64-bit platforms. Larger alignment values can be obtained with posix_memalign().


Examples


Example 1

In the following C code, the value of zero is used to indicate a null pointer: void optionally_return_a_value (int* optional_return_value_pointer)


Example 2

Here, the programmer has provided a global variable, whose address is then used as a sentinel: #define SENTINEL &sentinel_s node_t sentinel_s; void do_something_to_a_node (node_t * p)


Example 3

Assume we have a data structure table_entry that is always aligned to a 16 byte boundary. In other words, the least significant 4 bits of a table entry's address are always 0 We could use these 4 bits to mark the table entry with extra information. For example, bit 0 might mean read only, bit 1 might mean dirty (the table entry needs to be updated), and so on. If pointers are 16-bit values, then: * 0x3421 is a read-only pointer to the table_entry at address 0x3420 * 0xf472 is a pointer to a dirty table_entry at address 0xf470


Advantages

The major advantage of tagged pointers is that they take up less space than a pointer along with a separate tag field. This can be especially important when a pointer is a return value from a
function Function or functionality may refer to: Computing * Function key, a type of key on computer keyboards * Function model, a structured representation of processes in a system * Function object or functor or functionoid, a concept of object-oriente ...
. It can also be important in large tables of pointers. A more subtle advantage is that by storing a tag in the same place as the pointer, it is often possible to guarantee the atomicity of an operation that updates both the pointer and its tag without external
synchronization Synchronization is the coordination of events to operate a system in unison. For example, the conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are said to be synchronou ...
mechanisms. This can be an extremely large performance gain, especially in operating systems.


Disadvantages

Tagged pointers have some of the same difficulties as xor linked lists, although to a lesser extent. For example, not all
debugger A debugger or debugging tool is a computer program used to software testing, test and debugging, debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the pr ...
s will be able to properly follow tagged pointers; however, this is not an issue for a debugger that is designed with tagged pointers in mind. The use of zero to represent a null pointer does not suffer from these disadvantages: it is pervasive, most programming languages treat zero as a special null value, and it has thoroughly proven its robustness. An exception is the way that zero participates in
overload resolution In some programming languages, function overloading or method overloading is the ability to create multiple functions of the same name with different implementations. Calls to an overloaded function will run a specific implementation of that f ...
in C++, where zero is treated as an integer rather than a pointer; for this reason the special value
nullptr C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions b ...
is preferred over the integer zero. However, with tagged pointers zeros are usually not used to represent null pointers.


References

{{DEFAULTSORT:Tagged Pointer Programming constructs