In
computer programming
Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, a thunk is a
subroutine used to inject a calculation into another subroutine. Thunks are primarily used to delay a calculation until its result is needed, or to insert operations at the beginning or end of the other subroutine. They have many other applications in
compiler code generation and
modular programming
Modular programming is a software design technique that emphasizes separating the functionality of a Computer program, program into independent, interchangeable modules, such that each contains everything necessary to execute only one aspect of th ...
.
The term originated as a whimsical
irregular form of the verb ''think''. It refers to the original use of thunks in
ALGOL 60 compilers, which required special analysis (thought) to determine what type of routine to generate.
Background
The early years of
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
research saw broad experimentation with different
evaluation strategies. A key question was how to compile a subroutine call if the arguments can be arbitrary mathematical expressions rather than constants. One approach, known as "
call by value
In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
", calculates all of the arguments before the call and then passes the resulting values to the subroutine. In the rival "
call by name
In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
" approach, the subroutine receives the unevaluated argument expression and must evaluate it.
A simple implementation of "call by name" might substitute the code of an argument expression for each appearance of the corresponding parameter in the subroutine, but this can produce multiple versions of the subroutine and multiple copies of the expression code. As an improvement, the compiler can generate a helper subroutine, called a ''thunk'', that calculates the value of the argument. The address and environment of this helper subroutine are then passed to the original subroutine in place of the original argument, where it can be called as many times as needed. Peter Ingerman first described thunks in reference to the ALGOL 60 programming language, which supports call-by-name evaluation.
Applications
Functional programming
Although the software industry largely standardized on call-by-value and
call-by-reference
In a programming language, an evaluation strategy is a set of rules for evaluating expressions. The term is often used to refer to the more specific notion of a ''parameter-passing strategy'' that defines the kind of value that is passed to the f ...
evaluation, active study of call-by-name continued in the
functional programming
In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
community. This research produced a series of
lazy evaluation
In programming language theory, lazy evaluation, or call-by-need, is an evaluation strategy which delays the evaluation of an expression until its value is needed ( non-strict evaluation) and which also avoids repeated evaluations (sharing).
The ...
programming languages in which some variant of call-by-name is the standard evaluation strategy. Compilers for these languages, such as the
Glasgow Haskell Compiler
The Glasgow Haskell Compiler (GHC) is an open-source native code compiler for the functional programming language Haskell.
It provides a cross-platform environment for the writing and testing of Haskell code and it supports numerous extensions, ...
, have relied heavily on thunks, with the added feature that the thunks save their initial result so that they can avoid recalculating it; this is known as
memoization or
call-by-need.
Functional programming languages have also allowed programmers to explicitly generate thunks. This is done in
source code
In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
by wrapping an argument expression in an
anonymous function that has no parameters of its own. This prevents the expression from being evaluated until a receiving function calls the anonymous function, thereby achieving the same effect as call-by-name. The adoption of anonymous functions into other programming languages has made this capability widely available.
The following is a simple demonstration in JavaScript (ES6):
// 'hypot' is a binary function
const hypot = (x, y) => Math.sqrt(x * x + y * y);
// 'thunk' is a function that takes no arguments and, when invoked, performs a potentially expensive
// operation (computing a square root, in this example) and/or causes some side-effect to occur
const thunk = () => hypot(3, 4);
// the thunk can then be passed around without being evaluated...
doSomethingWithThunk(thunk);
// ...or evaluated
thunk(); // 5
Object-oriented programming
Thunks are useful in
object-oriented programming
Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of ...
platforms that allow a
class
Class or The Class may refer to:
Common uses not otherwise categorized
* Class (biology), a taxonomic rank
* Class (knowledge representation), a collection of individuals or objects
* Class (philosophy), an analytical concept used differentl ...
to
inherit multiple interfaces, leading to situations where the same
method
Method ( grc, μέθοδος, methodos) literally means a pursuit of knowledge, investigation, mode of prosecuting such inquiry, or system. In recent centuries it more often means a prescribed process for completing a task. It may refer to:
*Scien ...
might be called via any of several interfaces. The following code illustrates such a situation in
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
.
class A ;
class B ;
class C : public A, public B ;
int use(B *b)
int main()
In this example, the code generated for each of the classes A, B and C will include a
dispatch table
In computer science, a dispatch table is a table of pointers or memory addresses to functions or methods. Use of such a table is a common technique when implementing late binding in object-oriented programming.
Perl implementation
The followi ...
that can be used to call on an object of that type, via a reference that has the same type. Class C will have an additional dispatch table, used to call on an object of type C via a reference of type B. The expression will use B's own dispatch table or the additional C table, depending on the type of object b refers to. If it refers to an object of type C, the compiler must ensure that C's implementation receives an
instance address for the entire C object, rather than the inherited B part of that object.
As a direct approach to this pointer adjustment problem, the compiler can include an integer offset in each dispatch table entry. This offset is the difference between the reference's address and the address required by the method implementation. The code generated for each call through these dispatch tables must then retrieve the offset and use it to adjust the instance address before calling the method.
The solution just described has problems similar to the naïve implementation of call-by-name described earlier: the compiler generates several copies of code to calculate an argument (the instance address), while also increasing the dispatch table sizes to hold the offsets. As an alternative, the compiler can generate an ''adjustor thunk'' along with C's implementation of that adjusts the instance address by the required amount and then calls the method. The thunk can appear in C's dispatch table for B, thereby eliminating the need for callers to adjust the address themselves.
Numerical calculations requiring evaluations at multiple points
Routines for computations such as integration need to calculate an expression at multiple points. Call by name was used for this purpose in languages that didn't support
closures or
procedure parameters.
Interoperability
Thunks have been widely used to provide interoperability between software modules whose routines cannot call each other directly. This may occur because the routines have different
calling convention
In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have bee ...
s, run in different
CPU modes
CPU modes (also called ''processor modes,'' ''CPU states,'' ''CPU privilege levels'' and other names) are operating modes for the central processing unit of some computer architectures that place restrictions on the type and scope of operations t ...
or
address spaces, or at least one runs in a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
. A compiler (or other tool) can solve this problem by generating a thunk that automates the additional steps needed to call the target routine, whether that is transforming arguments, copying them to another location, or switching the CPU mode. A successful thunk minimizes the extra work the caller must do compared to a normal call.
Much of the literature on interoperability thunks relates to various
Wintel platforms, including
MS-DOS
MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few ope ...
,
OS/2
OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 r ...
,
Windows
Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ser ...
and
.NET, and to the transition from
16-bit
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two mo ...
to
32-bit memory addressing. As customers have migrated from one platform to another, thunks have been essential to support
legacy software
In computing, a legacy system is an old method, technology, computer system, or application program, "of, relating to, or being a previous or outdated computer system", yet still in use. Often referencing a system as "legacy" means that it paved ...
written for the older platforms.
The transition from 32-bit to 64-bit code on x86 also uses a form of thunking (WoW64). However, because the x86-64 address space is larger than the one available to 32-bit code, the old "generic thunk" mechanism could not be used to call 64-bit code from 32-bit code. The only case of 32-bit code calling 64-bit code is in the WoW64's thunking of Windows APIs to 32-bit.
Overlays and dynamic linking
On systems that lack automatic
virtual memory
In computing, virtual memory, or virtual storage is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a very ...
hardware, thunks can implement a limited form of virtual memory known as
overlays. With overlays, a developer divides a program's code into segments that can be loaded and unloaded independently, and identifies the
entry point
In computer programming
Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programmin ...
s into each segment. A segment that calls into another segment must do so indirectly via a
branch table
In computer programming, a branch table or jump table is a method of transferring program control ( branching) to another part of a program (or a different program that may have been dynamically loaded) using a table of branch or jump instruction ...
. When a segment is in memory, its branch table entries jump into the segment. When a segment is unloaded, its entries are replaced with "reload thunks" that can reload it on demand.
Similarly, systems that
dynamically link modules of a program together at run-time can use thunks to connect the modules. Each module can call the others through a table of thunks that the linker fills in when it loads the module. This way the modules can interact without prior knowledge of where they are located in memory.
See also
Thunk technologies
*
DOS Protected Mode Interface
In computing, the DOS Protected Mode Interface (DPMI) is a specification introduced in 1989 which allows a DOS program to run in protected mode, giving access to many features of the new PC processors of the time not available in real mode. It w ...
(DPMI)
*
DOS Protected Mode Services
DOS Protected Mode Services (DPMS) is a set of extended DOS memory management services to allow DPMS-enabled DOS drivers to load and execute in extended memory and protected mode.
Not being a DOS extender by itself, DPMS is a minimal set of ex ...
(DPMS)
*
J/Direct
*
Microsoft Layer for Unicode
*
Platform Invocation Services
*
Win32s Win32s is a 32-bit application runtime environment for the Microsoft Windows 3.1 and 3.11 operating systems. It allowed some 32-bit applications to run on the 16-bit operating system using call thunks. A beta version of Win32s was available in Oct ...
*
Windows on Windows
In computing, Windows on Windows (commonly referred to as WOW), was a compatibility layer of 32-bit versions of the Windows NT family of operating systems since 1993 with the release of Windows NT 3.1, which extends NTVDM to provide limited s ...
*
WoW64
*
libffi
libffi is a foreign function interface library. It provides a C programming language interface for calling natively compiled functions given information about the target function at run time instead of compile time. It also implements the oppos ...
Related concepts
*
Anonymous function
*
Futures and promises
In computer science, future, promise, delay, and deferred refer to constructs used for synchronizing program execution in some concurrent programming languages. They describe an object that acts as a proxy for a result that is initially unknown, ...
*
Remote procedure call
In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure ( subroutine) to execute in a different address space (commonly on another computer on a shared network), which is coded as if it were a normal ( ...
*
Shim (computing)
In computer programming, a shim is a library that transparently intercepts API calls and changes the arguments passed, handles the operation itself or redirects the operation elsewhere. Shims can be used to support an old API in a newer ...
*
Trampoline (computing)
*
Reducible expression
Notes
References
{{reflist, refs=
[{{cite book , author-last=Levine , author-first=John R. , author-link=John R. Levine , title=Linkers and Loaders , date=2000 , orig-year=October 1999 , edition=1 , publisher= Morgan Kaufmann , series=The Morgan Kaufmann Series in Software Engineering and Programming , location=San Francisco, USA , isbn=1-55860-496-0 , oclc=42413382 , url=https://www.iecc.com/linker/ , access-date=2020-01-12 , url-status=live , archive-url=https://archive.today/20121205032107/http://www.iecc.com/linker/ , archive-date=2012-12-05 Code]
ftp://ftp.iecc.com/pub/linker/] Errata
https://archive.today/20200114224817/https://linker.iecc.com/ 2020-01-14 -->
/ref>
Computing terminology
Functional programming