In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, an interpreter is a
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components.
A computer program ...
that directly
executes instructions written in a
programming or
scripting language
A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled.
A scripting ...
, without requiring them previously to have been
compiled
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
into a
machine language
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
program. An interpreter generally uses one of the following strategies for program execution:
#
Parse
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
the
source code
In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
and perform its behavior directly;
#
Translate
Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transl ...
source code into some efficient
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
or
object code
In computing, object code or object module is the product of a compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
and immediately execute that;
# Explicitly execute stored precompiled
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
made by a
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
and matched with the interpreter
Virtual Machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
.
Early versions of
Lisp programming language
Lisp (historically LISP) is a family of programming languages with a long history and a distinctive, fully parenthesized prefix notation.
Originally specified in 1960, Lisp is the second-oldest high-level programming language still in common u ...
and
minicomputer and microcomputer BASIC dialects would be examples of the first type.
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
,
Raku,
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
,
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
, and
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
are examples of the second, while
UCSD Pascal
UCSD Pascal is a Pascal programming language system that runs on the UCSD p-System, a portable, highly machine-independent operating system. UCSD Pascal was first released in 1977. It was developed at the University of California, San Diego (U ...
is an example of the third type. Source programs are compiled ahead of time and stored as machine independent code, which is then
linked at run-time and executed by an interpreter and/or compiler (for
JIT
Jit (also known as jiti, jit-jive and the Harare beat) is a style of popular Zimbabwean dance music. It features a swift rhythm played on drums and accompanied by a guitar. Jit evolved out many diverse influences, including domestic chimurenga, ...
systems). Some systems, such as
Smalltalk and contemporary versions of
BASIC and
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
may also combine two and three. Interpreters of various types have also been constructed for many languages traditionally associated with compilation, such as
Algol
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
,
Fortran,
Cobol,
C and
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
.
While interpretation and compilation are the two main means by which programming languages are implemented, they are not mutually exclusive, as most interpreting systems also perform some translation work, just like compilers. The terms "
interpreted language
In computer science, an interpreter is a computer program that directly executes instructions written in a programming or scripting language, without requiring them previously to have been compiled into a machine language program. An interprete ...
" or "
compiled language" signify that the canonical implementation of that language is an interpreter or a compiler, respectively. A
high-level language
In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to use, ...
is ideally an
abstraction independent of particular implementations.
History
Interpreters were used as early as 1952 to ease programming within the limitations of computers at the time (e.g. a shortage of program storage space, or no native support for floating point numbers). Interpreters were also used to translate between low-level machine languages, allowing code to be written for machines that were still under construction and tested on computers that already existed. The first interpreted high-level language was
Lisp. Lisp was first implemented in 1958 by
Steve Russell on an
IBM 704 computer. Russell had read
John McCarthy's paper, and realized (to McCarthy's surprise) that the Lisp ''eval'' function could be implemented in machine code. The result was a working Lisp interpreter which could be used to run Lisp programs, or more properly, "evaluate Lisp expressions".
General operation
An interpreter usually consists of a set of known
commands it can
execute
Execute, in capital punishment, is to put someone to death.
Execute may also refer to:
* Execution (computing), the running of a computer program
* ''Execute'' (album), a 2001 Garage hip-hop album by Oxide & Neutrino
* USS ''Execute'' (AM-232), a ...
, and a list of these commands in the order a programmer wishes to execute them. Each command (also known as an
Instruction) contains the data the programmer wants to mutate, and information on how to mutate the data. For example, an interpreter might read
ADD Wikipedia_Users, 5
and ''interpret'' it as a request to add five to the
Wikipedia_Users
variable
Variable may refer to:
* Variable (computer science), a symbolic name associated with a value and whose associated value may be changed
* Variable (mathematics), a symbol that represents a quantity in a mathematical expression, as used in many ...
.
Interpreters have a wide variety of instructions which are specialized to perform different tasks, but you will commonly find interpreter instructions for basic
mathematical operations
In mathematics, an operation is a function which takes zero or more input values (also called "''operands''" or "arguments") to a well-defined output value. The number of operands is the arity of the operation.
The most commonly studied operat ...
,
branching, and
memory management
Memory management is a form of resource management applied to computer memory. The essential requirement of memory management is to provide ways to dynamically allocate portions of memory to programs at their request, and free it for reuse when ...
, making most interpreters
Turing complete
Alan Mathison Turing (; 23 June 1912 – 7 June 1954) was an English mathematician, computer scientist, logician, cryptanalyst, philosopher, and theoretical biologist. Turing was highly influential in the development of theoretical co ...
. Many interpreters are also closely integrated with a
garbage collector
A waste collector, also known as a garbageman, garbage collector, trashman (in the US), binman or (rarely) dustman (in the UK), is a person employed by a public or private enterprise to collect and dispose of municipal solid waste (refuse) and r ...
and
debugger
A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its executi ...
.
Compilers versus interpreters
Programs written in a
high-level language
In computer science, a high-level programming language is a programming language with strong abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be easier to use, ...
are either directly executed by some kind of interpreter or converted into
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
by a compiler (and
assembler
Assembler may refer to:
Arts and media
* Nobukazu Takemura, avant-garde electronic musician, stage name Assembler
* Assemblers, a fictional race in the ''Star Wars'' universe
* Assemblers, an alternative name of the superhero group Champions of ...
and
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
) for the
CPU to execute.
While compilers (and assemblers) generally produce machine code directly executable by computer hardware, they can often (optionally) produce an intermediate form called
object code
In computing, object code or object module is the product of a compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ...
. This is basically the same machine specific code but augmented with a
symbol table
In computer science, a symbol table is a data structure used by a language translator such as a compiler or interpreter, where each identifier (or symbols), constants, procedures and functions in a program's source code is associated with info ...
with names and tags to make executable blocks (or modules) identifiable and relocatable. Compiled programs will typically use building blocks (functions) kept in a library of such object code modules. A
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
is used to combine (pre-made) library files with the object file(s) of the application to form a single executable file. The object files that are used to generate an executable file are thus often produced at different times, and sometimes even by different languages (capable of generating the same object format).
A simple interpreter written in a low-level language (e.g.
assembly
Assembly may refer to:
Organisations and meetings
* Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions
* General assembly, an official meeting of the members of an organization or of their representa ...
) may have similar machine code blocks implementing functions of the high-level language stored, and executed when a function's entry in a look up table points to that code. However, an interpreter written in a high-level language typically uses another approach, such as generating and then walking a
parse tree, or by generating and executing intermediate software-defined instructions, or both.
Thus, both compilers and interpreters generally turn source code (text files) into tokens, both may (or may not) generate a parse tree, and both may generate immediate instructions (for a
stack machine,
quadruple code, or by other means). The basic difference is that a compiler system, including a (built in or separate) linker, generates a stand-alone ''machine code'' program, while an interpreter system instead ''performs'' the actions described by the high-level program.
A compiler can thus make almost all the conversions from source code semantics to the machine level once and for all (i.e. until the program has to be changed) while an interpreter has to do ''some'' of this conversion work every time a statement or function is executed. However, in an efficient interpreter, much of the translation work (including analysis of types, and similar) is factored out and done only the first time a program, module, function, or even statement, is run, thus quite akin to how a compiler works. However, a compiled program still runs much faster, under most circumstances, in part because compilers are designed to optimize code, and may be given ample time for this. This is especially true for simpler high-level languages without (many) dynamic data structures, checks, or
type checking
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a type to every "term" (a word, phrase, or other set of symbols). Usually the terms are various constructs of a computer progra ...
.
In traditional compilation, the executable output of the
linkers (.exe files or .dll files or a library, see picture) is typically relocatable when run under a general operating system, much like the object code modules are but with the difference that this relocation is done dynamically at run time, i.e. when the program is loaded for execution. On the other hand, compiled and linked programs for small
embedded systems
An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
are typically statically allocated, often hard coded in a
NOR flash memory, as there is often no secondary storage and no operating system in this sense.
Historically, most interpreter systems have had a self-contained editor built in. This is becoming more common also for compilers (then often called an
IDE), although some programmers prefer to use an editor of their choice and run the compiler, linker and other tools manually. Historically, compilers predate interpreters because hardware at that time could not support both the interpreter and interpreted code and the typical batch environment of the time limited the advantages of interpretation.
Development cycle
During the
software development cycle, programmers make frequent changes to source code. When using a compiler, each time a change is made to the source code, they must wait for the compiler to translate the altered source files and
link all of the binary code files together before the program can be executed. The larger the program, the longer the wait. By contrast, a programmer using an interpreter does a lot less waiting, as the interpreter usually just needs to translate the code being worked on to an intermediate representation (or not translate it at all), thus requiring much less time before the changes can be tested. Effects are evident upon saving the source code and reloading the program. Compiled code is generally less readily debugged as editing, compiling, and linking are sequential processes that have to be conducted in the proper sequence with a proper set of commands. For this reason, many compilers also have an executive aid, known as a
Makefile
In software development, Make is a build automation tool that automatically builds executable programs and libraries from source code by reading files called ''Makefiles'' which specify how to derive the target program. Though integrated devel ...
and program. The Makefile lists compiler and linker command lines and program source code files, but might take a simple command line menu input (e.g. "Make 3") which selects the third group (set) of instructions then issues the commands to the compiler, and linker feeding the specified source code files.
Distribution
A
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
converts source code into binary instruction for a specific processor's architecture, thus making it less
portable
Portable may refer to:
General
* Portable building, a manufactured structure that is built off site and moved in upon completion of site and utility work
* Portable classroom, a temporary building installed on the grounds of a school to provide ...
. This conversion is made just once, on the developer's environment, and after that the same binary can be distributed to the user's machines where it can be executed without further translation. A
cross compiler
A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on an Android smartphone is a cross c ...
can generate binary code for the user machine even if it has a different processor than the machine where the code is compiled.
An interpreted program can be distributed as source code. It needs to be translated in each final machine, which takes more time but makes the program distribution independent of the machine's architecture. However, the portability of interpreted source code is dependent on the target machine actually having a suitable interpreter. If the interpreter needs to be supplied along with the source, the overall installation process is more complex than delivery of a monolithic executable since the interpreter itself is part of what need to be installed.
The fact that interpreted code can easily be read and copied by humans can be of concern from the point of view of
copyright
A copyright is a type of intellectual property that gives its owner the exclusive right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, educatio ...
. However, various systems of
encryption
In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can de ...
and
obfuscation exist. Delivery of intermediate code, such as bytecode, has a similar effect to obfuscation, but bytecode could be decoded with a
decompiler
A decompiler is a computer program that translates an executable file to a high-level source file which can be recompiled successfully. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level l ...
or
disassembler
A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
.
Efficiency
The main disadvantage of interpreters is that an interpreted program typically runs slower than if it had been
compiled
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
. The difference in speeds could be tiny or great; often an order of magnitude and sometimes more. It generally takes longer to run a program under an interpreter than to run the compiled code but it can take less time to interpret it than the total time required to compile and run it. This is especially important when prototyping and testing code when an edit-interpret-debug cycle can often be much shorter than an edit-compile-run-debug cycle.
Interpreting code is slower than running the compiled code because the interpreter must analyze each
statement in the program each time it is executed and then perform the desired action, whereas the compiled code just performs the action within a fixed context determined by the compilation. This
run-time analysis is known as "interpretive overhead". Access to variables is also slower in an interpreter because the mapping of identifiers to storage locations must be done repeatedly at run-time rather than at
compile time
In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled.
The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concep ...
.
There are various compromises between the
development speed
Development or developing may refer to:
Arts
*Development hell, when a project is stuck in development
*Filmmaking#Development, Filmmaking, development phase, including finance and budgeting
*Development (music), the process thematic material ...
when using an interpreter and the execution speed when using a compiler. Some systems (such as some
Lisps) allow interpreted and compiled code to call each other and to share variables. This means that once a routine has been tested and debugged under the interpreter it can be compiled and thus benefit from faster execution while other routines are being developed. Many interpreters do not execute the source code as it stands but convert it into some more compact internal form. Many
BASIC interpreters replace
keyword
Keyword may refer to:
Computing
* Keyword (Internet search), a word or phrase typically used by bloggers or online content creator to rank a web page on a particular topic
* Index term, a term used as a keyword to documents in an information syst ...
s with single
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
tokens which can be used to find the instruction in a
jump table
In computer programming, a branch table or jump table is a method of transferring program control ( branching) to another part of a program (or a different program that may have been dynamically loaded) using a table of branch or jump instruction ...
. A few interpreters, such as the
PBASIC
PBASIC is a microcontroller-based version of BASIC created by Parallax, Inc. in 1992.
PBASIC was created to bring ease of use to the microcontroller and embedded processor world. It is used for writing code for the BASIC Stamp microcontrollers ...
interpreter, achieve even higher levels of program compaction by using a bit-oriented rather than a byte-oriented program memory structure, where commands tokens occupy perhaps 5 bits, nominally "16-bit" constants are stored in a
variable-length code
In coding theory a variable-length code is a code which maps source symbols to a ''variable'' number of bits.
Variable-length codes can allow sources to be compressed and decompressed with ''zero'' error (lossless data compression) and still b ...
requiring 3, 6, 10, or 18 bits, and address operands include a "bit offset". Many BASIC interpreters can store and read back their own tokenized internal representation.
An interpreter might well use the same
lexical analyzer
In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' ( strings with an assigned and thus identified ...
and
parser
Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
as the compiler and then interpret the resulting
abstract syntax tree
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurr ...
. Example data type definitions for the latter, and a toy interpreter for syntax trees obtained from
C expressions are shown in the box.
Regression
Interpretation cannot be used as the sole method of execution: even though an interpreter can itself be interpreted and so on, a directly executed program is needed somewhere at the bottom of the stack because the code being interpreted is not, by definition, the same as the machine code that the CPU can execute.
Variations
Bytecode interpreters
There is a spectrum of possibilities between interpreting and compiling, depending on the amount of analysis performed before the program is executed. For example,
Emacs Lisp
Emacs Lisp is a dialect of the Lisp programming language used as a scripting language by Emacs (a text editor family most commonly associated with GNU Emacs and XEmacs). It is used for implementing most of the editing functionality built into Em ...
is compiled to
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
, which is a highly compressed and optimized representation of the Lisp source, but is not machine code (and therefore not tied to any particular hardware). This "compiled" code is then interpreted by a bytecode interpreter (itself written in
C). The compiled code in this case is machine code for a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
, which is implemented not in hardware, but in the bytecode interpreter. Such compiling interpreters are sometimes also called ''compreters''.
In a bytecode interpreter each instruction starts with a byte, and therefore bytecode interpreters have up to 256 instructions, although not all may be used. Some bytecodes may take multiple bytes, and may be arbitrarily complicated.
Control table
Control tables are tables that control the control flow or play a major part in program control. There are no rigid rules about the structure or content of a control table—its qualifying attribute is its ability to direct control flow in some w ...
s - that do not necessarily ever need to pass through a compiling phase - dictate appropriate algorithmic
control flow
In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''im ...
via customized interpreters in similar fashion to bytecode interpreters.
Threaded code interpreters
Threaded code interpreters are similar to bytecode interpreters but instead of bytes they use pointers. Each "instruction" is a word that points to a function or an instruction sequence, possibly followed by a parameter. The threaded code interpreter either loops fetching instructions and calling the functions they point to, or fetches the first instruction and jumps to it, and every instruction sequence ends with a fetch and jump to the next instruction. Unlike bytecode there is no effective limit on the number of different instructions other than available memory and address space. The classic example of threaded code is the
Forth
Forth or FORTH may refer to:
Arts and entertainment
* ''forth'' magazine, an Internet magazine
* ''Forth'' (album), by The Verve, 2008
* ''Forth'', a 2011 album by Proto-Kaw
* Radio Forth, a group of independent local radio stations in Scotla ...
code used in
Open Firmware
Open Firmware is a standard defining the interfaces of a computer firmware system, formerly endorsed by the Institute of Electrical and Electronics Engineers (IEEE). It originated at Sun Microsystems, where it was known as OpenBoot, and has bee ...
systems: the source language is compiled into "F code" (a bytecode), which is then interpreted by a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
.
Abstract syntax tree interpreters
In the spectrum between interpreting and compiling, another approach is to transform the source code into an optimized abstract syntax tree (AST), then execute the program following this tree structure, or use it to generate native code
just-in-time. In this approach, each sentence needs to be parsed just once. As an advantage over bytecode, the AST keeps the global program structure and relations between statements (which is lost in a bytecode representation), and when compressed provides a more compact representation.
Thus, using AST has been proposed as a better intermediate format for just-in-time compilers than bytecode. Also, it allows the system to perform better analysis during runtime.
However, for interpreters, an AST causes more overhead than a bytecode interpreter, because of nodes related to syntax performing no useful work, of a less sequential representation (requiring traversal of more pointers) and of overhead visiting the tree.
Just-in-time compilation
Further blurring the distinction between interpreters, bytecode interpreters and compilation is just-in-time (JIT) compilation, a technique in which the intermediate representation is compiled to native
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
at runtime. This confers the efficiency of running native code, at the cost of startup time and increased memory use when the bytecode or AST is first compiled. The earliest published JIT compiler is generally attributed to work on
LISP by
John McCarthy in 1960.
Adaptive optimization
Adaptive optimization is a technique in computer science that performs dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between ...
is a complementary technique in which the interpreter profiles the running program and compiles its most frequently executed parts into native code. The latter technique is a few decades old, appearing in languages such as
Smalltalk in the 1980s.
Just-in-time compilation has gained mainstream attention amongst language implementers in recent years, with
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, the
.NET Framework
The .NET Framework (pronounced as "''dot net"'') is a proprietary software framework developed by Microsoft that runs primarily on Microsoft Windows. It was the predominant implementation of the Common Language Infrastructure (CLI) until bein ...
, most modern
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
implementations, and
Matlab
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
now including JIT compilers.
Template Interpreter
Making the distinction between compilers and interpreters yet again even more vague is a special interpreter design known as a template interpreter. Rather than implement the execution of code by virtue of a large switch statement containing every possible bytecode, while operating on a software stack or a tree walk, a template interpreter maintains a large array of bytecode (or any efficient intermediate representation) mapped directly to corresponding native machine instructions that can be executed on the host hardware as key value pairs (or in more efficient designs, direct addresses to the native instructions),
known as a "Template". When the particular code segment is executed the interpreter simply loads or jumps to the opcode mapping in the template and directly runs it on the hardware. Due to its design, the template interpreter very strongly resembles a just-in-time compiler rather than a traditional interpreter, however it is technically not a JIT due to the fact that it merely translates code from the language into native calls one opcode at a time rather than creating optimized sequences of CPU executable instructions from the entire code segment. Due to the interpreter's simple design of simply passing calls directly to the hardware rather than implementing them directly, it is much faster than every other type, even bytecode interpreters, and to an extent less prone to bugs, but as a tradeoff is more difficult to maintain due to the interpreter having to support translation to multiple different architectures instead of a platform independent virtual machine/stack. To date, the only template interpreter implementations of widely known languages to exist are the interpreter within Java's official reference implementation, the Sun HotSpot Java Virtual Machine,
and the Ignition Interpreter in the Google V8 javascript execution engine.
Self-interpreter
A self-interpreter is a
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
interpreter written in a programming language which can interpret itself; an example is a
BASIC interpreter written in BASIC. Self-interpreters are related to
self-hosting compilers.
If no
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
exists for the language to be interpreted, creating a self-interpreter requires the implementation of the language in a host language (which may be another programming language or
assembler
Assembler may refer to:
Arts and media
* Nobukazu Takemura, avant-garde electronic musician, stage name Assembler
* Assemblers, a fictional race in the ''Star Wars'' universe
* Assemblers, an alternative name of the superhero group Champions of ...
). By having a first interpreter such as this, the system is
bootstrapped and new versions of the interpreter can be developed in the language itself. It was in this way that
Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
developed the TANGLE interpreter for the language
WEB of the industrial standard
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
typesetting system
Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or ''glyphs'' in digital systems representing ''characters'' (letters and other symbols).Dictionary.com Unabridged. Random Ho ...
.
Defining a computer language is usually done in relation to an abstract machine (so-called
operational semantics
Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its execut ...
) or as a mathematical function (
denotational semantics). A language may also be defined by an interpreter in which the semantics of the host language is given. The definition of a language by a self-interpreter is not well-founded (it cannot define a language), but a self-interpreter tells a reader about the expressiveness and elegance of a language. It also enables the interpreter to interpret its source code, the first step towards reflective interpreting.
An important design dimension in the implementation of a self-interpreter is whether a feature of the interpreted language is implemented with the same feature in the interpreter's host language. An example is whether a
closure in a
Lisp-like language is implemented using closures in the interpreter language or implemented "manually" with a data structure explicitly storing the environment. The more features implemented by the same feature in the host language, the less control the programmer of the interpreter has; a different behavior for dealing with number overflows cannot be realized if the arithmetic operations are delegated to corresponding operations in the host language.
Some languages such as
Lisp and
Prolog
Prolog is a logic programming language associated with artificial intelligence and computational linguistics.
Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
have elegant self-interpreters. Much research on self-interpreters (particularly reflective interpreters) has been conducted in the
Scheme programming language
Scheme is a dialect of the Lisp family of programming languages. Scheme was created during the 1970s at the MIT AI Lab and released by its developers, Guy L. Steele and Gerald Jay Sussman, via a series of memos now known as the Lambda Papers. It ...
, a dialect of Lisp. In general, however, any
Turing-complete
In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any ...
language allows writing of its own interpreter. Lisp is such a language, because Lisp programs are lists of symbols and other lists. XSLT is such a language, because XSLT programs are written in XML. A sub-domain of
metaprogramming
Metaprogramming is a programming technique in which computer programs have the ability to treat other programs as their data. It means that a program can be designed to read, generate, analyze or transform other programs, and even modify itself ...
is the writing of
domain-specific languages (DSLs).
Clive Gifford introduced
a measure quality of self-interpreter (the eigenratio), the limit of the ratio between computer time spent running a stack of ''N'' self-interpreters and time spent to run a stack of self-interpreters as ''N'' goes to infinity. This value does not depend on the program being run.
The book ''
Structure and Interpretation of Computer Programs
''Structure and Interpretation of Computer Programs'' (''SICP'') is a computer science textbook by Massachusetts Institute of Technology professors Harold Abelson and Gerald Jay Sussman with Julie Sussman. It is known as the "Wizard Book" in ha ...
'' presents examples of
meta-circular interpretation for Scheme and its dialects. Other examples of languages with a self-interpreter are
Forth
Forth or FORTH may refer to:
Arts and entertainment
* ''forth'' magazine, an Internet magazine
* ''Forth'' (album), by The Verve, 2008
* ''Forth'', a 2011 album by Proto-Kaw
* Radio Forth, a group of independent local radio stations in Scotla ...
and
Pascal.
Microcode
Microcode is a very commonly used technique "that imposes an interpreter between the hardware and the architectural level of a computer".
As such, the microcode is a layer of hardware-level instructions that implement higher-level
machine code
In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
instructions or internal
state machine sequencing in many
digital processing
Digital data, in information theory and information systems, is information represented as a string of discrete symbols each of which can take on one of only a finite number of values from some alphabet, such as letters or digits. An example is ...
elements. Microcode is used in general-purpose
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just Processor (computing), processor, is the electronic circuitry that executes Instruction (computing), instructions comprising a computer program. The CPU per ...
s, as well as in more specialized processors such as
microcontrollers,
digital signal processors,
channel controllers,
disk controller
{{unreferenced, date=May 2010
The disk controller is the controller circuit which enables the CPU to communicate with a hard disk, floppy disk or other kind of disk drive. It also provides an interface between the disk drive and the bus conne ...
s,
network interface controller
A network interface controller (NIC, also known as a network interface card, network adapter, LAN adapter or physical network interface, and by similar terms) is a computer hardware component that connects a computer to a computer network.
Ear ...
s,
network processors,
graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, m ...
s, and in other hardware.
Microcode typically resides in special high-speed memory and translates machine instructions,
state machine data or other input into sequences of detailed circuit-level operations. It separates the machine instructions from the underlying
electronics
The field of electronics is a branch of physics and electrical engineering that deals with the emission, behaviour and effects of electrons using electronic devices. Electronics uses active devices to control electron flow by amplification ...
so that instructions can be designed and altered more freely. It also facilitates the building of complex multi-step instructions, while reducing the complexity of computer circuits. Writing microcode is often called microprogramming and the microcode in a particular processor implementation is sometimes called a microprogram.
More extensive microcoding allows small and simple
microarchitectures to
emulate
Emulate, Inc. (Emulate) is a biotechnology company that commercialized Organs-on-Chips technology—a human cell-based technology that recreates organ-level function to model organs in healthy and diseased states. The technology has applications ...
more powerful architectures with wider
word length
In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word s ...
, more
execution unit
In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...
s and so on, which is a relatively simple way to achieve software compatibility between different products in a processor family.
Computer processor
Even a non microcoding computer processor itself can be considered to be a parsing immediate execution interpreter that is written in a general purpose hardware description language such as
VHDL
The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gate ...
to create a system that parses the machine code instructions and immediately executes them.
Applications
* Interpreters are frequently used to execute
command language
A command language is a language for job control in computing. It is a domain-specific and interpreted language; common examples of a command language are shell or batch programming languages.
These languages can be used directly at the comman ...
s, and
glue language
A scripting language or script language is a programming language that is used to manipulate, customize, and automate the facilities of an existing system. Scripting languages are usually interpreted at runtime rather than compiled.
A scripting ...
s since each operator executed in command language is usually an invocation of a complex routine such as an editor or compiler.
*
Self-modifying code
In computer science, self-modifying code (SMC) is code that alters its own instructions while it is executing – usually to reduce the instruction path length and improve performance or simply to reduce otherwise repetitively similar code, ...
can easily be implemented in an interpreted language. This relates to the origins of interpretation in Lisp and
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
research.
*
Virtualization
In computing, virtualization or virtualisation (sometimes abbreviated v12n, a numeronym) is the act of creating a virtual (rather than actual) version of something at the same abstraction level, including virtual computer hardware platforms, stor ...
. Machine code intended for a hardware architecture can be run using a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...
. This is often used when the intended architecture is unavailable, or among other uses, for running multiple copies.
*
Sandboxing: While some types of sandboxes rely on operating system protections, an interpreter or virtual machine is often used. The actual hardware architecture and the originally intended hardware architecture may or may not be the same. This may seem pointless, except that sandboxes are not compelled to actually execute all the instructions the source code it is processing. In particular, it can refuse to execute code that violates any
security" \n\n\nsecurity.txt is a proposed standard for websites' security information that is meant to allow security researchers to easily report security vulnerabilities. The standard prescribes a text file called \"security.txt\" in the well known locat ...
constraints it is operating under.
*
Emulator
In computing, an emulator is hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run software or use pe ...
s for running computer software written for obsolete and unavailable hardware on more modern equipment.
See also
*
BASIC interpreter
A BASIC interpreter is an interpreter that enables users to enter and run programs in the BASIC language and was, for the first part of the microcomputer era, the default application that computers would launch. Users were expected to use the BAS ...
*
Command-line interpreter
A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
*
Compiled language
*
Dynamic compilation
Dynamic compilation is a process used by some programming language implementations to gain performance during program execution. Although the technique originated in Smalltalk,Peter L. Deutsch and Alan Schiffman. "Efficient Implementation of the ...
*
Homoiconicity
In computer programming, homoiconicity (from the Greek words ''homo-'' meaning "the same" and ''icon'' meaning "representation") is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated as ...
*
Meta-circular evaluator In computing, a meta-circular evaluator (MCE) or meta-circular interpreter (MCI) is an interpreter which defines each feature of the interpreted language using a similar facility of the interpreter's host language. For example, interpreting a lambd ...
*
Partial evaluation
References
External links
IBM Card Interpreterspage at Columbia University
Theoretical Foundations For Practical 'Totally Functional Programming'(Chapter 7 especially) Doctoral dissertation tackling the problem of formalising what is an interpreter
Short animationexplaining the key conceptual difference between interpreters and compilers. Archived a
ghostarchive.orgon May 9, 2022.
{{Authority control
*
Programming language implementation