intermediate language
   HOME

TheInfoList



OR:

An intermediate representation (IR) is the
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
or code used internally by a
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
or
virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
to represent
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
. An IR is designed to be conducive to further processing, such as
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
and
translation Translation is the communication of the semantics, meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The English la ...
. A "good" IR must be ''accurate'' – capable of representing the source code without loss of information – and ''independent'' of any particular source or target language. An IR may take one of several forms: an in-memory
data structure In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
, or a special
tuple In mathematics, a tuple is a finite sequence or ''ordered list'' of numbers or, more generally, mathematical objects, which are called the ''elements'' of the tuple. An -tuple is a tuple of elements, where is a non-negative integer. There is o ...
- or stack-based
code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communicati ...
readable by the program. In the latter case it is also called an ''intermediate language''. A canonical example is found in most modern compilers. For example, the CPython interpreter transforms the linear human-readable text representing a program into an intermediate graph structure that allows flow analysis and re-arrangement before execution. Use of an intermediate representation such as this allows compiler systems like the
GNU Compiler Collection The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, Computer architecture, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes ...
and
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
to be used by many different source languages to generate code for many different target architectures.


Intermediate language

An intermediate language is the language of an abstract machine designed to aid in the analysis of
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
s. The term comes from their use in
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s, where the source code of a program is translated into a form more suitable for code-improving transformations before being used to generate object or
machine A machine is a physical system that uses power to apply forces and control movement to perform an action. The term is commonly applied to artificial devices, such as those employing engines or motors, but also to natural biological macromol ...
code for a target machine. The design of an intermediate language typically differs from that of a practical machine language in three fundamental ways: * Each instruction represents exactly one fundamental operation; e.g. "shift-add"
addressing mode Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...
s common in
microprocessors A microprocessor is a computer processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, and control circuitry r ...
are not present. *
Control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
information may not be included in the instruction set. * The number of
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-onl ...
s available may be large, even limitless. A popular format for intermediate languages is
three-address code In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate language, intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most t ...
. The term is also used to refer to languages used as intermediates by some
high-level programming language A high-level programming language is a programming language with strong Abstraction (computer science), abstraction from the details of the computer. In contrast to low-level programming languages, it may use natural language ''elements'', be ea ...
s which do not output object or machine code themselves, but output the intermediate language only. This intermediate language is submitted to a compiler for such language, which then outputs finished object or machine code. This is usually done to ease the process of
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
or to increase portability by using an intermediate language that has compilers for many processors and
operating systems An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
, such as C. Languages used for this fall in complexity between high-level languages and low-level languages, such as
assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
s.


Languages

Though not explicitly designed as an intermediate language, C's nature as an abstraction of assembly and its ubiquity as the ''de facto'' system language in
Unix-like A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
and other operating systems has made it a popular intermediate language: Eiffel, Sather, Esterel, some
dialect A dialect is a Variety (linguistics), variety of language spoken by a particular group of people. This may include dominant and standard language, standardized varieties as well as Vernacular language, vernacular, unwritten, or non-standardize ...
s of
Lisp Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation. Originally specified in the late 1950s, ...
( Lush, Gambit),
Squeak Squeak is an object-oriented, class-based, and reflective programming language. It was derived from Smalltalk-80 by a group that included some of Smalltalk-80's original developers, initially at Apple Computer, then at Walt Disney Imaginee ...
's Smalltalk-subset Slang, Nim,
Cython Cython () is a superset of the programming language Python, which allows developers to write Python code (with optional, C-inspired syntax extensions) that yields performance comparable to that of C. Cython is a compiled language that is ty ...
, Seed7,
SystemTap In computing, SystemTap () is a scripting language and tool for dynamically instrumenting running production Linux-based operating systems. System administrators can use SystemTap to extract, filter and summarize data in order to enable diagnosi ...
, Vala, V, and others make use of C as an intermediate language. Variants of C have been designed to provide C's features as a portable
assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
, including C-- and the C Intermediate Language. Any language targeting a
virtual machine In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
or p-code machine can be considered an intermediate language: *
Java bytecode Java bytecode is the instruction set of the Java virtual machine (JVM), the language to which Java and other JVM-compatible source code is compiled. Each instruction is represented by a single byte, hence the name bytecode, making it a compact ...
* Microsoft's Common Intermediate Language is an intermediate language designed to be shared by all compilers for the .NET Framework, before static or dynamic compilation to machine code. * While most intermediate languages are designed to support statically typed languages, the Parrot intermediate representation is designed to support dynamically typed languages—initially Perl and Python. * TIMI is used by compilers on the
IBM i IBM i (the ''i'' standing for ''integrated'') is an operating system developed by IBM for IBM Power Systems. It was originally released in 1988 as OS/400, as the sole operating system of the IBM AS/400 line of systems. It was renamed to i5/OS in 2 ...
platform. * O-code for BCPL *
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
precompiled code * Microsoft P-Code * Pascal p-code The
GNU Compiler Collection The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, Computer architecture, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes ...
(GCC) uses several intermediate languages internally to simplify portability and cross-compilation. Among these languages are * the historical Register Transfer Language (RTL) * the tree language GENERIC * the SSA-based
GIMPLE The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
. (Lower-level than GENERIC; input for most optimizers; has a compact "bytecode" notation.) GCC supports generating these IRs, as a final target: * HSA Intermediate Layer * LLVM Intermediate Representation (converted from GIMPLE in the now-defunct llvm-gcc which uses LLVM optimizers and codegen) The
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
compiler framework is based on the LLVM IR intermediate language, of which the compact, binary serialized representation is also referred to as "bitcode" and has been productized by Apple. Like GIMPLE Bytecode, LLVM Bitcode is useful in link-time optimization. Like GCC, LLVM also targets some IRs meant for direct distribution, including Google's PNaCl IR and SPIR. A further development within LLVM is the use of ''Multi-Level Intermediate Representation'' ( MLIR) with the potential to generate code for different heterogeneous targets, and to combine the outputs of different compilers. The ILOC intermediate language is used in classes on compiler design as a simple target language."CISC 471 Compiler Design"
by Uli Kremer


Other

Static analysis tools often use an intermediate representation. For instance, Radare2 is a toolbox for binary files analysis and reverse-engineering. It uses the intermediate languages ESIL and REIL to analyze binary files.


See also

* BURS * Interlingual machine translation * Pivot language * Abstract syntax tree *
Bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
(Intermediate code) * Symbol table *
Source-to-source compiler A source-to-source translator, source-to-source compiler (S2S compiler), transcompiler, or transpiler is a type of translator that takes the source code of a program written in a programming language as its input and produces an equivalent so ...
* Graph rewriting and term rewriting * UNCOL


References


External links

* The Stanford SUIF Group {{DEFAULTSORT:Intermediate language Compiler construction Programming language classification