In
computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, binary translation is a form of
binary recompilation where sequences of
instructions are translated from a source
instruction set
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...
(ISA) to the target instruction set with respect to the
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
for which the binary was compiled for. In some cases such as
instruction set simulation, the target instruction set may be the same as the source instruction set, providing testing and debugging features such as instruction trace, conditional breakpoints and
hot spot detection.
The two main types are static and dynamic binary translation. Translation can be done in hardware (for example, by circuits in a
CPU) or in software (e.g. run-time engines, static recompiler, emulators, all are typically slow).
Motivation
Binary translation is motivated by a lack of a binary for a target platform, the lack of source code to compile for the target platform, or otherwise difficulty in compiling the source for the target platform.
Statically recompiled binaries run potentially faster than their respective emulated binaries, as the emulation overhead is removed. This is similar to the difference in performance between interpreted and compiled programs in general.
Static binary translation
A translator using static binary translation aims to convert all of the code of an
executable file
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a da ...
into code that runs on the target architecture and platform without having to run the code first, as is done in dynamic binary translation. This is very difficult to do correctly, since not all the code can be discovered by the translator. For example, some parts of the executable may be reachable only through
indirect branches, whose value is known only at run-time.
One such static binary translator uses universal
superoptimizer peephole technology (developed by
Sorav Bansal and Alex Aiken from
Stanford University
Leland Stanford Junior University, commonly referred to as Stanford University, is a Private university, private research university in Stanford, California, United States. It was founded in 1885 by railroad magnate Leland Stanford (the eighth ...
) to perform efficient translation between possibly many source and target pairs, with considerably low development costs and high performance of the target binary. In experiments of PowerPC-to-x86 translations, some binaries even outperformed native versions, but on average they ran at two-thirds of native speed.
Examples for static binary translations
In 1960s
Honeywell
Honeywell International Inc. is an American publicly traded, multinational conglomerate corporation headquartered in Charlotte, North Carolina. It primarily operates in four areas of business: aerospace, building automation, industrial automa ...
provided a program called the
Liberator for their
Honeywell 200 series of computers; it could translate programs for the
IBM 1400 series
The IBM 1400 series are second-generation (transistor) mid-range business decimal computers that IBM marketed in the early 1960s. The computers were offered to replace tabulating machines like the IBM 407. The 1400-series machines stored infor ...
of computers into programs for the Honeywell 200 series.
In 1995 Norman Ramsey at
Bell Communications Research and Mary F. Fernandez at Department of Computer Science,
Princeton University
Princeton University is a private university, private Ivy League research university in Princeton, New Jersey, United States. Founded in 1746 in Elizabeth, New Jersey, Elizabeth as the College of New Jersey, Princeton is the List of Colonial ...
developed ''The New Jersey Machine-Code Toolkit'' that had the basic tools for static assembly translation.
In 2004 Scott Elliott and Phillip R. Hutchinson at
Nintendo
is a Japanese Multinational corporation, multinational video game company headquartered in Kyoto. It develops, publishes, and releases both video games and video game consoles.
The history of Nintendo began when craftsman Fusajiro Yamauchi ...
developed a tool to generate "C" code from
Game Boy
The is a handheld game console developed by Nintendo, launched in the Japanese home market on April 21, 1989, followed by North America later that year and other territories from 1990 onwards. Following the success of the Game & Watch single-ga ...
binary that could then be compiled for a new platform and linked against a hardware library for use in airline entertainment systems.
In 2014, an
ARM architecture
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer, RISC instruction set architectures (ISAs) for central processing unit, com ...
version of the 1998
video game
A video game or computer game is an electronic game that involves interaction with a user interface or input device (such as a joystick, game controller, controller, computer keyboard, keyboard, or motion sensing device) to generate visual fe ...
''
StarCraft'' was generated by static recompilation and additional
reverse engineering
Reverse engineering (also known as backwards engineering or back engineering) is a process or method through which one attempts to understand through deductive reasoning how a previously made device, process, system, or piece of software accompl ...
of the original
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
version.
The
Pandora
In Greek mythology, Pandora was the first human woman created by Hephaestus on the instructions of Zeus. As Hesiod related it, each god cooperated by giving her unique gifts. Her other name—inscribed against her figure on a white-ground '' ky ...
handheld community was capable of developing the required tools on their own and achieving such translations successfully several times.
Another example is the
NES-to-
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
statically recompiled version of the videogame ''
Super Mario Bros.'' which was generated under usage of
LLVM
LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
in 2013.
For instance, a successful x86-to-
x64 static recompilation was generated for the
procedural terrain generator of the video game ''
Cube World'' in 2014.
Dynamic binary translation
Dynamic binary translation (DBT) looks at a short sequence of code—typically on the order of a single
basic block
In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually decom ...
—then translates it and caches the resulting sequence. Code is only translated as it is discovered and when possible, and branch instructions are made to point to already translated and saved code (
memoization
In computing, memoization or memoisation is an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls to pure functions and returning the cached result when the same inputs occur ag ...
).
Dynamic binary translation differs from simple emulation (eliminating the emulator's main read-decode-execute loop—a major performance bottleneck), paying for this by large overhead during translation time. This overhead is hopefully amortized as translated code sequences are executed multiple times.
More advanced dynamic translators employ
dynamic recompilation
In computer science, dynamic recompilation is a feature of some emulators and virtual machines, where the system may recompile some part of a program during execution. By compiling during execution, the system can tailor the generated code to ...
where the translated code is instrumented to find out what portions are executed a large number of times, and these portions are
optimized aggressively. This technique is reminiscent of a
JIT compiler
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is compiler, compilation (of Source code, computer code) during execution of a program (at run time (program lifecycle phase), run time) rather than b ...
, and in fact such compilers (e.g.
Sun
The Sun is the star at the centre of the Solar System. It is a massive, nearly perfect sphere of hot plasma, heated to incandescence by nuclear fusion reactions in its core, radiating the energy from its surface mainly as visible light a ...
's
HotSpot technology) can be viewed as dynamic translators from a virtual instruction set (the
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
) to a real one.
Examples for dynamic binary translations in software
*
Apple Computer
Apple Inc. is an American multinational corporation and technology company headquartered in Cupertino, California, in Silicon Valley. It is best known for its consumer electronics, software, and services. Founded in 1976 as Apple Computer Co ...
implemented a dynamic translating
emulator
In computing, an emulator is Computer hardware, hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run sof ...
for
M68K
The Motorola 68000 series (also known as 680x0, m68000, m68k, or 68k) is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and w ...
code in their
PowerPC
PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
line of
Macintoshes,
which achieved a very high level of reliability, performance and compatibility (see
Mac 68K emulator). This allowed Apple to bring the machines to market with only a partially native
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
, and end users could adopt the new, faster architecture without risking their investment in software. Partly because the emulator was so successful, many parts of the operating system remained emulated. A full transition to a PowerPC native
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
(OS) was not made until the release of
Mac OS X
macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...
(10.0) in 2001. (The Mac OS X "
Classic
A classic is an outstanding example of a particular style; something of Masterpiece, lasting worth or with a timeless quality; of the first or Literary merit, highest quality, class, or rank – something that Exemplification, exemplifies its ...
" runtime environment continued to offer this emulation capability on PowerPC Macs until
Mac OS X 10.5.)
*
Mac OS X 10.4.4 for Intel-based Macs introduced the
Rosetta dynamic translation layer to ease Apple's transition from PPC-based hardware to x86. Developed for Apple by
Transitive Corporation, the Rosetta software is an implementation of Transitive's
QuickTransit solution.
* QuickTransit during its product lifespan also provided
SPARC→
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
, x86→
PowerPC
PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
and
MIPS→
Itanium 2
Itanium (; ) is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly developed by HP and I ...
translation support.
*
DEC achieved similar success with its translation tools to help users migrate from the
CISC VAX architecture to the
Alpha
Alpha (uppercase , lowercase ) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter ''aleph'' , whose name comes from the West Semitic word for ' ...
RISC
In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a comp ...
architecture.
*
HP ARIES (Automatic Re-translation and Integrated Environment Simulation) is a software dynamic binary translation system that combines fast code interpretation with two phase dynamic translation to transparently and accurately execute
HP 9000 HP-UX
HP-UX (from "Hewlett Packard Unix") is a proprietary software, proprietary implementation of the Unix operating system developed by Hewlett Packard Enterprise; current versions support HPE Integrity Servers, based on Intel's Itanium architect ...
applications on
HP-UX
HP-UX (from "Hewlett Packard Unix") is a proprietary software, proprietary implementation of the Unix operating system developed by Hewlett Packard Enterprise; current versions support HPE Integrity Servers, based on Intel's Itanium architect ...
11i for
HPE Integrity Servers
HPE Integrity Servers is a series of server (computing), server computers produced by Hewlett Packard Enterprise (formerly Hewlett-Packard) since 2003, based on the Itanium processor. The Integrity brand name was inherited by HP from Tandem Com ...
. The ARIES fast interpreter emulates a complete set of non-privileged
PA-RISC
Precision Architecture reduced instruction set computer, RISC (PA-RISC) or Hewlett Packard Precision Architecture (HP/PA or simply HPPA), is a computer, general purpose computer instruction set architecture (ISA) developed by Hewlett-Packard f ...
instructions with no user intervention. During interpretation, it monitors the application's execution pattern and translates only the frequently executed code into native
Itanium
Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly dev ...
code at runtime. ARIES implements two phase dynamic translation, a technique in which translated code in first phase collects runtime profile information which is used during second phase translation to further optimize the translated code. ARIES stores the dynamically translated code in memory buffer called code cache. Further references to translated basic blocks execute directly in the code cache and do not require additional interpretation or translation. The targets of translated code blocks are back-patched to ensure execution takes place in code cache most of the time. At the end of the emulation, ARIES discards all the translated code without modifying the original application. The ARIES emulation engine also implements Environment Emulation which emulates an
HP 9000 HP-UX
HP-UX (from "Hewlett Packard Unix") is a proprietary software, proprietary implementation of the Unix operating system developed by Hewlett Packard Enterprise; current versions support HPE Integrity Servers, based on Intel's Itanium architect ...
application's system calls, signal delivery, exception management, threads management, emulation of
HP GDB for debugging, and core file creation for the application.
* DEC created the
FX!32 binary translator for converting
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
applications to Alpha applications.
*
Sun Microsystems
Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
'
Wabi software included dynamic translation from x86 to SPARC instructions.
* In January 2000,
Transmeta
Transmeta Corporation was an American fabless semiconductor company based in Santa Clara, California. It developed low power x86 compatible microprocessors based on a VLIW core and a software layer called Code Morphing Software.
Code Morphing ...
Corporation announced a novel processor design named
Crusoe. From the
FAQ on their web site,
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
Corporation developed and implemented an
IA-32 Execution Layer - a dynamic binary translator designed to support IA-32 applications on
Itanium
Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly dev ...
-based systems, which was included in
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
Windows Server
Windows Server (formerly Windows NT Server) is a brand name for Server (computing), server-oriented releases of the Windows NT operating system (OS) that have been developed by Microsoft since 1993. The first release under this brand name i ...
for
Itanium
Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly dev ...
architecture, as well as in several flavors of
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
, including
Red Hat
Red Hat, Inc. (formerly Red Hat Software, Inc.) is an American software company that provides open source software products to enterprises and is a subsidiary of IBM. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North ...
and
Suse. It allowed IA-32 applications to run faster than they would using the native IA-32 mode on Itanium processors.
*
Dolphin
A dolphin is an aquatic mammal in the cetacean clade Odontoceti (toothed whale). Dolphins belong to the families Delphinidae (the oceanic dolphins), Platanistidae (the Indian river dolphins), Iniidae (the New World river dolphins), Pontopori ...
(an emulator for the
GameCube
The is a PowerPC-based home video game console developed and marketed by Nintendo. It was released in Japan on September 14, 2001, in North America on November 18, 2001, in Europe on May 3, 2002, and in Australia on May 17, 2002. It is the suc ...
/
Wii) performs JIT recompilation of PowerPC code to x86 and AArch64.
*
Microsoft Virtual PC supports binary translation for 32-bit guest operating systems.
*
VMware Workstation
VMware Workstation Pro (known as VMware Workstation until release of VMware Workstation 12 in 2015) is a hosted (Type 2) hypervisor that runs on x64 versions of Windows and Linux operating systems. It enables users to set up virtual machines (VM ...
12 or earlier are known to support binary translation for 32-bit guest operating systems.
Examples for dynamic binary translations in hardware
*
Nvidia
Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
Tegra
Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics pr ...
K1 Denver translates
ARM instructions over a slow hardware decoder to its native microcode instructions and uses a software binary translator for hot code.
See also
*
Binary optimization
*
Binary recompilation
*
Dynamic recompilation
In computer science, dynamic recompilation is a feature of some emulators and virtual machines, where the system may recompile some part of a program during execution. By compiling during execution, the system can tailor the generated code to ...
*
Just-in-time compilation
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is compilation (of computer code) during execution of a program (at run time) rather than before execution. This may consist of source code transl ...
*
Instruction set simulator
An instruction set simulator (ISS) is a simulation model (abstract), model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables ...
*
Emulator
In computing, an emulator is Computer hardware, hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run sof ...
*
Virtual machine
In computing, a virtual machine (VM) is the virtualization or emulator, emulation of a computer system. Virtual machines are based on computer architectures and provide the functionality of a physical computer. Their implementations may involve ...
*
Comparison of platform virtualization software
Platform virtualization software, specifically emulators and hypervisors, are software packages that emulate the whole physical computer machine, often providing multiple virtual machines on one physical platform. The table below compares basic ...
*
Shadow memory
In computing, shadow memory is a technique used to track and store information on computer memory used by a computer program, program during its execution. Shadow memory consists of shadow bytes that map to individual bits or one or more bytes in ...
References
Further reading
*
*
*
*
*
*
*
{{Refend
Emulation software
Interpreters (computing)
Virtualization