HOME

TheInfoList



OR:

A source-to-source translator, source-to-source compiler (S2S compiler), transcompiler, or transpiler is a type of
translator Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''trans ...
that takes the
source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...
of a program written in a
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
as its input and produces an equivalent source code in the same or a different programming language, usually as an
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
. A source-to-source translator converts between programming languages that operate at approximately the same level of
abstraction Abstraction is a process where general rules and concepts are derived from the use and classifying of specific examples, literal (reality, real or Abstract and concrete, concrete) signifiers, first principles, or other methods. "An abstraction" ...
, while a traditional
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
translates from a higher level language to a lower level language. For example, a source-to-source translator may perform a translation of a program from Python to
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
, while a traditional compiler translates from a language like C to assembly or
Java Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
to
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
. An automatic parallelizing compiler will frequently take in a high level language program as an input and then transform the code and annotate it with parallel code annotations (e.g.,
OpenMP OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
) or language constructs (e.g. Fortran's forall statements). Another purpose of source-to-source-compiling is translating legacy code to use the next version of the underlying programming language or an application programming interface (
API An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
) that breaks backward compatibility. It will perform automatic
code refactoring In computer programming and software design, code refactoring is the process of restructuring existing source code—changing the '' factoring''—without changing its external behavior. Refactoring is intended to improve the design, structure, ...
which is useful when the programs to refactor are outside the control of the original implementer (for example, converting programs from Python 2 to Python 3, or converting programs from an old API to the new API) or when the size of the program makes it impractical or time-consuming to refactor it by hand. Transcompilers may either keep translated code structure as close to the source code as possible to ease development and
debugging In engineering, debugging is the process of finding the Root cause analysis, root cause, workarounds, and possible fixes for bug (engineering), bugs. For software, debugging tactics can involve interactive debugging, control flow analysis, Logf ...
of the original source code or may change the structure of the original code so much that the translated code does not look like the source code. There are also debugging utilities that map the transcompiled source code back to the original code; for example, the
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
Source Map standard allows mapping of the JavaScript code executed by a
web browser A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
back to the original source when the JavaScript code was, for example, minified or produced by a transcompiled-to-JavaScript language. Examples include Closure Compiler,
CoffeeScript CoffeeScript is a programming language that compiles to JavaScript. It adds syntactic sugar inspired by Ruby, Python, and Haskell in an effort to enhance JavaScript's brevity and readability. Some added features include list comprehension an ...
, Dart,
Haxe Haxe is a high-level cross-platform programming language and compiler that can produce applications and source code for many different computing platforms from one code-base. It is free and open-source software, released under an MIT License. ...
,
Opal Opal is a hydrated amorphous form of silicon dioxide, silica (SiO2·''n''H2O); its water content may range from 3% to 21% by weight, but is usually between 6% and 10%. Due to the amorphous (chemical) physical structure, it is classified as a ...
,
TypeScript TypeScript (abbreviated as TS) is a high-level programming language that adds static typing with optional type annotations to JavaScript. It is designed for developing large applications and transpiles to JavaScript. It is developed by Micr ...
and
Emscripten Emscripten is an LLVM/Clang-based compiler that compiles C and C++ source code to WebAssembly, primarily for execution in web browsers. Emscripten allows applications and libraries written in C or C++ to be compiled ahead of time and run effi ...
.


Assembly language translators

So called ''Assembly language translators'' are a class of source-to-source translators converting code from one
assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...
into another, including (but not limited to) across different processor families and
system platform A computing platform, digital platform, or software platform is the infrastructure on which software is executed. While the individual components of a computing platform may be obfuscated under layers of abstraction, the ''summation of the requi ...
s.


Intel CONV86

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
marketed their 16-bit processor
8086 The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allo ...
to be
source compatible Source-code compatibility (source-compatible) means that a program can run on computers (or operating systems), independently of binary-code compatibility and that the source code is needed for portability. The source code must be compiled befo ...
to the
8080 The Intel 8080 is Intel's second 8-bit microprocessor. Introduced in April 1974, the 8080 was an enhanced successor to the earlier Intel 8008 microprocessor, although without binary compatibility.'' Electronic News'' was a weekly trade newspa ...
, an 8-bit processor. To support this, Intel had an ISIS-II-based translator from 8080 to 8086 source code named CONV86 (also referred to as CONV-86 and CONVERT 86) available to OEM customers since 1978, possibly the earliest program of this kind. It supported multiple levels of translation and ran at 2 MHz on an Intel Microprocessor Development System MDS-800 with 8-inch
floppy drive A floppy disk or floppy diskette (casually referred to as a floppy, a diskette, or a disk) is a type of disk storage composed of a thin and flexible disk of a magnetic storage medium in a square or nearly square plastic enclosure lined with a ...
s. According to user reports, it did not work very reliably.


SCP TRANS86

Seattle Computer Products Seattle Computer Products (SCP) was a Tukwila, Washington, microcomputer hardware company which was one of the first manufacturers of computer systems based on the 16-bit Intel 8086 processor. Founded in 1978, SCP began shipping its first S ...
(SCP) offered TRANS86.COM, written by
Tim Paterson Tim Paterson (born 1 June 1956) is an American computer programmer, best known for creating 86-DOS, an operating system for the Intel 8086. This system emulated the application programming interface (API) of CP/M, which was created by Gary Kilda ...
in 1980 while developing
86-DOS 86-DOS (known internally as QDOS, for Quick and Dirty Operating System) is a discontinued operating system developed and marketed by Seattle Computer Products (SCP) for its Intel 8086-based computer kit. 86-DOS shared a few of its commands wi ...
. The utility could translate Intel 8080 and
Zilog Zilog, Inc. is an American manufacturer of microprocessors, microcontrollers, and application-specific embedded System on a chip, system-on-chip (SoC) products. The company was founded in 1974 by Federico Faggin and Ralph Ungermann, who were soo ...
Z80 The Zilog Z80 is an 8-bit microprocessor designed by Zilog that played an important role in the evolution of early personal computing. Launched in 1976, it was designed to be software-compatible with the Intel 8080, offering a compelling altern ...
assembly source code (with Zilog/
Mostek Mostek Corporation was a semiconductor integrated circuit manufacturer, founded in 1969 by L. J. Sevin, Louay E. Sharif, Richard L. Petritz and other ex-employees of Texas Instruments. At its peak in the late 1970s, Mostek held an 85% market sh ...
mnemonic A mnemonic device ( ), memory trick or memory device is any learning technique that aids information retention or retrieval in the human memory, often by associating the information with something that is easier to remember. It makes use of e ...
s) into source code for the Intel 8086 (in a format only compatible with SCP's cross-assembler ASM86 for CP/M-80), but supported only a subset of
opcode In computing, an opcode (abbreviated from operation code) is an enumerated value that specifies the operation to be performed. Opcodes are employed in hardware devices such as arithmetic logic units (ALUs), central processing units (CPUs), and ...
s, registers and modes, and often still required significant manual correction and rework afterwards. Also, performing only a mere
transliteration Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus '' trans-'' + '' liter-'') in predictable ways, such as Greek → and → the digraph , Cyrillic → , Armenian → or L ...
, the brute-force single-pass translator did not carry out any register and jump optimizations. It took about 24 KB of RAM. The SCP version 1 of TRANS86.COM ran on Z80-based systems. Once 86-DOS was running, Paterson, in a self-hosting-inspired approach, utilized TRANS86 to convert itself into a program running under 86-DOS. Numbered version 2, this was named TRANS.COM instead. Later in 1982, the translator was apparently also available from
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
.


Sorcim TRANS86

Also named TRANS86, Sorcim offered an 8080 to 8086 translator as well since December 1980. Like SCP's program it was designed to port CP/M-80 application code (in ASM, MAC, RMAC or ACT80 assembly format) to
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few op ...
(in a format compatible with ACT86). In ACT80 format it also supported a few Z80 mnemonics. The translation occurred on an instruction-by-instruction basis with some optimization applied to conditional jumps. The program ran under CP/M-80, MP/M-80 and Cromemco DOS with a minimum of 24 KB of RAM, and had no restrictions on the source file size.


Digital Research XLT86

Much more sophisticated and the first to introduce
optimizing compiler An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of op ...
technologies into the source translation process was
Digital Research Digital Research, Inc. (DR or DRI) was a privately held American software company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser ...
's XLT86 1.0 in September 1981. XLT86 1.1 was available by April 1982. The program was written by
Gary Kildall Gary Arlen Kildall (; May 19, 1942 – July 11, 1994) was an American computer scientist and microcomputer entrepreneur. During the 1970s, Kildall created the CP/M operating system among other operating systems and programming tools, and s ...
and translated source code for the Intel 8080 processor (in a format compatible with ASM, MAC or RMAC assemblers) into source code for the 8086 (compatible with ASM86). Using global data flow analysis on 8080 register usage, the five-phase multi-pass translator would also optimize the output for code size and take care of calling conventions (CP/M-80 BDOS calls were mapped into BDOS calls for
CP/M-86 CP/M-86 is a discontinued version of the CP/M operating system that Digital Research (DR) made for the Intel 8086 and Intel 8088. The system commands are the same as in CP/M-80. Executable files used the relocatable .CMD file format. Digital Re ...
), so that CP/M-80 and MP/M-80 programs could be ported to the CP/M-86 and MP/M-86 platforms automatically. XLT86.COM itself was written in PL/I-80 for CP/M-80 platforms. The program occupied 30 KB of RAM for itself plus additional memory for the program graph. On a 64 KB memory system, the maximum source file size supported was about 6 KB, so that larger files had to be broken down accordingly before translation. Alternatively, XLT86 was also available for DEC VAX/VMS. Although XLT86's input and output worked on source-code level, the translator's in-memory representation of the program and the applied code optimizing technologies set the foundation to binary recompilation.


Others

2500 AD Software offered an 8080 to 8086 source-code translator as part of their XASM suite for CP/M-80 machines with Z80 as well as for Zilog ZEUS and Olivetti PCOS systems. Since 1979, Zilog offered a Z80 to
Z8000 The Zilog Z8000 is a 16-bit microprocessor architecture designed by Zilog and introduced in early 1979. Two chips were initially released, differing only in the width of the address bus; the Z8001 had a 23-bit bus while the Z8002 had a 16-bit b ...
translator as part of their PDS 8000 development system. Advanced Micro Computers (AMC) and 2500 AD Software offered Z80 to Z8000 translators as well. The latter was named TRANS and was available for Z80 CP/M, CP/M-86, MS-DOS and PCOS. The Z88DK development kit provides a Z80 to
i486 The Intel 486, officially named i486 and also known as 80486, is a microprocessor introduced in 1989. It is a higher-performance follow-up to the i386, Intel 386. It represents the fourth generation of binary compatible CPUs following the Inte ...
source code translator targeting nasm named "to86.awk", written in 2008 by Stefano Bodrato. It is in turn based on an 8080 to Z80 converter written in 2003 by Douglas Beattie, Jr., named "toz80.awk". In 2021, Brian Callahan wrote an 8080 CP/M 2.2 to MS-DOS source code translator targeting nasm named 8088ify.


Programming language implementations

The first implementations of some programming languages started as transcompilers, and the default implementation for some of those languages are still transcompilers. In addition to the table below, a
CoffeeScript CoffeeScript is a programming language that compiles to JavaScript. It adds syntactic sugar inspired by Ruby, Python, and Haskell in an effort to enhance JavaScript's brevity and readability. Some added features include list comprehension an ...
maintainer provides a list of languages that compile to JavaScript.


Porting a codebase

When developers want to switch to a different language while retaining most of an existing codebase, it might be better to use a transcompiler compared to rewriting the whole software by hand. Depending on the quality of the transcompiler, the code may or may not need manual intervention in order to work properly. This is different from "transcompiled languages" where the specifications demand that the output source code always works without modification. All transcompilers used to
port A port is a maritime facility comprising one or more wharves or loading areas, where ships load and discharge cargo and passengers. Although usually situated on a sea coast or estuary, ports can also be found far inland, such as Hamburg, Manch ...
a codebase will expect manual adjustment of the output source code if there is a need to achieve maximum code quality in terms of readability and platform convention.


Transcompiler pipelines

A transcompiler pipeline is what results from ''recursive transcompiling''. By stringing together multiple layers of tech, with a transcompile step between each layer, technology can be repeatedly transformed, effectively creating a distributed language independent specification.
XSLT XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats c ...
is a general-purpose transform tool that can be used between many different technologies, to create such a derivative code pipeline.World Wide Web Consortium (W3C). "XSL Transformations (XSLT) Version 2.0". https://www.w3.org/TR/xslt-20/


Recursive transcompiling

Recursive transcompilation (or recursive transpiling) is the process of applying the notion of transcompiling recursively, to create a pipeline of transformations (often starting from a
single source of truth In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas s ...
) which repeatedly turn one technology into another. By repeating this process, one can turn A → B → C → D → E → F and then back into . Some information will be preserved through this pipeline, from A → , and that information (at an abstract level) demonstrates what each of the components A–F agree on. In each of the different versions that the transcompiler pipeline produces, that information is preserved. It might take on many different shapes and sizes, but by the time it comes back to , having been transcompiled six times in the pipeline above, the information returns to its original state. This information which survives the transform through each format, from , is (by definition) derivative content or derivative code. Recursive transcompilation takes advantage of the fact that transcompilers may either keep translated code as close to the source code as possible to ease development and
debugging In engineering, debugging is the process of finding the Root cause analysis, root cause, workarounds, and possible fixes for bug (engineering), bugs. For software, debugging tactics can involve interactive debugging, control flow analysis, Logf ...
of the original source code, or else they may change the structure of the original code so much, that the translated code does not look like the source code. There are also debugging utilities that map the transcompiled source code back to the original code; for example,
JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...
source maps allow mapping of the JavaScript code executed by a
web browser A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
back to the original source in a transcompiled-to-JavaScript language.


See also

* * * * a source-to-source compiler framework using explicit pattern-directed rewrite rules * a source-to-source compiler from Fortran 77 to C * (running
IBM 1401 The IBM 1401 is a variable word length computer, variable-wordlength decimal computer that was announced by IBM on October 5, 1959. The first member of the highly successful IBM 1400 series, it was aimed at replacing unit record equipment for pr ...
programs on Honeywell H200) * * * * * * * * a source-to-source compiler framework * *


Notes


References


Further reading

* * *
1984-11-11 version 1.05
(NB. The
DOS DOS (, ) is a family of disk-based operating systems for IBM PC compatible computers. The DOS family primarily consists of IBM PC DOS and a rebranded version, Microsoft's MS-DOS, both of which were introduced in 1981. Later compatible syste ...
executable XLT86.COM 2 KBtranslates Intel 8080 assembly language source code to Intel 8086 assembly language source code. Despite its name this implementation in 8086 assembly is ''not'' related to Digital Research's earlier and much more sophisticated XLT86.) *

* and , also available as * (9 pages) (NB. This software translator was developed by ST and translates
Motorola Motorola, Inc. () was an American multinational telecommunications company based in Schaumburg, Illinois. It was founded by brothers Paul and Joseph Galvin in 1928 and had been named Motorola since 1947. Many of Motorola's products had been ...
6805/ HC05 assembly source code in 2500AD Software format into ST7 source code. The MIGR2ST7.EXE executable for
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
is available from "MCU ON CD".) *


External links

* * * {{cite web , title=Our Methodology – The Source to Source Conversion Process , publisher=Micro-Processor Services, Inc. (MPS) , url=http://www.mpsinc.com/process.html , access-date=2020-02-01 , url-status=live , archive-url=https://web.archive.org/web/20190512171423/http://www.mpsinc.com/process.html , archive-date=2019-05-12 Utility software types