In
computer programming
Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, self-hosting is the use of a
program as part of the
toolchain
A toolchain is a set of software development tools used to build and otherwise develop software. Often, the tools are executed sequentially and form a pipeline such that the output of one tool is the input for the next. Sometimes the term is us ...
or
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
that produces new versions of that same program—for example, a
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
that can compile its own
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
. Self-hosting
software
Software consists of computer programs that instruct the Execution (computing), execution of a computer. Software also includes design documents and specifications.
The history of software is closely tied to the development of digital comput ...
is commonplace on
personal computer
A personal computer, commonly referred to as PC or computer, is a computer designed for individual use. It is typically used for tasks such as Word processor, word processing, web browser, internet browsing, email, multimedia playback, and PC ...
s and larger systems. Other programs that are typically self-hosting include
kernels,
assemblers
Assembler may refer to:
Arts and media
* Nobukazu Takemura, avant-garde electronic musician, stage name Assembler
* Assemblers, a fictional race in the ''Star Wars'' universe
* Assemblers, an alternative name of the superhero group Champions of ...
,
command-line interpreter
A command-line interface (CLI) is a means of interacting with software via commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user-friendly alternativ ...
s and
revision control software.
Operating systems
An operating system is self-hosted when the toolchain to build the operating system runs on that same operating system. For example, Windows can be built on a computer running Windows.
Before a system can become self-hosted, another system is needed to develop it until it reaches a stage where self-hosting is possible. When developing for a new computer or operating system, a system to run the development software is needed, but development software used to write and build the operating system is also necessary. This is called a
bootstrapping
In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Many analytical techniques are often called bootstrap methods in reference to their self-starting or self-supporting ...
problem or, more generically, a
chicken or the egg dilemma.
A solution to this problem is the
cross compiler (or cross assembler when working with assembly language). A cross compiler allows
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
on one platform to be compiled for a different machine or operating system, making it possible to create an operating system for a machine for which a self-hosting compiler does not yet exist. Once written, software can be deployed to the target system using means such as an
EPROM,
floppy disk
A floppy disk or floppy diskette (casually referred to as a floppy, a diskette, or a disk) is a type of disk storage composed of a thin and flexible disk of a magnetic storage medium in a square or nearly square plastic enclosure lined with a ...
ette,
flash memory
Flash memory is an Integrated circuit, electronic Non-volatile memory, non-volatile computer memory storage medium that can be electrically erased and reprogrammed. The two main types of flash memory, NOR flash and NAND flash, are named for t ...
(such as a USB thumb drive), or
JTAG device. This is similar to the method used to write software for gaming consoles or for handheld devices like cellular phones or tablets, which do not host their own development tools.
Once the system is mature enough to compile its own code, the cross-development dependency ends. At this point, an operating system is said to be self-hosted.
Compilers
Software development using compiler or interpreters can also be self hosted when the compiler is capable of compiling itself.
Since self-hosted compilers suffer from the same bootstrap problems as operating systems, a compiler for a new programming language needs to be written in an existing language. So the developer may use something like assembly language,
C/
C++, or even a scripting language like
Python or
Lua to build the first version of the compiler. Once the language is mature enough, development of the compiler can shift to the compiler's native language, allowing the compiler to build itself.
History
The first self-hosting compiler (excluding assemblers) was written for
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
by Hart and Levin at MIT in 1962. They wrote a Lisp compiler in Lisp, testing it inside an existing Lisp
Interpreter. Once they had improved the compiler to the point where it could compile its own source code, it was self-hosting.
This technique is usually only practicable when an interpreter already exists for the very same language that is to be compiled; though possible, it is extremely uncommon to humanly compile a compiler with itself.
The concept borrows directly from and is an example of the broader notion of running a program on itself as input, used also in various proofs in
theoretical computer science
Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation.
It is difficult to circumscribe the theoretical areas precisely. The Associati ...
, such as the proof that the
halting problem
In computability theory (computer science), computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run for ...
is undecidable.
Examples
Ken Thompson
Kenneth Lane Thompson (born February 4, 1943) is an American pioneer of computer science. Thompson worked at Bell Labs for most of his career where he designed and implemented the original Unix operating system. He also invented the B (programmi ...
started development on
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
in 1968 by writing and compiling programs on the
GE-635
The GE-600 series is a family of 36-bit mainframe computers originating in the 1960s, built by General Electric (GE). When GE left the mainframe business, the line was sold to Honeywell, which built similar systems into the 1990s as the divisio ...
and carrying them over to the
PDP-7
The PDP-7 is an 18-bit computing, 18-bit minicomputer produced by Digital Equipment Corporation as part of the Programmed Data Processor, PDP series. Introduced in 1964, shipped since 1965, it was the first to use their Flip-Chip module, Flip- ...
for testing. After the initial Unix kernel, a
command interpreter, an editor, an assembler, and a few utilities were completed, the Unix operating system was self-hosting – programs could be written and tested on the PDP-7 itself.
[Dennis M. Ritchie]
"The Development of the C Language"
1993.
Douglas McIlroy
Malcolm Douglas McIlroy (born 1932) is an American mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College.
McIlroy is best known for having originally proposed Unix pipelines and de ...
wrote
TMG (a
compiler-compiler
In computer science, a compiler-compiler or compiler generator is a programming tool that creates a Parsing#Computer_languages, parser, interpreter (computer software), interpreter, or compiler from some form of formal description of a programm ...
) in TMG on a piece of paper and "decided to give his piece of paper to his piece of paper", doing the computation himself, thus compiling a TMG compiler into
assembly, which he typed up and assembled on Ken Thompson's PDP-7.
Development of the
GNU system relies largely on
GCC (the GNU Compiler Collection) and GNU
Emacs
Emacs (), originally named EMACS (an acronym for "Editor Macros"), is a family of text editors that are characterized by their extensibility. The manual for the most widely used variant, GNU Emacs, describes it as "the extensible, customizable, s ...
(a popular editor), making possible the self contained, maintained and sustained development of
free software
Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
for the
GNU Project
The GNU Project ( ) is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and Computer hardware, computing dev ...
.
Many
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s have self-hosted implementations: compilers that are both in and for the same language.
An approach is
bootstrapping
In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Many analytical techniques are often called bootstrap methods in reference to their self-starting or self-supporting ...
, where a core version of the language is initially implemented using another high-level language, assembler, or even
machine language
In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
; the resulting compiler is then used to start building successive expanded versions of itself.
List of languages having self-hosting compilers
The following programming languages have self-hosting compilers:
See also
*
Cross-compiler
A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on Android devices is a cross compil ...
*
Dogfooding
*
Futamura projection
*
Self-interpreter
*
Self-reference
Self-reference is a concept that involves referring to oneself or one's own attributes, characteristics, or actions. It can occur in language, logic, mathematics, philosophy, and other fields.
In natural or formal languages, self-reference ...
*
Indirect self-modification
References
{{Reflist
Computer programming
Self-hosting software