LLVM is a set of
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and
toolchain technologies that can be used to develop a
front end for any
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming l ...
and a
back end for any
instruction set architecture. LLVM is designed around a
language-independent intermediate representation (IR) that serves as a
portable, high-level
assembly language that can be
optimized with a variety of transformations over multiple passes.
LLVM is written in
C++ and is designed for
compile-time,
link-time,
run-time, and "idle-time" optimization. Originally implemented for
C and C++, the language-agnostic design of LLVM has since spawned a wide variety of front ends: languages with compilers that use LLVM (or which do not directly use LLVM but can generate compiled programs as LLVM IR) include
ActionScript
ActionScript is an object-oriented programming language originally developed by Macromedia Inc. (later acquired by Adobe). It is influenced by HyperTalk, the scripting language for HyperCard. It is now an implementation of ECMAScript (meani ...
,
Ada,
C#,
Common Lisp,
PicoLisp,
Crystal
A crystal or crystalline solid is a solid material whose constituents (such as atoms, molecules, or ions) are arranged in a highly ordered microscopic structure, forming a crystal lattice that extends in all directions. In addition, macr ...
,
CUDA,
D,
Delphi
Delphi (; ), in legend previously called Pytho (Πυθώ), in ancient times was a sacred precinct that served as the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient classical world. The oracl ...
,
Dylan,
Forth
Forth or FORTH may refer to:
Arts and entertainment
* ''forth'' magazine, an Internet magazine
* ''Forth'' (album), by The Verve, 2008
* ''Forth'', a 2011 album by Proto-Kaw
* Radio Forth, a group of independent local radio stations in Scotla ...
,
Fortran,
Free Basic,
Free Pascal,
Graphical G,
Halide,
Haskell,
Java bytecode,
Julia,
Kotlin,
Lua
Lua or LUA may refer to:
Science and technology
* Lua (programming language)
* Latvia University of Agriculture
* Last universal ancestor, in evolution
Ethnicity and language
* Lua people, of Laos
* Lawa people, of Thailand sometimes referred t ...
,
Objective-C
Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its N ...
,
OpenCL,
PostgreSQL's SQL and PLpgSQL,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
,
Rust,
Scala,
Swift, XC,
Xojo and
Zig.
History
The LLVM project started in 2000 at the
University of Illinois at Urbana–Champaign, under the direction of
Vikram Adve and
Chris Lattner. LLVM was originally developed as a research infrastructure to investigate
dynamic compilation techniques for static and
dynamic programming languages. LLVM was released under the
University of Illinois/NCSA Open Source License,
a
permissive free software licence. In 2005,
Apple Inc. hired Lattner and formed a team to work on the LLVM system for various uses within Apple's development systems.
LLVM has been an integral part of Apple's
Xcode development tools for
macOS
macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
and
iOS since Xcode 4.
In 2006, Lattner started working on a new project called
Clang. The combination of Clang front-end and LLVM back-end is called Clang/LLVM or simply Clang.
The name ''LLVM'' was originally an
initialism
An acronym is a word or name formed from the initial components of a longer name or phrase. Acronyms are usually formed from the initial letters of words, as in ''NATO'' (''North Atlantic Treaty Organization''), but sometimes use syllables, as ...
for ''Low Level Virtual Machine''. However, the LLVM project evolved into an umbrella project that has little relationship to what most current developers think of as a
virtual machine
In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized har ...
. This made the initialism "confusing" and "inappropriate", and since 2011 LLVM is "officially no longer an acronym", but a brand that applies to the LLVM umbrella project. The project encompasses the LLVM
intermediate representation (IR), the LLVM
debugger, the LLVM implementation of the
C++ Standard Library (with full support of
C++11
C++11 is a version of the ISO/ IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versio ...
and
C++14), etc. LLVM is administered by the LLVM Foundation. Compiler engineer Tanya Lattner became its president in 2014, and was in post .
''"For designing and implementing LLVM"'', the
Association for Computing Machinery
The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 and is the world's largest scientific and educational computing society. The ACM is a non-profit professional member ...
presented Vikram Adve, Chris Lattner, and
Evan Cheng with the 2012
ACM Software System Award.
The project was originally available under the
UIUC license
The University of Illinois/NCSA Open Source License, or UIUC license, is a permissive free software license, based on the MIT/X11 license and the 3-clause BSD license
BSD licenses are a family of permissive free software licenses, imposing ...
. After v9.0.0 released in 2019, LLVM relicensed to the
Apache License 2.0 with LLVM Exceptions.
about 400 contributions had not been relicensed.
Features
LLVM can provide the middle layers of a complete compiler system, taking
intermediate representation (IR) code from a
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
and emitting an optimized IR. This new IR can then be converted and linked into machine-dependent
assembly language code for a target platform. LLVM can accept the IR from the
GNU Compiler Collection
The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free sof ...
(GCC)
toolchain, allowing it to be used with a wide array of existing compiler front-ends written for that project.
LLVM can also generate
relocatable machine code at compile-time or link-time or even binary machine code at run-time.
LLVM supports a language-independent
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called a ...
and
type system.
Each instruction is in
static single assignment form (SSA), meaning that each
variable (called a typed register) is assigned once and then frozen. This helps simplify the analysis of dependencies among variables. LLVM allows code to be compiled statically, as it is under the traditional GCC system, or left for late-compiling from the IR to machine code via
just-in-time compilation
In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compiler, compilation during execution of a program (at run time (program lifecycle phase), run tim ...
(JIT), similar to
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
. The type system consists of basic types such as
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
or
floating-point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be ...
numbers and five
derived types:
pointers,
arrays,
vectors,
structures, and
functions. A type construct in a concrete language can be represented by combining these basic types in LLVM. For example, a class in C++ can be represented by a mix of structures, functions and arrays of
function pointers.
The LLVM JIT compiler can optimize unneeded static branches out of a program at runtime, and thus is useful for
partial evaluation in cases where a program has many options, most of which can easily be determined unneeded in a specific environment. This feature is used in the
OpenGL
OpenGL (Open Graphics Library) is a cross-language, cross-platform application programming interface (API) for rendering 2D and 3D vector graphics. The API is typically used to interact with a graphics processing unit (GPU), to achieve ha ...
pipeline of
Mac OS X Leopard (v10.5) to provide support for missing hardware features.
Graphics code within the OpenGL stack can be left in intermediate representation and then compiled when run on the target machine. On systems with high-end
graphics processing units (GPUs), the resulting code remains quite thin, passing the instructions on to the GPU with minimal changes. On systems with low-end GPUs, LLVM will compile optional procedures that run on the local
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU) that emulate instructions that the GPU cannot run internally. LLVM improved performance on low-end machines using
Intel GMA chipsets. A similar system was developed under the
Gallium3D LLVMpipe, and incorporated into the
GNOME shell to allow it to run without a proper 3D hardware driver loaded.
For run-time performance of the compiled programs, GCC formerly outperformed LLVM by 10% on average in 2011. Newer results in 2013 indicate that LLVM has now caught up with GCC in this area, and is now compiling binaries of approximately equal performance.
Components
LLVM has become an umbrella project containing multiple components.
Front ends
LLVM was originally written to be a replacement for the existing
code generator in the GCC stack, and many of the GCC front ends have been modified to work with it, resulting in the now-defunct LLVM-GCC suite. The modifications generally involve a
GIMPLE-to-LLVM IR step so that LLVM optimizers and codegen can be used instead of GCC's GIMPLE system. Apple was a significant user of LLVM-GCC through
Xcode 4.x (2013). This use of the GCC frontend was considered mostly a temporary measure, but with the advent of
Clang and advantages of LLVM and Clang's modern and modular codebase (as well as compilation speed), is mostly obsolete.
LLVM currently supports compiling of
Ada,
C,
C++,
D,
Delphi
Delphi (; ), in legend previously called Pytho (Πυθώ), in ancient times was a sacred precinct that served as the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient classical world. The oracl ...
,
Fortran,
Haskell,
Julia,
Objective-C
Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its N ...
,
Rust, and
Swift using various
front ends.
Widespread interest in LLVM has led to several efforts to develop new front ends for a variety of languages. The one that has received the most attention is Clang, a new compiler supporting C, C++, and Objective-C. Primarily supported by Apple, Clang is aimed at replacing the C/Objective-C compiler in the GCC system with a system that is more easily integrated with
integrated development environment
An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. An IDE normally consists of at least a source code editor, build automation tools a ...
s (IDEs) and has wider support for
multithreading. Support for
OpenMP directives has been included in
Clang since release 3.8.
The Utrecht
Haskell compiler can generate code for LLVM. Though the generator is in the early stages of development, in many cases it has been more efficient than the C code generator. There is a
Glasgow Haskell Compiler (GHC) backend using LLVM that achieves a 30% speed-up of the compiled code relative to native code compiling via GHC or C code generation followed by compiling, missing only one of the many optimizing techniques implemented by the GHC.
Many other components are in various stages of development, including, but not limited to, the
Rust compiler, a
Java bytecode front end, a
Common Intermediate Language (CIL) front end, the
MacRuby implementation of Ruby 1.9, various front ends for
Standard ML, and a new
graph coloring register allocator.
Intermediate representation
data:image/s3,"s3://crabby-images/9091d/9091d7ae6b0271099584a48f6296f094518f5719" alt="Mesa layers of crap 2016"
The core of LLVM is the
intermediate representation (IR), a low-level programming language similar to assembly. IR is a strongly typed
reduced instruction set computing (RISC) instruction set which abstracts away most details of the target. For example, the calling convention is abstracted through ''call'' and ''ret'' instructions with explicit arguments. Also, instead of a fixed set of registers, IR uses an infinite set of temporaries of the form %0, %1, etc. LLVM supports three equivalent forms of IR: a human-readable assembly format, an in-memory format suitable for frontends, and a dense bitcode format for serializing. A simple
"Hello, world!" program
A "Hello, World!" program is generally a computer program that ignores any input and outputs or displays a message similar to "Hello, World!". A small piece of code in most general-purpose programming languages, this program is used to illustr ...
in the IR format:
[For the full documentation, refer to .]
@.str = internal constant 4 x i8
4 (four) is a number, numeral and digit. It is the natural number following 3 and preceding 5. It is the smallest semiprime and composite number, and is considered unlucky in many East Asian cultures.
In mathematics
Four is the smallest c ...
c"hello, world\0A\00"
declare i32 @printf(ptr, ...)
define i32 @main(i32 %argc, ptr %argv) nounwind
The many different conventions used and features provided by different targets mean that LLVM cannot truly produce a target-independent IR and retarget it without breaking some established rules. Examples of target dependence beyond what is explicitly mentioned in the documentation can be found in a 2011 proposal for "wordcode", a fully target-independent variant of LLVM IR intended for online distribution. A more practical example is
PNaCl.
The LLVM project also introduces another type of intermediate representation called
MLIR which helps build reusable and extensible compiler infrastructure by employing a plugin architecture named Dialect. It enables the use of higher-level information on the program structure in the process of optimization including polyhedral compilation.
Back ends
At version 13, LLVM supports many
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called a ...
s, including
IA-32,
x86-64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging ...
,
ARM,
Qualcomm Hexagon,
MIPS,
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
Parallel Thread Execution
Parallel Thread Execution (PTX or NVPTX) is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's CUDA programming environment. The NVCC compiler translates code written in CUDA, a C++-like l ...
(PTX; called ''NVPTX'' in LLVM documentation),
PowerPC,
AMD TeraScale, most
AMD GPU recent ones (called ''AMDGPU'' in LLVM documentation),
SPARC,
z/Architecture (called ''SystemZ'' in LLVM documentation), and
XCore.
Some features are not available on some platforms. Most features are present for IA-32, x86-64, z/Architecture, ARM, and PowerPC.
RISC-V is supported as of version 7.
In the past, LLVM also supported other backends, fully or partially, including C backend,
Cell SPU,
mblaze (MicroBlaze), AMD R600, DEC/Compaq
Alpha (
Alpha AXP) and
Nios2,
but that hardware is mostly obsolete, and LLVM developers decided the support and maintenance costs were no longer justified.
LLVM also supports
WebAssembly as a target, enabling compiled programs to execute in WebAssembly-enabled environments such as
Google Chrome /
Chromium
Chromium is a chemical element with the symbol Cr and atomic number 24. It is the first element in group 6. It is a steely-grey, lustrous, hard, and brittle transition metal.
Chromium metal is valued for its high corrosion resistance and h ...
,
Firefox,
Microsoft Edge,
Apple Safari or
WAVM. LLVM-compliant WebAssembly compilers typically support mostly unmodified source code written in C, C++, D, Rust, Nim, Kotlin and several other languages.
The LLVM machine code (MC) subproject is LLVM's framework for translating machine instructions between textual forms and machine code. Formerly, LLVM relied on the system assembler, or one provided by a toolchain, to translate assembly into machine code. LLVM MC's integrated assembler supports most LLVM targets, including IA-32, x86-64, ARM, and ARM64. For some targets, including the various MIPS instruction sets, integrated assembly support is usable but still in the beta stage.
Linker
The lld subproject is an attempt to develop a built-in, platform-independent
linker for LLVM.
lld aims to remove dependence on a third-party linker. , lld supports
ELF,
PE/COFF,
Mach-O, and
WebAssembly in descending order of completeness. lld is faster than both flavors of
GNU ld.
Unlike the GNU linkers, lld has built-in support for
link-time optimization (LTO). This allows for faster code generation as it bypasses the use of a linker plugin, but on the other hand prohibits interoperability with other flavors of LTO.
C++ Standard Library
The LLVM project includes an implementation of the
C++ Standard Library called libc++, dual-licensed under the
MIT License and the
UIUC license
The University of Illinois/NCSA Open Source License, or UIUC license, is a permissive free software license, based on the MIT/X11 license and the 3-clause BSD license
BSD licenses are a family of permissive free software licenses, imposing ...
.
Since v9.0.0, it was relicensed to the
Apache License 2.0 with LLVM Exceptions.
Polly
This implements a suite of cache-locality optimizations as well as auto-parallelism and vectorization using a polyhedral model.
Debugger
C Standard Library
llvm-libc is an incomplete, upcoming, ABI independent
C standard library designed by and for the LLVM project.
Derivatives
Due to its permissive license, many vendors release their own tuned forks of LLVM. This is officially recognized by LLVM's documentation, which suggests against using version numbers in feature checks for this reason. Some of the vendors include:
* AMD's
AMD Optimizing C/C++ Compiler is based on LLVM, Clang, and Flang.
* Apple maintains an open-source fork for
Xcode.
*
ARM maintains a fork of LLVM 9 as the "Arm Compiler".
*
Flang, Fortran project in development
*
IBM is adopting LLVM in its
C/C++ and
Fortran compilers.
*
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the devel ...
has adopted LLVM for their next generation
Intel C++ Compiler.
* The
Los Alamos National Laboratory
Los Alamos National Laboratory (often shortened as Los Alamos and LANL) is one of the sixteen research and development laboratories of the United States Department of Energy (DOE), located a short distance northwest of Santa Fe, New Mexico, i ...
has a parallel-computing fork of LLVM 8 called "Kitsune".
*
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
uses LLVM in the implementation of its NVVM
CUDA Compiler. The NVVM compiler is distinct from the "NVPTX" backend mentioned in the
Backends section, although both generate PTX code for Nvidia GPUs.
* Since 2013, Sony has been using LLVM's primary front-end Clang compiler in the
software development kit (SDK) of its
PlayStation 4
The PlayStation 4 (PS4) is a home video game console developed by Sony Interactive Entertainment. Announced as the successor to the PlayStation 3 in February 2013, it was launched on November 15, 2013, in North America, November 29, 2013 in ...
console.
See also
*
Common Intermediate Language
*
HHVM
*
C--
*
Amsterdam Compiler Kit (ACK)
*
LLDB (debugger)
*
GNU lightning
*
GNU Compiler Collection
The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free sof ...
(GCC)
*
Pure
*
OpenCL
*
ROCm
*
Emscripten
*
TenDRA Distribution Format
*
Architecture Neutral Distribution Format (ANDF)
*
Comparison of application virtualization software
Application virtualization software refers to both application virtual machines and software responsible for implementing them. Application virtual machines are typically used to allow application bytecode to run portably on many different comput ...
*
SPIR-V
*
University of Illinois at Urbana Champaign discoveries & innovations
Literature
*
Chris Lattner -
The Architecture of Open Source Applications - Chapter 11 LLVM', , released 2012 under
CC BY
A Creative Commons (CC) license is one of several public copyright licenses that enable the free distribution of an otherwise copyrighted "work".A "work" is any creative material made by a person. A painting, a graphic, a book, a song/lyrics ...
3.0 (
Open Access
Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre o ...
).
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation a published paper by Chris Lattner, Vikram Adve
References
External links
*
{{Use mdy dates, date=October 2018
Compilers
Free compilers and interpreters
Register-based virtual machines
Software using the NCSA license
Software using the Apache license