HOME

TheInfoList



OR:

The GNU Compiler Collection (GCC) is an optimizing compiler produced by the
GNU Project The GNU Project () is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collaborati ...
supporting various
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, hardware architectures and
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also ...
s. The
Free Software Foundation The Free Software Foundation (FSF) is a 501(c)(3) non-profit organization founded by Richard Stallman on October 4, 1985, to support the free software movement, with the organization's preference for software being distributed under copyleft ( ...
(FSF) distributes GCC as
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general ...
(GNU GPL). GCC is a key component of the
GNU toolchain The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software applications and operating systems. The GNU toolchain pl ...
and the standard compiler for most projects related to GNU and the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ...
. With roughly 15 million lines of code in 2019, GCC is one of the biggest free programs in existence. It has played an important role in the growth of
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
, as both a tool and an example. When it was first released in 1987 by
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
, GCC 1.0 was named the GNU C Compiler since it only handled the
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well a ...
. It was extended to compile C++ in December of that year. Front ends were later developed for
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
,
Objective-C++ Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
, Fortran, Ada, D and Go, among others. The
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating sy ...
and
OpenACC OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems. As in OpenMP, the programme ...
specifications are also supported in the C and C++ compilers. GCC has been ported to more platforms and
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s than any other compiler, and is widely deployed as a tool in the development of both free and
proprietary software Proprietary software is software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern copyright and ...
. GCC is also available for many
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
s, including ARM-based and
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA ...
-based chips. As well as being the official compiler of the
GNU operating system GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
, GCC has been adopted as the standard compiler by many other modern Unix-like computer
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also ...
s, including most
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whic ...
distributions. Most BSD family operating systems also switched to GCC shortly after its release, although since then,
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular o ...
,
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project empha ...
and Apple macOS have moved to the Clang compiler, largely due to licensing reasons. GCC can also compile code for
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...
, Android, iOS, Solaris,
HP-UX HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrity ...
,
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set * Athens Internet Exchange, a European Internet exchange point Places Belgiu ...
and
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicatio ...
.


History

In late 1983, in an effort to bootstrap the GNU operating system,
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
asked Andrew S. Tanenbaum, the author of the Amsterdam Compiler Kit (also known as the '' Free University'' ''Compiler Kit'') for permission to use that software for GNU. When Tanenbaum advised him that the compiler was not free, and that only the university was free, Stallman decided to work on a different compiler. His initial plan was to rewrite an existing compiler from
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in respons ...
from
Pastel A pastel () is an art medium in a variety of forms including a stick, a square a pebble or a pan of color; though other forms are possible; they consist of powdered pigment and a binder. The pigments used in pastels are similar to those us ...
to C with some help from Len Tower and others. Stallman wrote a new C front end for the Livermore compiler, but then realized that it required megabytes of stack space, an impossibility on a
68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Secto ...
Unix system with only 64 KB, and concluded he would have to write a new compiler from scratch. None of the Pastel compiler code ended up in GCC, though Stallman did use the C front end he had written. GCC was first released March 22, 1987, available by
FTP The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data ...
from MIT. Stallman was listed as the author but cited others for their contributions, including Tower for "parts of the parser, RTL generator, RTL definitions, and of the Vax machine description", Jack Davidson and Christopher W. Fraser for the idea of using RTL as an intermediate language, and Paul Rubin for writing most of the preprocessor. Described as the "first free software hit" by Peter H. Salus, the GNU compiler arrived just at the time when
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, t ...
was unbundling its development tools from its operating system, selling them separately at a higher combined price than the previous bundle, which led many of Sun's users to buy or download GCC instead of the vendor's tools. While Stallman considered
GNU Emacs GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project o ...
as his main project, by 1990, GCC supported thirteen computer architectures, was outperforming several vendor compilers, and was used commercially by several companies.


EGCS fork

As GCC was licensed under the GPL, programmers wanting to work in other directions—particularly those writing interfaces for languages other than C—were free to develop their own
fork In cutlery or kitchenware, a fork (from la, furca 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods ei ...
of the compiler, provided they meet the GPL's terms, including its requirements to distribute
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
. Multiple forks proved inefficient and unwieldy, however, and the difficulty in getting work accepted by the official GCC project was greatly frustrating for many, as the project favored stability over new features. The FSF kept such close control on what was added to the official version of GCC 2.x (developed since 1992) that GCC was used as one example of the "cathedral" development model in
Eric S. Raymond Eric Steven Raymond (born December 4, 1957), often referred to as ESR, is an American software developer, open-source software advocate, and author of the 1997 essay and 1999 book ''The Cathedral and the Bazaar''. He wrote a guidebook for the ...
's essay ''
The Cathedral and the Bazaar ''The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary'' (abbreviated ''CatB'') is an essay, and later a book, by Eric S. Raymond on software engineering methods, based on his observations of the Linux k ...
''. In 1997, a group of developers formed the ''Experimental/Enhanced GNU Compiler System (EGCS)'' to merge several experimental forks into a single project. The basis of the merger was a development snapshot of GCC (taken around the 2.7.2 and later followed up to 2.8.1 release). Mergers included g77 (Fortran), PGCC ( P5 Pentium-optimized GCC), many C++ improvements, and many new architectures and
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also ...
variants. While both projects followed each other's changes closely, EGCS development proved considerably more vigorous, so much so that the FSF officially halted development on their GCC 2.x compiler, blessed EGCS as the official version of GCC, and appointed the EGCS project as the GCC maintainers in April 1999. With the release of GCC 2.95 in July 1999 the two projects were once again united. GCC has since been maintained by a varied group of programmers from around the world under the direction of a steering committee. GCC 3 (2002) removed a front-end for
CHILL In computing, CHILL (an acronym for CCITT High Level Language) is a procedural programming language designed for use in telecommunication switches (the hardware used inside telephone exchanges). The language is still used for legacy systems ...
due to a lack of maintenance. Before version 4.0 the Fortran front end was g77, which only supported FORTRAN 77, but later was dropped in favor of the new GNU Fortran front end that supports Fortran 95 and large parts of Fortran 2003 and Fortran 2008 as well. As of version 4.8, GCC is implemented in C++. Support for
Cilk Plus Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loop ...
existed from GCC 5 to GCC 7. GCC has been ported to a wide variety of
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s, and is widely deployed as a tool in the development of both free and
proprietary software Proprietary software is software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern copyright and ...
. GCC is also available for many
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
s, including
Symbian Symbian is a discontinued mobile operating system (OS) and computing platform designed for smartphones. It was originally developed as a proprietary software OS for personal digital assistants in 1998 by the Symbian Ltd. consortium. Symbian ...
(called ''gcce''), ARM-based, and
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA ...
-based chips. The compiler can target a wide variety of platforms, including
video game console A video game console is an electronic device that outputs a video signal or image to display a video game that can be played with a game controller. These may be home consoles, which are generally placed in a permanent location connected to a ...
s such as the
PlayStation 2 The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October 2000, in Europe on 24 November 2000, and in Australia o ...
, Cell SPE of PlayStation 3, and
Dreamcast The is a home video game console released by Sega on November 27, 1998, in Japan; September 9, 1999, in North America; and October 14, 1999, in Europe. It was the first sixth-generation video game console, preceding Sony's PlayStation 2, Nin ...
. It has been ported to more kinds of
processors A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
and operating systems than any other compiler.


Supported languages

, the recent 11.1 release of GCC includes front ends for C (gcc), C++ (g++),
Objective-C Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
, Fortran ( gfortran), Ada (
GNAT A gnat () is any of many species of tiny flying insects in the dipterid suborder Nematocera, especially those in the families Mycetophilidae, Anisopodidae and Sciaridae. They can be both biting and non-biting. Most often they fly in large ...
), Go (gccgo) and D (gdc, since 9.1) programming languages, with the
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating sy ...
and
OpenACC OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/GPU systems. As in OpenMP, the programme ...
parallel language extensions being supported since GCC 5.1. Versions prior to GCC 7 also supported
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
( gcj), allowing compilation of Java to native machine code.
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It ...
support, previously offered by third parties, will be merged into GCC 13. Regarding language version support for C++ and C, since GCC 11.1 the default target is ''gnu++17'', a superset of
C++17 C++17 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++17 replaced the prior version of the C++ standard, called C++14, and was later replaced by C++20. History Before the C++ Standards Committee fixed a 3-year ...
, and ''gnu11'', a superset of C11, with strict standard support also available. GCC also provides experimental support for
C++20 C++20 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++20 replaced the prior version of the C++ standard, called C++17. The standard was technically finalized by WG21 at the meeting in Prague in February 2020, ...
and upcoming C++23. Third-party front ends exist for many languages, such as Pascal ( gpc), Modula-3, and
VHDL The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gat ...
(GHDL). A few experimental branches exist to support additional languages, such as the GCC UPC compiler for Unified Parallel C or
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
.


Design

GCC's external interface follows
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
conventions. Users invoke a language-specific driver program (gcc for C, g++ for C++, etc.), which interprets command arguments, calls the actual compiler, runs the
assembler Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of ...
on the output, and then optionally runs the
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
to produce a complete executable binary. Each of the language compilers is a separate program that reads source code and outputs machine code. All have a common internal structure. A per-language front end parses the source code in that language and produces an
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
("tree" for short). These are, if necessary, converted to the middle end's input representation, called ''GENERIC'' form; the middle end then gradually transforms the program towards its final form.
Compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
s and
static code analysis In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution. The term ...
techniques (such as FORTIFY_SOURCE, a compiler directive that attempts to discover some buffer overflows) are applied to the code. These work on multiple representations, mostly the architecture-independent GIMPLE representation and the architecture-dependent RTL representation. Finally, machine code is produced using architecture-specific
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
originally based on an algorithm of Jack Davidson and Chris Fraser. GCC was written primarily in C except for parts of the Ada front end. The distribution includes the standard libraries for Ada and C++ whose code is mostly written in those languages. On some platforms, the distribution also includes a low-level runtime library, libgcc, written in a combination of machine-independent C and processor-specific machine code, designed primarily to handle arithmetic operations that the target processor cannot perform directly. GCC uses many additional tools in its build, many of which are installed by default by many Unix and Linux distributions (but which, normally, aren't present in Windows installations), including
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offi ...
, Flex,
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'' ...
, and other common tools. In addition, it currently requires three additional libraries to be present in order to build: GMP, MPC, and
MPFR The GNU Multiple Precision Floating-Point Reliable Library (GNU MPFR) is a GNU portable C library for arbitrary-precision binary floating-point computation with correct rounding, based on GNU Multi-Precision Library. Library MPFR's computatio ...
. In May 2010, the GCC steering committee decided to allow use of a C++ compiler to compile GCC. The compiler was intended to be written mostly in C plus a subset of features from C++. In particular, this was decided so that GCC's developers could use the destructors and generics features of C++. In August 2012, the GCC steering committee announced that GCC now uses C++ as its implementation language. This means that to build GCC from sources, a C++ compiler is required that understands ISO/IEC C++03 standard. On May 18, 2020, GCC moved away from ISO/IEC C++03 standard to ISO/IEC C++11 standard (i.e. needed to compile, bootstrap, the compiler itself; by default it however compiles later versions of C++).


Front ends

Each front end uses a parser to produce the
abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
of a given
source file In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
. Due to the syntax tree abstraction, source files of any of the different supported languages can be processed by the same back end. GCC started out using LALR parsers generated with
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'' ...
, but gradually switched to hand-written recursive-descent parsers for C++ in 2004, and for C and Objective-C in 2006. As of 2021 all front ends use hand-written recursive-descent parsers. Until GCC 4.0 the tree representation of the program was not fully independent of the processor being targeted. The meaning of a tree was somewhat different for different language front ends, and front ends could provide their own tree codes. This was simplified with the introduction of GENERIC and GIMPLE, two new forms of language-independent trees that were introduced with the advent of GCC 4.0. GENERIC is more complex, based on the GCC 3.x Java front end's intermediate representation. GIMPLE is a simplified GENERIC, in which various constructs are '' lowered'' to multiple GIMPLE instructions. The C, C++, and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
front ends produce GENERIC directly in the front end. Other front ends instead have different intermediate representations after parsing and convert these to GENERIC. In either case, the so-called "gimplifier" then converts this more complex form into the simpler SSA-based GIMPLE form that is the common language for a large number of powerful language- and architecture-independent global (function scope) optimizations.


GENERIC and GIMPLE

''GENERIC'' is an
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
language used as a "middle end" while compiling source code into executable binaries. A subset, called ''GIMPLE'', is targeted by all the front ends of GCC. The middle stage of GCC does all of the code analysis and
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
, working independently of both the compiled language and the target architecture, starting from the GENERIC representation and expanding it to
register transfer language In computer science, register transfer language (RTL) is a kind of intermediate representation (IR) that is very close to assembly language, such as that which is used in a compiler. It is used to describe data flow at the register-transfer le ...
(RTL). The GENERIC representation contains only the subset of the imperative programming constructs optimized by the middle end. In transforming the source code to GIMPLE, complex expressions are split into a
three-address code In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most three operands and is ty ...
using
temporary variable In computer programming, a temporary variable is a variable with short lifetime, usually to hold data that will soon be discarded, or before it can be placed at a more permanent memory location. Because it is short-lived, it is usually declared a ...
s. This representation was inspired by the SIMPLE representation proposed in the McCAT compiler by Laurie J. Hendren for simplifying the analysis and
optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...
of imperative programs.


Optimization

Optimization can occur during any phase of compilation; however, the bulk of optimizations are performed after the syntax and semantic analysis of the front end and before the code generation of the back end; thus a common, even though somewhat contradictory, name for this part of the compiler is the "middle end." The exact set of GCC optimizations varies from release to release as it develops, but includes the standard algorithms, such as
loop optimization In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capab ...
,
jump threading In computing, jump threading is a compiler optimization of one jump directly to a second jump. If the second condition is a subset or inverse of the first, it can be eliminated, or threaded through the first jump. This is easily done in a single ...
,
common subexpression elimination In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a si ...
,
instruction scheduling In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changi ...
, and so forth. The RTL optimizations are of less importance with the addition of global SSA-based optimizations on
GIMPLE The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
trees, as RTL optimizations have a much more limited scope, and have less high-level information. Some of these optimizations performed at this level include dead-code elimination,
partial-redundancy elimination In compiler theory, partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. PRE is a form of common subexpression elimination. An e ...
, global value numbering, sparse conditional constant propagation, and scalar replacement of aggregates. Array dependence based optimizations such as automatic vectorization and automatic parallelization are also performed.
Profile-guided optimization Profile-guided optimization (PGO, sometimes pronounced as ''pogo''), also known as profile-directed feedback (PDF), and feedback-directed optimization (FDO) is a compiler optimization technique in computer programming that uses profiling to imp ...
is also possible.


Back end

The GCC's back end is partly specified by preprocessor macros and functions specific to a target architecture, for instance to define its endianness, word size, and calling conventions. The front part of the back end uses these to help decide RTL generation, so although GCC's RTL is nominally processor-independent, the initial sequence of abstract instructions is already adapted to the target. At any moment, the actual RTL instructions forming the program representation have to comply with the machine description of the target architecture. The machine description file contains RTL patterns, along with operand constraints, and code snippets to output the final assembly. The constraints indicate that a particular RTL pattern might only apply (for example) to certain hardware registers, or (for example) allow immediate operand offsets of only a limited size (e.g. 12, 16, 24, ... bit offsets, etc.). During RTL generation, the constraints for the given target architecture are checked. In order to issue a given snippet of RTL, it must match one (or more) of the RTL patterns in the machine description file, and satisfy the constraints for that pattern; otherwise, it would be impossible to convert the final RTL into machine code. Towards the end of compilation, valid RTL is reduced to a ''strict'' form in which each instruction refers to real machine registers and a pattern from the target's machine description file. Forming strict RTL is a complicated task; an important step is register allocation, where real hardware registers are chosen to replace the initially assigned pseudo-registers. This is followed by a "reloading" phase; any pseudo-registers that were not assigned a real hardware register are 'spilled' to the stack, and RTL to perform this spilling is generated. Likewise, offsets that are too large to fit into an actual instruction must be broken up and replaced by RTL sequences that will obey the offset constraints. In the final phase, the machine code is built by calling a small snippet of code, associated with each pattern, to generate the real instructions from the target's instruction set, using the final registers, offsets, and addresses chosen during the reload phase. The assembly-generation snippet may be just a string, in which case a simple string substitution of the registers, offsets, and/or addresses into the string is performed. The assembly-generation snippet may also be a short block of C code, performing some additional work, but ultimately returning a string containing the valid assembly code.


C++ Standard Library (libstdc++)

The GCC project includes an implementation of the C++ Standard Library called libstdc++, licensed under the GPLv3 License with an exception to link closed source application when sources are built with GCC. The current version is 11.


Other features

Some features of GCC include: ; Link-time optimization : Link-time optimization optimizes across object file boundaries to directly improve the linked binary. Link-time optimization relies on an intermediate file containing the serialization of some ''Gimple'' representation included in the object file. The file is generated alongside the object file during source compilation. Each source compilation generates a separate object file and link-time helper file. When the object files are linked, the compiler is executed again and uses the helper files to optimize code across the separately compiled object files. ; Plugins : Plug-in (computing), Plugins extend the GCC compiler directly. Plugins allow a stock compiler to be tailored to specific needs by external code loaded as plugins. For example, plugins can add, replace, or even remove middle-end passes operating on ''Gimple'' representations. Several GCC plugins have already been published, notably: :* The Python plugin, which links against libpython, and allows one to invoke arbitrary Python scripts from inside the compiler. The aim is to allow GCC plugins to be written in Python. :* The MELT plugin provides a high-level Lisp (programming language), Lisp-like language to extend GCC. : The support of plugins was once a contentious issue in 2007. ; C++ Software transactional memory, transactional memory : The C++ language has an active proposal for transactional memory. It can be enabled in GCC 6 and newer when compiling with -fgnu-tm. ; Unicode identifiers : Although the C++ language requires support for non-ASCII Unicode characters in Identifier (computer languages), identifiers, the feature has only been supported since GCC 10. As with the existing handling of string literals, the source file is assumed to be encoded in UTF-8. The feature is optional in C, but has been made available too since this change. ; C extensions : GNU C extends the C programming language with several non-standard-features, including nested functions and typeof, typeof expressions.


Architectures

GCC target processor families as of version 11.1 include: * AArch64 * DEC Alpha, Alpha * ARM * Atmel AVR, AVR * Blackfin * eBPF * Adapteva#Products, Epiphany (GCC 4.8) * Hitachi H8, H8/300 * HC12 * IA-32 (x86) * IA-64 (Intel Itanium) * MIPS architecture, MIPS * Motorola 68000 * MSP430 * Nvidia GPU * Nvidia PTX * PA-RISC * PDP-11 * PowerPC * R8C / M16C / M32C * RISC-V * SPARC * SuperH * System/390 / zSeries * VAX * x86-64 Lesser-known target processors supported in the standard release have included: * 68HC11 * A29K * C6x * CR16 * D30V * DSP16xx * ETRAX CRIS * Fujitsu FR, FR-30 * FR-V * IBM ROMP * Intel i960 * IP2000 * M32R * MCORE * MIL-STD-1750A * MMIX * MN10200 * MN10300 * Motorola 88000 * NS320xx, NS32K * RL78 * Stormy16 * V850 * Xtensa Additional processors have been supported by GCC versions maintained separately from the FSF version: * Cortus APS3 * ARC (processor), ARC * AVR32 * C166 and C167 * D10V * EISC * eSi-RISC * Hexagon (processor), Hexagon * LatticeMico32 * LatticeMico8 * MeP * MicroBlaze * Motorola 6809 * MSP430 * NEC SX architecture * Nios II and Nios embedded processor, Nios * OpenRISC * PDP-10 * PIC30#PIC24 and dsPIC 16-bit microcontrollers, PIC24/dsPIC * PIC30#PIC32 32-bit microcontrollers, PIC32 * Parallax Propeller, Propeller * HP Saturn, Saturn (HP48XGCC) * System/370 * TIGCC (m68k variant) * TMS9900 * TriCore * Z8000 * ZPU (microprocessor), ZPU The GNU Compiler for Java, GCJ Java compiler can target either a native machine language architecture or the Java virtual machine's Java bytecode. When retargetable compiler, retargeting GCC to a new platform, bootstrapping (compilers), bootstrapping is often used. Motorola 68000, Zilog Z80, and other processors are also targeted in the GCC versions developed for various Texas Instruments, Hewlett Packard, Sharp, and Casio programmable graphing calculators.


License

GCC is licensed under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general ...
version 3. The ''GCC runtime exception'' permits compilation of proprietary programs (in addition to free software) with GCC. This does not impact the license terms of GCC source code.


See also

* List of compilers * MinGW * LLVM/ Clang


References


Further reading

*
Using the GNU Compiler Collection (GCC)
', Free Software Foundation, 2008. *
GNU Compiler Collection (GCC) Internals
', Free Software Foundation, 2008. *
An Introduction to GCC
', Network Theory Ltd., 2004 (Revised August 2005). . * Arthur Griffith, ''GCC: The Complete Reference''. McGrawHill / Osborne, 2002. .


External links


Official

*




Other


Collection of GCC 4.0.2 architecture and internals documents
at I.I.T. Bombay * *
From Source to Binary: The Inner Workings of GCC
by Diego Novillo, ''Red Hat#Red Hat Magazine, Red Hat Magazine'', December 2004
A 2003 paper on GENERIC and GIMPLE


an essay covering GCC development for the 1990s, with 30 monthly reports for in the "Inside Cygnus Engineering" section near the end





an essay by Rick Moen recording seven well-known forks, including the GCC/EGCS one {{Authority control 1987 software C (programming language) compilers C++ compilers Compilers Cross-platform free software Fortran compilers Free compilers and interpreters GNU Project software, Compiler Collection Java development tools Pascal (programming language) compilers Software that was rewritten in C++ Free software programmed in C++ Software using the GPL license Unix programming tools