Enhanced GNU Compiler System
   HOME

TheInfoList



OR:

The GNU Compiler Collection (GCC) is an optimizing compiler produced by the
GNU Project The GNU Project () is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and computing devices by collaborati ...
supporting various
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s, hardware architectures and
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s. The
Free Software Foundation The Free Software Foundation (FSF) is a 501(c)#501(c)(3), 501(c)(3) non-profit organization founded by Richard Stallman on October 4, 1985, to support the free software movement, with the organization's preference for software being distributed ...
(FSF) distributes GCC as
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
(GNU GPL). GCC is a key component of the
GNU toolchain The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software applications and operating systems. The GNU toolchain pl ...
and the standard compiler for most projects related to GNU and the
Linux kernel The Linux kernel is a free and open-source, monolithic, modular, multitasking, Unix-like operating system kernel. It was originally authored in 1991 by Linus Torvalds for his i386-based PC, and it was soon adopted as the kernel for the GNU ope ...
. With roughly 15 million lines of code in 2019, GCC is one of the biggest free programs in existence. It has played an important role in the growth of
free software Free software or libre software is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty, no ...
, as both a tool and an example. When it was first released in 1987 by
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
, GCC 1.0 was named the GNU C Compiler since it only handled the
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...
. It was extended to compile C++ in December of that year. Front ends were later developed for Objective-C,
Objective-C++ Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTST ...
, Fortran,
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, Tur ...
, D and Go, among others. The OpenMP and OpenACC specifications are also supported in the C and C++ compilers. GCC has been ported to more platforms and
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s than any other compiler, and is widely deployed as a tool in the development of both free and
proprietary software Proprietary software is software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern copyright and int ...
. GCC is also available for many
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
s, including ARM-based and
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...
-based chips. As well as being the official compiler of the GNU operating system, GCC has been adopted as the standard compiler by many other modern
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
computer
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
s, including most
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
distributions. Most
BSD The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berk ...
family operating systems also switched to GCC shortly after its release, although since then,
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
,
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project em ...
and
Apple macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
have moved to the Clang compiler, largely due to licensing reasons. GCC can also compile code for
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
,
Android Android may refer to: Science and technology * Android (robot), a humanoid robot or synthetic organism designed to imitate a human * Android (operating system), Google's mobile operating system ** Bugdroid, a Google mascot sometimes referred to ...
,
iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also includes ...
,
Solaris Solaris may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Solaris'' (1972 film), directed by ...
,
HP-UX HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrity Ser ...
,
AIX Aix or AIX may refer to: Computing * AIX, a line of IBM computer operating systems *An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set *Athens Internet Exchange, a European Internet exchange point Places Belgium ...
and DOS.


History

In late 1983, in an effort to bootstrap the GNU operating system,
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
asked Andrew S. Tanenbaum, the author of the
Amsterdam Compiler Kit The Amsterdam Compiler Kit (ACK) is a retargetable compiler suite and toolchain written by Andrew Tanenbaum and Ceriel Jacobs, since 2005 maintained by David Given. It has frontends for the following programming languages: C, Pascal, Modula-2, ...
(also known as the ''
Free University A free university is an organization offering uncredited, public classes without restrictions to who can teach or learn. They differ in structure. In 1980 in the United States, about half were associated with a traditional university, about a ...
'' ''Compiler Kit'') for permission to use that software for GNU. When Tanenbaum advised him that the compiler was not free, and that only the university was free, Stallman decided to work on a different compiler. His initial plan was to rewrite an existing compiler from
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
from
Pastel A pastel () is an art medium in a variety of forms including a stick, a square a pebble or a pan of color; though other forms are possible; they consist of powdered pigment and a binder. The pigments used in pastels are similar to those use ...
to C with some help from Len Tower and others. Stallman wrote a new C front end for the Livermore compiler, but then realized that it required megabytes of stack space, an impossibility on a 68000 Unix system with only 64 KB, and concluded he would have to write a new compiler from scratch. None of the Pastel compiler code ended up in GCC, though Stallman did use the C front end he had written. GCC was first released March 22, 1987, available by
FTP The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and data ...
from MIT. Stallman was listed as the author but cited others for their contributions, including Tower for "parts of the parser, RTL generator, RTL definitions, and of the Vax machine description", Jack Davidson and
Christopher W. Fraser LCC ("Local C Compiler" or "Little C Compiler") is a small, retargetable compiler for the ANSI C programming language. Although its source code is available at no charge for personal use, it is not open-source or free software according to the u ...
for the idea of using RTL as an intermediate language, and Paul Rubin for writing most of the preprocessor. Described as the "first free software hit" by
Peter H. Salus Peter Henry Salus is a linguist, computer scientist, historian of technology, author in many fields, and an editor of books and journals. He has conducted research in germanistics, language acquisition, and computer languages. Education and c ...
, the GNU compiler arrived just at the time when
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the ...
was unbundling its development tools from its operating system, selling them separately at a higher combined price than the previous bundle, which led many of Sun's users to buy or download GCC instead of the vendor's tools. While Stallman considered
GNU Emacs GNU Emacs is a free software text editor. It was created by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU project and a flagship project of ...
as his main project, by 1990, GCC supported thirteen computer architectures, was outperforming several vendor compilers, and was used commercially by several companies.


EGCS fork

As GCC was licensed under the GPL, programmers wanting to work in other directions—particularly those writing interfaces for languages other than C—were free to develop their own fork of the compiler, provided they meet the GPL's terms, including its requirements to distribute
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
. Multiple forks proved inefficient and unwieldy, however, and the difficulty in getting work accepted by the official GCC project was greatly frustrating for many, as the project favored stability over new features. The FSF kept such close control on what was added to the official version of GCC 2.x (developed since 1992) that GCC was used as one example of the "cathedral" development model in
Eric S. Raymond Eric Steven Raymond (born December 4, 1957), often referred to as ESR, is an American software developer, open-source software advocate, and author of the 1997 essay and 1999 book ''The Cathedral and the Bazaar''. He wrote a guidebook for the ...
's essay ''
The Cathedral and the Bazaar ''The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary'' (abbreviated ''CatB'') is an essay, and later a book, by Eric S. Raymond on software engineering methods, based on his observations of the Linux k ...
''. In 1997, a group of developers formed the ''Experimental/Enhanced GNU Compiler System (EGCS)'' to merge several experimental forks into a single project. The basis of the merger was a development snapshot of GCC (taken around the 2.7.2 and later followed up to 2.8.1 release). Mergers included g77 (Fortran), PGCC ( P5 Pentium-optimized GCC), many C++ improvements, and many new architectures and
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
variants. While both projects followed each other's changes closely, EGCS development proved considerably more vigorous, so much so that the FSF officially halted development on their GCC 2.x compiler, blessed EGCS as the official version of GCC, and appointed the EGCS project as the GCC maintainers in April 1999. With the release of GCC 2.95 in July 1999 the two projects were once again united. GCC has since been maintained by a varied group of programmers from around the world under the direction of a steering committee. GCC 3 (2002) removed a front-end for
CHILL In computing, CHILL (an acronym for CCITT High Level Language) is a procedural programming language designed for use in telecommunication switches (the hardware used inside telephone exchanges). The language is still used for legacy systems in ...
due to a lack of maintenance. Before version 4.0 the Fortran front end was g77, which only supported FORTRAN 77, but later was dropped in favor of the new GNU Fortran front end that supports Fortran 95 and large parts of Fortran 2003 and Fortran 2008 as well. As of version 4.8, GCC is implemented in C++. Support for
Cilk Plus Cilk, Cilk++, Cilk Plus and OpenCilk are general-purpose programming languages designed for multithreaded parallel computing. They are based on the C and C++ programming languages, which they extend with constructs to express parallel loops ...
existed from GCC 5 to GCC 7. GCC has been ported to a wide variety of
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s, and is widely deployed as a tool in the development of both free and
proprietary software Proprietary software is software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern copyright and int ...
. GCC is also available for many
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
s, including
Symbian Symbian is a discontinued mobile operating system A mobile operating system is an operating system for mobile phones, tablets, smartwatches, smartglasses, or other non-laptop personal mobile computing devices. While computers such as typic ...
(called ''gcce''), ARM-based, and
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...
-based chips. The compiler can target a wide variety of platforms, including
video game console A video game console is an electronic device that Input/output, outputs a video signal or image to display a video game that can be played with a game controller. These may be home video game console, home consoles, which are generally placed i ...
s such as the
PlayStation 2 The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October 2000, in Europe on 24 November 2000, and in Australia on 3 ...
, Cell SPE of PlayStation 3, and Dreamcast. It has been ported to more kinds of
processors A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
and operating systems than any other compiler.


Supported languages

, the recent 11.1 release of GCC includes front ends for C (gcc), C++ (g++), Objective-C, Fortran ( gfortran),
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, Tur ...
( GNAT), Go (gccgo) and D (gdc, since 9.1) programming languages, with the OpenMP and OpenACC parallel language extensions being supported since GCC 5.1. Versions prior to GCC 7 also supported
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
( gcj), allowing compilation of Java to native machine code.
Modula-2 Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
support, previously offered by third parties, will be merged into GCC 13. Regarding language version support for C++ and C, since GCC 11.1 the default target is ''gnu++17'', a superset of
C++17 C++17 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++17 replaced the prior version of the C++ standard, called C++14, and was later replaced by C++20. History Before the C++ Standards Committee fixed a 3-year rel ...
, and ''gnu11'', a superset of C11, with strict standard support also available. GCC also provides experimental support for C++20 and upcoming C++23. Third-party front ends exist for many languages, such as
Pascal Pascal, Pascal's or PASCAL may refer to: People and fictional characters * Pascal (given name), including a list of people with the name * Pascal (surname), including a list of people and fictional characters with the name ** Blaise Pascal, Fren ...
( gpc), Modula-3, and VHDL (GHDL). A few experimental branches exist to support additional languages, such as the GCC UPC compiler for Unified Parallel C or
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
.


Design

GCC's external interface follows
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
conventions. Users invoke a language-specific driver program (gcc for C, g++ for C++, etc.), which interprets command arguments, calls the actual compiler, runs the assembler on the output, and then optionally runs the linker to produce a complete
executable In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), instructi ...
binary. Each of the language compilers is a separate program that reads source code and outputs machine code. All have a common internal structure. A per-language front end parses the source code in that language and produces an abstract syntax tree ("tree" for short). These are, if necessary, converted to the middle end's input representation, called ''GENERIC'' form; the middle end then gradually transforms the program towards its final form.
Compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...
s and static code analysis techniques (such as FORTIFY_SOURCE, a compiler directive that attempts to discover some buffer overflows) are applied to the code. These work on multiple representations, mostly the architecture-independent GIMPLE representation and the architecture-dependent RTL representation. Finally, machine code is produced using architecture-specific pattern matching originally based on an algorithm of Jack Davidson and Chris Fraser. GCC was written primarily in C except for parts of the
Ada Ada may refer to: Places Africa * Ada Foah, a town in Ghana * Ada (Ghana parliament constituency) * Ada, Osun, a town in Nigeria Asia * Ada, Urmia, a village in West Azerbaijan Province, Iran * Ada, Karaman, a village in Karaman Province, Tur ...
front end. The distribution includes the standard libraries for Ada and C++ whose code is mostly written in those languages. On some platforms, the distribution also includes a low-level runtime library, libgcc, written in a combination of machine-independent C and processor-specific machine code, designed primarily to handle arithmetic operations that the target processor cannot perform directly. GCC uses many additional tools in its build, many of which are installed by default by many Unix and Linux distributions (but which, normally, aren't present in Windows installations), including
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
,
Flex Flex or FLEX may refer to: Computing * Flex (language), developed by Alan Kay * FLEX (operating system), a single-tasking operating system for the Motorola 6800 * FlexOS, an operating system developed by Digital Research * FLEX (protocol), a comm ...
,
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North Ame ...
, and other common tools. In addition, it currently requires three additional libraries to be present in order to build:
GMP GMP may refer to: Finance and economics * Gross metropolitan product * Guaranteed maximum price * Guaranteed Minimum Pension Science and technology * GNU Multiple Precision Arithmetic Library, a software library * Granulocyte-macrophage progenit ...
, MPC, and
MPFR The GNU Multiple Precision Floating-Point Reliable Library (GNU MPFR) is a GNU portable C library for arbitrary-precision binary floating-point computation with correct rounding, based on GNU Multi-Precision Library. Library MPFR's computation ...
. In May 2010, the GCC steering committee decided to allow use of a C++ compiler to compile GCC. The compiler was intended to be written mostly in C plus a subset of features from C++. In particular, this was decided so that GCC's developers could use the destructors and
generics Generic or generics may refer to: In business * Generic term, a common name used for a range or class of similar things not protected by trademark * Generic brand, a brand for a product that does not have an associated brand or trademark, other ...
features of C++. In August 2012, the GCC steering committee announced that GCC now uses C++ as its implementation language. This means that to build GCC from sources, a C++ compiler is required that understands ISO/IEC C++03 standard. On May 18, 2020, GCC moved away from ISO/IEC C++03 standard to ISO/IEC C++11 standard (i.e. needed to compile, bootstrap, the compiler itself; by default it however compiles later versions of C++).


Front ends

Each front end uses a parser to produce the abstract syntax tree of a given source file. Due to the syntax tree abstraction, source files of any of the different supported languages can be processed by the same back end. GCC started out using LALR parsers generated with
Bison Bison are large bovines in the genus ''Bison'' (Greek: "wild ox" (bison)) within the tribe Bovini. Two extant and numerous extinct species are recognised. Of the two surviving species, the American bison, ''B. bison'', found only in North Ame ...
, but gradually switched to hand-written recursive-descent parsers for C++ in 2004, and for C and Objective-C in 2006. As of 2021 all front ends use hand-written recursive-descent parsers. Until GCC 4.0 the tree representation of the program was not fully independent of the processor being targeted. The meaning of a tree was somewhat different for different language front ends, and front ends could provide their own tree codes. This was simplified with the introduction of GENERIC and GIMPLE, two new forms of language-independent trees that were introduced with the advent of GCC 4.0. GENERIC is more complex, based on the GCC 3.x Java front end's intermediate representation. GIMPLE is a simplified GENERIC, in which various constructs are '' lowered'' to multiple GIMPLE instructions. The C, C++, and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
front ends produce GENERIC directly in the front end. Other front ends instead have different intermediate representations after parsing and convert these to GENERIC. In either case, the so-called "gimplifier" then converts this more complex form into the simpler SSA-based GIMPLE form that is the common language for a large number of powerful language- and architecture-independent global (function scope) optimizations.


GENERIC and GIMPLE

''GENERIC'' is an intermediate representation language used as a "middle end" while compiling source code into executable binaries. A subset, called ''GIMPLE'', is targeted by all the front ends of GCC. The middle stage of GCC does all of the code analysis and optimization, working independently of both the compiled language and the target architecture, starting from the GENERIC representation and expanding it to
register transfer language In computer science, register transfer language (RTL) is a kind of intermediate representation (IR) that is very close to assembly language, such as that which is used in a compiler. It is used to describe data flow at the register-transfer level ...
(RTL). The GENERIC representation contains only the subset of the imperative programming constructs optimized by the middle end. In transforming the source code to GIMPLE, complex expressions are split into a three-address code using
temporary variable In computer programming, a temporary variable is a variable with short lifetime, usually to hold data that will soon be discarded, or before it can be placed at a more permanent memory location. Because it is short-lived, it is usually declared ...
s. This representation was inspired by the SIMPLE representation proposed in the McCAT compiler by Laurie J. Hendren for simplifying the analysis and optimization of imperative programs.


Optimization

Optimization can occur during any phase of compilation; however, the bulk of optimizations are performed after the syntax and semantic analysis of the front end and before the code generation of the back end; thus a common, even though somewhat contradictory, name for this part of the compiler is the "middle end." The exact set of GCC optimizations varies from release to release as it develops, but includes the standard algorithms, such as loop optimization,
jump threading In computing, jump threading is a compiler optimization of one jump directly to a second jump. If the second condition is a subset or inverse of the first, it can be eliminated, or threaded through the first jump. This is easily done in a single ...
,
common subexpression elimination In compiler theory, common subexpression elimination (CSE) is a compiler optimization that searches for instances of identical expressions (i.e., they all evaluate to the same value), and analyzes whether it is worthwhile replacing them with a sing ...
, instruction scheduling, and so forth. The RTL optimizations are of less importance with the addition of global SSA-based optimizations on
GIMPLE The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
trees, as RTL optimizations have a much more limited scope, and have less high-level information. Some of these optimizations performed at this level include
dead-code elimination In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefits: ...
,
partial-redundancy elimination In compiler theory, partial redundancy elimination (PRE) is a compiler optimization that eliminates expressions that are redundant on some but not necessarily all paths through a program. PRE is a form of common subexpression elimination. An exp ...
,
global value numbering Value numbering is a technique of determining when two computations in a program are equivalent and eliminating one of them with a semantics-preserving optimization. Global value numbering Global value numbering (GVN) is a compiler optimization base ...
,
sparse conditional constant propagation In computer science, sparse conditional constant propagation (SCCP) is an optimization frequently applied in compilers after conversion to static single assignment form (SSA). It simultaneously removes some kinds of dead code and propagates const ...
, and scalar replacement of aggregates. Array dependence based optimizations such as
automatic vectorization Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementation, wh ...
and
automatic parallelization Automatic may refer to: Music Bands * Automatic (band), Australian rock band * Automatic (American band), American rock band * The Automatic, a Welsh alternative rock band Albums * Automatic (Jack Bruce album), ''Automatic'' (Jack Bruce a ...
are also performed.
Profile-guided optimization Profile-guided optimization (PGO, sometimes pronounced as ''pogo''), also known as profile-directed feedback (PDF), and feedback-directed optimization (FDO) is a compiler optimization technique in computer programming that uses profiling to impr ...
is also possible.


Back end

The GCC's back end is partly specified by preprocessor macros and functions specific to a target architecture, for instance to define its
endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...
,
word size In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word si ...
, and calling conventions. The front part of the back end uses these to help decide RTL generation, so although GCC's RTL is nominally processor-independent, the initial sequence of abstract instructions is already adapted to the target. At any moment, the actual RTL instructions forming the program representation have to comply with the
machine description A machine is a physical system using power to apply forces and control movement to perform an action. The term is commonly applied to artificial devices, such as those employing engines or motors, but also to natural biological macromolecule ...
of the target architecture. The machine description file contains RTL patterns, along with operand constraints, and code snippets to output the final assembly. The constraints indicate that a particular RTL pattern might only apply (for example) to certain hardware registers, or (for example) allow immediate operand offsets of only a limited size (e.g. 12, 16, 24, ... bit offsets, etc.). During RTL generation, the constraints for the given target architecture are checked. In order to issue a given snippet of RTL, it must match one (or more) of the RTL patterns in the machine description file, and satisfy the constraints for that pattern; otherwise, it would be impossible to convert the final RTL into machine code. Towards the end of compilation, valid RTL is reduced to a ''strict'' form in which each instruction refers to real machine registers and a pattern from the target's machine description file. Forming strict RTL is a complicated task; an important step is register allocation, where real hardware registers are chosen to replace the initially assigned pseudo-registers. This is followed by a "reloading" phase; any pseudo-registers that were not assigned a real hardware register are 'spilled' to the stack, and RTL to perform this spilling is generated. Likewise, offsets that are too large to fit into an actual instruction must be broken up and replaced by RTL sequences that will obey the offset constraints. In the final phase, the machine code is built by calling a small snippet of code, associated with each pattern, to generate the real instructions from the target's
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
, using the final registers, offsets, and addresses chosen during the reload phase. The assembly-generation snippet may be just a string, in which case a simple string substitution of the registers, offsets, and/or addresses into the string is performed. The assembly-generation snippet may also be a short block of C code, performing some additional work, but ultimately returning a string containing the valid assembly code.


C++ Standard Library (libstdc++)

The GCC project includes an implementation of the C++ Standard Library called libstdc++, licensed under the GPLv3 License with an exception to link closed source application when sources are built with GCC. The current version is 11.


Other features

Some features of GCC include: ; Link-time optimization :
Link-time optimization Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimiz ...
optimizes across object file boundaries to directly improve the linked binary. Link-time optimization relies on an intermediate file containing the serialization of some ''Gimple'' representation included in the object file. The file is generated alongside the object file during source compilation. Each source compilation generates a separate object file and link-time helper file. When the object files are linked, the compiler is executed again and uses the helper files to optimize code across the separately compiled object files. ; Plugins :
Plugins Plug-in, plug in or plugin may refer to: * Plug-in (computing) is a software component that adds a specific feature to an existing computer program. ** Audio plug-in, adds audio signal processing features ** Photoshop plugin, a piece of software t ...
extend the GCC compiler directly. Plugins allow a stock compiler to be tailored to specific needs by external code loaded as plugins. For example, plugins can add, replace, or even remove middle-end passes operating on ''Gimple'' representations. Several GCC plugins have already been published, notably: :* The Python plugin, which links against libpython, and allows one to invoke arbitrary Python scripts from inside the compiler. The aim is to allow GCC plugins to be written in Python. :* The MELT plugin provides a high-level
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
-like language to extend GCC. : The support of plugins was once a contentious issue in 2007. ; C++
transactional memory In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transa ...
: The C++ language has an active proposal for transactional memory. It can be enabled in GCC 6 and newer when compiling with -fgnu-tm. ; Unicode identifiers : Although the C++ language requires support for non-ASCII
Unicode characters The Unicode Consortium and the ISO/IEC JTC 1/SC 2/ WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set ( UCS, officia ...
in identifiers, the feature has only been supported since GCC 10. As with the existing handling of string literals, the source file is assumed to be encoded in
UTF-8 UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
. The feature is optional in C, but has been made available too since this change. ; C extensions : GNU C extends the C programming language with several non-standard-features, including nested functions and typeof expressions.


Architectures

GCC target processor families as of version 11.1 include: * AArch64 *
Alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...
* ARM * AVR *
Blackfin The Blackfin is a family of 16-/32-bit microprocessors developed, manufactured and marketed by Analog Devices. The processors have built-in, fixed-point digital signal processor (DSP) functionality supplied by 16-bit multiply–accumulates (MA ...
* eBPF *
Epiphany Epiphany may refer to: * Epiphany (feeling), an experience of sudden and striking insight Religion * Epiphany (holiday), a Christian holiday celebrating the revelation of God the Son as a human being in Jesus Christ ** Epiphany season, or Epiph ...
(GCC 4.8) * H8/300 *
HC12 The 68HC12 (6812 or HC12 for short) is a microcontroller family from Freescale Semiconductor. Originally introduced in the mid-1990s, the architecture is an enhancement of the Freescale 68HC11. Programs written for the HC11 are usually compatibl ...
*
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...
(
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
) * IA-64 (Intel Itanium) * MIPS *
Motorola 68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
* MSP430 * Nvidia GPU *
Nvidia PTX Parallel Thread Execution (PTX or NVPTX) is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's CUDA programming environment. The NVCC compiler translates code written in CUDA, a C++-like langu ...
* PA-RISC *
PDP-11 The PDP-11 is a series of 16-bit minicomputers sold by Digital Equipment Corporation (DEC) from 1970 into the 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of all models were sold, ...
*
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
*
R8C The Renesas 'R8C'' is a 16-bit microcontroller that was developed as a smaller and cheaper version of the Renesas M16C. It retains the M16C's 16-bit CISC architecture and instruction set, but trades size for speed by cutting the internal data bu ...
/
M16C is a Japanese semiconductor manufacturer headquartered in Tokyo, Japan, initially incorporated in 2002 as Renesas Technology, the consolidated entity of the semiconductor units of Hitachi and Mitsubishi excluding their dynamic random-access ...
/
M32C is a Japanese semiconductor manufacturer headquartered in Tokyo, Japan, initially incorporated in 2002 as Renesas Technology, the consolidated entity of the semiconductor units of Hitachi and Mitsubishi excluding their dynamic random-access ...
*
RISC-V RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on estab ...
*
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...
* SuperH * System/390 /
zSeries IBM Z is a family name used by IBM for all of its z/Architecture mainframe computers. In July 2017, with another generation of products, the official family was changed to IBM Z from IBM z Systems; the IBM Z family now includes the newest mo ...
*
VAX VAX (an acronym for Virtual Address eXtension) is a series of computers featuring a 32-bit instruction set architecture (ISA) and virtual memory that was developed and sold by Digital Equipment Corporation (DEC) in the late 20th century. The VA ...
*
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
Lesser-known target processors supported in the standard release have included: *
68HC11 The 68HC11 (6811 or HC11 for short) is an 8-bit microcontroller (µC) family introduced by Motorola in 1984. Now produced by NXP Semiconductors, it descended from the Motorola 6800 microprocessor by way of the 6801. The 68HC11 devices are more p ...
*
A29K The AMD Am29000, commonly shortened to 29k, is a family of 32-bit RISC microprocessors and microcontrollers developed and fabricated by Advanced Micro Devices (AMD). Based on the seminal Berkeley RISC, the 29k added a number of significant impro ...
*
C6x Texas Instruments TMS320 is a blanket name for a series of digital signal processors (DSPs) from Texas Instruments. It was introduced on April 8, 1983 through the TMS32010 processor, which was then the fastest DSP on the market. The processor is ...
*
CR16 CompactRISC is a family of instruction set architectures from National Semiconductor. The architectures are designed according to reduced instruction set computing principles, and are mainly used in microcontrollers. The subarchitectures of this fam ...
*
D30V D3, D03, D.III, D III or D-3 may refer to: Transportation Roads * London Buses route D3, a Transport for London contracted bus route * D3 motorway (Czech Republic), a motorway in the Czech Republic * D3 road (Croatia), a state road in Croatia * D3 ...
* DSP16xx * ETRAX CRIS * FR-30 * FR-V *
IBM ROMP The ROMP is a reduced instruction set computer (RISC) microprocessor designed by IBM in the late 1970s. It is also known as the Research OPD Miniprocessor (after the two IBM divisions that collaborated on its inception, IBM Research and the Off ...
* Intel i960 * IP2000 * M32R * MCORE * MIL-STD-1750A *
MMIX MMIX (pronounced ''em-mix'') is a 64-bit reduced instruction set computing (RISC) architecture designed by Donald Knuth, with significant contributions by John L. Hennessy (who contributed to the design of the MIPS architecture) and Richard L. S ...
*
MN10200 MN1, MN 1, or MN-1 may be: * Minnesota State Highway 1 * Ulaanbaatar, ISO 3166-2 geocode for the capital of Mongolia * Minnesota's 1st congressional district * The ''MN1'' gene on human chromosome 22 * MN 1 (biostratigraphic zone), a biostratig ...
* MN10300 * Motorola 88000 * NS32K *
RL78 RL78 Family is a 16-bit CPU core for embedded microcontrollers of Renesas Electronics introduced in 2010. Architecture Although it has eight 8-bit registers or four 16-bit register pairs, essentially all arithmetic operations are performed ...
* Stormy16 *
V850 V850 is a 32-bit RISC CPU architecture produced by Renesas Electronics for embedded microcontrollers. It was designed by NEC as a replacement for their earlier NEC V60 family, and was introduced shortly before NEC sold their designs to Renesas ...
* Xtensa Additional processors have been supported by GCC versions maintained separately from the FSF version: * Cortus APS3 *
ARC ARC may refer to: Business * Aircraft Radio Corporation, a major avionics manufacturer from the 1920s to the '50s * Airlines Reporting Corporation, an airline-owned company that provides ticket distribution, reporting, and settlement services * ...
*
AVR32 AVR32 is a 32-bit RISC microcontroller architecture produced by Atmel. The microcontroller architecture was designed by a handful of people educated at the Norwegian University of Science and Technology, including lead designer Øyvind Strøm an ...
*
C166 The C166 family is a 16-bit microcontroller architecture from Infineon (formerly the semiconductor division of Siemens) in cooperation with STMicroelectronics. It was first released in 1990 and is a controller for measurement and control tasks. It ...
and C167 *
D10V D1, D01, D.I, D.1 or D-1 can refer to: Science and technology Biochemistry and medicine * ATC code D01 ''Antifungals for dermatological use'', a subgroup of the Anatomical Therapeutic Chemical Classification System * Dopamine receptor D1, Dopamine ...
* EISC * eSi-RISC *
Hexagon In geometry, a hexagon (from Ancient Greek, Greek , , meaning "six", and , , meaning "corner, angle") is a six-sided polygon. The total of the internal angles of any simple polygon, simple (non-self-intersecting) hexagon is 720°. Regular hexa ...
*
LatticeMico32 LatticeMico32 is a 32-bit microprocessor reduced instruction set computer (RISC) soft core from Lattice Semiconductor optimized for field-programmable gate arrays (FPGAs). It uses a Harvard architecture, which means the instruction and data buses a ...
*
LatticeMico8 The LatticeMico8 is an 8-bit microcontroller reduced instruction set computer (RISC) soft processor core optimized for field-programmable gate arrays (FPGAs) and crossover programmable logic device architecture from Lattice Semiconductor. Combinin ...
*
MeP MEP may refer to: Organisations and politics * Mahajana Eksath Peramuna, a political party in Sri Lanka * Mahajana Eksath Peramuna (1956), a former political alliance in Sri Lanka * Maison européenne de la photographie, a photography centre ...
* MicroBlaze * Motorola 6809 * MSP430 *
NEC SX architecture is a Japanese multinational information technology and electronics corporation, headquartered in Minato, Tokyo. The company was known as the Nippon Electric Company, Limited, before rebranding in 1983 as NEC. It provides IT and network soluti ...
* Nios II and Nios * OpenRISC *
PDP-10 Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especi ...
* PIC24/dsPIC *
PIC32 PIC (usually pronounced as ''"pick"'') is a family of microcontrollers made by Microchip Technology, derived from the PIC1650"PICmicro Family Tree", PIC16F Seminar Presentation originally developed by General Instrument's Microelectronics ...
*
Propeller A propeller (colloquially often called a screw if on a ship or an airscrew if on an aircraft) is a device with a rotating hub and radiating blades that are set at a pitch to form a helical spiral which, when rotated, exerts linear thrust upon ...
*
Saturn Saturn is the sixth planet from the Sun and the second-largest in the Solar System, after Jupiter. It is a gas giant with an average radius of about nine and a half times that of Earth. It has only one-eighth the average density of Earth; h ...
(HP48XGCC) * System/370 *
TIGCC {{Infobox software , name = TIGCC , logo = , caption = The Logo for the TIGCC Project. , developer = The TIGCC Team , latest_release_version = 0.96-beta8 , latest_release_date = {{release date and age, 2006, 10, 31 , operating_system = Linux/Unix, ...
( m68k variant) *
TMS9900 Introduced in June 1976, the TMS9900 was one of the first commercially available, single-chip 16-bit microprocessors. It implemented Texas Instruments' TI-990 minicomputer architecture in a single-chip format, and was initially used for low-end m ...
* TriCore *
Z8000 The Z8000 ("''zee-'' or ''zed-eight-thousand''") is a 16-bit microprocessor introduced by Zilog in early 1979. The architecture was designed by Bernard Peuto while the logic and physical implementation was done by Masatoshi Shima, assisted by a ...
*
ZPU The ZPU (, meaning "anti-aircraft machine gun mount") is a family of towed anti-aircraft gun based on the Soviet 14.5×114mm KPV heavy machine gun. It entered service with the Soviet Union in 1949 and is used by over 50 countries worldwide. ...
The GCJ Java compiler can target either a native machine language architecture or the
Java virtual machine A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
's
Java bytecode In computing, Java bytecode is the bytecode-structured instruction set of the Java virtual machine (JVM), a virtual machine that enables a computer to run programs written in the Java programming language and several other programming languages, ...
. When
retargeting In software engineering, retargeting is an attribute of software development tools that have been specifically designed to generate code for more than one computing platform. Compilers A retargetable compiler is a compiler that has been designed ...
GCC to a new platform, bootstrapping is often used. Motorola 68000, Zilog Z80, and other processors are also targeted in the GCC versions developed for various Texas Instruments, Hewlett Packard, Sharp, and Casio programmable graphing calculators.


License

GCC is licensed under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
version 3. The ''GCC runtime exception'' permits compilation of proprietary programs (in addition to free software) with GCC. This does not impact the license terms of GCC source code.


See also

* List of compilers * MinGW *
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
/ Clang


References


Further reading

*
Using the GNU Compiler Collection (GCC)
', Free Software Foundation, 2008. *
GNU Compiler Collection (GCC) Internals
', Free Software Foundation, 2008. *
An Introduction to GCC
', Network Theory Ltd., 2004 (Revised August 2005). . * Arthur Griffith, ''GCC: The Complete Reference''. McGrawHill / Osborne, 2002. .


External links


Official

*




Other


Collection of GCC 4.0.2 architecture and internals documents
at I.I.T. Bombay * *
From Source to Binary: The Inner Workings of GCC
by Diego Novillo, ''
Red Hat Magazine Red is the color at the long wavelength end of the visible spectrum of light, next to orange and opposite violet. It has a dominant wavelength of approximately 625–740 nanometres. It is a primary color in the RGB color model and a secondary ...
'', December 2004
A 2003 paper on GENERIC and GIMPLE


an essay covering GCC development for the 1990s, with 30 monthly reports for in the "Inside Cygnus Engineering" section near the end





an essay by Rick Moen recording seven well-known forks, including the GCC/EGCS one {{Authority control 1987 software C (programming language) compilers C++ compilers Compilers Cross-platform free software Fortran compilers Free compilers and interpreters Compiler Collection Java development tools Pascal (programming language) compilers Software that was rewritten in C++ Free software programmed in C++ Software using the GPL license Unix programming tools