The GNU Compiler Collection (GCC) is a collection of
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
s from the
GNU Project
The GNU Project ( ) is a free software, mass collaboration project announced by Richard Stallman on September 27, 1983. Its goal is to give computer users freedom and control in their use of their computers and Computer hardware, computing dev ...
that support various
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s,
hardware architectures, and
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s. The
Free Software Foundation
The Free Software Foundation (FSF) is a 501(c)(3) non-profit organization founded by Richard Stallman on October 4, 1985. The organisation supports the free software movement, with the organization's preference for software being distributed ...
(FSF) distributes GCC as
free software
Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
under the
GNU General Public License
The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
(GNU GPL). GCC is a key component of the
GNU toolchain which is used for most projects related to
GNU and the
Linux kernel
The Linux kernel is a Free and open-source software, free and open source Unix-like kernel (operating system), kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the k ...
. With roughly 15 million lines of code in 2019, GCC is one of the largest free programs in existence.
It has played an important role in the growth of
free software
Free software, libre software, libreware sometimes known as freedom-respecting software is computer software distributed open-source license, under terms that allow users to run the software for any purpose as well as to study, change, distribut ...
, as both a tool and an example.
When it was first released in 1987 by
Richard Stallman
Richard Matthew Stallman ( ; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
, GCC 1.0 was named the GNU C Compiler since it only handled the
C programming language
C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
.
It was extended to compile
C++ in December of that year.
Front ends were later developed for
Objective-C
Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
,
Objective-C++
Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C (programming language), C programming language. Originally developed by Brad Cox and Tom Love in ...
,
Fortran,
Ada,
Go,
D,
Modula-2
Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
,
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
and
COBOL
COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...
among others. The
OpenMP
OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
and
OpenACC
OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/ GPU systems.
As in OpenMP, the prog ...
specifications are also supported in the C and C++ compilers.
As well as being the official compiler of the
GNU operating system, GCC has been adopted as the standard compiler by many other modern
Unix-like
A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
computer
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s, including most
Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
distributions. Most
BSD
The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginni ...
family operating systems also switched to GCC shortly after its release, although since then,
FreeBSD
FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
and
Apple macOS have moved to the
Clang
Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...
compiler, largely due to licensing reasons. GCC can also compile code for
Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
,
Android,
iOS
Ios, Io or Nio (, ; ; locally Nios, Νιός) is a Greek island in the Cyclades group in the Aegean Sea. Ios is a hilly island with cliffs down to the sea on most sides. It is situated halfway between Naxos and Santorini. It is about long an ...
,
Solaris,
HP-UX
HP-UX (from "Hewlett Packard Unix") is a proprietary software, proprietary implementation of the Unix operating system developed by Hewlett Packard Enterprise; current versions support HPE Integrity Servers, based on Intel's Itanium architect ...
,
AIX and
DOS.
GCC has been
ported to more platforms and
instruction set architecture
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...
s than any other compiler, and is widely deployed as a tool in the development of both free and
proprietary software
Proprietary software is computer software, software that grants its creator, publisher, or other rightsholder or rightsholder partner a legal monopoly by modern copyright and intellectual property law to exclude the recipient from freely sharing t ...
. GCC is also available for many
embedded system
An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...
s, including
ARM-based and
Power ISA
Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...
-based chips.
History
In late 1983, in an effort to
bootstrap the
GNU operating system,
Richard Stallman
Richard Matthew Stallman ( ; born March 16, 1953), also known by his initials, rms, is an American free software movement activist and programmer. He campaigns for software to be distributed in such a manner that its users have the freedom to ...
asked
Andrew S. Tanenbaum, the author of the
Amsterdam Compiler Kit
The Amsterdam Compiler Kit (ACK) is a retargetable compiler suite and toolchain written by Andrew Tanenbaum and Ceriel Jacobs, since 2005 maintained by David Given. It has frontends for the following programming languages: C, Pascal, Modula ...
(also known as the ''
Free University'' ''Compiler Kit''), for permission to use that software for GNU. When Tanenbaum advised him that the compiler was not free, and that only the university was free, Stallman decided to work on a different compiler. His initial plan was to rewrite an existing compiler from
Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory (LLNL) is a Federally funded research and development centers, federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now i ...
from
Pastel
A pastel () is an art medium that consists of powdered pigment and a binder (material), binder. It can exist in a variety of forms, including a stick, a square, a pebble, and a pan of color, among other forms. The pigments used in pastels are ...
to C with some help from
Len Tower and others.
Stallman wrote a new C front end for the Livermore compiler, but then realized that it required megabytes of stack space, an impossibility on a
68000
The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
Unix system with only 64 KB, and concluded he would have to write a new compiler from scratch.
None of the Pastel compiler code ended up in GCC, though Stallman did use the C front end he had written.
GCC was first released March 22, 1987, available by
FTP
The File Transfer Protocol (FTP) is a standard communication protocol used for the transfer of computer files from a server to a client on a computer network. FTP is built on a client–server model architecture using separate control and dat ...
from
MIT
The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
. Stallman was listed as the author but cited others for their contributions, including Tower for "parts of the parser, RTL generator, RTL definitions, and of the Vax machine description", Jack Davidson and
Christopher W. Fraser for the idea of using
RTL as an intermediate language, and Paul Rubin for writing most of the preprocessor.
Described as the "first free software hit" by
Peter H. Salus, the GNU compiler arrived just at the time when
Sun Microsystems
Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
was unbundling its development tools from
its operating system, selling them separately at a higher combined price than the previous bundle, which led many of Sun's users to buy or download GCC instead of the vendor's tools.
While Stallman considered
GNU Emacs
GNU Emacs is a text editor and suite of free software tools. Its development began in 1984 by GNU Project founder Richard Stallman, based on the Emacs editor developed for Unix operating systems. GNU Emacs has been a central component of the GNU ...
as his main project, by 1990 GCC supported thirteen computer architectures, was outperforming several vendor compilers, and was used commercially by several companies.
EGCS fork
As GCC was licensed under the GPL, programmers wanting to work in other directions—particularly those writing interfaces for languages other than C—were free to develop their own
fork of the compiler, provided they meet the GPL's terms, including its requirements to distribute
source code
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, only ...
. Multiple forks proved inefficient and unwieldy, however, and the difficulty in getting work accepted by the official GCC project was greatly frustrating for many, as the project favored stability over new features.
The FSF kept such close control on what was added to the official version of GCC 2.x (developed since 1992) that GCC was used as one example of the "cathedral" development model in
Eric S. Raymond's essay ''
The Cathedral and the Bazaar
''The Cathedral and the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary'' (abbreviated ''CatB'') is an essay, and later a book, by Eric S. Raymond on software engineering methods, based on his observations of the Linux ...
''.
In 1997, a group of developers formed the ''Experimental/Enhanced GNU Compiler System (EGCS)'' to merge several experimental forks into a single project.
The basis of the merger was a development snapshot of GCC (taken around the 2.7.2 and later followed up to 2.8.1 release). Mergers included g77 (Fortran), PGCC (
P5 Pentium
Pentium is a series of x86 architecture-compatible microprocessors produced by Intel from 1993 to 2023. The Pentium (original), original Pentium was Intel's fifth generation processor, succeeding the i486; Pentium was Intel's flagship proce ...
-optimized GCC),
many C++ improvements, and many new architectures and
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
variants.
While both projects followed each other's changes closely, EGCS development proved considerably more vigorous, so much so that the FSF officially halted development on their GCC 2.x compiler, blessed EGCS as the official version of GCC, and appointed the EGCS project as the GCC maintainers in April 1999. With the release of GCC 2.95 in July 1999 the two projects were once again united.
GCC has since been maintained by a varied group of programmers from around the world under the direction of a steering committee.
GCC 3 (2002) removed a front-end for
CHILL
In computing, CHILL (an acronym for CCITT High Level Language) is a procedural programming language designed for use in telecommunication switches (the hardware used inside telephone exchanges). The language is still used for legacy systems ...
due to a lack of maintenance.
Before version 4.0 the Fortran front end was
g77
, which only supported
FORTRAN 77, but later was dropped in favor of the new
GNU Fortran front end that supports
Fortran 95 and large parts of
Fortran 2003 and
Fortran 2008 as well.
As of version 4.8, GCC is implemented in C++.
Support for
Cilk Plus existed from GCC 5 to GCC 7.
GCC has been
ported to a wide variety of
instruction set architecture
In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...
s, and is widely deployed as a tool in the development of both free and
proprietary software
Proprietary software is computer software, software that grants its creator, publisher, or other rightsholder or rightsholder partner a legal monopoly by modern copyright and intellectual property law to exclude the recipient from freely sharing t ...
. GCC is also available for many
embedded system
An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is e ...
s, including
Symbian
Symbian is a discontinued mobile operating system (OS) and computing platform designed for smartphones. It was originally developed as a proprietary software OS for personal digital assistants in 1998 by the Symbian Ltd. consortium. Symbian OS ...
(called ''gcce''),
ARM-based, and
Power ISA
Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...
-based chips.
The compiler can target a wide variety of platforms, including
video game console
A video game console is an electronic device that Input/output, outputs a video signal or image to display a video game that can typically be played with a game controller. These may be home video game console, home consoles, which are generally ...
s such as the
PlayStation 2
The PlayStation 2 (PS2) is a home video game console developed and marketed by Sony Interactive Entertainment, Sony Computer Entertainment. It was first released in Japan on 4 March 2000, in North America on 26 October, in Europe on 24 Novembe ...
,
Cell SPE of PlayStation 3,
and
Dreamcast
The is the final home video game console manufactured by Sega. It was released in Japan on November 27, 1998, in North America on September 9, 1999 and in Europe on October 14, 1999. It was the first sixth-generation video game console, prec ...
.
It has been ported to more kinds of
processors and operating systems than any other compiler.
Supported languages
GCC includes front ends for
C (
gcc
),
C++ (
g++
),
Objective-C
Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
,
Objective-C++
Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C (programming language), C programming language. Originally developed by Brad Cox and Tom Love in ...
,
Fortran (
gfortran
),
Ada (
GNAT
GNAT is a free-software compiler for the Ada programming language which forms part of the GNU Compiler Collection (GCC). It supports all versions of the language, i.e. Ada 2012, Ada 2005, Ada 95 and Ada 83. Originally its ...
),
Go (
gccgo
),
D (
gdc
, since 9.1),
Modula-2
Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
(
gm2
, since 13.1),
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
(
gccrs
, since 15.1) and
COBOL
COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...
(
gcobol
, since 15.1) programming languages,
with the
OpenMP
OpenMP is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating systems, including Solaris, ...
and
OpenACC
OpenACC (for ''open accelerators'') is a programming standard for parallel computing developed by Cray, CAPS, Nvidia and PGI. The standard is designed to simplify parallel programming of heterogeneous CPU/ GPU systems.
As in OpenMP, the prog ...
parallel language extensions being supported since GCC 5.1.
Versions prior to GCC 7 also supported
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
(
gcj
), allowing compilation of Java to native machine code.
Third-party front ends exist for many languages, such as
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language member of the ALGOL family that was conceived as a successor to the ALGOL 60 language, designed with the goal of a much wider scope of application and ...
,
Pascal (
gpc
),
Mercury,
Modula-3
Modula-3 is a programming language conceived as a successor to an upgraded version of Modula-2 known as Modula-2+. It has been influential in research circles (influencing the designs of languages such as Java, C#, Python and Nim), but it ha ...
,
VHDL
VHDL (Very High Speed Integrated Circuit Program, VHSIC Hardware Description Language) is a hardware description language that can model the behavior and structure of Digital electronics, digital systems at multiple levels of abstraction, ran ...
(
GHDL
) and
PL/I
PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...
.
A few experimental branches exist to support additional languages, such as the GCC
UPC compiler for
Unified Parallel C
Unified Parallel C (UPC) is an extension of the C programming language designed for high-performance computing on large-scale parallel machines, including those with a common global address space ( SMP and NUMA) and those with distributed me ...
.
Regarding language version support for C++ and C, since GCC 11.1 the default target is ''gnu++17'', a superset of
C++17
C17, C-17 or C.17 may refer to:
Transportation
* , a 1917 British C-class submarine
Air
* Boeing C-17 Globemaster III, a military transport aircraft
* Lockheed Y1C-17 Vega, a six-passenger monoplane
* Cierva C.17, a 1928 English experimental ...
, and ''gnu11'', a superset of
C11, with strict standard support also available. GCC also provides experimental support for
C++20 C20 or C-20 may refer to:
Science and technology
* Carbon-20 (C-20 or 20C), an isotope of carbon
* C20, the smallest possible fullerene (a carbon molecule)
* C20 (engineering), a mix of concrete that has a compressive strength of 20 newtons per squ ...
and
C++23
C++23, formally ISO/IEC 14882:2024, is the current open standard for the C++ programming language that follows C++20. The final draft of this version is N4950.
In February 2020, at the final meeting for C++20 in Prague, an overall plan for C++ ...
.
Design

GCC's external interface follows
Unix
Unix (, ; trademarked as UNIX) is a family of multitasking, multi-user computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, a ...
conventions. Users invoke a language-specific driver program (
gcc
for C,
g++
for C++, etc.), which interprets
command arguments, calls the actual compiler, runs the
assembler on the output, and then optionally runs the
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
to produce a complete
executable
In computer science, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instruction (computer science), in ...
binary.
Each of the language compilers is a separate program that reads source code and outputs
machine code
In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
. All have a common internal structure. A per-language front end
parses the source code in that language and produces an
abstract syntax tree
An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
("tree" for short).
These are, if necessary, converted to the middle end's input representation, called ''GENERIC'' form; the middle end then gradually transforms the program towards its final form.
Compiler optimization
An optimizing compiler is a compiler designed to generate code that is optimized in aspects such as minimizing program execution time, memory usage, storage size, and power consumption. Optimization is generally implemented as a sequence of op ...
s and
static code analysis
In computer science, static program analysis (also known as static analysis or static simulation) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs duri ...
techniques (such as FORTIFY_SOURCE, a compiler directive that attempts to discover some
buffer overflows) are applied to the code. These work on multiple representations, mostly the architecture-independent GIMPLE representation and the architecture-dependent
RTL representation. Finally,
machine code
In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
is produced using architecture-specific
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
originally based on an algorithm of Jack Davidson and Chris Fraser.
GCC was written primarily in
C except for parts of the
Ada front end. The distribution includes the standard libraries for Ada and
C++ whose code is mostly written in those languages. On some platforms, the distribution also includes a low-level runtime library, libgcc, written in a combination of machine-independent C and processor-specific
machine code
In computer programming, machine code is computer code consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). For conventional binary computers, machine code is the binaryOn nonb ...
, designed primarily to handle arithmetic operations that the target processor cannot perform directly.
GCC uses many additional tools in its build, many of which are installed by default by many Unix and Linux distributions (but which, normally, aren't present in Windows installations), including
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
,
Flex,
Bison
A bison (: bison) is a large bovine in the genus ''Bison'' (from Greek, meaning 'wild ox') within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised.
Of the two surviving species, the American ...
, and other common tools. In addition, it currently requires three additional libraries to be present in order to build:
GMP,
MPC, and
MPFR
The GNU Multiple Precision Floating-Point Reliable Library (GNU MPFR) is a GNU portable C (programming language), C Library (computing), library for Arbitrary-precision arithmetic, arbitrary-precision binary Floating-point arithmetic, floating-po ...
.
In May 2010, the GCC steering committee decided to allow use of a
C++ compiler to compile GCC.
The compiler was intended to be written mostly in C plus a subset of features from C++. In particular, this was decided so that GCC's developers could use the
destructors and
generics features of C++.
In August 2012, the GCC steering committee announced that GCC now uses C++ as its implementation language. This means that to build GCC from sources, a C++ compiler is required that understands
ISO/IEC C++03 standard.
On May 18, 2020, GCC moved away from
ISO/IEC C++03 standard to
ISO/IEC C++11 standard (i.e. needed to compile, bootstrap, the compiler itself; by default it however compiles later versions of C++).
Front ends

Each
front end uses a parser to produce the
abstract syntax tree
An abstract syntax tree (AST) is a data structure used in computer science to represent the structure of a program or code snippet. It is a tree representation of the abstract syntactic structure of text (often source code) written in a formal ...
of a given
source file
In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer.
Since a computer, at base, onl ...
. Due to the syntax tree abstraction, source files of any of the different supported languages can be processed by the same
back end. GCC started out using
LALR parsers generated with
Bison
A bison (: bison) is a large bovine in the genus ''Bison'' (from Greek, meaning 'wild ox') within the tribe Bovini. Two extant taxon, extant and numerous extinction, extinct species are recognised.
Of the two surviving species, the American ...
, but gradually switched to hand-written
recursive-descent parsers for C++ in 2004, and for C and Objective-C in 2006. As of 2021 all front ends use hand-written recursive-descent parsers.
Until GCC 4.0, the tree representation of the program was not fully independent of the processor being targeted. The meaning of a tree was somewhat different for different language front ends, and front ends could provide their own tree codes. This was simplified with the introduction of GENERIC and GIMPLE, two new forms of language-independent trees that were introduced with the advent of GCC 4.0. GENERIC is more complex, based on the GCC 3.x Java front end's intermediate representation. GIMPLE is a simplified GENERIC, in which various constructs are ''
lowered'' to multiple GIMPLE instructions. The
C,
C++, and
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
front ends produce GENERIC directly in the front end. Other front ends instead have different intermediate representations after parsing and convert these to GENERIC.
In either case, the so-called "gimplifier" then converts this more complex form into the simpler
SSA-based GIMPLE form that is the common language for a large number of language- and architecture-independent global (function scope) optimizations.
GENERIC and GIMPLE
''GENERIC'' is an
intermediate representation
An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
language used as a "middle end" while compiling source code into
executable binaries. A subset, called ''GIMPLE'', is targeted by all the front ends of GCC.
The middle stage of GCC does all of the code analysis and
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
, working independently of both the compiled language and the target architecture, starting from the GENERIC representation and expanding it to
register transfer language (RTL). The GENERIC representation contains only the subset of the imperative
programming constructs optimized by the middle end.
In transforming the source code to GIMPLE, complex
expressions are split into a
three-address code
In computer science, three-address code (often abbreviated to TAC or 3AC) is an intermediate language, intermediate code used by optimizing compilers to aid in the implementation of code-improving transformations. Each TAC instruction has at most t ...
using
temporary variables. This representation was inspired by the SIMPLE representation proposed in the McCAT compiler by Laurie J. Hendren for simplifying the analysis and
optimization
Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criteria, from some set of available alternatives. It is generally divided into two subfiel ...
of
imperative programs.
Optimization
Optimization can occur during any phase of compilation; however, the bulk of optimizations are performed after the syntax and
semantic analysis of the front end and before the
code generation of the back end; thus a common, though somewhat self-contradictory, name for this part of the compiler is the "middle end."
The exact set of GCC optimizations varies from release to release as it develops, but includes the standard algorithms, such as
loop optimization
In compiler theory, loop optimization is the process of increasing execution speed and reducing the overheads associated with loops. It plays an important role in improving cache performance and making effective use of parallel processing capa ...
,
jump threading,
common subexpression elimination,
instruction scheduling, and so forth. The
RTL optimizations are of less importance with the addition of global SSA-based optimizations on
GIMPLE
The GNU Compiler Collection (GCC) is a collection of compilers from the GNU Project that support various programming languages, hardware architectures, and operating systems. The Free Software Foundation (FSF) distributes GCC as free software ...
trees, as RTL optimizations have a much more limited scope, and have less high-level information.
Some of these optimizations performed at this level include
dead-code elimination,
partial-redundancy elimination,
global value numbering,
sparse conditional constant propagation, and
scalar replacement of aggregates. Array dependence based optimizations such as
automatic vectorization
Automatic vectorization, in parallel computing, is a special case of automatic parallelization, where a computer program is converted from a scalar implementation, which processes a single pair of operands at a time, to a vector implementatio ...
and
automatic parallelization are also performed.
Profile-guided optimization is also possible.
C++ Standard Library (libstdc++)
The GCC project includes an implementation of the
C++ Standard Library
The C standard library, sometimes referred to as libc, is the standard library for the C programming language, as specified in the ISO C standard.ISO/ IEC (2018). '' ISO/IEC 9899:2018(E): Programming Languages - C §7'' Starting from the origina ...
called libstdc++, licensed under the GPLv3 License with an exception to link non-GPL applications when sources are built with GCC.
Other features
Some features of GCC include:
; Link-time optimization
:
Link-time optimization optimizes across object file boundaries to directly improve the linked binary. Link-time optimization relies on an intermediate file containing the serialization of some ''Gimple'' representation included in the object file. The file is generated alongside the object file during source compilation. Each source compilation generates a separate object file and link-time helper file. When the object files are linked, the compiler is executed again and uses the helper files to optimize code across the separately compiled object files.
; Plugins
:
Plugins extend the GCC compiler directly. Plugins allow a stock compiler to be tailored to specific needs by external code loaded as plugins. For example, plugins can add, replace, or even remove middle-end passes operating on ''Gimple'' representations. Several GCC plugins have already been published, notably:
:* The Python plugin, which links against libpython, and allows one to invoke arbitrary Python scripts from inside the compiler. The aim is to allow GCC plugins to be written in Python.
:* The MELT plugin provides a high-level
Lisp
Lisp (historically LISP, an abbreviation of "list processing") is a family of programming languages with a long history and a distinctive, fully parenthesized Polish notation#Explanation, prefix notation.
Originally specified in the late 1950s, ...
-like language to extend GCC.
: The support of plugins was once a contentious issue in 2007.
; C++
transactional memory In computer science and computer engineering, engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an linearizability, atomic way. It is a concurrency control ...
: The C++ language has an active proposal for transactional memory. It can be enabled in GCC 6 and newer when compiling with
-fgnu-tm
.
; Unicode identifiers
: Although the C++ language requires support for non-ASCII
Unicode characters
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/Working group, WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set ( ...
in
identifiers
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, person, physical countable object (or class thereof), or physical mass ...
, the feature has only been supported since GCC 10. As with the existing handling of string literals, the source file is assumed to be encoded in
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
. The feature is optional in C, but has been made available too since this change.
; C extensions
: GNU C extends the C programming language with several non-standard-features, including
nested function
In computer programming, a nested function (or nested procedure or subroutine) is a named function that is defined within another, enclosing, block and is lexically scoped within the enclosing block meaning it is only callable by name within t ...
s.
Architectures

The primary supported (and best tested) processor families are 64- and 32-bit ARM, 64- and 32-bit
x86 64 and
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
and 64-bit
PowerPC
PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
and
SPARC.
GCC target processor families as of version 11.1 include:
*
AArch64
AArch64, also known as ARM64, is a 64-bit version of the ARM architecture family, a widely used set of computer processor designs. It was introduced in 2011 with the ARMv8 architecture and later became part of the ARMv9 series. AArch64 allows ...
*
Alpha
Alpha (uppercase , lowercase ) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter ''aleph'' , whose name comes from the West Semitic word for ' ...
*
ARM
*
AVR
*
Blackfin
Blackfin is a family of 16-/32-bit microprocessors developed, manufactured and marketed by Analog Devices. The processors have built-in, fixed-point digital signal processor (DSP) functionality performed by 16-bit multiply–accumulates (MA ...
*
eBPF
*
Epiphany (GCC 4.8)
*
H8/300
The Hitachi H8 is a large family of 8-bit, 16-bit and 32-bit microcontrollers made by Renesas Technology, originating in the early 1990s within Hitachi Semiconductor. The original design, the H8/300, was an 8-bit processor that had a 16-bit ...
*
HC12
*
IA-32
IA-32 (short for "Intel Architecture, 32-bit", commonly called ''i386'') is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the i386, 80386 microprocessor in 1985. IA-32 is the first incarn ...
(32-bit
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
)
*
IA-64
IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the discontinued Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by ...
(Intel Itanium)
*
MIPS
*
Motorola 68000 series
The Motorola 68000 series (also known as 680x0, m68000, m68k, or 68k) is a family of 32-bit computing, 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and ...
*
MSP430
*
Nvidia GPU
This list contains general information about graphics processing units (GPUs) and video cards from Nvidia, based on official specifications. In addition some Comparison of Nvidia nForce chipsets, Nvidia motherboards come with integrated onboard GP ...
*
Nvidia PTX
*
PA-RISC
Precision Architecture reduced instruction set computer, RISC (PA-RISC) or Hewlett Packard Precision Architecture (HP/PA or simply HPPA), is a computer, general purpose computer instruction set architecture (ISA) developed by Hewlett-Packard f ...
*
PDP-11
The PDP–11 is a series of 16-bit minicomputers originally sold by Digital Equipment Corporation (DEC) from 1970 into the late 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of a ...
*
PowerPC
PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
*
R8C /
M16C /
M32C
*
RISC-V
RISC-V (pronounced "risk-five") is an open standard instruction set architecture (ISA) based on established reduced instruction set computer (RISC) principles. The project commenced in 2010 at the University of California, Berkeley. It transfer ...
*
SPARC
*
SuperH
SuperH (or SH) is a 32-bit reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Hitachi and currently produced by Renesas. It is implemented by microcontrollers and microprocessors for embedded systems.
At the ...
*
System/390 /
z/Architecture
z/Architecture, initially and briefly called ESA Modal Extensions (ESAME), is IBM's 64-bit complex instruction set computer (CISC) instruction set architecture, implemented by its mainframe computers. IBM introduced its first z/Architecture ...
*
VAX
*
x86-64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set architecture, instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new ope ...
Lesser-known target processors supported in the standard release have included:
*
68HC11
*
A29K
*
C6x
*
CR16
*
D30V
*
DSP16xx
*
ETRAX CRIS The ETRAX CRIS is a RISC Instruction set architecture, ISA and series of Central processing unit, CPUs designed and manufactured by Axis Communications for use in embedded systems since 1993. The name is an acronym of the chip's features: ''Ethernet ...
*
FR-30
*
FR-V
The Fujitsu FR-V (Fujitsu RISC- VLIW) is one of the very few processors ever able to process both a very long instruction word (VLIW) and vector processor instructions at the same time, increasing throughput with high parallel computing while ...
*
IBM ROMP
*
Intel i960
Intel's i960 (or 80960) is a RISC-based microprocessor design that became popular during the early 1990s as an embedded system, embedded microcontroller. It became a best-selling CPU in that segment, along with the competing AMD 29000. In spite ...
*
IP2000
*
M32R
The M32R is a 32-bit RISC instruction set architecture (ISA) developed by Mitsubishi Electric for embedded microprocessors and microcontrollers. The ISA is now owned by Renesas Electronics Corporation, and the company designs and fabricates M32R ...
*
MCORE
*
MIL-STD-1750A
*
MMIX
*
MN10200
*
MN10300
The MN103 (also called MN10300 or AM33) is a 32-bit microprocessor series developed by Matsushita Electric Industrial, now Panasonic Corporation. Most variants include a media processor, working as an image processor or video processor. It is us ...
*
Motorola 88000
*
NS32K
*
RL78
*
Stormy16
*
V850
*
Xtensa
Additional processors have been supported by GCC versions maintained separately from the FSF version:
*
Cortus APS3
*
ARC
*
AVR32
AVR32 is a 32-bit RISC microcontroller architecture produced by Atmel. The microcontroller architecture was designed by a handful of people educated at the Norwegian University of Science and Technology, including lead designer Øyvind Strøm ...
*
C166 and
C167
*
D10V
*
EISC
*
eSi-RISC
*
Hexagon
In geometry, a hexagon (from Greek , , meaning "six", and , , meaning "corner, angle") is a six-sided polygon. The total of the internal angles of any simple (non-self-intersecting) hexagon is 720°.
Regular hexagon
A regular hexagon is de ...
*
LatticeMico32
*
LatticeMico8
*
MeP
*
MicroBlaze
*
Motorola 6809
The Motorola 6809 ("''sixty-eight-oh-nine''") is an 8-bit microprocessor with some 16-bit features. It was designed by Motorola's Terry Ritter and Joel Boney and introduced in 1978. Although source compatible with the earlier Motorola 6800, the ...
*
MSP430
*
NEC SX architecture
*
Nios II
Nios II is a 32-bit embedded processor architecture designed specifically for the Altera family of field-programmable gate array (FPGA) integrated circuits. Nios II incorporates many enhancements over the original Nios architecture, making ...
and
Nios
*
OpenRISC
OpenRISC is a project to develop a series of open-source hardware based central processing units (CPUs) on established reduced instruction set computer (RISC) principles. It includes an instruction set architecture (ISA) using an open-source lic ...
*
PDP-10
Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especi ...
*
PIC24/dsPIC
*
PIC32
*
Propeller
A propeller (often called a screw if on a ship or an airscrew if on an aircraft) is a device with a rotating hub and radiating blades that are set at a pitch to form a helical spiral which, when rotated, exerts linear thrust upon a working flu ...
*
Saturn
Saturn is the sixth planet from the Sun and the second largest in the Solar System, after Jupiter. It is a gas giant, with an average radius of about 9 times that of Earth. It has an eighth the average density of Earth, but is over 95 tim ...
(HP48XGCC)
*
System/370
* TIGCC (
m68k
The Motorola 68000 series (also known as 680x0, m68000, m68k, or 68k) is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and w ...
variant)
*
TMS9900
*
TriCore
*
Z8000
*
ZPU
The
GCJ Java compiler can target either a native machine language architecture or the
Java virtual machine
A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally descr ...
's
Java bytecode
Java bytecode is the instruction set of the Java virtual machine (JVM), the language to which Java and other JVM-compatible source code is compiled. Each instruction is represented by a single byte, hence the name bytecode, making it a compact ...
. When
retargeting GCC to a new platform,
bootstrapping
In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Many analytical techniques are often called bootstrap methods in reference to their self-starting or self-supporting ...
is often used. Motorola 68000, Zilog Z80, and other processors are also targeted in the GCC versions developed for various Texas Instruments, Hewlett Packard, Sharp, and Casio programmable graphing calculators.
License
GCC is licensed under the
GNU General Public License
The GNU General Public Licenses (GNU GPL or simply GPL) are a series of widely used free software licenses, or ''copyleft'' licenses, that guarantee end users the freedom to run, study, share, or modify the software. The GPL was the first ...
version 3. The ''GCC runtime exception'' permits compilation of proprietary programs (in addition to free software) with GCC headers and runtime libraries. This does not impact the license terms of GCC source code.
See also
*
List of compilers
This page is intended to list all current compilers, compiler generators, Interpreter (computing), interpreters, translators, tool foundations, Assembler (computing), assemblers, automatable command line interfaces (Shell (computing), shells), et ...
*
MinGW
*
LLVM
LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
/
Clang
Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...
References
Further reading
*
Using the GNU Compiler Collection (GCC)', Free Software Foundation, 2008.
*
GNU Compiler Collection (GCC) Internals', Free Software Foundation, 2008.
*
An Introduction to GCC', Network Theory Ltd., 2004 (Revised August 2005). .
* Arthur Griffith, ''GCC: The Complete Reference''. McGraw Hill / Osborne, 2002. .
External links
Official
*
Other
Collection of GCC 4.0.2 architecture and internals documentsat I.I.T. Bombay
*
*
From Source to Binary: The Inner Workings of GCC by Diego Novillo, ''
Red Hat Magazine'', December 2004
A 2003 paper on GENERIC and GIMPLE an essay covering GCC development for the 1990s, with 30 monthly reports for in the "Inside Cygnus Engineering" section near the end
an essay by Rick Moen recording seven well-known forks, including the GCC/EGCS one
{{Authority control
1987 software
C (programming language) compilers
C++ compilers
Compilers
Cross-platform free software
Fortran compilers
Free and open source compilers
Compiler Collection
Java development tools
Pascal (programming language) compilers
Software that was rewritten in C++
Free software programmed in C++
Software using the GNU General Public License
Unix programming tools