HOME

TheInfoList




The GNU Compiler Collection (GCC) is an
optimizing compiler In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithm of an algorithm (Euclid's algorithm) for calculating the greatest commo ...
produced by the
GNU Project The GNU Project () is a free software Free software (or libre software) is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versi ...
supporting various
programming language A programming language is a formal language In mathematics Mathematics (from Ancient Greek, Greek: ) includes the study of such topics as quantity (number theory), mathematical structure, structure (algebra), space (geometry), and calcu ...

programming language
s, hardware architectures and
operating system An operating system (OS) is system software System software is software designed to provide a platform for other software. Examples of system software include operating systems (OS) like macOS, Linux, Android (operating system), Android and Mi ...

operating system
s. The
Free Software Foundation The Free Software Foundation (FSF) is a 501(c)(3) non-profit organization founded by Richard Stallman on October 4, 1985, to support the free software movement, which promotes the universal freedom to study, distribute, create, and modify compu ...
(FSF) distributes GCC as
free software Free software (or libre software) is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty ...
under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software license A free-software license is a notice that grants the recipient of a piece of software extensive rights to modify and redistribute that ...
(GNU GPL). GCC is a key component of the
GNU toolchain The GNU toolchain is a broad collection of programming tools produced by the GNU Project. These tools form a toolchain (a suite of tools used in a serial manner) used for developing software application software, applications and operating systems. ...
and the standard compiler for most projects related to
GNU GNU () is an extensive collection of free software Free software (or libre software) is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any ...

GNU
and the
Linux kernel The Linux kernel is a free and open-source Free and open-source software (FOSS) is software that is both free software and open-source software where anyone is free software license, freely licensed to use, copy, study, and change the softwar ...
. With roughly 15 million lines of code in 2019, GCC is one of the biggest free programs in existence. It has played an important role in the growth of
free software Free software (or libre software) is computer software distributed under terms that allow users to run the software for any purpose as well as to study, change, and distribute it and any adapted versions. Free software is a matter of liberty ...
, as both a tool and an example. When it was first released in 1987 by
Richard Stallman Richard Matthew Stallman (; born March 16, 1953), also known by his initials, rms, is an American free software movement The free software movement is a social movement Social organisms, including humans, live collectively in interacting popul ...

Richard Stallman
, GCC 1.0 was named the GNU C Compiler since it only handled the
C programming language C (, as in the letter ''c'') is a general-purpose, procedural computer programming language A programming language is a formal language comprising a Instruction set architecture, set of instructions that produce various kinds of Input/outp ...
. It was extended to compile
C++ C++ () is a general-purpose programming language In computer software, a general-purpose programming language is a programming language dedicated to a general-purpose, designed to be used for writing software in a wide variety of application ...

C++
in December of that year. Front ends were later developed for
Objective-C Objective-C is a general-purpose, object-oriented Object-oriented programming (OOP) is a programming paradigm Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of m ...
,
Objective-C++ Objective-C is a General-purpose programming language, general-purpose, Object-oriented programming, object-oriented programming language that adds Smalltalk-style Message passing, messaging to the C (programming language), C programming language ...
,
Fortran Fortran (; formerly FORTRAN) is a general-purpose, compiled In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another langua ...

Fortran
,
Ada Ada may refer to: Places Africa * Ada Foah Ada Foah is a town on the southeast coast of Ghana, where the Volta River meets the Atlantic Ocean. The town is located along the Volta River, off of the Accra-Aflao motorway. Known for Palm tree, pal ...
, D and Go, among others. The
OpenMP OpenMP (Open Multi-Processing) is an application programming interface In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithm ...
and
OpenACC OpenACC (for ''open accelerators'') is a programming standard for parallel computing Parallel computing is a type of computing, computation where many calculations or the execution of Process (computing), processes are carried out simultaneously ...
specifications are also supported in the C and C++ compilers. GCC has been
ported In , porting is the process of adapting for the purpose of achieving some form of execution in a that is different from the one that a given program (meant for such execution) was originally designed for (e.g., different , operating system, or ...
to more platforms and
instruction set architecture In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algo ...
s than any other compiler, and is widely deployed as a tool in the development of both free and
proprietary software Proprietary software, also known as non-free software or closed-source software, is computer software for which the software's publisher or another person reserves some rights from licenses to use, modify, share modifications, or share the softwa ...
. GCC is also available for many
embedded system An embedded system is a computer system A computer is a machine that can be programmed to carry out Sequence, sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operations known as Com ...
s, including
ARM In human anatomy, the arm is the part of the upper limb The upper Limb (anatomy), limbs or upper extremities are the forelimbs of an upright posture, upright-postured tetrapod vertebrate, extending from the scapulae and clavicles down to and incl ...
-based and
Power ISA The Power ISA is an instruction set architecture In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their ...
-based chips. As well as being the official compiler of the
GNU operating system GNU () is an extensive collection of free software Free software (or libre software) is computer software Software is a collection of Instruction (computer science), instructions and data (computing), data that tell a computer how to ...
, GCC has been adopted as the standard compiler by many other modern
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system An operating system (OS) is system software System software is software designed to provide a platform for other software. Examples of system software include operating s ...
computer
operating system An operating system (OS) is system software System software is software designed to provide a platform for other software. Examples of system software include operating systems (OS) like macOS, Linux, Android (operating system), Android and Mi ...

operating system
s, including most
Linux Linux ( or ) is a family of open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product ...

Linux
distributions. Most
BSD The Berkeley Software Distribution (BSD) is a discontinued based on , developed and distributed by the (CSRG) at the . The term "BSD" commonly refers to its descendants, including , , , and . BSD was initially called Berkeley Unix because it ...
family operating systems also switched to GCC shortly after its release, although since then,
FreeBSD FreeBSD is a free and open-source Free and open-source software (FOSS) is software that is both free software and open-source software where anyone is free software license, freely licensed to use, copy, study, and change the software in any ...

FreeBSD
,
OpenBSD OpenBSD is a security-focused operating system, security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking N ...
and Apple macOS have moved to the
Clang Clang is a compiler front end In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both com ...
compiler, largely due to licensing reasons. GCC can also compile code for
Windows Microsoft Windows, commonly referred to as Windows, is a group of several proprietary {{Short pages monitor On some platforms, the distribution also includes a low-level runtime library, libgcc, written in a combination of machine-independent C and processor-specific
machine code In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a particular task. Programming involves tasks such as analysis, ge ...
, designed primarily to handle arithmetic operations that the target processor cannot perform directly. GCC uses many standard tools in its build, including
Perl Perl is a family of two high-level High-level and low-level, as technical terms, are used to classify, describe and point to specific Objective (goal), goals of a systematic operation; and are applied in a wide range of contexts, such as, for ...
, Flex,
Bison Bison are large, even-toed ungulate The even-toed ungulates (Artiodactyla , ) are ungulate Ungulates ( ) are members of the diverse clade A clade (; from grc, , ''klados'', "branch"), also known as a monophyletic group or natural gro ...
, and other common tools. In addition, it currently requires three additional libraries to be present in order to build: GMP, MPC, and MPFR. In May 2010, the GCC steering committee decided to allow use of a
C++ C++ () is a general-purpose programming language In computer software, a general-purpose programming language is a programming language dedicated to a general-purpose, designed to be used for writing software in a wide variety of application ...

C++
compiler to compile GCC. The compiler was intended to be written mostly in C plus a subset of features from C++. In particular, this was decided so that GCC's developers could use the Destructor (computer science), destructors and Generic programming, generics features of C++. In August 2012, the GCC steering committee announced that GCC now uses C++ as its implementation language. This means that to build GCC from sources, a C++ compiler is required that understands C++03, ISO/IEC C++03 standard. On May 18, 2020, GCC moved away from C++03, ISO/IEC C++03 standard to C++11, ISO/IEC C++11 standard (i.e. needed to compile, bootstrap, the compiler itself; by default it however compiles later versions of C++).


Front ends

Each front end (compiler), front end uses a parser to produce the
abstract syntax tree In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algor ...
of a given source file. Due to the syntax tree abstraction, source files of any of the different supported languages can be processed by the same back end (Compiler), back end. GCC started out using LALR parsers generated with GNU Bison, Bison, but gradually switched to hand-written Recursive descent parser, recursive-descent parsers for C++ in 2004, and for C and Objective-C in 2006. As of 2021 all front ends use hand-written recursive-descent parsers. Until GCC 4.0 the tree representation of the program was not fully independent of the processor being targeted. The meaning of a tree was somewhat different for different language front ends, and front ends could provide their own tree codes. This was simplified with the introduction of GENERIC and GIMPLE, two new forms of language-independent trees that were introduced with the advent of GCC 4.0. GENERIC is more complex, based on the GCC 3.x Java front end's intermediate representation. GIMPLE is a simplified GENERIC, in which various constructs are ''lowering (computer science), lowered'' to multiple GIMPLE instructions. The C (Programming Language), C,
C++ C++ () is a general-purpose programming language In computer software, a general-purpose programming language is a programming language dedicated to a general-purpose, designed to be used for writing software in a wide variety of application ...

C++
, and
Java Java ( id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 147.7 million people, Java is the world's List of ...
front ends produce GENERIC directly in the front end. Other front ends instead have different intermediate representations after parsing and convert these to GENERIC. In either case, the so-called "gimplifier" then converts this more complex form into the simpler Static single assignment form, SSA-based GIMPLE form that is the common language for a large number of powerful language- and architecture-independent global (function scope) optimizations.


GENERIC and GIMPLE

''GENERIC'' is an intermediate representation language used as a "middle end" while compiling source code into executable, executable binaries. A subset, called ''GIMPLE'', is targeted by all the front ends of GCC. The middle stage of GCC does all of the code analysis and optimizing compiler, optimization, working independently of both the compiled language and the target architecture, starting from the GENERIC representation and expanding it to register transfer language (RTL). The GENERIC representation contains only the subset of the imperative computer programming, programming constructs optimized by the middle end. In transforming the source code to GIMPLE, complex Expression (programming), expressions are split into a three-address code using temporary variables. This representation was inspired by the SIMPLE representation proposed in the McCAT compiler by Laurie J. Hendren for simplifying the analysis and Optimization (computer science), optimization of Imperative programming, imperative programs.


Optimization

Optimization can occur during any phase of compilation; however, the bulk of optimizations are performed after the syntax and Semantic analysis (compiler), semantic analysis of the front end and before the Code generation (compiler), code generation of the back end; thus a common, even though somewhat contradictory, name for this part of the compiler is the "middle end." The exact set of GCC optimizations varies from release to release as it develops, but includes the standard algorithms, such as loop optimization, jump threading, common subexpression elimination, instruction scheduling, and so forth. The RTL optimizations are of less importance with the addition of global SSA-based optimizations on GIMPLE trees, as RTL optimizations have a much more limited scope, and have less high-level information. Some of these optimizations performed at this level include dead code elimination, partial redundancy elimination, global value numbering, sparse conditional constant propagation, and scalar replacement of aggregates. Array dependence based optimizations such as automatic vectorization and automatic parallelization are also performed. Profile-guided optimization is also possible.


Back end

The GCC's back end is partly specified by C preprocessor, preprocessor macros and functions specific to a target architecture, for instance to define its endianness, word size, and calling conventions. The front part of the back end uses these to help decide RTL generation, so although GCC's RTL is nominally processor-independent, the initial sequence of abstract instructions is already adapted to the target. At any moment, the actual RTL instructions forming the program representation have to comply with the machine description of the target architecture. The machine description file contains RTL patterns, along with operand constraints, and code snippets to output the final assembly. The constraints indicate that a particular RTL pattern might only apply (for example) to certain hardware registers, or (for example) allow immediate operand offsets of only a limited size (e.g. 12, 16, 24, ... bit offsets, etc.). During RTL generation, the constraints for the given target architecture are checked. In order to issue a given snippet of RTL, it must match one (or more) of the RTL patterns in the machine description file, and satisfy the constraints for that pattern; otherwise, it would be impossible to convert the final RTL into machine code. Towards the end of compilation, valid RTL is reduced to a ''strict'' form in which each instruction refers to real machine registers and a pattern from the target's machine description file. Forming strict RTL is a complicated task; an important step is register allocation, where real hardware registers are chosen to replace the initially assigned pseudo-registers. This is followed by a "reloading" phase; any pseudo-registers that were not assigned a real hardware register are 'spilled' to the stack, and RTL to perform this spilling is generated. Likewise, offsets that are too large to fit into an actual instruction must be broken up and replaced by RTL sequences that will obey the offset constraints. In the final phase, the machine code is built by calling a small snippet of code, associated with each pattern, to generate the real instructions from the target's instruction set, using the final registers, offsets, and addresses chosen during the reload phase. The assembly-generation snippet may be just a string, in which case a simple string substitution of the registers, offsets, and/or addresses into the string is performed. The assembly-generation snippet may also be a short block of C code, performing some additional work, but ultimately returning a string containing the valid assembly code.


C++ Standard Library (libstdc++)

The GCC project includes an implementation of the C++ Standard Library called libstdc++, licensed under the GPLv3 License with an exception to link closed source application when sources are built with GCC. The current version is 11.


Other features

Some features of GCC include: ; Link-time optimization : Link-time optimization optimizes across object file boundaries to directly improve the linked binary. Link-time optimization relies on an intermediate file containing the serialization of some ''Gimple'' representation included in the object file. The file is generated alongside the object file during source compilation. Each source compilation generates a separate object file and link-time helper file. When the object files are linked, the compiler is executed again and uses the helper files to optimize code across the separately compiled object files. ; Plugins : Plug-in (computing), Plugins extend the GCC compiler directly. Plugins allow a stock compiler to be tailored to specific needs by external code loaded as plugins. For example, plugins can add, replace, or even remove middle-end passes operating on ''Gimple'' representations. Several GCC plugins have already been published, notably: :* The Python plugin, which links against libpython, and allows one to invoke arbitrary Python scripts from inside the compiler. The aim is to allow GCC plugins to be written in Python. :* The MELT plugin provides a high-level Lisp (programming language), Lisp-like language to extend GCC. : The support of plugins was once a contentious issue in 2007. ; C++ Software transactional memory, transactional memory : The C++ language has an active proposal for transactional memory. It can be enabled in GCC 6 and newer when compiling with -fgnu-tm. ; Unicode identifiers : Although the C++ language requires support for non-ASCII Unicode characters in Identifier (computer languages), identifiers, the feature has only been supported since GCC 10. As with the existing handling of string literals, the source file is assumed to be encoded in UTF-8. The feature is optional in C, but has been made available too since this change.


Architectures

GCC target processor families as of version 11.1 include: * DEC Alpha, Alpha *
ARM In human anatomy, the arm is the part of the upper limb The upper Limb (anatomy), limbs or upper extremities are the forelimbs of an upright posture, upright-postured tetrapod vertebrate, extending from the scapulae and clavicles down to and incl ...
* Atmel AVR, AVR * Blackfin * Adapteva#Products, Epiphany (GCC 4.8) * Hitachi H8, H8/300 * HC12 * IA-32 (x86) * IA-64 (Intel Itanium) * MIPS architecture, MIPS * Motorola 68000 * PA-RISC * PDP-11 * PowerPC * R8C / M16C / M32C * SPARC * SuperH * System/390 / zSeries * VAX * x86-64 * Nvidia GPU * Nvidia PTX * AArch64 * RISC-V * MSP430 * eBPF Lesser-known target processors supported in the standard release have included: * 68HC11 * A29K * CR16 * C6x * D30V * DSP16xx * ETRAX CRIS * Fujitsu FR, FR-30 * FR-V * Intel i960 * IP2000 * M32R * MCORE * MIL-STD-1750A * MMIX * MN10200 * MN10300 * Motorola 88000 * NS320xx, NS32K * IBM ROMP * RL78 * Stormy16 * V850 * Xtensa Additional processors have been supported by GCC versions maintained separately from the FSF version: * Cortus APS3 * ARC (processor), ARC * AVR32 * C166 and C167 * D10V * EISC * eSi-RISC * Hexagon (processor), Hexagon * LatticeMico32 * LatticeMico8 * MeP * MicroBlaze * Motorola 6809 * MRISC32 * MSP430 * NEC SX architecture * Nios II and Nios embedded processor, Nios * OpenRISC * PDP-10 * PIC30#PIC24 and dsPIC 16-bit microcontrollers, PIC24/dsPIC * PIC30#PIC32 32-bit microcontrollers, PIC32 * Parallax Propeller, Propeller * HP Saturn, Saturn (HP48XGCC) * System/370 * TIGCC (m68k variant) * TMS9900 * TriCore * Z8000 * ZPU (microprocessor), ZPU The GNU Compiler for Java, GCJ Java compiler can target either a native machine language architecture or the Java virtual machine's Java bytecode. When retargetable compiler, retargeting GCC to a new platform, bootstrapping (compilers), bootstrapping is often used. Motorola 68000, Zilog Z80, and other processors are also targeted in the GCC versions developed for various Texas Instruments, Hewlett Packard, Sharp, and Casio programmable graphing calculators.


License

GCC is licensed under the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software license A free-software license is a notice that grants the recipient of a piece of software extensive rights to modify and redistribute that ...
version 3. The ''GCC runtime exception'' permits compilation of proprietary programs (in addition to free software) with GCC. This does not impact the license terms of GCC source code.


See also

* List of compilers * MinGW * LLVM/
Clang Clang is a compiler front end In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both com ...


References


Further reading

*
Using the GNU Compiler Collection (GCC)
', Free Software Foundation, 2008. *
GNU Compiler Collection (GCC) Internals
', Free Software Foundation, 2008. *
An Introduction to GCC
', Network Theory Ltd., 2004 (Revised August 2005). . * Arthur Griffith, ''GCC: The Complete Reference''. McGrawHill / Osborne, 2002. .


External links


Official

*




Other


Collection of GCC 4.0.2 architecture and internals documents
at I.I.T. Bombay * *
From Source to Binary: The Inner Workings of GCC
by Diego Novillo, ''Red Hat#Red Hat Magazine, Red Hat Magazine'', December 2004
A 2003 paper on GENERIC and GIMPLE


an essay covering GCC development for the 1990s, with 30 monthly reports for in the "Inside Cygnus Engineering" section near the end





an essay by Rick Moen recording seven well-known forks, including the GCC/EGCS one {{Authority control 1987 software C (programming language) compilers C++ compilers Compilers Cross-platform free software Fortran compilers Free compilers and interpreters GNU Project software, Compiler Collection Java development tools Pascal (programming language) compilers Software that was rewritten in C++ Free software programmed in C++ Software using the GPL license Unix programming tools