In
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
construction, name mangling (also called name decoration) is a technique used to solve various problems caused by the need to resolve unique names for programming entities in many modern
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
s.
It provides a way of encoding additional information in the name of a
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
,
structure
A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as ...
,
class
Class or The Class may refer to:
Common uses not otherwise categorized
* Class (biology), a taxonomic rank
* Class (knowledge representation), a collection of individuals or objects
* Class (philosophy), an analytical concept used differentl ...
or another
datatype
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
in order to pass more semantic information from the
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
to the
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
.
The need for name mangling arises where the language allows different entities to be named with the same
identifier
An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
as long as they occupy a different
namespace
In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified.
Namespaces ...
(typically defined by a module, class, or explicit ''namespace'' directive) or have different signatures (such as in
function overloading
In some programming languages, function overloading or method overloading is the ability to create multiple functions of the same name with different implementations. Calls to an overloaded function will run a specific implementation of that f ...
). It is required in these use cases because each signature might require different, specialized
calling convention
In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have been ...
in the machine code.
Any
object code
In computing, object code or object module is the product of a compiler.
In a general sense object code is a sequence of statements or instructions in a computer language, usually a machine code language (i.e., binary) or an intermediate langua ...
produced by compilers is usually linked with other pieces of object code (produced by the same or another compiler) by a type of program called a
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
. The linker needs a great deal of information on each program entity. For example, to correctly link a function it needs its name, the number of arguments and their types, and so on.
The simple programming languages of the 1970s, like
C, only distinguished
subroutines
In computer programming, a function or subroutine is a sequence of program instructions that performs a specific task, packaged as a unit. This unit can then be used in programs wherever that particular task should be performed.
Functions may ...
by their name, ignoring other information including parameter and return types.
Later programming languages, like
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, defined stricter requirements for routines to be considered "equal", such as the parameter types, return type, and calling convention of a function. These requirements enable method overloading and detection of some bugs (such as using different definitions of a function when compiling different source files).
These stricter requirements needed to work with existing tools and conventions; therefore, additional requirements were encoded in the name of the symbol, since that was the only information the traditional linker had about a symbol.
Another use of name mangling is for detecting additional non-signature related changes, such as function purity, or whether it can potentially throw an exception or trigger garbage collection. An example of a language doing this is
D. These are more of a simplified error checking. For example, functions and could be compiled into one object file, but then their signatures changed to and used to compile other source calling it. At link time the linker will detect there is no function and return an error. Similarly, the linker will not be able to detect that the return type of is different, and return an error. Otherwise, incompatible calling conventions would be used, and most likely produce the wrong result or crash the program. Mangling doesn't usually capture every detail of the calling process. For example, it doesn't fully prevent errors like changes of data members of a struct or class. For example, could be compiled into one object file, then the definition for changed to be and used in the compilation of a call to . In such cases, the compiler will usually use a different calling convention, but in both cases will mangle to the same name, so the linker will not detect this problem, and the result will usually be a crash or data- or memory corruption at runtime.
Examples
C
Although name mangling is not generally required or used by languages that do not support
function overloading
In some programming languages, function overloading or method overloading is the ability to create multiple functions of the same name with different implementations. Calls to an overloaded function will run a specific implementation of that f ...
, like
C and classic
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Fren ...
, they use it in some cases to provide additional information about a function.
For example, compilers targeted at Microsoft Windows platforms support a variety of
calling convention
In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have been ...
s, which determine the manner in which parameters are sent to subroutines and results are returned. Because the different calling conventions are incompatible with one another, compilers mangle symbols with codes detailing which convention should be used to call the specific routine.
The mangling scheme was established by Microsoft and has been informally followed by other compilers including Digital Mars, Borland, and GNU GCC when compiling code for the Windows platforms. The scheme even applies to other languages, such as
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Fren ...
,
D,
Delphi
Delphi (; ), in legend previously called Pytho (Πυθώ), in ancient times was a sacred precinct that served as the seat of Pythia, the major oracle who was consulted about important decisions throughout the ancient classical world. The oracle ...
,
Fortran, and
C#. This allows subroutines written in those languages to call, or be called by, existing Windows libraries using a calling convention different from their default.
When compiling the following C examples:
int _cdecl f (int x)
int _stdcall g (int y)
int _fastcall h (int z)
32 bit compilers emit, respectively:
_f
_g@4
@h@4
In the and mangling schemes, the function is encoded as
_@
and
@@
respectively, where is the number of bytes, in decimal, of the argument(s) in the parameter list (including those passed in registers, for fastcall). In the case of , the function name is merely prefixed by an underscore.
The 64-bit convention on Windows (Microsoft C) has no leading underscore. This difference may in some rare cases lead to unresolved externals when porting such code to 64 bits. For example, Fortran code can use 'alias' to link against a C method by name as follows:
SUBROUTINE f()
!DEC$ ATTRIBUTES C, ALIAS:'_f' :: f
END SUBROUTINE
This will compile and link fine under 32 bits, but generate an unresolved external
_f
under 64 bits. One workaround for this is not to use 'alias' at all (in which the method names typically need to be capitalized in C and Fortran). Another is to use the BIND option:
SUBROUTINE f() BIND(C,NAME="f")
END SUBROUTINE
In C, most compilers also mangle static functions and variables (and in C++ functions and variables declared static or put in the anonymous namespace) in translation units using the same mangling rules as for their non-static versions. If functions with the same name (and parameters for C++) are also defined and used in different translation units, it will also mangle to the same name, potentially leading to a clash. However, they will not be equivalent if they are called in their respective translation units. Compilers are usually free to emit arbitrary mangling for these functions, because it is illegal to access these from other translation units directly, so they will never need linking between different object code (linking of them is never needed). To prevent linking conflicts, compilers will use standard mangling, but will use so-called 'local' symbols. When linking many such translation units there might be multiple definitions of a function with the same name, but resulting code will only call one or another depending on which translation unit it came from. This is usually done using the
relocation mechanism.
C++
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
compilers are the most widespread users of name mangling. The first C++ compilers were implemented as translators to
C source code, which would then be compiled by a C compiler to object code; because of this, symbol names had to conform to C identifier rules. Even later, with the emergence of compilers that produced machine code or assembly directly, the system's
linker
Linker or linkers may refer to:
Computing
* Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
generally did not support C++ symbols, and mangling was still required.
The
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
language does not define a standard decoration scheme, so each compiler uses its own. C++ also has complex language features, such as
classes,
templates
Template may refer to:
Tools
* Die (manufacturing), used to cut or shape material
* Mold, in a molding process
* Stencil, a pattern or overlay used in graphic arts (drawing, painting, etc.) and sewing to replicate letters, shapes or designs
Co ...
,
namespaces
In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified.
Namespaces ...
, and
operator overloading
In computer programming, operator overloading, sometimes termed ''operator ad hoc polymorphism'', is a specific case of polymorphism, where different operators have different implementations depending on their arguments. Operator overloading is ...
, that alter the meaning of specific symbols based on context or usage. Meta-data about these features can be disambiguated by mangling (decorating) the name of a
symbol
A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
. Because the name-mangling systems for such features are not standardized across compilers, few linkers can link object code that was produced by different compilers.
Simple example
A single C++ translation unit might define two functions named :
int f ()
int f (int)
void g ()
These are distinct functions, with no relation to each other apart from the name. The C++ compiler will therefore encode the type information in the symbol name, the result being something resembling:
int __f_v ()
int __f_i (int)
void __g_v ()
Even though its name is unique, is still mangled: name mangling applies to ''all'' C++ symbols (those not in an
extern "C" block).
Complex example
The mangled symbols in this example, in the comments below the respective identifier name, are those produced by the GNU GCC 3.x compilers, according to the IA-64 (Itanium) ABI:
namespace wikipedia
All mangled symbols begin with (note that an identifier beginning with an underscore followed by a capital letter is a
reserved identifier
In a computer language, a reserved word (also known as a reserved identifier) is a word that cannot be used as an identifier, such as the name of a variable, function, or label – it is "reserved from use". This is a syntactic definition, and a ...
in C, so conflict with user identifiers is avoided); for nested names (including both namespaces and classes), this is followed by , then a series of <length, id> pairs (the length being the length of the next identifier), and finally . For example, becomes:
_ZN9wikipedia7article6formatE
For functions, this is then followed by the type information; as is a function, this is simply ; hence:
_ZN9wikipedia7article6formatEv
For , the standard type (which is a for ) is used, which has the special alias ; a reference to this type is therefore , with the complete name for the function being:
_ZN9wikipedia7article8print_toERSo
How different compilers mangle the same functions
There isn't a standardized scheme by which even trivial C++ identifiers are mangled, and consequently different compilers (or even different versions of the same compiler, or the same compiler on different platforms) mangle public symbols in radically different (and thus totally incompatible) ways. Consider how different C++ compilers mangle the same functions:
Notes:
*The Compaq C++ compiler on OpenVMS VAX and Alpha (but not IA-64) and Tru64 has two name mangling schemes. The original, pre-standard scheme is known as the ARM model, and is based on the name mangling described in the C++ Annotated Reference Manual (ARM). With the advent of new features in standard C++, particularly
templates
Template may refer to:
Tools
* Die (manufacturing), used to cut or shape material
* Mold, in a molding process
* Stencil, a pattern or overlay used in graphic arts (drawing, painting, etc.) and sewing to replicate letters, shapes or designs
Co ...
, the ARM scheme became more and more unsuitable — it could not encode certain function types, or produced identically mangled names for different functions. It was therefore replaced by the newer "ANSI" model, which supported all ANSI template features, but was not backward compatible.
*On IA-64, a standard
Application Binary Interface (ABI) exists (see
external links
An internal link is a type of hyperlink on a web page to another page or resource, such as an image or document, on the same website or domain.
Hyperlinks are considered either "external" or "internal" depending on their target or destination ...
), which defines (among other things) a standard name-mangling scheme, and which is used by all the IA-64 compilers. GNU GCC 3.''x'', in addition, has adopted the name mangling scheme defined in this standard for use on other, non-Intel platforms.
*The Visual Studio and Windows SDK include the program which prints the C-style function prototype for a given mangled name.
*On Microsoft Windows, the Intel compiler and
Clang
Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
uses the Visual C++ name mangling for compatibility.
Handling of C symbols when linking from C++
The job of the common C++ idiom:
#ifdef __cplusplus
extern "C"
#endif
is to ensure that the symbols within are "unmangled" – that the compiler emits a binary file with their names undecorated, as a C compiler would do. As C language definitions are unmangled, the C++ compiler needs to avoid mangling references to these identifiers.
For example, the standard strings library, , usually contains something resembling:
#ifdef __cplusplus
extern "C"
#endif
Thus, code such as:
if (strcmp(argv "-x") 0)
strcpy(a, argv ;
else
memset (a, 0, sizeof(a));
uses the correct, unmangled and . If the had not been used, the (SunPro) C++ compiler would produce code equivalent to:
if (__1cGstrcmp6Fpkc1_i_(argv "-x") 0)
__1cGstrcpy6Fpcpkc_0_(a, argv ;
else
__1cGmemset6FpviI_0_ (a, 0, sizeof(a));
Since those symbols do not exist in the C runtime library (''e.g.'' libc), link errors would result.
Standardized name mangling in C++
It would seem that standardized name mangling in the C++ language would lead to greater interoperability between compiler implementations. However, such a standardization by itself would not suffice to guarantee C++ compiler interoperability and it might even create a false impression that interoperability is possible and safe when it isn't. Name mangling is only one of several
application binary interface
In computer software, an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
An ' ...
(ABI) details that need to be decided and observed by a C++ implementation. Other ABI aspects like
exception handling
In computing and computer programming, exception handling is the process of responding to the occurrence of ''exceptions'' – anomalous or exceptional conditions requiring special processing – during the execution of a program. In general, an ...
,
virtual table
In computer programming, a virtual method table (VMT), virtual function table, virtual call table, dispatch table, vtable, or vftable is a mechanism used in a programming language to support dynamic dispatch (or run-time method binding).
When ...
layout, structure, and stack frame
padding
Padding is thin cushioned material sometimes added to clothes. Padding may also be referred to as batting when used as a layer in lining quilts or as a packaging or stuffing material. When padding is used in clothes, it is often done in an attempt ...
also cause differing C++ implementations to be incompatible. Further, requiring a particular form of mangling would cause issues for systems where implementation limits (e.g., length of symbols) dictate a particular mangling scheme. A standardized ''requirement'' for name mangling would also prevent an implementation where mangling was not required at all — for example, a linker that understood the C++ language.
The
C++ standard
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
therefore does not attempt to standardize name mangling. On the contrary, the ''Annotated C++ Reference Manual'' (also known as ''ARM'', , section 7.2.1c) actively encourages the use of different mangling schemes to prevent linking when other aspects of the ABI are incompatible.
Nevertheless, as detailed in the section above, on some platforms the full C++ ABI has been standardized, including name mangling.
Real-world effects of C++ name mangling
Because C++ symbols are routinely exported from
DLL and
shared object
In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subr ...
files, the name mangling scheme is not merely a compiler-internal matter. Different compilers (or different versions of the same compiler, in many cases) produce such binaries under different name decoration schemes, meaning that symbols are frequently unresolved if the compilers used to create the library and the program using it employed different schemes. For example, if a system with multiple C++ compilers installed (e.g., GNU GCC and the OS vendor's compiler) wished to install the
Boost C++ Libraries
Boost is a set of libraries for the C++ programming language that provides support for tasks and structures such as linear algebra, pseudorandom number generation, multithreading, image processing, regular expressions, and unit testing. It conta ...
, it would have to be compiled multiple times (once for GCC and once for the vendor compiler).
It is good for safety purposes that compilers producing incompatible object codes (codes based on different ABIs, regarding e.g., classes and exceptions) use different name mangling schemes. This guarantees that these incompatibilities are detected at the linking phase, not when executing the software (which could lead to obscure bugs and serious stability issues).
For this reason, name decoration is an important aspect of any C++-related
ABI.
There are instances, particularly in large, complex code bases, where it can be difficult or impractical to map the mangled name emitted within a linker error message back to the particular corresponding token/variable-name in the source. This problem can make identifying the relevant source file(s) very difficult for build or test engineers even if only one compiler and linker are in use. Demanglers (including those within the linker error reporting mechanisms) sometimes help but the mangling mechanism itself may discard critical disambiguating information.
Demangle via c++filt
$ c++filt -n _ZNK3MapI10StringName3RefI8GDScriptE10ComparatorIS0_E16DefaultAllocatorE3hasERKS0_
Map, Comparator, DefaultAllocator>::has(StringName const&) const
Demangle via builtin GCC ABI
#include
#include
#include
int main()
Output:
Java
In Java, the signature of a method or a class contains its name and the types of its method arguments and return value, where applicable. The format of signatures is documented, as the language, compiler, and .class file format were all designed together (and had object-orientation and universal interoperability in mind from the start).
Creating unique names for inner and anonymous classes
The scope of anonymous classes is confined to their parent class, so the compiler must produce a "qualified" public name for the
inner class In object-oriented programming (OOP), an inner class or nested class is a class declared entirely within the body of another class or interface. It is distinguished from a subclass.
Overview
An instance of a normal or top-level class can exist on ...
, to avoid conflict where other classes with the same name (inner or not) exist in the same namespace. Similarly, anonymous classes must have "fake" public names generated for them (as the concept of anonymous classes only exists in the compiler, not the runtime). So, compiling the following java program
public class foo
will produce three .class files:
* foo.class, containing the main (outer) class ''foo''
* foo$bar.class, containing the named inner class ''foo.bar''
* foo$1.class, containing the anonymous inner class (local to method ''foo.zark'')
All of these class names are valid (as $ symbols are permitted in the JVM specification) and these names are "safe" for the compiler to generate, as the Java language definition advises not to use $ symbols in normal java class definitions.
Name resolution in Java is further complicated at runtime, as
fully qualified class names are unique only inside a specific
classloader The Java Class Loader is a part of the Java Runtime Environment that dynamically loads Java classes into the Java Virtual Machine. Usually classes are only loaded on demand. The Java run time system does not need to know about files and file syste ...
instance. Classloaders are ordered hierarchically and each Thread in the JVM has a so-called context class loader, so in cases where two different classloader instances contain classes with the same name, the system first tries to load the class using the root (or system) classloader and then goes down the hierarchy to the context class loader.
Java Native Interface
Java's native method support allows Java language programs to call out to programs written in another language (generally either C or C++). There are two name-resolution concerns here, neither of which is implemented in a particularly standard manner:
* JVM to native name translation - this seems to be more stable, since Oracle makes its scheme public.
* Normal C++ name mangling - see above.
Python
In
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
, mangling is used for class attributes that one does not want subclasses to use which are designated as such by giving them a name with two or more leading underscores and no more than one trailing underscore. For example, will be mangled, as will and , but and will not. Python's runtime does not restrict access to such attributes, the mangling only prevents name collisions if a derived class defines an attribute with the same name.
On encountering name mangled attributes, Python transforms these names by prepending a single underscore and the name of the enclosing class, for example:
>>> class Test:
... def __mangled_name(self):
... pass
... def normal_name(self):
... pass
>>> t = Test()
>>> ttr for attr in dir(t) if "name" in attr _Test__mangled_name', 'normal_name'
Pascal
Borland's Turbo Pascal / Delphi range
To avoid name mangling in Pascal, use:
exports
myFunc name 'myFunc',
myProc name 'myProc';
Free Pascal
Free Pascal
Free Pascal Compiler (FPC) is a compiler for the closely related programming-language dialects Pascal and Object Pascal. It is free software released under the GNU General Public License, witexception clausesthat allow static linking against it ...
supports
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
and operator overloading, thus it also uses name mangling to support these features. On the other hand, Free Pascal is capable of calling symbols defined in external modules created with another language and exporting its own symbols to be called by another language. For further information, consul
Chapter 6.2an
o
Fortran
Name mangling is also necessary in
Fortran compilers, originally because the language is
case insensitive. Further mangling requirements were imposed later in the evolution of the language because of the addition of
modules
Broadly speaking, modularity is the degree to which a system's components may be separated and recombined, often with the benefit of flexibility and variety in use. The concept of modularity is used primarily to reduce complexity by breaking a sy ...
and other features in the Fortran 90 standard. The case mangling, especially, is a common issue that must be dealt with in order to call Fortran libraries, such as
LAPACK
LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also ...
, from other languages, such as
C.
Because of the case insensitivity, the name of a subroutine or function must be converted to a standardized case and format by the compiler so that it will be linked in the same way regardless of case. Different compilers have implemented this in various ways, and no standardization has occurred. The
AIX
Aix or AIX may refer to:
Computing
* AIX, a line of IBM computer operating systems
*An Alternate Index, for a Virtual Storage Access Method Key Sequenced Data Set
* Athens Internet Exchange, a European Internet exchange point
Places Belgi ...
and
HP-UX
HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrity Ser ...
Fortran compilers convert all identifiers to lower case , while the
Cray
Cray Inc., a subsidiary of Hewlett Packard Enterprise, is an American supercomputer manufacturer headquartered in Seattle, Washington. It also manufactures systems for data storage and analytics. Several Cray supercomputer systems are listed ...
and
Unicos
UNICOS is a range of Unix and after it Linux operating system (OS) variants developed by Cray for its supercomputers. UNICOS is the successor of the Cray Operating System (COS). It provides network clustering and source code compatibility layer ...
Fortran compilers converted identifiers to all upper case . The
GNU
GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
g77 compiler converts identifiers to lower case plus an underscore , except that identifiers already containing an underscore have two underscores appended , following a convention established by
f2c
f2c is a program to convert Fortran 77 to C code, developed at Bell Laboratories. The standalone f2c program was based on the core of the first complete Fortran 77 compiler to be implemented, the "f77" program by Feldman and Weinberger. B ...
. Many other compilers, including
SGI SGI may refer to:
Companies
*Saskatchewan Government Insurance
*Scientific Games International, a gambling company
*Silicon Graphics, Inc., a former manufacturer of high-performance computing products
*Silicon Graphics International, formerly Rac ...
's
IRIX
IRIX ( ) is a discontinued operating system developed by Silicon Graphics (SGI) to run on the company's proprietary MIPS workstations and servers. It is based on UNIX System V with BSD extensions. In IRIX, SGI originated the XFS file system and ...
compilers,
GNU Fortran
GNU Fortran or GFortran is the GNU Fortran compiler, which is part of the GNU Compiler Collection (GCC).
It includes full support for the Fortran#Fortran 95, Fortran 95 language, and supports large parts of the Fortran#Fortran 2003, Fortran 2003 an ...
, and
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
's Fortran compiler (except on Microsoft Windows), convert all identifiers to lower case plus an underscore ( and , respectively). On Microsoft Windows, the Intel Fortran compiler defaults to uppercase without an underscore.
Identifiers in Fortran 90 modules must be further mangled, because the same procedure name may occur in different modules. Since the Fortran 2003 Standard requires that module procedure names not conflict with other external symbols, compilers tend to use the module name and the procedure name, with a distinct marker in between. For example:
module m
contains
integer function five()
five = 5
end function five
end module m
In this module, the name of the function will be mangled as (e.g., GNU Fortran), (e.g., Intel's ifort), (e.g., Oracle's sun95), etc. Since Fortran does not allow overloading the name of a procedure, but uses
generic interface blocks and generic type-bound procedures instead, the mangled names do not need to incorporate clues about the arguments.
The Fortran 2003 BIND option overrides any name mangling done by the compiler, as shown
above.
Rust
Function names are mangled by default in
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH ...
. However, this can be disabled by the function attribute. This attribute can be used to export functions to C, C++, or Objective-C. Additionally, along with the function attribute or the crate attribute, it allows the user to define a C-style entry point for the program.
Rust has used many versions of symbol mangling schemes that can be selected at compile time with an option. The following manglers are defined:
* A C++ style mangling based on the Itanium IA-64 C++ ABI. Symbols begin with , and filename hashes are used for disambiguation. Used since Rust 1.9.
* An improved version of the legacy scheme, with changes for Rust. Symbols begin with . Polymorphism can be encoded. Functions don't have return types encoded (Rust does not have overloading). Unicode names use modified
punycode
Punycode is a representation of Unicode with the limited ASCII character subset used for Internet hostnames. Using Punycode, host names containing Unicode characters are transcoded to a subset of ASCII consisting of letters, digits, and hyphens, wh ...
. Compression (backreference) use byte-based addressing. Used since Rust 1.37.
Examples are provided in the Rust tests.
Objective-C
Essentially two forms of method exist in
Objective-C
Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXTS ...
, the
class ("static") method, and the
instance method
A method in object-oriented programming (OOP) is a procedure associated with a message and an object. An object consists of ''state data'' and ''behavior''; these compose an ''interface'', which specifies how the object may be utilized by any ...
. A method declaration in Objective-C is of the following form:
+ (''return-type'') ''name''
0:''parameter''
0 ''name''
1:''parameter''
1 ...
– (''return-type'') ''name''
0:''parameter''
0 ''name''
1:''parameter''
1 ...
Class methods are signified by +, instance methods use -. A typical class method declaration may then look like:
+ (id) initWithX: (int) number andY: (int) number;
+ (id) new;
With instance methods looking like this:
- (id) value;
- (id) setValue: (id) new_value;
Each of these method declarations have a specific internal representation. When compiled, each method is named according to the following scheme for class methods:
_c_''Class''_''name''
0_''name''
1_ ...
and this for instance methods:
_i_''Class''_''name''
0_''name''
1_ ...
The colons in the Objective-C syntax are translated to underscores. So, the Objective-C class method , if belonging to the class would translate as , and the instance method (belonging to the same class) would translate to .
Each of the methods of a class are labeled in this way. However, in order to look up a method that a class may respond to would be tedious if all methods are represented in this fashion. Each of the methods is assigned a unique symbol (such as an integer). Such a symbol is known as a ''selector''. In Objective-C, one can manage selectors directly — they have a specific type in Objective-C — .
During compilation, a table is built that maps the textual representation, such as , to selectors (which are given a type ). Managing selectors is more efficient than manipulating the textual representation of a method. Note that a selector only matches a method's name, not the class it belongs to — different classes can have different implementations of a method with the same name. Because of this, implementations of a method are given a specific identifier too, these are known as implementation pointers, and are also given a type, .
Message sends are encoded by the compiler as calls to the function, or one of its cousins, where is the receiver of the message, and determines the method to call. Each class has its own table that maps selectors to their implementations — the implementation pointer specifies where in memory the actual implementation of the method resides. There are separate tables for class and instance methods. Apart from being stored in the to lookup tables, the functions are essentially anonymous.
The value for a selector does not vary between classes. This enables
polymorphism.
The Objective-C runtime maintains information about the argument and return types of methods. However, this information is not part of the name of the method, and can vary from class to class.
Since Objective-C does not support
namespaces
In computing, a namespace is a set of signs (''names'') that are used to identify and refer to objects of various kinds. A namespace ensures that all of a given set of objects have unique names so that they can be easily identified.
Namespaces ...
, there is no need for the mangling of class names (that do appear as symbols in generated binaries).
Swift
Swift keeps metadata about functions (and more) in the mangled symbols referring to them. This metadata includes the function's name, attributes, module name, parameter types, return type, and more. For example:
The mangled name for a method of a class in module is , for 2014 Swift. The components and their meanings are as follows:
* : The prefix for all Swift symbols. Everything will start with this.
* : Non-curried function.
* : Function of a class, i.e. a method
* : Module name, prefixed with its length.
* : Name of class the function belongs to, prefixed with its length.
* : Function name, prefixed with its length.
* : The function attribute. In this case ‘f’, which means a normal function.
* : Designates the type of the first parameter (namely the class instance) as the first in the type stack (here is not nested and thus has index 0).
* : This begins the type list for the parameter tuple of the function.
* : External name of first parameter of the function.
* : Indicates builtin Swift type Swift.Int for the first parameter.
* : The return type: again Swift.Int.
Mangling for versions since Swift 4.0 is documented officially. It retains some similarity to Itanium.
See also
*
Application programming interface
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
(API)
*
Application binary interface
In computer software, an application binary interface (ABI) is an interface between two binary program modules. Often, one of these modules is a library or operating system facility, and the other is a program that is being run by a user.
An ' ...
(ABI)
*
Calling convention
In computer science, a calling convention is an implementation-level (low-level) scheme for how subroutines or functions receive parameters from their caller and how they return a result. When some code calls a function, design choices have been ...
*
Comparison of application virtualization software (i.e. VMs)
*
Foreign function interface
A foreign function interface (FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written in another.
Naming
The term comes from the specification for Common Lisp, which explicit ...
(FFI)
*
Java Native Interface
In software design, the Java Native Interface (JNI) is a foreign function interface programming framework that enables Java code running in a Java virtual machine (JVM) to call and be called by
native applications (programs specific to a hardwa ...
(JNI)
*
Language binding
In programming and software design, binding is an application programming interface (API) that provides glue code specifically made to allow a programming language to use a foreign library or operating system service (one that is not native to th ...
*
Stropping
*
SWIG
References
External links
Linux Itanium ABI for C++ including name mangling scheme.
Macintosh C/C++ ABI Standard Specification— filter to demangle encoded C++ symbols for GNU/Intel compilers
undname— msvc tool to demangle names.
demangler.com— An online tool for demangling GCC and MSVC C++ symbols
— From Apple's
'
Calling conventions for different C++ compilersby
Agner Fog
Agner Fog is a Danish evolutionary anthropologist and computer scientist. He is currently an Associate Professor of computer science at the Technical University of Denmark (DTU), and has been present at DTU since 1995. He is best known for coining ...
contains detailed description of name mangling schemes for various x86 and x64 C++ compilers (pp. 24–42 in 2011-06-08 version)
C++ Name Mangling/DemanglingQuite detailed explanation of Visual C++ compiler name mangling scheme
PHP UnDecorateSymbolNamea php script that demangles Microsoft Visual C's function names.
* Code
ftp://ftp.iecc.com/pub/linker/] Errata
https://archive.today/20200114224817/https://linker.iecc.com/ 2020-01-14 -->Name mangling demystified by Fivos Kefallonitis
{{application binary interface
C++
Computer libraries
Java (programming language)
Compiler construction