One-time Position-independent Code
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, position-independent code (PIC) or position-independent executable (PIE) is a body of
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
that, being placed somewhere in the
primary memory Computer data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers. The central processing unit (CPU) of a computer ...
, executes properly regardless of its
absolute address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Su ...
. PIC is commonly used for
shared libraries In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subr ...
, so that the same library code can be loaded in a location in each program address space where it does not overlap with other memory in use (for example, other shared libraries). PIC was also used on older computer systems that lacked an MMU, so that the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
could keep applications away from each other even within the single
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve st ...
of an MMU-less system. Position-independent code can be executed at any memory address without modification. This differs from
absolute code In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. S ...
, which must be loaded at a specific location to function correctly, and load-time locatable (LTL) code, in which a
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
or
program loader In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them ...
modifies a program before execution so it can be run only from a particular memory location. Generating position-independent code is often the default behavior for
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s, but they may place restrictions on the use of some language features, such as disallowing use of absolute addresses (position-independent code has to use
relative address In computer science, an offset within an Array data structure, array or other data structure object is an integer (computer science), integer indicating the distance (displacement) between the beginning of the object and a given element or point, ...
ing). Instructions that refer directly to specific memory addresses sometimes execute faster, and replacing them with equivalent relative-addressing instructions may result in slightly slower execution, although modern processors make the difference practically negligible.


History

In early computers such as the
IBM 701 The IBM 701 Electronic Data Processing Machine, known as the Defense Calculator while in development, was IBM’s first commercial scientific computer and its first series production mainframe computer, which was announced to the public on May ...
(29 April 1952) or the
UNIVAC I The UNIVAC I (Universal Automatic Computer I) was the first general-purpose electronic digital computer design for business application produced in the United States. It was designed principally by J. Presper Eckert and John Mauchly, the invento ...
(31 March 1951) code was position-dependent: each program was built to load into and run from a particular address. Those early computers did not have an operating system and were not multitasking-capable. Programs were loaded into main storage (or even stored on magnetic drum for execution directly from there) and run one at a time. In such an operational context, position-independent code was not necessary. The
IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applica ...
(7 April 1964) was designed with truncated addressing similar to that of the
UNIVAC III The UNIVAC III, designed as an improved transistorized replacement for the vacuum tube UNIVAC I and UNIVAC II computers, was introduced in June 1962, with Westinghouse agreeing to furnish system programing and marketing on June 1, 1962. It was d ...
, with code position independence in mind. In truncated addressing, memory addresses are calculated from a ''base register'' and an offset. At the beginning of a program, the programmer must establish ''addressability'' by loading a base register; normally the programmer also informs the assembler with a ''USING'' pseudo-op. The programmer can load the base register from a register known to contain the entry point address, typically R15, or can use th
BALR (Branch And Link, Register form)
instruction (with a R2 Value of 0) to store the next sequential instruction's address into the base register, which was then coded explicitly or implicitly in each instruction that referred to a storage location within the program. Multiple base registeres could be used, for code or for data. Such instructions require less memory because they do not have to hold a full 24, 31, 32, or 64 bit address (4 or 8 bytes), but instead a base register number (encoded in 4 bits) and a 12–bit address offset (encoded in 12 bits), requiring only two bytes. This programming technique is standard on IBM S/360 type systems. It has been in use through to today's IBM System/z. When coding in assembly language, the programmer has to establish addressability for the program as described above and also use other base registers for dynamically allocated storage. Compilers automatically take care of this kind of addressing. IBM's early operating system
DOS/360 Disk Operating System/360, also DOS/360, or simply DOS, is the discontinued first member of a sequence of operating systems for IBM System/360, System/370 and later mainframes. It was announced by IBM on the last day of 1964, and it was first d ...
(1966) was not using virtual storage (since the early models of System S/360 did not support it), but it did have the ability to place programs to an arbitrary (or automatically chosen) storage location during loading via the PHASE name,* JCL (Job Control Language) statement. So, on S/360 systems without virtual storage, a program could be loaded at any storage location, but this required a contiguous memory area large enough to hold that program. Sometimes
memory fragmentation In computer storage, fragmentation is a phenomenon in which storage space, main storage or secondary storage, is used inefficiently, reducing capacity or performance and often both. The exact consequences of fragmentation depend on the specif ...
would occur from loading and unloading differently sized modules. Virtual storage - by design - does not have that limitation. While DOS/360 and
OS/360 OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB ...
did not support PIC, transient SVC routines in OS/360 could not contain relocatable address constants and could run in any of the transient areas without relocation. Virtual storage was first introduced on
IBM System/360 model 67 The IBM System/360 Model 67 (S/360-67) was an important IBM mainframe model in the late 1960s. * It had "its own powerful operating system... heTime Sharing System monitor (TSS)" offering "virtually instantaneous access to and response from t ...
in (1965) to support IBM's first multi-tasking operating and time-sharing operating system TSS/360. Later versions of DOS/360 (DOS/VS etc.) and later IBM operating systems all utilized virtual storage. Truncated addressing remained as part of the base architecture, and still advantageous when multiple modules must be loaded into the same virtual address space. By way of comparison, on early segmented systems such as
Burroughs MCP The MCP (Master Control Program) is the operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems. MCP was originally written in 1961 in ESPOL (Executive Systems Problem Oriented Language). In ...
on the
Burroughs B5000 The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables.E.g., 12-bit syllables for B5000, 8-bit syllables for B6500 The first machine in the family was the B5000 in ...
(1961) and
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of t ...
(1964), paging systems such as IBM
TSS/360 The IBM Time Sharing System TSS/360 is a discontinued early time-sharing operating system designed exclusively for a special model of the System/360 line of mainframes, the Model 67. Made available on a trial basis to a limited set of custom ...
(1967) or
base and bounds In computing base and bounds refers to a simple form of virtual memory where access to computer memory is controlled by one or a small number of sets of processor registers called ''base and bounds registers''. In its simplest form each user proce ...
systems such as
GECOS General Comprehensive Operating System (GCOS, ; originally GECOS, General Electric Comprehensive Operating Supervisor) is a family of operating systems oriented toward the 36-bit GE/Honeywell mainframe computers. The original version of GCOS wa ...
on the GE 625 and
EXEC Exec or EXEC may refer to: * Executive officer, a person responsible for running an organization * Executive producer, provides finance and guidance for the making of a commercial entertainment product * A family of kit helicopters produced by Rot ...
on the
UNIVAC 1107 The UNIVAC 1100/2200 series is a series of compatible 36-bit computer systems, beginning with the UNIVAC 1107 in 1962, initially made by Sperry Rand. The series continues to be supported today by Unisys Corporation as the ClearPath Dorado Series. ...
, code was also inherently position-independent, since addresses in a program were relative to the current segment rather than absolute. The invention of dynamic address translation (the function provided by an MMU) originally reduced the need for position-independent code because every process could have its own independent
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve st ...
(range of addresses). However, multiple simultaneous jobs using the same code created a waste of physical memory. If two jobs run entirely identical programs, dynamic address translation provides a solution by allowing the system simply to map two different jobs' address 32K to the same bytes of real memory, containing the single copy of the program. Different programs may share common code. For example, the payroll program and the accounts receivable program may both contain an identical sort subroutine. A shared module (a shared library is a form of shared module) gets loaded once and mapped into the two address spaces.


Technical details

Procedure calls inside a shared library are typically made through small procedure linkage table
stub Stub or Stubb may refer to: Shortened objects and entities * Stub (stock), the portion of a corporation left over after most but not all of it has been bought out or spun out * Stub, a tree cut and allowed to regrow from the trunk; see Pollardi ...
s, which then call the definitive function. This notably allows a shared library to inherit certain function calls from previously loaded libraries rather than using its own versions. Data references from position-independent code are usually made indirectly, through
Global Offset Table The Global Offset Table, or GOT, is a section of a computer program's (executables and shared libraries) memory used to enable computer program code compiled as an ELF file to run correctly, independent of the memory address where the program's co ...
s (GOTs), which store the addresses of all accessed
global variable In computer programming, a global variable is a variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed. The set of all global variables is known as the ''global environment'' or ''global s ...
s. There is one GOT per compilation unit or object module, and it is located at a fixed offset from the code (although this offset is not known until the library is linked). When a
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
links modules to create a shared library, it merges the GOTs and sets the final offsets in code. It is not necessary to adjust the offsets when loading the shared library later. Position independent functions accessing global data start by determining the absolute address of the GOT given their own current program counter value. This often takes the form of a fake function call in order to obtain the return value on stack (
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
), in a specific standard register (
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...
, MIPS), or a special register (
POWER Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...
/
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
/
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...
), which can then be moved to a predefined standard register, or to obtain it into that standard register (
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
,
Alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...
,
ESA/390 The IBM System/390 is a discontinued mainframe product family implementing the ESA/390, the fifth generation of the System/360 instruction set architecture. The first computers to use the ESA/390 were the Enterprise System/9000 (ES/9000) ...
and
z/Architecture z/Architecture, initially and briefly called ESA Modal Extensions (ESAME), is IBM's 64-bit complex instruction set computer (CISC) instruction set architecture, implemented by its mainframe computers. IBM introduced its first z/Architecture-b ...
). Some processor architectures, such as the
Motorola 68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
,
Motorola 6809 The Motorola 6809 ("''sixty-eight-oh-nine''") is an 8-bit microprocessor with some 16-bit features. It was designed by Motorola's Terry Ritter and Joel Boney and introduced in 1978. Although source compatible with the earlier Motorola 6800, the 6 ...
,
WDC 65C816 The W65C816S (also 65C816 or 65816) is an 8/16-bit microprocessor (MPU) developed and sold by the Western Design Center (WDC). Introduced in 1985, the W65C816S is an enhanced version of the WDC 65C02 8-bit computing, 8-bit MPU, itself a CMOS enha ...
, Knuth's
MMIX MMIX (pronounced ''em-mix'') is a 64-bit reduced instruction set computing (RISC) architecture designed by Donald Knuth, with significant contributions by John L. Hennessy (who contributed to the design of the MIPS architecture) and Richard L. S ...
,
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
and
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
allow referencing data by offset from the
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
. This is specifically targeted at making position-independent code smaller, less register demanding and hence more efficient.


Windows DLLs

Dynamic-link libraries Dynamic-link library (DLL) is Microsoft's implementation of the shared library concept in the Microsoft Windows and OS/2 operating systems. These libraries usually have the file extension DLL, OCX (for libraries containing ActiveX controls), o ...
(DLLs) in
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...
use variant E8 of the CALL instruction (Call near, relative, displacement relative to next instruction). These instructions need not be fixed up when a DLL is loaded. Some global variables (e.g. arrays of string literals, virtual function tables) are expected to contain an address of an object in data section respectively in code section of the dynamic library; therefore, the stored address in the global variable must be updated to reflect the address where the DLL was loaded to. The dynamic loader calculates the address referred to by a global variable and stores the value in such global variable; this triggers copy-on-write of a memory page containing such global variable. Pages with code and pages with global variables that do not contain pointers to code or global data remain shared between processes. This operation must be done in any OS that can load a dynamic library at arbitrary address. In Windows Vista and later versions of Windows, the relocation of DLLs and executables is done by the kernel memory manager, which shares the relocated binaries across multiple processes. Images are always relocated from their preferred base addresses, achieving
address space layout randomization Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited fu ...
(ASLR). Versions of Windows prior to Vista require that system DLLs be
prelink In computing, prebinding, also called prelinking, is a method for optimizing application load times by resolving library symbols prior to launch. Background Most computer programs consist of code that requires external shared libraries to exec ...
ed at non-conflicting fixed addresses at the link time in order to avoid runtime relocation of images. Runtime relocation in these older versions of Windows is performed by the DLL loader within the context of each process, and the resulting relocated portions of each image can no longer be shared between processes. The handling of DLLs in Windows differs from the earlier
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 ...
procedure it derives from. OS/2 presents a third alternative and attempts to load DLLs that are not position-independent into a dedicated "shared arena" in memory, and maps them once they are loaded. All users of the DLL are able to use the same in-memory copy.


Multics

In
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of t ...
each procedure conceptually has a code segment and a linkage segment. The code segment contains only code and the linkage section serves as a template for a new linkage segment. Pointer register 4 (PR4) points to the linkage segment of the procedure. A call to a procedure saves PR4 in the stack before loading it with a pointer to the callee's linkage segment. The procedure call uses an indirect pointer pair with a flag to cause a trap on the first call so that the dynamic linkage mechanism can add the new procedure and its linkage segment to the Known Segment Table (KST), construct a new linkage segment, put their segment numbers in the caller's linkage section and reset the flag in the indirect pointer pair.


TSS

In IBM S/360 Time Sharing System (TSS/360 and TSS/370) each procedure may have a read-only public CSECT and a writable private Prototype Section (PSECT). A caller loads a V-constant for the routine into General Register 15 (GR15) and copies an R-constant for the routine's PSECT into the 19th word of the save area pointed to be GR13. The Dynamic Loader does not load program pages or resolve address constants until the first page fault.


Position-independent executables

''Position-independent executables'' (PIE) are executable binaries made entirely from position-independent code. While some systems only run PIC executables, there are other reasons they are used. PIE binaries are used in some security-focused
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...
distributions to allow
PaX Pax or PAX may refer to: Peace * Peace (Latin: ''pax'') ** Pax (goddess), the Roman goddess of peace ** Pax, a truce term * Pax (liturgy), a salutation in Catholic and Lutheran religious services * Pax (liturgical object), an object formerly ki ...
or
Exec Shield Exec Shield is a project started at Red Hat, Inc in late 2002 with the aim of reducing the risk of worm or other automated remote attacks on Linux systems. The first result of the project was a security patch for the Linux kernel that emulates an ...
to use
address space layout randomization Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited fu ...
to prevent attackers from knowing where existing executable code is during a security attack using exploits that rely on knowing the offset of the executable code in the binary, such as
return-to-libc attack A "return-to-libc" attack is a computer security attack usually starting with a buffer overflow in which a subroutine return address on a call stack is replaced by an address of a subroutine that is already present in the process executable memory ...
s. Apple's
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
and
iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also includes ...
fully support PIE executables as of versions 10.7 and 4.3, respectively; a warning is issued when non-PIE iOS executables are submitted for approval to Apple's App Store but there's no hard requirement yet and non-PIE applications are not rejected.
OpenBSD OpenBSD is a security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by forking NetBSD 1.0. According to the website, the OpenBSD project em ...
has PIE enabled by default on most architectures since OpenBSD 5.3, released on 1 May 2013. Support for PIE in statically linked binaries, such as the executables in /bin and /sbin directories, was added near the end of 2014. openSUSE added PIE as a default in 2015-02. Beginning with
Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...
 23, Fedora maintainers decided to build packages with PIE enabled as the default.
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: ''Desktop'', ''Server'', and ''Core'' for Internet of things devices and robots. All the ...
17.10 has PIE enabled by default across all architectures. Gentoo's new profiles now support PIE by default. Around July 2017,
Debian Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of D ...
enabled PIE by default. Android enabled support for PIEs in
Jelly Bean Jelly beans are small bean shaped sugar candies with soft candy shells and thick gel interiors (see gelatin and jelly). The confection is primarily made of sugar and sold in a wide variety of colors and flavors. History It has been clai ...
and removed non-PIE linker support in
Lollipop A lollipop is a type of sugar candy usually consisting of hard candy mounted on a stick and intended for sucking or licking. Different informal terms are used in different places, including lolly, sucker, sticky-pop, etc. Lollipops are availa ...
.


See also

*
Dynamic linker In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed (at "run time"), by copying the content of libraries from persistent storage to RAM, filling ...
*
Object file An object file is a computer file containing object code, that is, machine code output of an assembler or compiler. The object code is usually relocatable, and not usually directly executable. There are various formats for object files, and the ...
*
Code segment In computing, a code segment, also known as a text segment or simply as text, is a portion of an object file or the corresponding section of the program's virtual address space that contains executable instructions. Segment The term "segment" c ...


Notes


References


External links


Introduction to Position Independent Code

Position Independent Code internals



The Curious Case of Position Independent Executables
{{application binary interface Operating system technology Computer libraries Computer file formats