HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, position-independent code (PIC) or position-independent executable (PIE) is a body of
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ver ...
that, being placed somewhere in the primary memory, executes properly regardless of its
absolute address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. Su ...
. PIC is commonly used for
shared libraries In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and su ...
, so that the same library code can be loaded in a location in each program address space where it does not overlap with other memory in use (for example, other shared libraries). PIC was also used on older computer systems that lacked an MMU, so that the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
could keep applications away from each other even within the single
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve ...
of an MMU-less system. Position-independent code can be executed at any memory address without modification. This differs from absolute code, which must be loaded at a specific location to function correctly, and
load-time locatable Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded ...
(LTL) code, in which a
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
or program loader modifies a program before execution so it can be run only from a particular memory location. Generating position-independent code is often the default behavior for
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s, but they may place restrictions on the use of some language features, such as disallowing use of absolute addresses (position-independent code has to use relative addressing). Instructions that refer directly to specific memory addresses sometimes execute faster, and replacing them with equivalent relative-addressing instructions may result in slightly slower execution, although modern processors make the difference practically negligible.


History

In early computers such as the
IBM 701 The IBM 701 Electronic Data Processing Machine, known as the Defense Calculator while in development, was IBM’s first commercial scientific computer and its first series production mainframe computer, which was announced to the public on May ...
(29 April 1952) or the
UNIVAC I The UNIVAC I (Universal Automatic Computer I) was the first general-purpose electronic digital computer design for business application produced in the United States. It was designed principally by J. Presper Eckert and John Mauchly, the invent ...
(31 March 1951) code was position-dependent: each program was built to load into and run from a particular address. Those early computers did not have an operating system and were not multitasking-capable. Programs were loaded into main storage (or even stored on magnetic drum for execution directly from there) and run one at a time. In such an operational context, position-independent code was not necessary. The
IBM System/360 The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applic ...
(7 April 1964) was designed with truncated addressing similar to that of the UNIVAC III, with code position independence in mind. In truncated addressing, memory addresses are calculated from a ''base register'' and an offset. At the beginning of a program, the programmer must establish ''addressability'' by loading a base register; normally the programmer also informs the assembler with a ''USING'' pseudo-op. The programmer can load the base register from a register known to contain the entry point address, typically R15, or can use th
BALR (Branch And Link, Register form)
instruction (with a R2 Value of 0) to store the next sequential instruction's address into the base register, which was then coded explicitly or implicitly in each instruction that referred to a storage location within the program. Multiple base registeres could be used, for code or for data. Such instructions require less memory because they do not have to hold a full 24, 31, 32, or 64 bit address (4 or 8 bytes), but instead a base register number (encoded in 4 bits) and a 12–bit address offset (encoded in 12 bits), requiring only two bytes. This programming technique is standard on IBM S/360 type systems. It has been in use through to today's IBM System/z. When coding in assembly language, the programmer has to establish addressability for the program as described above and also use other base registers for dynamically allocated storage. Compilers automatically take care of this kind of addressing. IBM's early operating system
DOS/360 Disk Operating System/360, also DOS/360, or simply DOS, is the discontinued first member of a sequence of operating systems for IBM System/360, System/370 and later mainframes. It was announced by IBM on the last day of 1964, and it was first d ...
(1966) was not using virtual storage (since the early models of System S/360 did not support it), but it did have the ability to place programs to an arbitrary (or automatically chosen) storage location during loading via the PHASE name,* JCL (Job Control Language) statement. So, on S/360 systems without virtual storage, a program could be loaded at any storage location, but this required a contiguous memory area large enough to hold that program. Sometimes memory fragmentation would occur from loading and unloading differently sized modules. Virtual storage - by design - does not have that limitation. While DOS/360 and
OS/360 OS/360, officially known as IBM System/360 Operating System, is a discontinued batch processing operating system developed by IBM for their then-new System/360 mainframe computer, announced in 1964; it was influenced by the earlier IBSYS/IBJOB ...
did not support PIC, transient SVC routines in OS/360 could not contain relocatable address constants and could run in any of the transient areas without relocation. Virtual storage was first introduced on
IBM System/360 model 67 IBM mainframes are large computer systems produced by IBM since 1952. During the 1960s and 1970s, IBM dominated the large computer market. Current mainframe computers in IBM's line of business computers are developments of the basic design of th ...
in (1965) to support IBM's first multi-tasking operating and time-sharing operating system TSS/360. Later versions of DOS/360 (DOS/VS etc.) and later IBM operating systems all utilized virtual storage. Truncated addressing remained as part of the base architecture, and still advantageous when multiple modules must be loaded into the same virtual address space. By way of comparison, on early segmented systems such as
Burroughs MCP The MCP (Master Control Program) is the operating system of the Burroughs small, medium and large systems, including the Unisys Clearpath/MCP systems. MCP was originally written in 1961 in ESPOL (Executive Systems Problem Oriented Language). I ...
on the
Burroughs B5000 The Burroughs Large Systems Group produced a family of large 48-bit mainframes using stack machine instruction sets with dense syllables.E.g., 12-bit syllables for B5000, 8-bit syllables for B6500 The first machine in the family was the B5000 in 1 ...
(1961) and
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of ...
(1964), paging systems such as IBM TSS/360 (1967) or base and bounds systems such as
GECOS General Comprehensive Operating System (GCOS, ; originally GECOS, General Electric Comprehensive Operating Supervisor) is a family of operating systems oriented toward the 36-bit GE/Honeywell mainframe computers. The original version of GCOS wa ...
on the GE 625 and EXEC on the
UNIVAC 1107 The UNIVAC 1100/2200 series is a series of compatible 36-bit computer systems, beginning with the UNIVAC 1107 in 1962, initially made by Sperry Rand. The series continues to be supported today by Unisys Corporation as the ClearPath Dorado Series. ...
, code was also inherently position-independent, since addresses in a program were relative to the current segment rather than absolute. The invention of dynamic address translation (the function provided by an MMU) originally reduced the need for position-independent code because every process could have its own independent
address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve ...
(range of addresses). However, multiple simultaneous jobs using the same code created a waste of physical memory. If two jobs run entirely identical programs, dynamic address translation provides a solution by allowing the system simply to map two different jobs' address 32K to the same bytes of real memory, containing the single copy of the program. Different programs may share common code. For example, the payroll program and the accounts receivable program may both contain an identical sort subroutine. A shared module (a shared library is a form of shared module) gets loaded once and mapped into the two address spaces.


Technical details

Procedure calls inside a shared library are typically made through small procedure linkage table stubs, which then call the definitive function. This notably allows a shared library to inherit certain function calls from previously loaded libraries rather than using its own versions. Data references from position-independent code are usually made indirectly, through Global Offset Tables (GOTs), which store the addresses of all accessed
global variable In computer programming, a global variable is a variable with global scope, meaning that it is visible (hence accessible) throughout the program, unless shadowed. The set of all global variables is known as the ''global environment'' or ''global ...
s. There is one GOT per compilation unit or object module, and it is located at a fixed offset from the code (although this offset is not known until the library is linked). When a
linker Linker or linkers may refer to: Computing * Linker (computing), a computer program that takes one or more object files generated by a compiler or generated by an assembler and links them with libraries, generating an executable program or shar ...
links modules to create a shared library, it merges the GOTs and sets the final offsets in code. It is not necessary to adjust the offsets when loading the shared library later. Position independent functions accessing global data start by determining the absolute address of the GOT given their own current program counter value. This often takes the form of a fake function call in order to obtain the return value on stack ( x86), in a specific standard register (
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...
, MIPS), or a special register ( POWER/
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM ...
/
Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power ISA ...
), which can then be moved to a predefined standard register, or to obtain it into that standard register (
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to a ...
, Alpha, ESA/390 and z/Architecture). Some processor architectures, such as the
Motorola 68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sect ...
,
Motorola 6809 The Motorola 6809 ("''sixty-eight-oh-nine''") is an 8-bit microprocessor with some 16-bit features. It was designed by Motorola's Terry Ritter and Joel Boney and introduced in 1978. Although source compatible with the earlier Motorola 6800, the 6 ...
, WDC 65C816, Knuth's
MMIX MMIX (pronounced ''em-mix'') is a 64-bit reduced instruction set computing (RISC) architecture designed by Donald Knuth, with significant contributions by John L. Hennessy (who contributed to the design of the MIPS architecture) and Richard L. ...
, ARM and
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging ...
allow referencing data by offset from the
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
. This is specifically targeted at making position-independent code smaller, less register demanding and hence more efficient.


Windows DLLs

Dynamic-link libraries (DLLs) in Microsoft Windows use variant E8 of the CALL instruction (Call near, relative, displacement relative to next instruction). These instructions need not be fixed up when a DLL is loaded. Some global variables (e.g. arrays of string literals, virtual function tables) are expected to contain an address of an object in data section respectively in code section of the dynamic library; therefore, the stored address in the global variable must be updated to reflect the address where the DLL was loaded to. The dynamic loader calculates the address referred to by a global variable and stores the value in such global variable; this triggers copy-on-write of a memory page containing such global variable. Pages with code and pages with global variables that do not contain pointers to code or global data remain shared between processes. This operation must be done in any OS that can load a dynamic library at arbitrary address. In Windows Vista and later versions of Windows, the relocation of DLLs and executables is done by the kernel memory manager, which shares the relocated binaries across multiple processes. Images are always relocated from their preferred base addresses, achieving
address space layout randomization Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited ...
(ASLR). Versions of Windows prior to Vista require that system DLLs be prelinked at non-conflicting fixed addresses at the link time in order to avoid runtime relocation of images. Runtime relocation in these older versions of Windows is performed by the DLL loader within the context of each process, and the resulting relocated portions of each image can no longer be shared between processes. The handling of DLLs in Windows differs from the earlier
OS/2 OS/2 (Operating System/2) is a series of computer operating systems, initially created by Microsoft and IBM under the leadership of IBM software designer Ed Iacobucci. As a result of a feud between the two companies over how to position OS/2 ...
procedure it derives from. OS/2 presents a third alternative and attempts to load DLLs that are not position-independent into a dedicated "shared arena" in memory, and maps them once they are loaded. All users of the DLL are able to use the same in-memory copy.


Multics

In
Multics Multics ("Multiplexed Information and Computing Service") is an influential early time-sharing operating system based on the concept of a single-level memory.Dennis M. Ritchie, "The Evolution of the Unix Time-sharing System", Communications of ...
each procedure conceptually has a code segment and a linkage segment. The code segment contains only code and the linkage section serves as a template for a new linkage segment. Pointer register 4 (PR4) points to the linkage segment of the procedure. A call to a procedure saves PR4 in the stack before loading it with a pointer to the callee's linkage segment. The procedure call uses an indirect pointer pair with a flag to cause a trap on the first call so that the dynamic linkage mechanism can add the new procedure and its linkage segment to the Known Segment Table (KST), construct a new linkage segment, put their segment numbers in the caller's linkage section and reset the flag in the indirect pointer pair.


TSS

In IBM S/360 Time Sharing System (TSS/360 and TSS/370) each procedure may have a read-only public CSECT and a writable private Prototype Section (PSECT). A caller loads a V-constant for the routine into General Register 15 (GR15) and copies an R-constant for the routine's PSECT into the 19th word of the save area pointed to be GR13. The Dynamic Loader does not load program pages or resolve address constants until the first page fault.


Position-independent executables

''Position-independent executables'' (PIE) are executable binaries made entirely from position-independent code. While some systems only run PIC executables, there are other reasons they are used. PIE binaries are used in some security-focused
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which i ...
distributions to allow PaX or Exec Shield to use
address space layout randomization Address space layout randomization (ASLR) is a computer security technique involved in preventing exploitation of memory corruption vulnerabilities. In order to prevent an attacker from reliably jumping to, for example, a particular exploited ...
to prevent attackers from knowing where existing executable code is during a security attack using exploits that rely on knowing the offset of the executable code in the binary, such as return-to-libc attacks. Apple's
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
and iOS fully support PIE executables as of versions 10.7 and 4.3, respectively; a warning is issued when non-PIE iOS executables are submitted for approval to Apple's App Store but there's no hard requirement yet and non-PIE applications are not rejected.
OpenBSD OpenBSD is a security-focused operating system, security-focused, free and open-source, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking N ...
has PIE enabled by default on most architectures since OpenBSD 5.3, released on 1 May 2013. Support for PIE in
statically linked A stand-alone program, also known as a freestanding program, is a computer program that does not load any external module, library function or program and that is designed to boot with the bootstrap procedure of the target processor – it runs o ...
binaries, such as the executables in /bin and /sbin directories, was added near the end of 2014. openSUSE added PIE as a default in 2015-02. Beginning with
Fedora A fedora () is a hat with a soft brim and indented crown.Kilgour, Ruth Edwards (1958). ''A Pageant of Hats Ancient and Modern''. R. M. McBride Company. It is typically creased lengthwise down the crown and "pinched" near the front on both sides ...
 23, Fedora maintainers decided to build packages with PIE enabled as the default.
Ubuntu Ubuntu ( ) is a Linux distribution based on Debian and composed mostly of free and open-source software. Ubuntu is officially released in three editions: '' Desktop'', '' Server'', and ''Core'' for Internet of things devices and robots. All th ...
17.10 has PIE enabled by default across all architectures. Gentoo's new profiles now support PIE by default. Around July 2017,
Debian Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of De ...
enabled PIE by default.
Android Android may refer to: Science and technology * Android (robot), a humanoid robot or synthetic organism designed to imitate a human * Android (operating system), Google's mobile operating system ** Bugdroid, a Google mascot sometimes referred to ...
enabled support for PIEs in Jelly Bean and removed non-PIE linker support in
Lollipop A lollipop is a type of sugar candy usually consisting of hard candy mounted on a stick and intended for sucking or licking. Different informal terms are used in different places, including lolly, sucker, sticky-pop, etc. Lollipops are av ...
.


See also

*
Dynamic linker In computing, a dynamic linker is the part of an operating system that loads and links the shared libraries needed by an executable when it is executed (at " run time"), by copying the content of libraries from persistent storage to RAM, filli ...
*
Object file An object file is a computer file containing object code, that is, machine code output of an assembler or compiler. The object code is usually relocatable, and not usually directly executable. There are various formats for object files, and the ...
*
Code segment In computing, a code segment, also known as a text segment or simply as text, is a portion of an object file or the corresponding section of the program's virtual address space that contains executable instructions. Segment The term "segment ...


Notes


References


External links


Introduction to Position Independent Code

Position Independent Code internals



The Curious Case of Position Independent Executables
{{application binary interface Operating system technology Computer libraries Computer file formats