HOME

TheInfoList



OR:

A fat binary (or multiarchitecture binary) is a computer
executable program In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data file ...
or
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
which has been expanded (or "fattened") with code native to multiple
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called ...
s which can consequently be run on multiple processor types. This results in a file larger than a normal one-architecture binary file, thus the name. The usual method of implementation is to include a version of the
machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...
for each instruction set, preceded by a single
entry point In computer programming, an entry point is the place in a program where the execution of a program begins, and where the program has access to command line arguments. To start a program's execution, the loader or operating system passes contr ...
with code compatible with all operating systems, which executes a jump to the appropriate section. Alternative implementations store different executables in different
forks In cutlery or kitchenware, a fork (from la, furca 'pitchfork') is a utensil, now usually made of metal, whose long handle terminates in a head that branches into several narrow and often slightly curved tines with which one can spear foods eit ...
, each with its own entry point that is directly used by the operating system. The use of fat binaries is not common in
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
software; there are several alternatives to solve the same problem, such as the use of an
installer Installation (or setup) of a computer program (including device drivers and plugins), is the act of making the program ready for execution. Installation refers to the particular configuration of a software or hardware with a view to making it u ...
program to choose an architecture-specific binary at install time (such as with Android multiple APKs), selecting an architecture-specific binary at runtime (such as with Plan 9's union directories and
GNUstep GNUstep is a free software implementation of the Cocoa (formerly OpenStep) Objective-C frameworks, widget toolkit, and application development tools for Unix-like operating systems and Microsoft Windows. It is part of the GNU Project. GNUste ...
's fat bundles), distributing software in
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
form and
compiling In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
it in-place, or the use of a
virtual machine In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
(such as with
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's most ...
) and
just-in-time compilation In computing, just-in-time (JIT) compilation (also dynamic translation or run-time compilations) is a way of executing computer code that involves compilation during execution of a program (at run time) rather than before execution. This may co ...
.


Apollo


Apollo's compound executables

In 1988,
Apollo Computer Apollo Computer Inc., founded in 1980 in Chelmsford, Massachusetts, by William Poduska (a founder of Prime Computer) and others, developed and produced Apollo/Domain workstations in the 1980s. Along with Symbolics and Sun Microsystems, Apollo was ...
's
Domain/OS Domain/OS is the discontinued operating system used by the Apollo/Domain line of workstations manufactured by Apollo Computer. It was originally launched in 1981 as AEGIS, and was rebranded to Domain/OS in 1988 when Unix environments were added t ...
SR10.1 introduced a new file type, "cmpexe" (compound executable), that bundled binaries for
Motorola 680x0 The Motorola 68000 series (also known as 680x0, m68000, m68k, or 68k) is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and ...
and
Apollo PRISM PRISM (Parallel Reduced Instruction Set Multiprocessor) was Apollo Computer's high-performance CPU used in their DN10000 series workstations. It was for some time the fastest microprocessor available, a high fraction of a Cray-1 in a workstatio ...
executables.


Apple


Apple's fat binary

A fat-binary scheme smoothed the
Apple Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
's transition, beginning in 1994, from
68k The Motorola 68000 series (also known as 680x0, m68000, m68k, or 68k) is a family of 32-bit complex instruction set computer (CISC) microprocessors. During the 1980s and early 1990s, they were popular in personal computers and workstations and w ...
microprocessors to
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM� ...
microprocessors. Many applications for the old platform ran transparently on the new platform under an evolving emulation scheme, but emulated code generally runs slower than native code. Applications released as "fat binaries" took up more storage space, but they ran at full speed on either platform. This was achieved by packaging both a
68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
-compiled version and a PowerPC-compiled version of the same program into their executable files. The older 68K code (CFM-68K or classic 68K) continued to be stored in the
resource fork The resource fork is a fork or section of a file on Apple's classic Mac OS operating system, which was also carried over to the modern macOS for compatibility, used to store structured data along with the unstructured data stored within the data f ...
, while the newer PowerPC code was contained in the
data fork In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
, in PEF format. Fat binaries were larger than programs supporting only the PowerPC or 68k, which led to the creation of a number of utilities that would strip out the unneeded version. In the era of small
hard drive A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating platters coated with magneti ...
s, when 80 MB hard drives were a common size, these utilities were sometimes useful, as program code was generally a large percentage of overall drive usage, and stripping the unneeded members of a fat binary would free up a significant amount of space on a hard drive.


NeXT's/Apple's multi-architecture binaries


NeXTSTEP Multi-Architecture Binaries

Fat binaries were a feature of
NeXT Next may refer to: Arts and entertainment Film * ''Next'' (1990 film), an animated short about William Shakespeare * ''Next'' (2007 film), a sci-fi film starring Nicolas Cage * '' Next: A Primer on Urban Painting'', a 2005 documentary film Lit ...
's
NeXTSTEP NeXTSTEP is a discontinued object-oriented, multitasking operating system based on the Mach kernel and the UNIX-derived BSD. It was developed by NeXT Computer in the late 1980s and early 1990s and was initially used for its range of proprieta ...
/OPENSTEP operating system, starting with NeXTSTEP 3.1. In NeXTSTEP, they were called "Multi-Architecture Binaries". Multi-Architecture Binaries were originally intended to allow software to be compiled to run both on NeXT's Motorola 68k-based hardware and on Intel
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...
-based PCs running NeXTSTEP, with a single binary file for both platforms. It was later used to allow OPENSTEP applications to run on PCs and the various
RISC In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comput ...
platforms OPENSTEP supported. Multi-Architecture Binary files are in a special archive format, in which a single file stores one or more
Mach-O Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically-loaded code, and core dumps. It was developed to replace the a.out format. Mach-O is used by some systems based on the Mac ...
subfiles for each architecture supported by the Multi-Architecture Binary. Every Multi-Architecture Binary starts with a structure () containing two unsigned integers. The first integer ("magic") is used as a magic number to identify this file as a Fat Binary. The second integer () defines how many Mach-O Files the archive contains (how many instances of the same program for different architectures). After this header, there are ' number of fat_arch structures (). This structure defines the offset (from the start of the file) at which to find the file, the alignment, the size and the CPU type and subtype which the Mach-O binary (within the archive) is targeted at. The version of the
GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free softwa ...
shipped with the Developer Tools was able to
cross-compile A cross compiler is a compiler capable of creating executable code for a platform other than the one on which the compiler is running. For example, a compiler that runs on a PC but generates code that runs on an Android smartphone is a cross ...
source code for the different architectures on which
NeXTStep NeXTSTEP is a discontinued object-oriented, multitasking operating system based on the Mach kernel and the UNIX-derived BSD. It was developed by NeXT Computer in the late 1980s and early 1990s and was initially used for its range of proprieta ...
was able to run. For example, it was possible to choose the target architectures with multiple '-arch' options (with the architecture as argument). This was a convenient way to distribute a program for NeXTStep running on different architectures. It was also possible to create libraries (e.g. using NeXTStep's ) with different targeted object files.


Mach-O and Mac OS X

Apple Computer acquired NeXT in 1996 and continued to work with the OPENSTEP code. Mach-O became the native object file format in Apple's free Darwin operating system (2000) and Apple's
Mac OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac (computer), Mac computers. Within the market of ...
(2001), and NeXT's Multi-Architecture Binaries continued to be supported by the operating system. Under Mac OS X, Multi-Architecture Binaries can be used to support multiple variants of an architecture, for instance to have different versions of
32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in 32-bit units. Compared to smaller bit widths, 32-bit computers can perform large calculati ...
code optimized for the
PowerPC G3 The PowerPC 7xx is a family of third generation 32-bit PowerPC microprocessors designed and manufactured by IBM and Motorola (spun off as Freescale Semiconductor bought by NXP Semiconductors). This family is called the PowerPC G3 by its well-kno ...
,
PowerPC G4 PowerPC G4 is a designation formerly used by Apple and Eyetech to describe a ''fourth generation'' of 32-bit PowerPC microprocessors. Apple has applied this name to various (though closely related) processor models from Freescale, a former part o ...
, and
PowerPC 970 The PowerPC 970, PowerPC 970FX, and PowerPC 970MP are 64-bit PowerPC processors from IBM introduced in 2002. When used in PowerPC-based Macintosh computers, Apple referred to them as the PowerPC G5. The 970 family was created through a colla ...
generations of processors. It can also be used to support multiple architectures, such as 32-bit and
64-bit In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit CPUs and ALUs are those that are based on processor registers, address buses, or data buses of that size. A comput ...
PowerPC, or PowerPC and
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
, or
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...
and
ARM64 AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
.


Apple's Universal binary

In 2005, Apple announced another transition, from PowerPC processors to Intel x86 processors. Apple promoted the distribution of new applications that support both PowerPC and x86 natively by using executable files in Multi-Architecture Binary format. Apple calls such programs " Universal applications" and calls the file format "
Universal binary The universal binary format is, in Apple parlance, a format for executable files that run natively on either PowerPC or Intel-manufactured IA-32 or Intel 64 or ARM64-based Macintosh computers. The format originated on NeXTStep as " Multi-Archit ...
" as perhaps a way to distinguish this new transition from the previous transition, or other uses of Multi-Architecture Binary format. Universal binary format was not necessary for forward migration of pre-existing native PowerPC applications; from 2006 to 2011, Apple supplied
Rosetta Rosetta or Rashid (; ar, رشيد ' ; french: Rosette  ; cop, ϯⲣⲁϣⲓⲧ ''ti-Rashit'', Ancient Greek: Βολβιτίνη ''Bolbitinē'') is a port city of the Nile Delta, east of Alexandria, in Egypt's Beheira governorate. The ...
, a PowerPC (PPC)-to-x86 dynamic binary translator, to play this role. However, Rosetta had a fairly steep performance overhead, so developers were encouraged to offer both PPC and Intel binaries, using Universal binaries. The obvious cost of Universal binary is that every installed executable file is larger, but in the years since the release of the PPC, hard-drive space has greatly outstripped executable size; while a Universal binary might be double the size of a single-platform version of the same application, free-space resources generally dwarf the code size, which becomes a minor issue. In fact, often a Universal-binary application will be smaller than two single-architecture applications because program resources can be shared rather than duplicated. If not all of the architectures are required, the and command-line applications can be used to remove versions from the Multi-Architecture Binary image, thereby creating what is sometimes called a ''thin binary''. In addition, Multi-Architecture Binary executables can contain code for both 32-bit and 64-bit versions of PowerPC and x86, allowing applications to be shipped in a form that supports 32-bit processors but that makes use of the larger address space and wider data paths when run on 64-bit processors. In versions of the
Xcode Xcode is Apple's integrated development environment (IDE) for macOS, used to develop software for macOS, iOS, iPadOS, watchOS, and tvOS. It was initially released in late 2003; the latest stable release is version 14.2, released on December 13, ...
development environment from 2.1 through 3.2 (running on Mac OS X 10.4 through
Mac OS X 10.6 Mac OS X Snow Leopard (version 10.6) is the seventh major release of macOS, Apple's desktop and server operating system for Macintosh computers. Snow Leopard was publicly unveiled on June 8, 2009 at Apple’s Worldwide Developers Conference. ...
), Apple included utilities which allowed applications to be targeted for both Intel and PowerPC architecture; universal binaries could eventually contain up to four versions of the executable code (32-bit PowerPC, 32-bit x86, 64-bit PowerPC, and 64-bit x86). However, PowerPC support was removed from Xcode 4.0 and is therefore not available to developers running Mac OS X 10.7 or greater. In 2020, Apple announced another transition, this time from Intel x86 processors to
Apple silicon Apple silicon is a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture. It is the basis of most new Mac computers as well as iPhone, iPad, iPod Touch, Apple TV, a ...
(ARM64 architecture). To smooth the transition Apple added support for the Universal 2 binary format; Universal 2 binary files are Multi-Architecture Binary files containing both x86-64 and ARM64 executable code, allowing the binary to run natively on both 64-bit Intel and 64-bit Apple silicon. Additionally, Apple introduced
Rosetta 2 Rosetta is a dynamic binary translator developed by Apple Inc. for macOS, an application compatibility layer between different instruction set architectures. It enables a transition to newer hardware, by automatically translating software. The ...
dynamic binary translation for x86 to Arm64 instruction set to allow users to run applications that do not have Universal binary variants.


Apple Fat EFI binary

In 2006, Apple switched from
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM� ...
to
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
CPUs, and replaced
Open Firmware Open Firmware is a standard defining the interfaces of a computer firmware system, formerly endorsed by the Institute of Electrical and Electronics Engineers (IEEE). It originated at Sun Microsystems, where it was known as OpenBoot, and has bee ...
with EFI. However, by 2008, some of their Macs used 32-bit EFI and some used 64-bit EFI. For this reason, Apple extended the EFI specification with "fat" binaries that contained both 32-bit and 64-bit EFI binaries.


CP/M and DOS


Combined COM-style binaries for CP/M-80 and DOS

CP/M-80, MP/M-80,
Concurrent CP/M MP/M (Multi-Programming Monitor Control Program) is a discontinued multi-user version of the CP/M operating system, created by Digital Research developer Tom Rolander in 1979. It allowed multiple users to connect to a single computer, each us ...
, CP/M Plus, Personal CP/M-80, SCP and
MSX-DOS MSX-DOS is a discontinued disk operating system developed by Microsoft for the 8-bit home computer standard MSX, and is a cross between MS-DOS 1.25 and CP/M-80 2. MSX-DOS MSX-DOS and the extended BASIC with 3½-inch floppy disk support ...
executables for the
Intel 8080 The Intel 8080 (''"eighty-eighty"'') is the second 8-bit microprocessor designed and manufactured by Intel. It first appeared in April 1974 and is an extended and enhanced variant of the earlier 8008 design, although without binary compatibil ...
(and
Z80 The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples we ...
) processor families use the same
.COM The domain name .com is a top-level domain (TLD) in the Domain Name System (DNS) of the Internet. Added at the beginning of 1985, its name is derived from the word ''commercial'', indicating its original intended purpose for domains registere ...
file extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
as
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicatio ...
-compatible operating systems for
Intel 8086 The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allow ...
binaries. In both cases programs are loaded at offset +100h and executed by jumping to the first byte in the file. As the
opcode In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the opera ...
s of the two processor families are not compatible, attempting to start a program under the wrong operating system leads to incorrect and unpredictable behaviour. In order to avoid this, some methods have been devised to build fat binaries which contain both a CP/M-80 and a DOS program, preceded by initial code which is interpreted correctly on both platforms. The methods either combine two fully functional programs each built for their corresponding environment, or add
stub Stub or Stubb may refer to: Shortened objects and entities * Stub (stock), the portion of a corporation left over after most but not all of it has been bought out or spun out * Stub, a tree cut and allowed to regrow from the trunk; see Pollard ...
s which cause the program to exit gracefully if started on the wrong processor. For this to work, the first few instructions (sometimes also called '' gadget headers'') in the .COM file have to be valid code for both 8086 and 8080 processors, which would cause the processors to branch into different locations within the code. For example, the utilities in Simeon Cran's emulator MyZ80 start with the opcode sequence . An 8086 sees this as a jump and reads its next instruction from offset +154h whereas an 8080 or compatible processor goes straight through and reads its next instruction from +103h. A similar sequence used for this purpose is . John C. Elliott's FATBIN is a utility to combine a CP/M-80 and a DOS .COM file into one executable. His derivative of the original PMsfx modifies archives created by Yoshihiko Mino's
PMarc Executable compression is any means of compressing an executable file and combining the compressed data with decompression code into a single executable. When this compressed executable is executed, the decompression code recreates the original ...
to be self-extractable under ''both'', CP/M-80 and DOS, starting with to also include the "-pms-" signature for self-extracting PMA archives, thereby also representing a form of executable ASCII code. Another method to keep a DOS-compatible operating system from erroneously executing .COM programs for CP/M-80 and MSX-DOS machines is to start the 8080 code with , which is decoded as a "RET" instruction by x86 processors, thereby gracefully exiting the program, while it will be decoded as "JP 103h" instruction by 8080 processors and simply jump to the next instruction in the program. Similar, the CP/M assembler Z80ASM+ by SLR Systems would display an error message when erroneously run on DOS. Some CP/M-80 3.0 .COM files may have one or more RSX overlays attached to them by GENCOM. If so, they start with an extra 256-byte header (one
page Page most commonly refers to: * Page (paper), one side of a leaf of paper, as in a book Page, PAGE, pages, or paging may also refer to: Roles * Page (assistance occupation), a professional occupation * Page (servant), traditionally a young ma ...
). In order to indicate this, the first byte in the header is set to magic byte , which works both as a signature identifying this type of COM file to the CP/M 3.0 executable loader, as well as a "RET" instruction for 8080-compatible processors which leads to a graceful exit if the file is executed under older versions of CP/M-80. is never appropriate as the first byte of a program for any x86 processor (it has different meanings for different generations, but is never a meaningful first byte); the executable loader in some versions of DOS rejects COM files that start with , avoiding incorrect operation. Similar overlapping code sequences have also been devised for combined Z80/
6502 The MOS Technology 6502 (typically pronounced "sixty-five-oh-two" or "six-five-oh-two") William Mensch and the moderator both pronounce the 6502 microprocessor as ''"sixty-five-oh-two"''. is an 8-bit microprocessor that was designed by a small te ...
, 8086/
68000 The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
or x86/ MIPS/
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
binaries.


Combined binaries for CP/M-86 and DOS

CP/M-86 CP/M-86 was a version of the CP/M operating system that Digital Research (DR) made for the Intel 8086 and Intel 8088. The system commands are the same as in CP/M-80. Executable files used the relocatable .CMD file format. Digital Research also ...
and DOS do not share a common file extension for executables. Thus, it is not normally possible to confuse executables. However, early versions of DOS had so much in common with CP/M in terms of its architecture that some early DOS programs were developed to share binaries containing executable code. One program known to do this was WordStar 3.2x, which used identical overlay files in their ports for CP/M-86 and
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few oper ...
, and used dynamically fixed-up code to adapt to the differing calling conventions of these operating systems at runtime.
Digital Research Digital Research, Inc. (DR or DRI) was a company created by Gary Kildall to market and develop his CP/M operating system and related 8-bit, 16-bit and 32-bit systems like MP/M, Concurrent DOS, FlexOS, Multiuser DOS, DOS Plus, DR DOS and G ...
's GSX for CP/M-86 and DOS also shares binary identical 16-bit drivers.


Combined COM and SYS files

DOS
device driver In computing, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automaton. A driver provides a software interface to hardware devices, enabling operating systems and ...
s (typically with file extension
.SYS .sys is a filename extension used in MS-DOS applications and Microsoft Windows operating systems. They are system files that contain device drivers or hardware configurations for the system. Most DOS files are real mode device drivers. Certain ...
) start with a file header whose first four bytes are FFFFFFFFh by convention, although this is not a requirement. This is fixed up dynamically by the operating system when the driver loads (typically in the
DOS BIOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicatio ...
when it executes DEVICE statements in
CONFIG.SYS CONFIG.SYS is the primary configuration file for the DOS and OS/2 operating systems. It is a special ASCII text file that contains user-accessible setup or configuration directives evaluated by the operating system's DOS BIOS (typically residing ...
). Since DOS does not reject files with a .COM extension to be loaded per DEVICE and does not test for FFFFFFFFh, it is possible to combine a COM program and a device driver into the same file by placing a jump instruction to the entry point of the embedded COM program within the first four bytes of the file (three bytes are usually sufficient). If the embedded program and the device driver sections share a common portion of code, or data, it is necessary for the code to deal with being loaded at offset +0100h as a .COM style program, and at +0000h as a device driver. For shared code loaded at the "wrong" offset but not designed to be position-independent, this requires an internal address fix-up similar to what would otherwise already have been carried out by a
relocating loader Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded ...
, except for that in this case it has to be done by the loaded program itself; this is similar to the situation with self-relocating drivers but with the program already loaded at the target location by the operating system's loader.


Crash-protected system files

Under DOS, some files, by convention, have file extensions which do not reflect their actual file type. For example, COUNTRY.SYS is not a DOS device driver, but a binary NLS database file for use with the CONFIG.SYS COUNTRY directive and the NLSFUNC driver. The PC DOS and
DR-DOS DR-DOS (written as DR DOS, without a hyphen, in versions up to and including 6.0) is a disk operating system for IBM PC compatibles. Upon its introduction in 1988, it was the first DOS attempting to be compatible with IBM PC DOS and MS-DO ...
system files
IBMBIO.COM IBMBIO.COM is a system file in many DOS operating systems. It contains the system initialization code and all built-in device drivers. It also loads the DOS kernel ( IBMDOS.COM) and optional pre-loadable system components (like for disk compre ...
and IBMDOS.COM are special binary images loaded by
bootstrap loader In computing, booting is the process of starting a computer as initiated via hardware such as a button or by a software command. After it is switched on, a computer's central processing unit (CPU) has no software in its main memory, so so ...
s, not COM-style programs. Trying to load COUNTRY.SYS with a DEVICE statement or executing IBMBIO.COM or IBMDOS.COM at the command prompt will cause unpredictable results. It is sometimes possible to avoid this by utilizing techniques similar to those described above. For example, DR-DOS 7.02 and higher incorporate a safety feature developed by Matthias R. Paul: If these files are called inappropriately, tiny embedded stubs will just display some file version information and exit gracefully. Additionally, the message is specifically crafted to follow certain "magic" patterns recognized by the external
NetWare NetWare is a discontinued computer network operating system developed by Novell, Inc. It initially used cooperative multitasking to run various services on a personal computer, using the IPX network protocol. The original NetWare product in ...
& DR-DOS
VERSION Version may refer to: Computing * Software version, a set of numbers that identify a unique evolution of a computer program * VERSION (CONFIG.SYS directive), a configuration directive in FreeDOS Music * Cover version * Dub version * Remix * '' ...
file identification utility. A similar protection feature was the 8080 instruction ("RST 0") at the very start of Jay Sage's and Joe Wright's Z-System type-3 and type-4 "Z3ENV" programs as well as "Z3TXT" language overlay files, which would result in a
warm boot In computing, rebooting is the process by which a running computer system is restarted, either intentionally or unintentionally. Reboots can be either a cold reboot (alternatively known as a hard reboot) in which the power to the system is phys ...
(instead of a crash) under CP/M-80 if loaded inappropriately. In a distantly similar fashion, many (binary)
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
s by convention include a byte (
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
^Z) near the beginning of the file. This
control character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than t ...
will be interpreted as "soft"
end-of-file In computing, end-of-file (EOF) is a condition in a computer operating system where no more data can be read from a data source. The data source is usually called a file or stream. Details In the C standard library, the character reading functio ...
(EOF) marker when a file is opened in non-binary mode, and thus, under many operating systems (including the
PDP-6 The PDP-6, short for Programmed Data Processor model 6, is a computer developed by Digital Equipment Corporation (DEC) during 1963 and first delivered in the summer of 1964. It was an expansion of DEC's existing 18-bit systems to use a 36-bit d ...
monitor and
RT-11 RT-11 (Real-time 11) is a discontinued small, low-end, single-user real-time operating system for the full line of Digital Equipment Corporation PDP-11 16-bit computers. RT-11 was first implemented in 1970. It was widely used for real-time computi ...
, VMS,
TOPS-10 TOPS-10 System (''Timesharing / Total Operating System-10'') is a discontinued operating system from Digital Equipment Corporation (DEC) for the PDP-10 (or DECsystem-10) mainframe computer family. Launched in 1967, TOPS-10 evolved from the earlie ...
, CP/M, DOS, and Windows), it prevents "binary garbage" from being displayed when a file is accidentally printed at the console.


Linux


FatELF: Universal binaries for Linux

FatELF was a fat binary implementation for
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which in ...
and other Unix-like operating systems. Technically, a FatELF binary was a concatenation of
ELF An elf () is a type of humanoid supernatural being in Germanic mythology and folklore. Elves appear especially in North Germanic mythology. They are subsequently mentioned in Snorri Sturluson's Icelandic Prose Edda. He distinguishes ...
binaries with some meta data indicating which binary to use on what architecture. Additionally to the CPU architecture abstraction (
byte order In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most sig ...
,
word size In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word ...
,
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
instruction set, etc.), there is the advantage of binaries with support for multiple kernel ABIs and versions. FatELF has several use-cases, according to developers: * Distributions no longer need to have separate downloads for various platforms. * Separated ''/lib'', ''/lib32'' and ''/lib64'' trees are not required anymore in OS directory structure. * The correct binary and libraries are centrally chosen by the system instead of
shell script A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file mani ...
s. * If the ELF ABI changes someday, legacy users can be still supported. * Distribution of web browser plug ins that work out of the box with multiple platforms. * Distribution of one application file that works across Linux and BSD OS variants, without a platform compatibility layer on them. * One hard drive partition can be booted on different machines with different CPU architectures, for development and experimentation. Same root file system, different kernel and CPU architecture. * Applications provided by network share or USB sticks, will work on multiple systems. This is also helpful for creating
portable application A portable application (portable app), sometimes also called standalone, is a program designed to read and write its configuration settings into an accessible folder in the computer, usually in the folder where the portable application can be ...
s and also cloud computing images for heterogeneous systems. A proof-of-concept Ubuntu 9.04 image is available. , FatELF has not been integrated into the mainline Linux kernel.


Windows


Fatpack

Although the
Portable Executable The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary fo ...
format used by Windows does not allow assigning code to platforms, it is still possible to make a loader program that dispatches based on architecture. This is because desktop versions of Windows on ARM have support for 32-bit
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
emulation, making it a useful "universal" machine code target. Fatpack is a loader that demonstrates the concept: it includes a 32-bit x86 program that tries to run the executables packed into its resource sections one by one.


Similar concepts

The following approaches are similar to fat binaries in that multiple versions of machine code of the same purpose are provided in the same file.


Heterogeneous computing

Since 2007, some specialized compilers for heterogeneous platforms produce code files for parallel execution on multiple types of processors, i.e. the CHI ( C for Heterogeneous Integration) compiler from the
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
EXOCHI (Exoskeleton Sequencer) development suite extends the
OpenMP OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared-memory multiprocessing programming in C, C++, and Fortran, on many platforms, instruction-set architectures and operating sy ...
pragma concept for multithreading to produce fat binaries containing code sections for different
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
s (ISAs) from which the runtime loader can dynamically initiate the parallel execution on multiple available CPU and
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
cores in a heterogeneous system environment. Introduced in February 2007,
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
's parallel computing platform
CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach c ...
(Compute Unified Device Architecture) is a software to enable general-purpose computing on GPUs (
GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications tradition ...
). Its
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represe ...
-based compiler NVCC can create
ELF An elf () is a type of humanoid supernatural being in Germanic mythology and folklore. Elves appear especially in North Germanic mythology. They are subsequently mentioned in Snorri Sturluson's Icelandic Prose Edda. He distinguishes ...
-based fat binaries containing so called PTX virtual
assembly Assembly may refer to: Organisations and meetings * Deliberative assembly, a gathering of members who use parliamentary procedure for making decisions * General assembly, an official meeting of the members of an organization or of their representa ...
(as text) which the CUDA runtime driver can later just-in-time compile into some SASS (Streaming Assembler) binary executable code for the actually present target GPU. The executables can also include so called ''CUDA binaries'' (aka ''cubin'' files) containing dedicated executable code sections for one or more specific GPU architectures from which the CUDA runtime can choose from at load-time. Fat binaries are also supported by , a GPU
simulator A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of models; the model represents the key characteristics or behaviors of the selected system or process, whereas the ...
introduced in 2007 as well. Multi2Sim (M2S), an
OpenCL OpenCL (Open Computing Language) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), field-prog ...
heterogeneous system simulator framework (originally only for either MIPS or x86 CPUs, but later extended to also support
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
CPUs and GPUs like the
AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufact ...
/
ATI Ati or ATI may refer to: * Ati people, a Negrito ethnic group in the Philippines **Ati language (Philippines), the language spoken by this people group ** Ati-Atihan festival, an annual celebration held in the Philippines *Ati language (China), a ...
Evergreen In botany, an evergreen is a plant which has foliage that remains green and functional through more than one growing season. This also pertains to plants that retain their foliage only in warm climates, and contrasts with deciduous plants, which ...
&
Southern Islands The Southern Islands is a planning area consisting of a collection of islets located within the Central Region of Singapore, once home to the native Malay islanders and sea nomads before they were relocated to the mainland for urban redevelop ...
as well as Nvidia Fermi &
Kepler Johannes Kepler (; ; 27 December 1571 – 15 November 1630) was a German astronomer, mathematician, astrologer, natural philosopher and writer on music. He is a key figure in the 17th-century Scientific Revolution, best known for his laws ...
families) supports ELF-based fat binaries as well.


Fat objects

GNU Compiler Collection The GNU Compiler Collection (GCC) is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems. The Free Software Foundation (FSF) distributes GCC as free softwa ...
(GCC) and LLVM do not have a fat binary format, but they do have fat ''object files'' for
link-time optimization Interprocedural optimization (IPO) is a collection of compiler techniques used in computer programming to improve performance in programs containing many frequently used functions of small or medium length. IPO differs from other compiler optimiz ...
(LTO). Since LTO involves delaying the compilation to link-time, the
object file An object file is a computer file containing object code, that is, machine code output of an assembler or compiler. The object code is usually relocatable, and not usually directly executable. There are various formats for object files, and the ...
s must store the
intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good" ...
(IR), but on the other hand machine code may need to be stored too (for speed or compatibility). An LTO object containing both IR and machine code is known as a ''fat object''.


Function multi-versioning

Even in a program or
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
intended for the same
instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
, a programmer may wish to make use of some newer instruction set extensions while keeping compatibility with an older CPU. This can be achieved with function multi-versioning (FMV): versions of the same function are written into the program, and a piece of code decides which one to use by detecting the CPU's capabilities (such as through
CPUID In the x86 architecture, the CPUID instruction (identified by a CPUID opcode) is a processor supplementary instruction (its name derived from CPU IDentification) allowing software to discover details of the processor. It was introduced by Intel i ...
).
Intel C++ Compiler Intel oneAPI DPC++/C++ Compiler and Intel C++ Compiler Classic are Intel’s C, C++, SYCL, and Data Parallel C++ (DPC++) compilers for Intel processor-based systems, available for Windows, Linux, and macOS operating systems. Overview Intel ...
, GCC, and LLVM all have the ability to automatically generate multi-versioned functions. This is a form of
dynamic dispatch In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriente ...
without any semantic effects. Many math libraries feature hand-written assembly routines that are automatically chosen according to CPU capability. Examples include
glibc The GNU C Library, commonly known as glibc, is the GNU Project's implementation of the C standard library. Despite its name, it now also directly supports C++ (and, indirectly, other programming languages). It was started in the 1980s by t ...
,
Intel MKL Intel oneAPI Math Kernel Library (Intel oneMKL; formerly Intel Math Kernel Library or Intel MKL) is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, ...
, and
OpenBLAS In scientific computing, OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software a ...
. In addition, the library loader in glibc supports loading from alternative paths for specific CPU features. A similar, but byte-level granular approach originally devised by Matthias R. Paul and Axel C. Frinke is to let a small self-discarding,
relaxing Leisure has often been defined as a quality of experience or as free time. Free time is time spent away from business, work, job hunting, domestic chores, and education, as well as necessary activities such as eating and sleeping. Leisure ...
and
relocating loader Relocation is the process of assigning load addresses for position-dependent code and data of a program and adjusting the code and data to reflect the assigned addresses. Prior to the advent of multiprocess systems, and still in many embedded ...
embedded into the executable file alongside any number of alternative binary code snippets conditionally build a size- or speed-optimized runtime image of a program or driver necessary to perform (or not perform) a particular function in a particular target environment at
load-time In computer systems a loader is the part of an operating system that is responsible for loading programs and libraries. It is one of the essential stages in the process of starting a program, as it places programs into memory and prepares them ...
through a form of
dynamic dead code elimination In compiler theory, dead-code elimination (also known as DCE, dead-code removal, dead-code stripping, or dead-code strip) is a compiler optimization to remove code which does not affect the program results. Removing such code has several benefit ...
(DDCE).


See also

*
Cross-platform software In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
*
DOS stub The New Executable (abbreviated NE or NewEXE) is a 16-bit .exe file format, a successor to the DOS MZ executable format. It was used in Windows 1.0–3.x, Windows 9x, multitasking MS-DOS 4.0, OS/2 1.x, and the OS/2 subset of Windows NT up to ve ...
*
Fat pointer In computer science, dynamic dispatch is the process of selecting which implementation of a polymorphic operation (method or function) to call at run time. It is commonly employed in, and considered a prime characteristic of, object-oriente ...
*
Linear Executable .exe is a common filename extension denoting an executable file (the main execution point of a computer program) for Microsoft Windows, OS/2, and DOS. File formats There are numerous file formats which may be used by a file with a extension ...
(LX) *
New Executable The New Executable (abbreviated NE or NewEXE) is a 16-bit .exe file format, a successor to the DOS MZ executable format. It was used in Windows 1.0–3.x, Windows 9x, multitasking MS-DOS 4.0, OS/2 1.x, and the OS/2 subset of Windows NT up to v ...
(NE) *
Portable Executable The Portable Executable (PE) format is a file format for executables, object code, DLLs and others used in 32-bit and 64-bit versions of Windows operating systems. The PE format is a data structure that encapsulates the information necessary fo ...
(PE) *
Position-independent code In computing, position-independent code (PIC) or position-independent executable (PIE) is a body of machine code that, being placed somewhere in the primary memory, executes properly regardless of its absolute address. PIC is commonly used for ...
(PIC) *
Side effect In medicine, a side effect is an effect, whether therapeutic or adverse, that is secondary to the one intended; although the term is predominantly employed to describe adverse effects, it can also apply to beneficial, but unintended, consequenc ...
* Universal hex format, a "fat" hex file format targeting multiple platforms * Alphanumeric executable, executable code camouflaged as (sometimes even readable) text * Multi-architecture shellcode, shellcode targeting multiple platforms (and sometimes even camouflaged as alphanumeric text)


Notes


References


Further reading

* ; * * * * * * {{cite web , title=AmigaOS 3.9 - Features , work=AmigaOS: multimedia, multi-threaded, multi-tasking , author-first=Matthias , author-last=Münch , date=2006 , orig-date=2005 , url=https://os.amigaworld.de/index.php?lang=en&page=7 , access-date=2021-09-29 , url-status=live , archive-url=https://web.archive.org/web/20210929081348/https://os.amigaworld.de/index.php?lang=en&page=7 , archive-date=2019-09-29 Executable file formats