computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...

, machine code is computer code consisting of machine language instructions, which are used to control a computer's

central processing unit A central processing unit (CPU), also called a central processor, main processor, or just processor, is the primary Processor (computing), processor in a given computer. Its electronic circuitry executes Instruction (computing), instructions ...

(CPU). For conventional binary computers, machine code is the binaryOn nonbinary machines it is, e.g., a decimal representation. representation of a computer program that is actually read and interpreted by the computer. A program in machine code consists of a sequence of machine instructions (possibly interspersed with data). Each machine code instruction causes the CPU to perform a specific task. Examples of such tasks include: # Load a

word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...

from

memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...

to a CPU register # Execute an

arithmetic logic unit In computing, an arithmetic logic unit (ALU) is a Combinational logic, combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on ...

(ALU) operation on one or more registers or memory locations # Jump or

skip Skip or Skips may refer to: Acronyms * SKIP (Skeletal muscle and kidney enriched inositol phosphatase), a human gene * Simple Key-Management for Internet Protocol * SKIP of New York (Sick Kids need Involved People), a non-profit agency aiding ...

to an instruction that is not the next one In general, each architecture family (e.g.,

x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...

, ARM) has its own

instruction set architecture In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...

(ISA), and hence its own specific machine code language. There are exceptions, such as the VAX architecture, which includes optional support of the PDP-11 instruction set; the

IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the discontinued Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by ...

architecture, which includes optional support of the

IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called ''i386'') is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the i386, 80386 microprocessor in 1985. IA-32 is the first incarn ...

instruction set; and the PowerPC 615 microprocessor, which can natively process both

PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...

and x86 instruction sets. Machine code is a strictly numerical language, and it is the lowest-level interface to the CPU intended for a programmer.

Assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...

provides a direct map between the numerical machine code and a human-readable mnemonic. In assembly, numerical

opcode In computing, an opcode (abbreviated from operation code) is an enumerated value that specifies the operation to be performed. Opcodes are employed in hardware devices such as arithmetic logic units (ALUs), central processing units (CPUs), and ...

s and operands are replaced with mnemonics and labels. For example, the

architecture has available the 0x90 opcode; it is represented as NOP in the assembly

source code In computing, source code, or simply code or source, is a plain text computer program written in a programming language. A programmer writes the human readable source code to control the behavior of a computer. Since a computer, at base, only ...

. While it is possible to write programs directly in machine code, managing individual bits and calculating numerical addresses is tedious and error-prone. Therefore, programs are rarely written directly in machine code. However, an existing machine code program may be edited if the assembly source code is not available. The majority of programs today are written in a high-level language. A high-level program may be translated into machine code by a

compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...

Instruction set

Every processor or processor family has its own

instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...

. Machine instructions are patterns of bitsOn early decimal machines, patterns of characters, digits and digit sign that specify some particular action. An instruction set is described by its instruction format. Some ways in which instruction formats may differ: * all instructions may have the same length or instructions may have different lengths; * the number of instructions may be small or large; * instructions may or may not align with the architecture's word length. A processor's instruction set needs to execute the circuits of a computer's digital logic level. At the digital level, the program needs to control the computer's registers, bus, memory, ALU, and other hardware components. To control a computer's architectural features, machine instructions are created. Examples of features that are controlled using machine instructions: * segment registers * protected address mode * binary-coded decimal (BCD) arithmetic The criteria for instruction formats include: * Instructions most commonly used should be shorter than instructions rarely used. * The memory transfer rate of the underlying hardware determines the flexibility of the memory fetch instructions. * The number of bits in the address field requires special consideration. Determining the size of the address field is a choice between space and speed. On some computers, the number of bits in the address field may be too small to access all of the physical memory. Also,

virtual address space In computing, a virtual address space (VAS) or address space is the set of ranges of virtual addresses that an operating system makes available to a process. The range of virtual addresses usually starts at a low address and can extend to the h ...

needs to be considered. Another constraint may be a limitation on the size of registers used to construct the address. Whereas a shorter address field allows the instructions to execute more quickly, other physical properties need to be considered when designing the instruction format. Instructions can be separated into two types: general-purpose and special-purpose. Special-purpose instructions exploit architectural features that are unique to a computer. General-purpose instructions control architectural features common to all computers. General-purpose instructions control: * Data movement from one place to another * Monadic operations that have one

operand In mathematics, an operand is the object of a mathematical operation, i.e., it is the object or quantity that is operated on. Unknown operands in equalities of expressions can be found by equation solving. Example The following arithmetic expres ...

to produce a result * Dyadic operations that have two operands to produce a result * Comparisons and conditional jumps * Procedure calls * Loop control * Input/output

Assembly languages

A much more human-friendly rendition of machine language, named

assembly language In computing, assembly language (alternatively assembler language or symbolic machine code), often referred to simply as assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence bet ...

, uses mnemonic codes to refer to machine code instructions, rather than using the instructions' numeric values directly, and uses symbolic names to refer to storage locations and sometimes registers. For example, on the

Zilog Z80 The Zilog Z80 is an 8-bit computing, 8-bit microprocessor designed by Zilog that played an important role in the evolution of early personal computing. Launched in 1976, it was designed to be Backward compatibility, software-compatible with the ...

processor, the machine code 00000101, which causes the CPU to decrement the B

general-purpose register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-onl ...

, would be represented in assembly language as DEC B.

Examples

IBM 709x

The IBM 704, 709, 704x and 709x store one instruction in each instruction word; IBM numbers the bit from the left as S, 1, ..., 35. Most instructions have one of two formats: ;Generic :S,1-11 :12-13 Flag, ignored in some instructions :14-17 unused :18-20 Tag :21-35 Y ;Index register control, other than TSX :S,1-2 Opcode :3-17 Decrement :18-20 Tag :21-35 Y For all but the IBM 7094 and 7094 II, there are three index registers designated A, B and C; indexing with multiple 1 bits in the tag subtracts the

logical or In logic, disjunction (also known as logical disjunction, logical or, logical addition, or inclusive disjunction) is a logical connective typically notated as \lor and read aloud as "or". For instance, the English language, English language ...

of the selected index registers and loading with multiple 1 bits in the tag loads all of the selected index registers. The 7094 and 7094 II have seven index registers, but when they are powered on they are in ''multiple tag mode'', in which they use only the three of the index registers in a fashion compatible with earlier machines, and require a Leave Multiple Tag Mode (LMTM) instruction in order to access the other four index registers. The effective address is normally Y-C(T), where C(T) is either 0 for a tag of 0, the logical or of the selected index registers in multiple tag mode or the selected index register if not in multiple tag mode. However, the effective address for index register control instructions is just Y. A flag with both bits 1 selects indirect addressing; the indirect address word has both a tag and a Y field. In addition to ''transfer'' (branch) instructions, these machines have skip instruction that conditionally skip one or two words, e.g., Compare Accumulator with Storage (CAS) does a three way compare and conditionally skips to NSI, NSI+1 or NSI+2, depending on the result.

MIPS

The

MIPS architecture MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies ...

provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the ''op'' (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by ''op''. R-type (register) instructions include an additional field ''funct'' to determine the exact operation. The fields used in these types are: 6 5 5 5 5 6 bits rs , rt , rd , shamt, funct R-type rs , rt , address/immediate I-type target address J-type ''rs'', ''rt'', and ''rd'' indicate register operands; ''shamt'' gives a shift amount; and the ''address'' or ''immediate'' fields contain an operand directly. For example, adding the registers 1 and 2 and placing the result in register 6 is encoded: rs , rt , rd , shamt, funct 0 1 2 6 0 32 decimal 000000 00001 00010 00110 00000 100000 binary Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3: rs , rt , address/immediate 35 3 8 68 decimal 100011 00011 01000 00000 00001 000100 binary Jumping to the address 1024: target address 2 1024 decimal 000010 00000 00000 00000 10000 000000 binary

Overlapping instructions

On processor architectures with variable-length instruction sets (such as

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...

processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the Kruskal count, sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences. These are called ''overlapping instructions'', ''overlapping opcodes'', ''overlapping code'', ''overlapped code'', ''instruction scission'', or ''jump into the middle of an instruction''. In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example were in the implementation of error tables in

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

's Altair BASIC, where ''interleaved instructions'' mutually shared their instruction bytes. The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on byte-level such as in the implementation of

boot loader A bootloader, also spelled as boot loader or called bootstrap loader, is a computer program that is responsible for booting a computer and booting an operating system. If it also provides an interactive menu with multiple boot choices then it's o ...

s which have to fit into boot sectors. It is also sometimes used as a code obfuscation technique as a measure against disassembly and tampering. The principle is also used in shared code sequences of fat binaries which must run on multiple instruction-set-incompatible processor platforms. This property is also used to find unintended instructions called

gadget A gadget is a machine, mechanical device or any ingenious article. Gadgets are sometimes referred to as ''wikt:gizmo, gizmos''. History The etymology of the word is disputed. The word first appears as reference to an 18th-century tool in Glass ...

s in existing code repositories and is used in return-oriented programming as alternative to code injection for exploits such as return-to-libc attacks.

Relationship to microcode

In some computers, the machine code of the

architecture Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and construction, constructi ...

is implemented by an even more fundamental underlying layer called

microcode In processor design, microcode serves as an intermediary layer situated between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. It consists of a set of hardware-level instructions ...

, providing a common machine language interface across a line or family of different models of computer with widely different underlying

dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dat ...

s. This is done to facilitate

porting In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally desig ...

of machine language programs between different models. An example of this use is the IBM

System/360 The IBM System/360 (S/360) is a family of mainframe computer systems announced by IBM on April 7, 1964, and delivered between 1965 and 1978. System/360 was the first family of computers designed to cover both commercial and scientific applicati ...

family of computers and their successors.

Relationship to bytecode

Machine code is generally different from

bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...

(also known as p-code), which is either executed by an interpreter or itself compiled into machine code for faster (direct) execution. An exception is when a processor is designed to use a particular bytecode directly as its machine code, such as is the case with Java processors. Machine code and assembly code are sometimes called '' native code'' when referring to platform-dependent parts of language features or libraries.

Storing in memory

From the point of view of the CPU, machine code is stored in RAM, but is typically also kept in a set of caches for performance reasons. There may be different caches for instructions and data, depending on the architecture. The CPU knows what machine code to execute, based on its internal program counter. The program counter points to a memory address and is changed based on special instructions which may cause programmatic branches. The program counter is typically set to a hard coded value when the CPU is first powered on, and will hence execute whatever machine code happens to be at this address. Similarly, the program counter can be set to execute whatever machine code is at some arbitrary address, even if this is not valid machine code. This will typically trigger an architecture specific protection fault. The CPU is oftentimes told, by page permissions in a paging based system, if the current page actually holds machine code by an execute bit — pages have multiple such permission bits (readable, writable, etc.) for various housekeeping functionality. E.g. on

Unix-like A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...

systems memory pages can be toggled to be executable with the system call, and on Windows, can be used to achieve a similar result. If an attempt is made to execute machine code on a non-executable page, an architecture specific fault will typically occur. Treating data as machine code, or finding new ways to use existing machine code, by various techniques, is the basis of some security vulnerabilities. Similarly, in a segment based system, segment descriptors can indicate whether a segment can contain executable code and in what rings that code can run. From the point of view of a

process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management * Business process, activities that produce a specific s ...

, the ''code space'' is the part of its

address space In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity. For software programs to save and retrieve ...

where the code in execution is stored. In multitasking systems this comprises the program's code segment and usually shared libraries. In multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of context switching considerably as compared to process switching.

Readability by humans

Machine code can be seen as a set of electrical pulses that make the instructions readable to the computer; it is not readable by humans, with

Douglas Hofstadter Douglas Richard Hofstadter (born 15 February 1945) is an American cognitive and computer scientist whose research includes concepts such as the sense of self in relation to the external world, consciousness, analogy-making, Strange loop, strange ...

comparing it to examining the atoms of a

DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...

molecule. However, various tools and methods exist to decode machine code to human-readable

. One such method is disassembly, which easily decodes it back to its corresponding assembly language

because assembly language forms a one-to-one mapping to machine code. Machine code may also be decoded to high-level language under two conditions. The first condition is to accept an obfuscated reading of the source code. An obfuscated version of source code is displayed if the machine code is sent to a

decompiler A decompiler is a computer program that translates an executable file back into high-level source code. Unlike a compiler, which converts high-level code into machine code, a decompiler performs the reverse process. While disassemblers translate e ...

of the source language. The second condition requires the machine code to have information about the source code encoded within. The information includes a symbol table that contains debug symbols. The symbol table may be stored within the executable, or it may exist in separate files. A

debugger A debugger is a computer program used to test and debug other programs (the "target" programs). Common features of debuggers include the ability to run or halt the target program using breakpoints, step through code line by line, and display ...

can then read the symbol table to help the programmer interactively

debug In engineering, debugging is the process of finding the root cause, workarounds, and possible fixes for bugs. For software, debugging tactics can involve interactive debugging, control flow analysis, log file analysis, monitoring at the ap ...

the machine code in

execution Capital punishment, also known as the death penalty and formerly called judicial homicide, is the state-sanctioned killing of a person as punishment for actual or supposed misconduct. The sentence ordering that an offender be punished in ...

. * The SHARE Operating System (1959) for the IBM 709,

IBM 7090 The IBM 7090 is a second-generation Transistor computer, transistorized version of the earlier IBM 709 vacuum tube mainframe computer that was designed for "large-scale scientific and technological applications". The 7090 is the fourth member o ...

, and IBM 7094 computers allowed for an loadable code format named SQUOZE. SQUOZE was a compressed binary form of

code and included a symbol table. * Modern IBM mainframe

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

s, such as

z/OS z/OS is a 64-bit operating system for IBM z/Architecture mainframes, introduced by IBM in October 2000. It derives from and is the successor to OS/390, which in turn was preceded by a string of MVS versions.Starting with the earliest: ...

, have available a symbol table named ''Associated data'' (ADATA). The table is stored in a file that can be produced by the IBM High-Level Assembler (HLASM), IBM's

COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural, and, since 2002, object-oriented language. COBOL is primarily ...

compiler, and IBM's

PL/I PL/I (Programming Language One, pronounced and sometimes written PL/1) is a procedural, imperative computer programming language initially developed by IBM. It is designed for scientific, engineering, business and system programming. It has b ...

compiler, either as a separate SYSADATA file or as ADATA records in a Generalized object output file (GOFF). *

Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...

has available a symbol table that is stored in a

program database Program database (PDB) is a file format (developed by Microsoft) for storing debugging information about a program (or, commonly, program modules such as a DLL or EXE). PDB files commonly have a .pdb extension. A PDB file is typically creat ...

(.pdb) file. * Most

operating systems have available symbol table formats named stabs and

DWARF Dwarf, dwarfs or dwarves may refer to: Common uses *Dwarf (folklore), a supernatural being from Germanic folklore * Dwarf, a human or animal with dwarfism Arts, entertainment, and media Fictional entities * Dwarf (''Dungeons & Dragons''), a sh ...

. In

macOS macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...

and other Darwin-based operating systems, the debug symbols are stored in DWARF format in a separate .dSYM file.

Notes

References

Sources

* * *