In
computer programming
Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, machine code is any
low-level programming language, consisting of machine language
instructions, which are used to control a computer's
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU). Each instruction causes the CPU to perform a very specific task, such as a load, a store, a
jump, or an
arithmetic logic unit
In computing, an arithmetic logic unit (ALU) is a combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on floating point numb ...
(ALU) operation on one or more units of data in the CPU's
registers or
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered ...
.
Early CPUs had specific machine code that might break backwards compatibility with each new CPU released. The notion of an
instruction set architecture
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ...
(ISA) defines and specifies the behavior and encoding in memory of the instruction set of the system, without specifying its exact implementation. This acts as an abstraction layer, enabling compatibility within the same family of CPUs, so that machine code written or generated according to the ISA for the family will run on all CPUs in the family, including future CPUs.
In general, each architecture family (e.g.
x86,
ARM) has its own ISA, and hence its own specific machine code language. There are exceptions, e.g. the
IA-64
IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in col ...
can emulate x86.
Machine code is a strictly numerical language, and is the lowest-level interface to the CPU intended for a programmer. There is, on some CPUs, a lower level interface in the form of (modifiable)
microcode
In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...
that implement the machine code. However, microcode is not intended to be changed by the end user on normal commercial CPUs.
Assembly language provides a direct mapping between the numerical machine code and a human readable version where numerical opcodes and operands are replaced by readable strings (e.g. 0x90 is the NOP instruction on
x86). While it is possible to write programs directly in machine code, managing individual bits and calculating numerical
addresses and constants manually is tedious and error-prone. For this reason, programs are very rarely written directly in machine code in modern contexts, but may be done for low level
debugging
In computer programming and software development, debugging is the process of finding and resolving ''bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems.
Debugging tactics can involve in ...
, program
patching (especially when assembler source is not available) and assembly language
disassembly.
The majority of practical programs today are written in
higher-level languages or assembly language. The source code is then translated to executable machine code by utilities such as
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s,
assemblers
Assembler may refer to:
Arts and media
* Nobukazu Takemura, avant-garde electronic musician, stage name Assembler
* Assemblers, a fictional race in the ''Star Wars'' universe
* Assemblers, an alternative name of the superhero group Champions of ...
, and
linkers, with the important exception of
interpreted programs,
which are not translated into machine code. However, the ''
interpreter'' itself, which may be seen as an executor or processor performing the instructions of the source code, typically consists of directly executable machine code (generated from assembly or high-level language source code).
Machine code is by definition the lowest level of programming detail visible to the programmer, but internally many processors use
microcode
In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...
or optimise and transform machine code instructions into sequences of
micro-ops. This is not generally considered to be a machine code.
Instruction set
Every processor or processor family has its own
instruction set
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called a ...
. Instructions are patterns of
bit
The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented a ...
s, digits, or characters that correspond to machine commands. Thus, the instruction set is specific to a class of processors using (mostly) the same
architecture
Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...
. Successor or derivative processor designs often include instructions of a predecessor and may add new additional instructions. Occasionally, a successor design will discontinue or alter the meaning of some instruction code (typically because it is needed for new purposes), affecting code compatibility to some extent; even compatible processors may show slightly different behavior for some instructions, but this is rarely a problem. Systems may also differ in other details, such as memory arrangement, operating systems, or
peripheral devices. Because a program normally relies on such factors, different systems will typically not run the same machine code, even when the same type of processor is used.
A processor's instruction set may have fixed-length or variable-length instructions. How the patterns are organized varies with the particular architecture and type of instruction. Most instructions have one or more
opcode
In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the opera ...
fields that specify the basic instruction type (such as arithmetic, logical,
jump, etc.), the operation (such as add or compare), and other fields that may give the type of the
operand
In mathematics, an operand is the object of a mathematical operation, i.e., it is the object or quantity that is operated on.
Example
The following arithmetic expression shows an example of operators and operands:
:3 + 6 = 9
In the above exa ...
(s), the
addressing mode
Addressing modes are an aspect of the instruction set architecture in most central processing unit (CPU) designs. The various addressing modes that are defined in a given instruction set architecture define how the machine language instructions ...
(s), the addressing offset(s) or index, or the operand value itself (such constant operands contained in an instruction are called ''immediate'').
Not all machines or individual instructions have explicit operands. On a machine with a single
accumulator, the accumulator is implicitly both the left operand and result of most arithmetic instructions. Some other architectures, such as the
x86 architecture, have accumulator versions of common instructions, with the accumulator regarded as one of the general registers by longer instructions. A
stack machine has most or all of its operands on an implicit stack. Special purpose instructions also often lack explicit operands; for example, CPUID in the x86 architecture writes values into four implicit destination registers. This distinction between explicit and implicit operands is important in code generators, especially in the
register allocation
In compiler optimization, register allocation is the process of assigning local automatic variables and expression results to a limited number of processor registers.
Register allocation can happen over a basic block (''local register allocat ...
and live range tracking parts. A good code optimizer can track implicit as well as explicit operands which may allow more frequent
constant propagation,
constant folding of registers (a register assigned the result of a constant expression freed up by replacing it by that constant) and other code enhancements.
Programs
A
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components.
A computer progra ...
is a list of instructions that can be executed by a
central processing unit
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
(CPU). A program's execution is done in order for the CPU that is executing it to solve a problem and thus accomplish a result. While simple processors are able to execute instructions one after another,
superscalar processors are able under certain circumstances (when the pipeline is full) of executing two or more instructions simultaneously. As an example, the original
Intel Pentium from 1993 can execute at most two instructions per clock cycle when its pipeline is full.
Program flow may be influenced by special 'jump' instructions that transfer execution to an address (and hence instruction) other than the next numerically sequential address. Whether these
conditional jumps occur is dependent upon a condition such as a value being greater than, less than, or equal to another value.
Assembly languages
A much more human friendly rendition of machine language, called
assembly language, uses
mnemonic codes to refer to machine code instructions, rather than using the instructions' numeric values directly, and uses
symbolic names to refer to storage locations and sometimes
registers. For example, on the
Zilog Z80
The Z80 is an 8-bit computing, 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first wor ...
processor, the machine code
00000101
, which causes the CPU to decrement the
B
processor register
A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only. ...
, would be represented in assembly language as
DEC B
.
Example
The
MIPS architecture
MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies, ...
provides a specific example for a machine code whose instructions are always 32 bits long. The general type of instruction is given by the ''op'' (operation) field, the highest 6 bits. J-type (jump) and I-type (immediate) instructions are fully specified by ''op''. R-type (register) instructions include an additional field ''funct'' to determine the exact operation. The fields used in these types are:
6 5 5 5 5 6 bits
rs , rt , rd , shamt, funct R-type
rs , rt , address/immediate I-type
target address J-type
''rs'', ''rt'', and ''rd'' indicate register operands; ''shamt'' gives a shift amount; and the ''address'' or ''immediate'' fields contain an operand directly.
For example, adding the registers 1 and 2 and placing the result in register 6 is encoded:
rs , rt , rd , shamt, funct 0 1 2 6 0 32 decimal
000000 00001 00010 00110 00000 100000 binary
Load a value into register 8, taken from the memory cell 68 cells after the location listed in register 3:
rs , rt , address/immediate 35 3 8 68 decimal
100011 00011 01000 00000 00001 000100 binary
Jumping to the address 1024:
target address 2 1024 decimal
000010 00000 00000 00000 10000 000000 binary
Overlapping instructions
On processor architectures with
variable-length instruction sets
(such as
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the devel ...
's
x86 processor family) it is, within the limits of the control-flow resynchronizing phenomenon known as the
Kruskal Count,
sometimes possible through opcode-level programming to deliberately arrange the resulting code so that two code paths share a common fragment of opcode sequences. These are called ''overlapping instructions'', ''overlapping opcodes'', ''overlapping code'', ''overlapped code'', ''instruction scission'', or ''jump into the middle of an instruction'', and represent a form of
superposition.
In the 1970s and 1980s, overlapping instructions were sometimes used to preserve memory space. One example were in the implementation of error tables in
Microsoft
Microsoft Corporation is an American multinational corporation, multinational technology company, technology corporation producing Software, computer software, consumer electronics, personal computers, and related services headquartered at th ...
's
Altair BASIC, where ''interleaved instructions'' mutually shared their instruction bytes.
The technique is rarely used today, but might still be necessary to resort to in areas where extreme optimization for size is necessary on byte-level such as in the implementation of
boot loader
A bootloader, also spelled as boot loader or called boot manager and bootstrap loader, is a computer program that is responsible for booting a computer.
When a computer is turned off, its softwareincluding operating systems, application code, an ...
s which have to fit into
boot sectors.
It is also sometimes used as a
code obfuscation technique as a measure against
disassembly and tampering.
The principle is also utilized in shared code sequences of
fat binaries which must run on multiple instruction-set-incompatible processor platforms.
This property is also used to find
unintended instruction
An illegal opcode, also called an unimplemented operation, unintended opcode or undocumented instruction, is an instruction to a CPU that is not mentioned in any official documentation released by the CPU's designer or manufacturer, which nev ...
s called
gadget
A gadget is a mechanical device or any ingenious article. Gadgets are sometimes referred to as '' gizmos''.
History
The etymology of the word is disputed. The word first appears as reference to an 18th-century tool in glassmaking that was develo ...
s in existing code repositories and is utilized in
return-oriented programming as alternative to
code injection for exploits such as
return-to-libc attacks.
Relationship to microcode
In some computers, the machine code of the
architecture
Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...
is implemented by an even more fundamental underlying layer called
microcode
In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...
, providing a common machine language interface across a line or family of different models of computer with widely different underlying
dataflow
In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming.
Software architecture
Da ...
s. This is done to facilitate
porting
In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally des ...
of machine language programs between different models. An example of this use is the IBM
System/360
The IBM System/360 (S/360) is a family of mainframe computer systems that was announced by IBM on April 7, 1964, and delivered between 1965 and 1978. It was the first family of computers designed to cover both commercial and scientific applic ...
family of computers and their successors. With dataflow path widths of 8 bits to 64 bits and beyond, they nevertheless present a common architecture at the machine language level across the entire line.
Using microcode to implement an
emulator
In computing, an emulator is hardware or software that enables one computer system (called the ''host'') to behave like another computer system (called the ''guest''). An emulator typically enables the host system to run software or use peri ...
enables the computer to present the architecture of an entirely different computer. The System/360 line used this to allow porting programs from earlier IBM machines to the new family of computers, e.g. an
IBM 1401/1440/1460 emulator on the IBM S/360 model 40.
Relationship to bytecode
Machine code is generally different from
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
(also known as p-code), which is either executed by an interpreter or itself compiled into machine code for faster (direct) execution. An exception is when a processor is designed to use a particular bytecode directly as its machine code, such as is the case with
Java processors.
Machine code and assembly code are sometimes called ''
native code'' when referring to platform-dependent parts of language features or libraries.
Storing in memory
From the point of view of the CPU, machine code is stored in RAM, but is typically also kept in a set of caches for performance reasons. There may be different caches for instructions and data, depending on the architecture.
The CPU knows what machine code to execute, based on its internal program counter. The program counter points to a memory address and is changed based on special instructions which may cause programmatic branches. The program counter is typically set to a hard coded value when the CPU is first powered on, and will hence execute whatever machine code happens to be at this address.
Similarly, the program counter can be set to execute whatever machine code is at some arbitrary address, even if this isn't valid machine code. This will typically trigger an architecture specific protection fault.
The CPU is oftentimes told, by page permissions in a paging based system, if the current page actually holds machine code by an execute bit — pages have multiple such permission bits (readable, writable, etc.) for various housekeeping functionality. E.g. on
Unix-like
A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
systems memory pages can be toggled to be executable with the system call, and on Windows, can be used to achieve a similar result. If an attempt is made to execute machine code on a non-executable page, an architecture specific fault will typically occur. Treating data as machine code, or finding new ways to use existing machine code, by various techniques, is the basis of some security vulnerabilities.
From the point of view of a
process, the ''code space'' is the part of its
address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.
For software programs to save and retrieve ...
where the code in execution is stored. In
multitasking systems this comprises the program's
code segment and usually
shared libraries
In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and su ...
. In
multi-threading environment, different threads of one process share code space along with data space, which reduces the overhead of
context switching considerably as compared to process switching.
Readability by humans
Pamela Samuelson wrote that machine code is so unreadable that the
United States Copyright Office cannot identify whether a particular encoded program is an original work of authorship;
however, the US Copyright Office ''does'' allow for copyright registration of computer programs
and a program's machine code can sometimes be
decompiled in order to make its functioning more easily understandable to humans.
However, the output of a decompiler or disassembler will be missing the comments and symbolic references, so while the output may be easier to read than the object code, it will still be more difficult than the original source code. This problem does not exist for object-code formats like
SQUOZE, where the source code is included in the file.
Cognitive science professor
Douglas Hofstadter
Douglas Richard Hofstadter (born February 15, 1945) is an American scholar of cognitive science, physics, and comparative literature whose research includes concepts such as the sense of self in relation to the external world, consciousness, an ...
has compared machine code to
genetic code
The genetic code is the set of rules used by living cells to translate information encoded within genetic material ( DNA or RNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links ...
, saying that "Looking at a program written in machine language is vaguely comparable to looking at a
DNA molecule atom by atom."
See also
*
Assembly language
*
Endianness
In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
*
List of machine languages
*
Machine code monitor
A machine code monitor ( machine language monitor) is software that allows a user to enter commands to view and change memory locations on a computer, with options to load and save memory contents from/to secondary storage. Some full-feature ...
*
Overhead code
In computing, object code or object module is the product of a compiler.
In a general sense object code is a sequence of statements or instructions in a computer language, usually a machine code language (i.e., binary) or an intermediate langua ...
*
P-code machine
*
Reduced instruction set computing
In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set compu ...
(RISC)
*
Very long instruction word
Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
* Teaching Machine Code:
Micro-Professor MPF-I
The Micro-Professor MPF-I, introduced in 1981 by Multitech (which, in 1987, changed its name to Acer), was the first branded computer product from Multitech and probably one of the world's longest selling computers. The MPF-I, specifically design ...
Notes
References
Further reading
*
*
*
{{Authority control
*
Low-level programming languages