computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations ( computation) automatically. Modern digital electronic computers can perform generic sets of operations known as programs. These prog ...

central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, a ...

s, micro-operations (also known as micro-ops or μops, historically also as micro-actions) are detailed low-level instructions used in some designs to implement complex machine instructions (sometimes termed macro-instructions in this context). Usually, micro-operations perform basic operations on data stored in one or more registers, including transferring data between registers or between registers and external

buses A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for char ...

of the

(CPU), and performing arithmetic or logical operations on registers. In a typical

fetch-decode-execute cycle The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch-execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instruction ...

, each step of a macro-instruction is decomposed during its execution so the CPU determines and steps through a series of micro-operations. The execution of micro-operations is performed under control of the CPU's

control unit The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the op ...

, which decides on their execution while performing various optimizations such as reordering, fusion and caching.

Optimizations

Various forms of μops have long been the basis for traditional

microcode In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a la ...

routines used to simplify the implementation of a particular

CPU design Processor design is a subfield of computer engineering and electronics engineering (fabrication) that deals with creating a processor, a key component of computer hardware. The design process involves choosing an instruction set and a certain exe ...

or perhaps just the sequencing of certain multi-step operations or addressing modes. More recently, μops have also been employed in a different way in order to let modern CISC processors more easily handle asynchronous parallel and speculative execution: As with traditional microcode, one or more table lookups (or equivalent) is done to locate the appropriate μop-sequence based on the encoding and semantics of the machine instruction (the decoding or translation step), however, instead of having rigid μop-sequences controlling the CPU directly from a microcode- ROM, μops are here dynamically buffered for rescheduling before being executed. This buffering means that the fetch and decode stages can be more detached from the execution units than is feasible in a more traditional microcoded (or hard-wired) design. As this allows a degree of freedom regarding execution order, it makes some extraction of

instruction-level parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. Disc ...

out of a normal single-threaded program possible (provided that dependencies are checked etc.). It opens up for more analysis and therefore also for reordering of code sequences in order to dynamically optimize mapping and scheduling of μops onto machine resources (such as ALUs, load/store units etc.). As this happens on the μop-level, sub-operations of different machine (macro) instructions may often intermix in a particular μop-sequence, forming partially reordered machine instructions as a direct consequence of the out-of-order dispatching of microinstructions from several macro instructions. However, this is not the same as the ''micro-op fusion'', which aims at the fact that a more complex microinstruction may replace a few simpler microinstructions in certain cases, typically in order to minimize state changes and usage of the queue and

re-order buffer A re-order buffer (ROB) is a hardware unit used in an extension to the Tomasulo algorithm to support out-of-order and speculative instruction execution. The extension forces instructions to be committed in-order. The buffer is a circular buffe ...

space, therefore reducing power consumption. Micro-op fusion is used in some modern CPU designs. Execution optimization has gone even further; processors not only translate many machine instructions into a series of μops, but also do the opposite when appropriate; they combine certain machine instruction sequences (such as a compare followed by a conditional jump) into a more complex μop which fits the execution model better and thus can be executed faster or with less machine resources involved. This is also known as ''macro-op fusion''. Another way to try to improve performance is to cache the decoded micro-operations in a micro-operation cache, so that if the same macroinstruction is executed again, the processor can directly access the decoded micro-operations from the cache, instead of decoding them again. The

execution trace cache The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of central processing units (CPUs) made by Intel. The first CPU to use this architecture was the Willamette-core Pentium ...

found in

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 ser ...

NetBurst microarchitecture (

Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 ...

) is a widespread example of this technique. The size of this cache may be stated in terms of how many thousands (or strictly multiple of 1024) of micro-operations it can store: ''Kμops''.

References

{{CPU technologies Instruction processing Central processing unit