Multicycle control units
The simplest computers use a multicycle microarchitecture. These were the earliest designs. They are still popular in the very smallest computers, such as the embedded systems that operate machinery. In a computer, the control unit often steps through the instruction cycle successively. This consists of fetching the instruction, fetching the operands, decoding the instruction, executing the instruction, and then writing the results back to memory. When the next instruction is placed in the control unit, it changes the behavior of the control unit to complete the instruction correctly. So, the bits of the instruction directly control the control unit, which in turn controls the computer. The control unit may include a binary counter to tell the control unit's logic what step it should do. Multicycle control units typically use both the rising and falling edges of their square-wave timing clock. They operate a step of their operation on each edge of the timing clock, so that a four-step operation completes in two clock cycles. This doubles the speed of the computer, given the same logic family. Many computers have two different types of unexpected events. AnPipelined control units
Many medium-complexity computers pipeline instructions. This design is popular because of its economy and speed. In a pipelined computer, instructions flow through the computer. This design has several stages. For example, it might have one stage for each step of the Von Neumann cycle. A pipelined computer usually has "pipeline registers" after each stage. These store the bits calculated by a stage so that the logic gates of the next stage can use the bits to do the next step. It is common for even numbered stages to operate on one edge of the square-wave clock, while odd-numbered stages operate on the other edge. This speeds the computer by a factor of two compared to single-edge designs. In a pipelined computer, the control unit arranges for the flow to start, continue, and stop as a program commands. The instruction data is usually passed in pipeline registers from one stage to the next, with a somewhat separated piece of control logic for each stage. The control unit also assures that the instruction in each stage does not harm the operation of instructions in other stages. For example, if two stages must use the same piece of data, the control logic assures that the uses are done in the correct sequence. When operating efficiently, a pipelined computer will have an instruction in each stage. It is then working on all of those instructions at the same time. It can finish about one instruction for each cycle of its clock. When a program makes a decision, and switches to a different sequence of instructions, the pipeline sometimes must discard the data in process and restart. This is called a "stall." When two instructions could interfere, sometimes the control unit must stop processing a later instruction until an earlier instruction completes. This is called a "pipeline bubble" because a part of the pipeline is not processing instructions. Pipeline bubbles can occur when two instructions operate on the same register. Interrupts and unexpected exceptions also stall the pipeline. If a pipelined computer abandons work for an interrupt, more work is lost than in a multicycle computer. Predictable exceptions do not need to stall. For example, if an exception instruction is used to enter the operating system, it does not cause a stall. For the same speed of electronic logic, a pipelined computer can execute more instructions per second than a multicycle computer. Also, even though the electronic logic has a fixed maximum speed, a pipelined computer can be made faster or slower by varying the number of stages in the pipeline. With more stages, each stage does less work, and so the stage has fewer delays from thePreventing stalls
Control units use many methods to keep a pipeline full and avoid stalls. For example, even simple control units can assume that a backwards branch, to a lower-numbered, earlier instruction, is a loop, and will be repeated. So, a control unit with this design will always fill the pipeline with the backwards branch path. If aOut of order control units
A control unit can be designed to finish what it can. If several instructions can be completed at the same time, the control unit will arrange it. So, the fastest computers can process instructions in a sequence that can vary somewhat, depending on when the operands or instruction destinations become available. Most supercomputers and many PC CPUs use this method. The exact organization of this type of control unit depends on the slowest part of the computer. When the execution of calculations is the slowest, instructions flow from memory into pieces of electronics called "issue units." An issue unit holds an instruction until both its operands and an execution unit are available. Then, the instruction and its operands are "issued" to an execution unit. The execution unit does the instruction. Then the resulting data is moved into a queue of data to be written back to memory or registers. If the computer has multiple execution units, it can usually do several instructions per clock cycle. It is common to have specialized execution units. For example, a modestly priced computer might have only one floating-point execution unit, because floating point units are expensive. The same computer might have several integer units, because these are relatively inexpensive, and can do the bulk of instructions. One kind of control unit for issuing uses an array of electronic logic, a "scoreboard" that detects when an instruction can be issued. The "height" of the array is the number of execution units, and the "length" and "width" are each the number of sources of operands. When all the items come together, the signals from the operands and execution unit will cross. The logic at this intersection detects that the instruction can work, so the instruction is "issued" to the free execution unit. An alternative style of issuing control unit implements the Tomasulo algorithm, which reorders a hardware queue of instructions. In some sense, both styles utilize a queue. The scoreboard is an alternative way to encode and reorder a queue of instructions, and some designers call it a queue table. With some additional logic, a scoreboard can compactly combine execution reordering, register renaming and precise exceptions and interrupts. Further it can do this without the power-hungry, complex content-addressable memory used by the Tomasulo algorithm. If the execution is slower than writing the results, the memory write-back queue always has free entries. But what if the memory writes slowly? Or what if the destination register will be used by an "earlier" instruction that has not yet issued? Then the write-back step of the instruction might need to be scheduled. This is sometimes called "retiring" an instruction. In this case, there must be scheduling logic on the back end of execution units. It schedules access to the registers or memory that will get the results. Retiring logic can also be designed into an issuing scoreboard or a Tomasulo queue, by including memory or register access in the issuing logic. Out of order controllers require special design features to handle interrupts. When there are several instructions in progress, it is not clear where in the instruction stream an interrupt occurs. For input and output interrupts, almost any solution works. However, when a computer has virtual memory, an interrupt occurs to indicate that a memory access failed. This memory access must be associated with an exact instruction and an exact processor state, so that the processor's state can be saved and restored by the interrupt. A usual solution preserves copies of registers until a memory access completes. Also, out of order CPUs have even more problems with stalls from branching, because they can complete several instructions per clock cycle, and usually have many instructions in various stages of progress. So, these control units might use all of the solutions used by pipelined processors.Translating control units
Some computers translate each single instruction into a sequence of simpler instructions. The advantage is that an out of order computer can be simpler in the bulk of its logic, while handling complex multi-step instructions. x86 Intel CPUs since the Pentium Pro translate complex CISC x86 instructions to more RISC-like internal micro-operations. In these, the "front" of the control unit manages the translation of instructions. Operands are not translated. The "back" of the CU is an out-of-order CPU that issues the micro-operations and operands to the execution units and data paths.Control units for low-powered computers
Many modern computers have controls that minimize power usage. In battery-powered computers, such as those in cell-phones, the advantage is longer battery life. In computers with utility power, the justification is to reduce the cost of power, cooling or noise. Most modern computers use CMOS logic. CMOS wastes power in two common ways: By changing state, i.e. "active power", and by unintended leakage. The active power of a computer can be reduced by turning off control signals. Leakage current can be reduced by reducing the electrical pressure, the voltage, making the transistors with larger depletion regions or turning off the logic completely. Active power is easier to reduce because data stored in the logic is not affected. The usual method reduces the CPU's clock rate. Most computer systems use this method. It is common for a CPU to idle during the transition to avoid side-effects from the changing clock. Most computers also have a "halt" instruction. This was invented to stop non-interrupt code so that interrupt code has reliable timing. However, designers soon noticed that a halt instruction was also a good time to turn off a CPU's clock completely, reducing the CPU's active power to zero. The interrupt controller might continue to need a clock, but that usually uses much less power than the CPU. These methods are relatively easy to design, and became so common that others were invented for commercial advantage. Many modern low-power CMOS CPUs stop and start specialized execution units and bus interfaces depending on the needed instruction. Some computers even arrange the CPU's microarchitecture to use transfer-triggered multiplexers so that each instruction only utilises the exact pieces of logic needed. One common method is to spread the load to many CPUs, and turn off unused CPUs as the load reduces. The operating system's task switching logic saves the CPUs' data to memory. In some cases, one of the CPUs can be simpler and smaller, literally with fewer logic gates. So, it has low leakage, and it is the last to be turned off, and the first to be turned on. Also it then is the only CPU that requires special low-power features. A similar method is used in most PCs, which usually have an auxiliary embedded CPU that manages the power system. However, in PCs, the software is usually in the BIOS, not the operating system. Theoretically, computers at lower clock speeds could also reduce leakage by reducing the voltage of the power supply. This affects the reliability of the computer in many ways, so the engineering is expensive, and it is uncommon except in relatively expensive computers such as PCs or cellphones. Some designs can use very low leakage transistors, but these usually add cost. The depletion barriers of the transistors can be made larger to have less leakage, but this makes the transistor larger and thus both slower and more expensive. Some vendors use this technique in selected portions of an IC by constructing low leakage logic from large transistors that some processes provide for analog circuits. Some processes place the transistors above the surface of the silicon, in "fin fets", but these processes have more steps, so are more expensive. Special transistor doping materials (e.g. hafnium) can also reduce leakage, but this adds steps to the processing, making it more expensive. Some semiconductors have a larger band-gap than silicon. However, these materials and processes are currently (2020) more expensive than silicon. Managing leakage is more difficult, because before the logic can be turned-off, the data in it must be moved to some type of low-leakage storage. Some CPUs make use of a special type of flip-flop (to store a bit) that couples a fast, high-leakage storage cell to a slow, large (expensive) low-leakage cell. These two cells have separated power supplies. When the CPU enters a power saving mode (e.g. because of a halt that waits for an interrupt), data is transferred to the low-leakage cells, and the others are turned off. When the CPU leaves a low-leakage mode (e.g. because of an interrupt), the process is reversed. Older designs would copy the CPU state to memory, or even disk, sometimes with specialized software. Very simple embedded systems sometimes just restart.Integrating with the Computer
All modern CPUs have control logic to attach the CPU to the rest of the computer. In modern computers, this is usually a bus controller. When an instruction reads or writes memory, the control unit either controls the bus directly, or controls a bus controller. Many modern computers use the same bus interface for memory, input and output. This is called "memory-mapped I/O". To a programmer, the registers of the I/O devices appear as numbers at specific memory addresses. x86 PCs use an older method, a separate I/O bus accessed by I/O instructions. A modern CPU also tends to include anFunctions of the control unit
Thus a program of instructions in memory will cause the CU to configure a CPU's data flows to manipulate the data correctly between instructions. This results in a computer that could run a complete program and require no human intervention to make hardware changes between instructions (as had to be done when using only punch cards for computations before stored programmed computers with CUs were invented).Hardwired control unit
Microprogram control unit
The idea of microprogramming was introduced by Maurice Wilkes in 1951 as an intermediate level to executeCombination methods of design
A popular variation on microcode is to debug the microcode using a software simulator. Then, the microcode is a table of bits. This is a logicalSee also
* Processor design * Computer architecture *References
{{DEFAULTSORT:Control Unit Central processing unit Digital electronics