HOME

TheInfoList



OR:

In
computer architecture In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, the ...
, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a
branch A branch, sometimes called a ramus in botany, is a woody structural member connected to the central trunk of a tree (or sometimes a shrub). Large branches are known as boughs and small branches are known as twigs. The term ''twig'' usually ...
instruction on a RISC or DSP architecture; this instruction will execute even if the preceding branch is taken. Thus, by design, the instructions appear to execute in an illogical or incorrect order. It is typical for
assembler Assembler may refer to: Arts and media * Nobukazu Takemura, avant-garde electronic musician, stage name Assembler * Assemblers, a fictional race in the ''Star Wars'' universe * Assemblers, an alternative name of the superhero group Champions of ...
s to automatically reorder instructions by default, hiding the awkwardness from assembly developers and compilers.


Branch delay slots

When a branch instruction is involved, the location of the following delay slot instruction in the pipeline may be called a branch delay slot. Branch delay slots are found mainly in DSP architectures and older RISC architectures. MIPS,
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
,
ETRAX CRIS The ETRAX CRIS is a RISC ISA and series of CPUs designed and manufactured by Axis Communications for use in embedded systems since 1993. The name is an acronym of the chip's features: ''Ethernet, Token Ring, AXis - Code Reduced Instruction Set''. To ...
,
SuperH SuperH (or SH) is a 32-bit reduced instruction set computing (RISC) instruction set architecture (ISA) developed by Hitachi and currently produced by Renesas. It is implemented by microcontrollers and microprocessors for embedded systems. At the ...
, and
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
are RISC architectures that each have a single branch delay slot; PowerPC,
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
, Alpha, and
RISC-V RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on estab ...
do not have any. DSP architectures that each have a single branch delay slot include the VS DSP, μPD77230 and TMS320C3x. The SHARC DSP and
MIPS-X MIPS-X is a reduced instruction set computer (RISC) microprocessor and instruction set architecture (ISA) developed as a follow-on project to the Stanford MIPS, MIPS project at Stanford University by the same team that developed MIPS. The project, ...
use a double branch delay slot; such a processor will execute a pair of instructions following a branch instruction before the branch takes effect. The
TMS320C4x Texas Instruments TMS320 is a blanket name for a series of digital signal processors (DSPs) from Texas Instruments. It was introduced on April 8, 1983 through the TMS32010 processor, which was then the fastest DSP on the market. The processor is ...
uses a triple branch delay slot. The following example shows delayed branches in assembly language for the SHARC DSP including a pair after the RTS instruction. Registers R0 through R9 are cleared to zero in order by number (the register cleared after R6 is R7, not R9). No instruction executes more than once.
     R0 = 0;
     CALL fn (DB);      /* call a function, below at label "fn" */
     R1 = 0;            /* first delay slot */
     R2 = 0;            /* second delay slot */
     /***** discontinuity here (the CALL takes effect) *****/

     R6 = 0;            /* the CALL/RTS comes back here, not at "R1 = 0" */
     JUMP end (DB);
     R7 = 0;            /* first delay slot */
     R8 = 0;            /* second delay slot */
     /***** discontinuity here (the JUMP takes effect) *****/

     /* next 4 instructions are called from above, as function "fn" */
fn:  R3 = 0;
     RTS (DB);          /* return to caller, past the caller's delay slots */
     R4 = 0;            /* first delay slot */
     R5 = 0;            /* second delay slot */
     /***** discontinuity here (the RTS takes effect) *****/

end: R9 = 0;
The goal of a pipelined architecture is to complete an instruction every clock cycle. To maintain this rate, the pipeline must be full of instructions at all times. The branch delay slot is a side effect of pipelined architectures due to the branch hazard, i.e. the fact that the branch would not be resolved until the instruction has worked its way through the pipeline. A simple design would insert stalls into the pipeline after a branch instruction until the new branch target address is computed and loaded into the program counter. Each cycle where a stall is inserted is considered one branch delay slot. A more sophisticated design would execute program instructions that are not dependent on the result of the branch instruction. This optimization can be performed in
software Software is a set of computer programs and associated software documentation, documentation and data (computing), data. This is in contrast to Computer hardware, hardware, from which the system is built and which actually performs the work. ...
at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concep ...
by moving instructions into branch delay slots in the in-memory instruction stream, if the hardware supports this. Another side effect is that special handling is needed when managing breakpoints on instructions as well as stepping while debugging within branch delay slot. The ideal number of branch delay slots in a particular pipeline implementation is dictated by the number of pipeline stages, the presence of register forwarding, what stage of the pipeline the branch conditions are computed, whether or not a branch target buffer (BTB) is used and many other factors. Software compatibility requirements dictate that an architecture may not change the number of delay slots from one generation to the next. This inevitably requires that newer hardware implementations contain extra hardware to ensure that the architectural behavior is followed despite no longer being relevant.


Load delay slot

A load delay slot is an instruction which executes immediately after a load (of a register from memory) but does not see, and need not wait for, the result of the load. Load delay slots are very uncommon because load delays are highly unpredictable on modern hardware. A load may be satisfied from RAM or from a cache, and may be slowed by resource contention. Load delays were seen on very early RISC processor designs. The
MIPS I MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies, ...
ISA (implemented in the R2000 and R3000 microprocessors) suffers from this problem. The following example is MIPS I assembly code, showing both a load delay slot and a branch delay slot. lw v0,4(v1) # load word from address v1+4 into v0 nop # wasted load delay slot jr v0 # jump to the address specified by v0 nop # wasted branch delay slot


See also

*
Control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''im ...
* Bubble (computing) *
Branch predication In computer science, predication is an architectural feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (''predicated'') n ...


External links

* * {{refend Instruction processing