In
computer architecture, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a
branch instruction on a
RISC
In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comput ...
or
DSP architecture; this instruction will execute even if the preceding branch is taken. Thus, by design, the instructions appear to execute in an illogical or incorrect order. It is typical for
assemblers to automatically reorder instructions by default, hiding the awkwardness from assembly developers and compilers.
Branch delay slots
When a branch instruction is involved, the location of the following delay slot instruction in the
pipeline may be called a branch delay slot. Branch delay slots are found mainly in
DSP architectures and older
RISC
In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comput ...
architectures.
MIPS,
PA-RISC,
ETRAX CRIS,
SuperH, and
SPARC are RISC architectures that each have a single branch delay slot;
PowerPC
PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
,
ARM,
Alpha
Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...
, and
RISC-V do not have any.
DSP architectures that each have a single branch delay slot include the
VS DSP
VS, Vs or vs may refer to:
Arts, entertainment and media Film and television
* ''Vs'' (film), or All Superheroes Must Die'', a 2011 horror film
* ''Vs.'' (game show), 1999
* "VS.", an episode of ''Prison Break''
Gaming
* ''Vs.'' (video game ...
,
μPD77230 and
TMS320C3x. The
SHARC DSP and
MIPS-X use a double branch delay slot; such a processor will execute a pair of instructions following a branch instruction before the branch takes effect. The
TMS320C4x uses a triple branch delay slot.
The following example shows delayed branches in assembly language for the SHARC DSP including a pair after the RTS instruction. Registers R0 through R9 are cleared to zero in order by number (the register cleared after R6 is R7, not R9). No instruction executes more than once.
R0 = 0;
CALL fn (DB); /* call a function, below at label "fn" */
R1 = 0; /* first delay slot */
R2 = 0; /* second delay slot */
/***** discontinuity here (the CALL takes effect) *****/
R6 = 0; /* the CALL/RTS comes back here, not at "R1 = 0" */
JUMP end (DB);
R7 = 0; /* first delay slot */
R8 = 0; /* second delay slot */
/***** discontinuity here (the JUMP takes effect) *****/
/* next 4 instructions are called from above, as function "fn" */
fn: R3 = 0;
RTS (DB); /* return to caller, past the caller's delay slots */
R4 = 0; /* first delay slot */
R5 = 0; /* second delay slot */
/***** discontinuity here (the RTS takes effect) *****/
end: R9 = 0;
The goal of a
pipelined architecture is to complete an instruction every clock cycle. To maintain this rate, the pipeline must be full of instructions at all times. The branch delay slot is a side effect of pipelined architectures due to the
branch hazard, i.e. the fact that the branch would not be resolved until the instruction has worked its way through the pipeline. A simple design would insert stalls into the pipeline after a branch instruction until the new branch target address is computed and loaded into the
program counter
The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
. Each cycle where a stall is inserted is considered one branch delay slot. A more sophisticated design would execute program instructions that are not dependent on the result of the branch instruction. This optimization can be performed in
software at
compile time by moving instructions into branch delay slots in the in-memory instruction stream, if the hardware supports this. Another side effect is that special handling is needed when managing
breakpoints
In software development, a breakpoint is an intentional stopping or pausing place in a program, put in place for debugging purposes. It is also sometimes simply referred to as a pause.
More generally, a breakpoint is a means of acquiring knowle ...
on instructions as well as stepping while
debugging
In computer programming and software development, debugging is the process of finding and resolving '' bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems.
Debugging tactics can involve in ...
within branch delay slot.
The ideal number of branch delay slots in a particular pipeline implementation is dictated by the number of pipeline stages, the presence of
register forwarding, what stage of the pipeline the branch conditions are computed, whether or not a
branch target buffer (BTB) is used and many other factors. Software compatibility requirements dictate that an architecture may not change the number of delay slots from one generation to the next. This inevitably requires that newer hardware implementations contain extra hardware to ensure that the architectural behavior is followed despite no longer being relevant.
Load delay slot
A load delay slot is an instruction which executes immediately after a load (of a register from memory) but does not see, and need not wait for, the result of the load. Load delay slots are very uncommon because load delays are highly unpredictable on modern hardware. A load may be satisfied from RAM or from a cache, and may be slowed by resource contention. Load delays were seen on very early RISC processor designs. The
MIPS I ISA (implemented in the
R2000 and
R3000 microprocessors) suffers from this problem.
The following example is MIPS I assembly code, showing both a load delay slot and a branch delay slot.
lw v0,4(v1) # load word from address v1+4 into v0
nop # wasted load delay slot
jr v0 # jump to the address specified by v0
nop # wasted branch delay slot
See also
*
Control flow
*
Bubble (computing)
In the design of pipelined computer processors, a pipeline stall is a delay in execution of an instruction in order to resolve a hazard.
Details
In a standard five-stage pipeline, during the decoding stage, the control unit will determine whe ...
*
Branch predication
External links
*
*
{{refend
Instruction processing