HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, predication is an architectural feature that provides an alternative to conditional transfer of control, as implemented by conditional
branch A branch, sometimes called a ramus in botany, is a woody structural member connected to the central trunk of a tree (or sometimes a shrub). Large branches are known as boughs and small branches are known as twigs. The term ''twig'' usually ...
machine instructions. Predication works by having conditional (''predicated'') non-branch instructions associated with a ''predicate'', a Boolean value used by the instruction to control whether the instruction is allowed to modify the architectural state or not. If the predicate specified in the instruction is true, the instruction modifies the architectural state; otherwise, the architectural state is unchanged. For example, a predicated move instruction (a conditional move) will only modify the destination if the predicate is true. Thus, instead of using a conditional branch to select an instruction or a sequence of instructions to execute based on the predicate that controls whether the branch occurs, the instructions to be executed are associated with that predicate, so that they will be executed, or not executed, based on whether that predicate is true or false.
Vector processors In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
, some
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
ISAs (such as AVX2 and
AVX-512 AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in July 2013, and implemented in Intel's Xeon Phi x200 (Knights Landing) and Skylake-X CPUs; t ...
) and
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
s in general make heavy use of predication, applying one bit of a conditional ''mask Vector'' to the corresponding elements in the Vector registers being processed, whereas scalar predication in scalar instruction sets only need the one predicate bit. Where Predicate Masks become particularly powerful in Vector processing is if an ''array'' of Condition Codes, one per Vector element, may feed back into Predicate Masks that are then applied to subsequent Vector instructions.


Overview

Most
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
s contain conditional code, which will be executed only under specific conditions depending on factors that cannot be determined beforehand, for example depending on user input. As the majority of processors simply execute the next instruction in a sequence, the traditional solution is to insert ''branch'' instructions that allow a program to conditionally branch to a different section of code, thus changing the next step in the sequence. This was sufficient until designers began improving performance by implementing
instruction pipelining In computer engineering, instruction pipelining or ILP is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incom ...
, a method which is slowed down by branches. For a more thorough description of the problems which arose, and a popular solution, see
branch predictor In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
. Luckily, one of the more common patterns of code that normally relies on branching has a more elegant solution. Consider the following pseudocode: if condition else On a system that uses conditional branching, this might translate to machine instructions looking similar to: branch-if-condition to label1 dosomethingelse branch-to label2 label1: dosomething label2: ... With predication, all possible branch paths are coded inline, but some instructions execute while others do not. The basic idea is that each instruction is associated with a predicate (the word here used similarly to its usage in
predicate logic First-order logic—also known as predicate logic, quantificational logic, and first-order predicate calculus—is a collection of formal systems used in mathematics, philosophy, linguistics, and computer science. First-order logic uses quantifie ...
) and that the instruction will only be executed if the predicate is true. The machine code for the above example using predication might look something like this: (condition) dosomething (not condition) dosomethingelse Besides eliminating branches, less code is needed in total, provided the architecture provides predicated instructions. While this does not guarantee faster execution in general, it will if the dosomething and dosomethingelse blocks of code are short enough. Predication's simplest form is ''partial predication'', where the architecture has ''conditional move'' or ''conditional select'' instructions. Conditional move instructions write the contents of one register over another only if the predicate's value is true, whereas conditional select instructions choose which of two registers has its contents written to a third based on the predicate's value. A more generalized and capable form is ''full predication''. Full predication has a set of predicate registers for storing predicates (which allows multiple nested or sequential branches to be simultaneously eliminated) and most instructions in the architecture have a register specifier field to specify which predicate register supplies the predicate.


Advantages

The main purpose of predication is to avoid jumps over very small sections of program code, increasing the effectiveness of pipelined execution and avoiding problems with the
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
. It also has a number of more subtle benefits: *Functions that are traditionally computed using simple arithmetic and bitwise operations may be quicker to compute using predicated instructions. *Predicated instructions with different predicates can be mixed with each other and with unconditional code, allowing better
instruction scheduling In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing ...
and so even better performance. *Elimination of unnecessary branch instructions can make the execution of necessary branches, such as those that make up loops, faster by lessening the load on
branch prediction In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
mechanisms. *Elimination of the cost of a branch misprediction which can be high on deeply pipelined architectures. * Instruction sets that have comprehensive Condition Codes generated by instructions may reduce code size further by directly using the Condition Registers in or as predication.


Disadvantages

Predication's primary drawback is in increased encoding space. In typical implementations, every instruction reserves a bitfield for the predicate specifying under what conditions that instruction should have an effect. When available memory is limited, as on
embedded devices An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' as ...
, this space cost can be prohibitive. However, some architectures such as
Thumb-2 ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configure ...
are able to avoid this issue (see below). Other detriments are the following: *Predication complicates the hardware by adding levels of
logic Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the science of deductively valid inferences or of logical truths. It is a formal science investigating how conclusions follow from premise ...
to critical paths and potentially degrades clock speed. *A predicated block includes cycles for all operations, so shorter paths may take longer and be penalized. *Predication is not usually speculated and causes a longer dependency chain. For ordered data this translates to a performance loss compared to a predictable branch. Predication is most effective when paths are balanced or when the longest path is the most frequently executed, but determining such a path is very difficult at compile time, even in the presence of profiling information.


History

Predicated instructions were popular in European computer designs of the 1950s, including the
Mailüfterl Mailüfterl is a nickname for the Austrian ''Binär dezimaler Volltransistor-Rechenautomat'' (binary-decimal fully transistorized computing automaton), an early transistorized computer. Other early transistorized computers included TRADIC, Harwel ...
(1955), the Zuse Z22 (1955), the
ZEBRA Zebras (, ) (subgenus ''Hippotigris'') are African equines with distinctive black-and-white striped coats. There are three living species: the Grévy's zebra (''Equus grevyi''), plains zebra (''E. quagga''), and the mountain zebra (''E. zebr ...
(1958), and the
Electrologica X1 The Electrologica X1 was a digital computer designed and manufactured in the Netherlands from 1958 to 1965. About thirty were produced and sold in the Netherlands and abroad. The X1 was designed by the Mathematical Centre in Amsterdam, an acade ...
(1958). The IBM ACS-1 design of 1967 allocated a "skip" bit in its instruction formats, and the CDC Flexible Processor in 1976 allocated three conditional execution bits in its microinstruction formats. Hewlett-Packard's
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
architecture (1986) had a feature called ''nullification'', which allowed most instructions to be predicated by the previous instruction. IBM's POWER architecture (1990) featured conditional move instructions. POWER's successor, PowerPC (1993), dropped these instructions.
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president un ...
's Alpha architecture (1992) also featured conditional move instructions. MIPS gained conditional move instructions in 1994 with the MIPS IV version; and
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
was extended in Version 9 (1994) with conditional move instructions for both integer and floating-point registers. In the Hewlett-Packard/
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in col ...
architecture, most instructions are predicated. The predicates are stored in 64 special-purpose predicate registers; and one of the predicate registers is always true so that ''unpredicated'' instructions are simply instructions predicated with the value true. The use of predication is essential in IA-64's implementation of
software pipelining In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the ...
because it avoids the need for writing separated code for prologs and epilogs. In the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
architecture, a family of conditional move instructions (CMOV and FCMOV) were added to the architecture by the
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
Pentium Pro The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It introduced the P6 microarchitecture (sometimes termed i686) and was originally intended to replace the original ...
(1995) processor. The CMOV instructions copied the contents of the source register to the destination register depending on a predicate supplied by the value of the flag register. In the ARM architecture, the original 32-bit instruction set provides a feature called ''conditional execution'' that allows most instructions to be predicated by one of 13 predicates that are based on some combination of the four condition codes set by the previous instruction. ARM's Thumb instruction set (1994) dropped conditional execution to reduce the size of instructions so they could fit in 16 bits, but its successor,
Thumb-2 ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configure ...
(2003) overcame this problem by using a special instruction which has no effect other than to supply predicates for the following four instructions. The 64-bit instruction set introduced in ARMv8-A (2011) replaced conditional execution with conditional selection instructions.


SIMD, SIMT and vector predication

Some
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
instruction sets, like AVX2, have the ability to use a logical mask to conditionally load/store values to memory, a parallel form of the conditional move, and may also apply individual mask bits to individual arithmetic units executing a parallel operation. The technique is known in Flynn's taxonomy as "associative processing". This form of predication is also used in
vector processors In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data called ...
and
single instruction, multiple threads Single instruction, multiple threads (SIMT) is an execution model used in parallel computing where single instruction, multiple data (SIMD) is combined with multithreading. It is different from SPMD in that all instructions in all "threads" are exe ...
GPU computing. All the techniques, advantages and disadvantages of single scalar predication apply just as well to the parallel processing case.


See also

*
Branch predictor In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
*
Control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''im ...
*
Delay slot In computer architecture, a delay slot is an instruction slot being executed without the effects of a preceding instruction. The most common form is a single arbitrary instruction located immediately after a branch instruction on a RISC or DSP ...
*
Instruction-level parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. Disc ...
*
Optimizing compiler In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power cons ...
*
Pipeline stall In the design of pipelined computer processors, a pipeline stall is a delay in execution of an instruction in order to resolve a hazard. Details In a standard five-stage pipeline, during the decoding stage, the control unit will determine whe ...
*
Software pipelining In computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler (or in the ...
*
Speculative execution Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing ...
*
Vector processor In computing, a vector processor or array processor is a central processing unit (CPU) that implements an instruction set where its instructions are designed to operate efficiently and effectively on large one-dimensional arrays of data calle ...
*
Very long instruction word Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...


References


Further reading

* {{DEFAULTSORT:Predication Conditional constructs Instruction processing