The XOP (''eXtended Operations'')

instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...

, announced by

AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...

on May 1, 2009, is an extension to the 128-bit SSE core instructions in the

x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...

and

AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging m ...

instruction set for the

Bulldozer A bulldozer or dozer (also called a crawler) is a large, motorized machine equipped with a metal blade to the front for pushing material: soil, sand, snow, rubble, or rock during construction work. It travels most commonly on continuous track ...

processor core, which was released on October 12, 2011. However AMD removed support for XOP from

Zen (microarchitecture) Zen is the codename for a family of computer processor microarchitectures from AMD, first launched in February 2017 with the first generation of its Ryzen CPUs. It is used in Ryzen (desktop and mobile), Ryzen Threadripper (workstation/high en ...

onward. The XOP instruction set contains several different types of vector instructions since it was originally intended as a major upgrade to SSE. Most of the instructions are integer instructions, but it also contains floating point permutation and floating point fraction extraction instructions. See the index for a list of instruction types.

History

XOP is a revised subset of what was originally intended as

SSE5 The SSE5 (short for Streaming SIMD Extensions version 5) was a SIMD instruction set extension proposed by AMD on August 30, 2007 as a supplement to the 128-bit SSE core instructions in the AMD64 architecture. AMD chose not to implement SSE5 as or ...

. It was changed to be similar but not overlapping with

AVX AVX may refer to: Technology * Advanced Vector Extensions, an instruction set extension in the x86 microprocessor architecture ** AVX2, an expansion of the AVX instruction set ** AVX-512, 512-bit extensions to the 256-bit AVX * AVX Corporation, a m ...

, parts that overlapped with AVX were removed or moved to separate standards such as FMA4 (floating-point vector multiply–accumulate) and CVT16 (

Half-precision In computing, half precision (sometimes called FP16) is a binary floating-point computer number format that occupies 16 bits (two bytes in modern computers) in computer memory. It is intended for storage of floating-point values in applications w ...

floating-point conversion implemented as F16C by

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...

). All SSE5 instructions that were equivalent or similar to instructions in the

and FMA4 instruction sets announced by Intel have been changed to use the coding proposed by Intel.

Integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...

instructions ''without'' equivalents in AVX were classified as the XOP extension. The XOP instructions have an opcode byte 8F (

hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...

), but otherwise almost identical coding scheme as

with the 3-byte VEX prefix. Commentators have seen this as evidence that Intel has not allowed AMD to use any part of the large VEX coding space. AMD has been forced to use different codes in order to avoid using any code combination that Intel might possibly be using in its development pipeline for something else. The XOP coding scheme is as close to the VEX scheme as technically possible without risking that the AMD codes overlap with future Intel codes. This inference is speculative, since no public information is available about negotiations between the two companies on this issue. The use of the 8F byte requires that the m-bits (see

VEX coding scheme The VEX prefix (from "vector extensions") and VEX coding scheme are an extension to the x86 and x86-64 instruction set architecture for microprocessors from Intel, AMD and others. Features The VEX coding scheme allows the definition of new instr ...

) have a value larger than or equal to 8 in order to avoid overlap with existing instructions.Byte value 0x8F is an existing opcode for a POP instruction. This instruction uses the ModR/M byte, which follows the opcode, but it does not make use of the "reg" (register) field, which is bits 3-5. Some opcodes which don't use "reg" multiplex instructions by using these bits to signify eight different instructions (0x80-0x83 and 0xD0-0xDF, among others); 0x8F does not. This means, for a standard POP instruction, bits 3-5 should always be zero. Since the m-bits are bits 0-4, requiring a value 8 or higher sets bit 3 of the byte following 0x8F. The C4 byte used in the VEX scheme has no such restriction. This may prevent the use of the m-bits for other purposes in the future in the XOP scheme, but not in the VEX scheme. Another possible problem is that the pp bits have the value 00 in the XOP scheme, while they have the value 01 in the VEX scheme for instructions that have no legacy equivalent. This may complicate the use of the pp bits for other purposes in the future. A similar compatibility issue is the difference between the FMA3 and FMA4 instruction sets. Intel initially proposed FMA4 in AVX/FMA specification version 3 to supersede the 3-operand FMA proposed by AMD in SSE5. After AMD adopted FMA4, Intel canceled FMA4 support and reverted to FMA3 in the AVX/FMA specification version 5 (See FMA history). In March 2015, AMD explicitly revealed in the description of the patch for the GNU Binutils package that

Zen Zen ( zh, t=禪, p=Chán; ja, text= 禅, translit=zen; ko, text=선, translit=Seon; vi, text=Thiền) is a school of Mahayana Buddhism that originated in China during the Tang dynasty, known as the Chan School (''Chánzong'' 禪宗), and ...

, its third-generation x86-64 architecture in its first iteration (znver1 – Zen, version 1), will not support TBM, FMA4, XOP and LWP instructions developed specifically for the "Bulldozer" family of micro-architectures.

Integer vector multiply–accumulate instructions

These are integer version of the

FMA instruction set The FMA instruction set is an extension to the 128 and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations."FMA3 and FMA4 are not instruction sets, they are ind ...

. These are all four operand instructions similar to FMA4 and they all operate on signed integers.

Integer vector horizontal addition

Horizontal addition instructions adds adjacent values in the input vector to each other. The output size in the instructions below describes how wide the horizontal addition performed is. For instance horizontal byte to word adds two bytes at a time and returns the result as vector of words, but byte to quadword adds eight bytes together at a time and returns the result as vector of quadwords. Six additional horizontal addition and subtraction instructions can be found in

SSSE3 Supplemental Streaming SIMD Extensions 3 (SSSE3 or SSE3S) is a SIMD instruction set created by Intel and is the fourth iteration of the SSE technology. History SSSE3 was first introduced with Intel processors based on the Core microarchitectu ...

, but they operate on two input vectors and only does two and two operations.

Integer vector compare

This set of vector compare instructions all take an immediate as an extra argument. The immediate controls what kind of comparison is performed. There are eight comparison possible for each instruction. The vectors are compared and all comparisons that evaluate to true set all corresponding bits in the destination to 1, and false comparisons sets all the same bits to 0. This result can be used directly in VPCMOV instruction for a vectorized

conditional move In computer science, predication is an architectural feature that provides an alternative to conditional transfer of control, as implemented by conditional branch machine instructions. Predication works by having conditional (''predicated'') n ...

Vector conditional move

VPCMOV works as bitwise variant of the blend instructions in SSE4. For each bit in the selector 1 selects the same bit in the first source, and 0 selects the same in the second source. When used together with the XOP vector comparison instructions above this can be used to implement a vectorized ternary move, or if the second input is the same as the destination, a conditional move (CMOV).

Integer vector shift and rotate instructions

The shift instructions here differ from those in SSE2 in that they can shift each unit with a different amount using a vector register interpreted as packed signed integers. The sign indicates the direction of shift or rotate, with positive values causing left shift and negative right shift Intel has specified a different incompatible set of variable vector shift instructions in AVX2.

Vector permute

VPPERM is a single instruction that combines the

instruction PALIGNR and PSHUFB and adds more to both. Some compare it the Altivec instruction VPERM. It takes three registers as input, the first two are source registers and the third the selector register. Each byte in the selector selects one of the bytes in one of the two input registers for the output. The selector can also apply effects on the selected bytes such as setting it to 0, reverse the bit order, and repeating the most-significant bit. All of the effects or the input can in addition be inverted. The VPERMIL2PD and VPERMIL2PS instructions are two source versions of the VPERMILPD and VPERMILPS instructions in

which means like VPPERM they can select output from any of the fields in the two inputs.

Floating-point fraction extraction

These instructions extracts the fractional part of floating point, that is the part that would be lost in conversion to integer.

CPUs with XOP

: ** "Heavy Equipment" processors *** Bulldozer-based processors, Q4 2011 *** Piledriver-based processors, Q4 2012 *** Steamroller-based processors, Q1 2014 *** Excavator-based processors (including "v2"), 2015

Notes

References

{{DEFAULTSORT:Xop Instruction Set X86 instructions SIMD computing AMD technologies