Bit Manipulation Instruction Sets
   HOME

TheInfoList



OR:

Bit manipulation instructions sets (BMI sets) are extensions to the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
instruction set architecture In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, ...
for
microprocessor A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
s from
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
and AMD. The purpose of these instruction sets is to improve the speed of bit manipulation. All the instructions in these sets are non-
SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...
and operate only on general-purpose registers. There are two sets published by Intel: BMI (now referred to as BMI1) and BMI2; they were both introduced with the Haswell microarchitecture with BMI1 matching features offered by AMD's ABM instruction set and BMI2 extending them. Another two sets were published by AMD: ABM (''Advanced Bit Manipulation'', which is also a subset of SSE4a implemented by Intel as part of SSE4.2 and BMI1), and TBM (''Trailing Bit Manipulation'', an extension introduced with Piledriver-based processors as an extension to BMI1, but dropped again in
Zen Zen (; from Chinese: ''Chán''; in Korean: ''Sŏn'', and Vietnamese: ''Thiền'') is a Mahayana Buddhist tradition that developed in China during the Tang dynasty by blending Indian Mahayana Buddhism, particularly Yogacara and Madhyamaka phil ...
-based processors).


ABM (Advanced Bit Manipulation)

AMD was the first to introduce the instructions that now form Intel's BMI1 as part of its ABM (''Advanced Bit Manipulation'') instruction set, then later added support for Intel's new BMI2 instructions. AMD today advertises the availability of these features via Intel's BMI1 and BMI2 cpuflags and instructs programmers to target them accordingly. While Intel considers POPCNT as part of SSE4.2 and LZCNT as part of BMI1, both Intel and AMD advertise the presence of these two instructions individually. POPCNT has a separate CPUID flag of the same name, and Intel and AMD use AMD's ABM flag to indicate LZCNT support (since LZCNT combined with BMI1 and BMI2 completes the expanded ABM instruction set). LZCNT is related to the Bit Scan Reverse (BSR) instruction, but sets the ZF (if the result is zero) and CF (if the source is zero) flags rather than setting the ZF (if the source is zero). Also, it produces a defined result (the source operand size in bits) if the source operand is zero. For a non-zero argument, sum of LZCNT and BSR results is argument bit width minus 1 (for example, if 32-bit argument is 0x000f0000, LZCNT gives 12, and BSR gives 19). The encoding of LZCNT is such that if ABM is not supported, then the BSR instruction is executed instead.


BMI1 (Bit Manipulation Instruction Set 1)

The instructions below are those enabled by the BMI bit in CPUID. Intel officially considers LZCNT as part of BMI, but advertises LZCNT support using the ABM CPUID feature flag. BMI1 is available in AMD's
Jaguar The jaguar (''Panthera onca'') is a large felidae, cat species and the only extant taxon, living member of the genus ''Panthera'' that is native to the Americas. With a body length of up to and a weight of up to , it is the biggest cat spe ...
, Piledriver and newer processors, and in Intel's Haswell and newer processors. TZCNT is almost identical to the Bit Scan Forward (BSF) instruction, but sets the ZF (if the result is zero) and CF (if the source is zero) flags rather than setting the ZF (if the source is zero). For a non-zero argument, the result of TZCNT and BSF is equal. As with LZCNT, the encoding of TZCNT is such that if BMI1 is not supported, then the BSF instruction is executed instead.


BMI2 (Bit Manipulation Instruction Set 2)

Intel introduced BMI2 together with BMI1 in its line of Haswell processors. Only AMD has produced processors supporting BMI1 without BMI2; BMI2 is supported by AMDs
Excavator Excavators are heavy equipment (construction), heavy construction equipment primarily consisting of a backhoe, boom, dipper (or stick), Bucket (machine part), bucket, and cab on a rotating platform known as the "house". The modern excavator's ...
architecture and newer.


Parallel bit deposit and extract

The PDEP and PEXT instructions are new generalized bit-level compress and expand instructions. They take two inputs; one is a source, and the other is a selector. The selector is a bitmap selecting the bits that are to be packed or unpacked. PEXT copies selected bits from the source to contiguous low-order bits of the destination; higher-order destination bits are cleared. PDEP does the opposite for the selected bits: contiguous low-order bits are copied to selected bits of the destination; other destination bits are cleared. This can be used to extract any bitfield of the input, and even do a lot of bit-level shuffling that previously would have been expensive. While what these instructions do is similar to bit level gather-scatter SIMD instructions, PDEP and PEXT instructions (like the rest of the BMI instruction sets) operate on general-purpose registers. The instructions are available in 32-bit and 64-bit versions. An example using arbitrary source and selector in 32-bit mode is: AMD processors before Zen 3 that implement PDEP and PEXT do so in microcode, with a latency of 18 cycles rather than (Zen 3) 3 cycles. As a result it is often faster to use other instructions on these processors.


TBM (Trailing Bit Manipulation)

TBM consists of instructions complementary to the instruction set started by BMI1; their complementary nature means they do not necessarily need to be used directly but can be generated by an optimizing compiler when supported. AMD introduced TBM together with BMI1 in its Piledriver line of processors; later AMD Jaguar and Zen-based processors do not support TBM. No Intel processors (at least through Alder Lake) support TBM.


Supporting CPUs

*
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
** Intel Nehalem processors and newer (like
Sandy Bridge Sandy Bridge is the List of Intel codenames, codename for Intel's 32 nm process, 32 nm microarchitecture used in the second generation of the Intel Core, Intel Core processors (Intel Core i7, Core i7, Intel Core i5, i5, Intel Core i3, i3). The Sa ...
, Ivy Bridge) (POPCNT supported) ** Intel Silvermont processors (POPCNT supported) ** Intel Haswell processors and newer (like Skylake, Broadwell) (ABM, BMI1 and BMI2 supported) * AMD ** K10-based processors (ABM supported) ** "Cat" low-power processors *** Bobcat-based processors (ABM supported) *** Jaguar-based processors and newer (ABM and BMI1 supported) *** Puma-based processors and newer (ABM and BMI1 supported) ** "Heavy Equipment" processors *** Bulldozer-based processors (ABM supported) *** Piledriver-based processors (ABM, BMI1 and TBM supported) *** Steamroller-based processors (ABM, BMI1 and TBM supported) *** Excavator-based processors and newer (ABM, BMI1, BMI2 and TBM supported; microcoded PEXT and PDEP) ** Zen-based, Zen+-based, and Zen 2-based processors (ABM, BMI1 and BMI2 supported; microcoded PEXT and PDEP) ** Zen 3 processors and newer (ABM, BMI1 and BMI2 supported; full hardware implementation) Note that instruction extension support means the processor is capable of executing the supported instructions for software compatibility purposes. The processor might not perform well doing so. For example, Excavator through Zen 2 processors implement PEXT and PDEP instructions using microcode resulting in the instructions executing significantly slower than the same behaviour recreated using other instructions. (A software method called "zp7" is, in fact, faster on these machines.) For optimum performance it is recommended that compiler developers choose to use individual instructions in the extensions based on architecture specific performance profiles rather than on extension availability.


See also

*
Advanced Vector Extensions Advanced Vector Extensions (AVX, also known as Gesher New Instructions and then Sandy Bridge New Instructions) are SIMD extensions to the x86 instruction set architecture for microprocessors from Intel and Advanced Micro Devices (AMD). They w ...
(AVX) *
AES instruction set An Advanced Encryption Standard instruction set (AES instruction set) is a set of instructions that are specifically designed to perform AES encryption and decryption operations efficiently. These instructions are typically found in modern process ...
*
CLMUL instruction set Carry-less Multiplication (CLMUL) is an extension to the x86 instruction set used by microprocessors from Intel and AMD which was proposed by Intel in March 2008 and made available in the Intel Westmere processors announced in early 2010. Mathema ...
* F16C *
FMA instruction set The FMA instruction set is an extension to the 128- and 256-bit Streaming SIMD Extensions instructions in the x86 microprocessor instruction set to perform fused multiply–add (FMA) operations. There are two variants: * FMA4 is supported in ...
* Intel ADX *
XOP instruction set The XOP (''eXtended Operations'') instruction set, announced by AMD on May 1, 2009, is an extension to the 128-bit SSE core instructions in the x86 and AMD64 instruction set for the Bulldozer processor core, which was released on October 12, 201 ...
*
Intel BCD opcodes The Intel BCD opcodes are a set of six x86 instruction (computer science), instructions that operate with binary-coded decimal numbers. The radix used for the representation of numbers in the x86 central processing unit, processors is 2. This is c ...
(also used for advanced bit manipulation techniques)


References


Further reading

*


External links


Intel Intrinsics Guide
{{Multimedia extensions X86 instructions AMD technologies