Transactional Synchronization Extensions (TSX), also called Transactional Synchronization Extensions New Instructions (TSX-NI), is an extension to the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...
instruction set architecture
In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
(ISA) that adds hardware
transactional memory In computer science and engineering, transactional memory attempts to simplify concurrent programming by allowing a group of load and store instructions to execute in an atomic way. It is a concurrency control mechanism analogous to database transa ...
support, speeding up execution of multi-threaded software through lock elision. According to different benchmarks, TSX/TSX-NI can provide around 40% faster applications execution in specific workloads, and 4–5 times more database
transactions per second
In a very generic sense, the term transactions per second (TPS) refers to the number of atomic actions performed by certain entity per second. In a more restricted view, the term is usually used by DBMS
In computing, a database is an organized ...
(TPS).
TSX/TSX-NI was documented by
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
in February 2012, and debuted in June 2013 on selected Intel
microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
s based on the
Haswell microarchitecture. Haswell processors below 45xx as well as R-series and K-series (with unlocked multiplier)
SKUs do not support TSX/TSX-NI.
In August 2014, Intel announced a bug in the TSX/TSX-NI implementation on current steppings of Haswell, Haswell-E, Haswell-EP and early
Broadwell CPUs, which resulted in disabling the TSX/TSX-NI feature on affected CPUs via a
microcode
In processor design, microcode (μcode) is a technique that interposes a layer of computer organization between the central processing unit (CPU) hardware and the programmer-visible instruction set architecture of a computer. Microcode is a laye ...
update.
In 2016, a
side-channel timing attack
In cryptography, a timing attack is a side-channel attack in which the attacker attempts to compromise a cryptosystem by analyzing the time taken to execute cryptographic algorithms. Every logical operation in a computer takes time to execute, and ...
was found by abusing the way TSX/TSX-NI handles transactional faults (i.e.
page fault
In computing, a page fault (sometimes called PF or hard fault) is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to t ...
s) in order to break
kernel address space layout randomization (KASLR) on all major operating systems.
In 2021, Intel released a microcode update that disabled the TSX/TSX-NI feature on CPU generations from
Skylake to
Coffee Lake
Coffee Lake is Intel's codename for its eighth generation Core microprocessor family, announced on September 25, 2017. It is manufactured using Intel's second 14 nm process node refinement. Desktop Coffee Lake processors introduced i5 and i ...
, as a mitigation for discovered security issues.
Support for TSX/TSX-NI emulation is provided as part of the Intel Software Development Emulator. There is also experimental support for TSX/TSX-NI emulation in a
QEMU
QEMU is a free and open-source emulator (Quick EMUlator). It emulates the machine's processor through dynamic binary translation and provides a set of different hardware and device models for the machine, enabling it to run a variety of guest ...
fork.
Features
TSX/TSX-NI provides two software interfaces for designating code regions for transactional execution. Hardware Lock Elision (HLE) is an instruction prefix-based interface designed to be backward compatible with processors without TSX/TSX-NI support. Restricted Transactional Memory (RTM) is a new instruction set interface that provides greater flexibility for programmers.
TSX/TSX-NI enables
optimistic execution of transactional code regions. The hardware monitors multiple threads for conflicting memory accesses, while aborting and rolling back transactions that cannot be successfully completed. Mechanisms are provided for software to detect and handle failed transactions.
In other words, lock elision through transactional execution uses memory transactions as a fast path where possible, while the slow (fallback) path is still a normal lock.
Hardware Lock Elision
Hardware Lock Elision (HLE) adds two new instruction prefixes,
XACQUIRE
and
XRELEASE
. These two prefixes reuse the
opcodes
In computing, an opcode (abbreviated from operation code, also known as instruction machine code, instruction code, instruction syllable, instruction parcel or opstring) is the portion of a machine language instruction that specifies the operat ...
of the existing
REPNE
/
REPE
prefixes (
F2H
/
F3H
). On processors that do not support HLE,
REPNE
/
REPE
prefixes are ignored on instructions for which the
XACQUIRE
/
XRELEASE
are valid, thus enabling backward compatibility.
The
XACQUIRE
prefix hint can only be used with the following instructions with an explicit
LOCK
prefix:
ADD
,
ADC
,
AND
,
BTC
,
BTR
,
BTS
,
CMPXCHG
,
CMPXCHG8B
,
DEC
,
INC
,
NEG
,
NOT
,
OR
,
SBB
,
SUB
,
XOR
,
XADD
, and
XCHG
. The
XCHG
instruction can be used without the
LOCK
prefix as well.
The
XRELEASE
prefix hint can be used both with the instructions listed above, and with the
MOV mem, reg
and
MOV mem, imm
instructions.
HLE allows optimistic execution of a critical section by skipping the write to a lock, so that the lock appears to be free to other threads. A failed transaction results in execution restarting from the
XACQUIRE
-prefixed instruction, but treating the instruction as if the
XACQUIRE
prefix were not present.
Restricted Transactional Memory
Restricted Transactional Memory (RTM) is an alternative implementation to HLE which gives the programmer the flexibility to specify a fallback code path that is executed when a transaction cannot be successfully executed. Unlike HLE, RTM is not backward compatible with processors that do not support it. For backward compatibility, programs are required to detect support for RTM in the CPU before using the new instructions.
RTM adds three new instructions:
XBEGIN
,
XEND
and
XABORT
. The
XBEGIN
and
XEND
instructions mark the start and the end of a transactional code region; the
XABORT
instruction explicitly aborts a transaction. Transaction failure redirects the processor to the fallback code path specified by the
XBEGIN
instruction, with the abort status returned in the
EAX
register.
XTEST
instruction
TSX/TSX-NI provides a new
XTEST
instruction that returns whether the processor is executing a transactional region. This instruction is supported by the processor if it supports HLE or RTM or both.
TSX Suspend Load Address Tracking
TSX/TSX-NI Suspend Load Address Tracking (TSXLDTRK) is an instruction set extension that allows to temporarily disable tracking loads from memory in a section of code within a transactional region. This feature extends HLE and RTM, and its support in the processor must be detected separately.
TSXLDTRK introduces two new instructions,
XSUSLDTRK
and
XRESLDTRK
, for suspending and resuming load address tracking, respectively. While the tracking is suspended, any loads from memory will not be added to the transaction read set. This means that, unless these memory locations were added to the transaction read or write sets outside the suspend region, writes at these locations by other threads will not cause transaction abort. Suspending load address tracking for a portion of code within a transactional region allows to reduce the amount of memory that needs to be tracked for read-write conflicts and therefore increase the probability of successful commit of the transaction.
Implementation
Intel's TSX/TSX-NI specification describes how the transactional memory is exposed to programmers, but withholds details on the actual transactional memory implementation.
Intel specifies in its developer's and optimization manuals that Haswell maintains both read-sets and write-sets at the granularity of a cache line, tracking addresses in the L1 data cache of the processor.
Intel also states that data conflicts are detected through the
cache coherence
In computer architecture, cache coherence is the uniformity of shared resource data that ends up stored in multiple local caches. When clients in a system maintain caches of a common memory resource, problems may arise with incoherent data, whi ...
protocol.
Haswell's L1 data cache has an associativity of eight. This means that in this implementation, a transactional execution that writes to nine distinct locations mapping to the same cache set will abort. However, due to micro-architectural implementations, this does not mean that fewer accesses to the same set are guaranteed to never abort. Additionally, in CPU configurations with
Hyper-Threading Technology
Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multipl ...
, the L1 cache is shared between the two threads on the same core, so operations in a sibling logical processor of the same core can cause evictions.
Independent research points into Haswell’s transactional memory most likely being a deferred update system using the per-core caches for transactional data and register checkpoints.
In other words, Haswell is more likely to use the cache-based transactional memory system, as it is a much less risky implementation choice. On the other hand, Intel's
Skylake or later may combine this cache-based approach with ''memory ordering buffer'' (MOB) for the same purpose, possibly also providing multi-versioned transactional memory that is more amenable to
speculative multithreading
Thread Level Speculation (TLS), also known as Speculative Multithreading, or Speculative Parallelization, is a technique to speculatively execute a section of computer code that is anticipated to be executed later in parallel with the normal exec ...
.
History and bugs
In August 2014, Intel announced that a bug exists in the TSX/TSX-NI implementation on Haswell, Haswell-E, Haswell-EP and early Broadwell CPUs, which resulted in disabling the TSX/TSX-NI feature on affected CPUs via a microcode update.
The bug was fixed in F-0 steppings of the vPro-enabled Core M-5Y70 Broadwell CPU in November 2014.
The bug was found and then reported during a diploma thesis in the School of Electrical and Computer Engineering of the
National Technical University of Athens
The National (Metsovian) Technical University of Athens (NTUA; el, Εθνικό Μετσόβιο Πολυτεχνείο, ''National Metsovian Polytechnic''), sometimes known as Athens Polytechnic, is among the oldest higher education institution ...
.
In October 2018, Intel disclosed a TSX/TSX-NI memory ordering issue found in some
Skylake processors. As a result of a microcode update, HLE support was disabled in the affected CPUs, and RTM was mitigated by sacrificing one performance counter when used outside of Intel
SGX mode or System Management Mode (
SMM). System software would have to either effectively disable RTM or update performance monitoring tools not to use the affected performance counter.
In June 2021, Intel published a microcode update that further disables TSX/TSX-NI on various Xeon and Core processor models from
Skylake through
Coffee Lake
Coffee Lake is Intel's codename for its eighth generation Core microprocessor family, announced on September 25, 2017. It is manufactured using Intel's second 14 nm process node refinement. Desktop Coffee Lake processors introduced i5 and i ...
and
Whiskey Lake as a mitigation for TSX Asynchronous Abort (TAA) vulnerability. Earlier mitigation for memory ordering issue was removed. By default, with the updated microcode, the processor would still indicate support for RTM but would always abort the transaction. System software is able to detect this mode of operation and mask support for TSX/TSX-NI from the
CPUID
instruction, preventing detection of TSX/TSX-NI by applications. System software may also enable the "Unsupported Software Development Mode", where RTM is fully active, but in this case RTM usage may be subject to the issues described earlier, and therefore this mode should not be enabled on production systems. On some systems RTM can't be re-enabled when SGX is active. HLE is always disabled.
According to Intel 64 and IA-32 Architectures Software Developer's Manual from May 2020, Volume 1, Chapter 2.5 Intel Instruction Set Architecture And Features Removed,
HLE has been removed from Intel products released in 2019 and later. RTM is not documented as removed. However, Intel 10th generation
Comet Lake
Comet Lake is Intel's codename for its 10th generation Core microprocessors. They are manufactured using Intel's third 14 nm Skylake process refinement, succeeding the Whiskey Lake U-series mobile processor and Coffee Lake desktop proces ...
and
Ice Lake CPUs, which were released in 2020, do not support TSX/TSX-NI,
including both HLE and RTM. Engineering versions of Comet Lake processors were still retaining TSX/TSX-NI support.
In Intel Architecture Instruction Set Extensions Programming Reference revision 41 from October 2020,
a new TSXLDTRK instruction set extension was documented and slated for inclusion in the upcoming
Sapphire Rapids
Sapphire Rapids is a List of Intel codenames, codename for Intel's server (fourth generation Xeon Scalable) and workstation processors based on 7 nm process, Intel 7.
Sapphire Rapids was intended as part of the Eagle Stream server platform. In a ...
processors.
See also
*
Advanced Synchronization Facility
Advanced Synchronization Facility (ASF) is a proposed extension to the x86-64 instruction set architecture that adds hardware transactional memory support. It was introduced by AMD; the latest specification was dated March 2009. , it was still i ...
– AMD's competing technology
References
Further reading
* . Software-based improvements to hardware lock-elision in Intel TSX.
External links
Presentation from IDF 2012(PDF)
Adding lock elision to Linux Linux Plumbers Conference 2012 (PDF)
Lock elision in the GNU C library LWN.net
LWN.net is a computing webzine with an emphasis on free software and software for Linux and other Unix-like operating systems. It consists of a weekly issue, separate stories which are published most days, and threaded discussion attached to ...
, January 30, 2013, by Andi Kleen
TSX Optimization Guide Chapter 12 (PDF)
Software Developers Manual Volume 1, Chapter 2.5 (PDF)
Web Resources about Intel Transactional Synchronization Extensionsx86, microcode: BUG: microcode update that changes x86_capability LKML
The Linux kernel mailing list (LKML) is the main electronic mailing list for Linux kernel development, where the majority of the announcements, discussions, debates, and flame wars over the kernel take place. Many other mailing lists exist to ...
, September 2014 (there is also anothe
similar bug report
Intel microcode Gentoo, September 19, 2015
{{Multimedia extensions
Computer-related introductions in 2012
X86 instructions
Parallel computing
Transactional memory
Transaction processing
Concurrency control
Hardware bugs