AArch64 or ARM64 is the 64-bit extension of the
ARM architecture family
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for central processing unit, computer ...
.
It was first introduced with the
Armv8-A
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...
architecture. Arm releases a new extension every year.
ARMv8.x and ARMv9.x extensions and features
Announced in October 2011,
ARMv8-A represents a fundamental change to the ARM architecture. It adds an optional 64-bit architecture, named "AArch64", and the associated new "A64" instruction set. AArch64 provides
user-space
A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour.
Kernel ...
compatibility with the existing 32-bit architecture ("AArch32" / ARMv7-A), and instruction set ("A32"). The 16-32bit Thumb instruction set is referred to as "T32" and has no 64-bit counterpart. ARMv8-A allows 32-bit applications to be executed in a 64-bit OS, and a 32-bit OS to be under the control of a 64-bit
hypervisor
A hypervisor (also known as a virtual machine monitor, VMM, or virtualizer) is a type of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is calle ...
.
ARM announced their Cortex-A53 and Cortex-A57 cores on 30 October 2012.
Apple
An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple fruit tree, trees are agriculture, cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, wh ...
was the first to release an ARMv8-A compatible core (
Cyclone
In meteorology, a cyclone () is a large air mass that rotates around a strong center of low atmospheric pressure, counterclockwise in the Northern Hemisphere and clockwise in the Southern Hemisphere as viewed from above (opposite to an anti ...
) in a consumer product (
iPhone 5S
The iPhone 5S (stylized and marketed as iPhone 5s) is a smartphone that was designed and marketed by Apple Inc. It is the seventh generation of the iPhone, succeeding the iPhone 5, and unveiled in September 2013, alongside the iPhone 5C.
Th ...
).
AppliedMicro
Applied Micro Circuits Corporation (also known as AppliedMicro, AMCC or APM) was a fabless semiconductor company designing Computer networking, network and Embedded processor, embedded Power ISA (including a Power ISA license), and server process ...
, using an
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term '' field-programmable''. The FPGA configuration is generally specified using a hardware de ...
, was the first to demo ARMv8-A.
The first ARMv8-A
SoC from
Samsung
The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
is the Exynos 5433 used in the
Galaxy Note 4
The Samsung Galaxy Note 4 is an Android smartphone developed and produced by Samsung Electronics. It was unveiled during a Samsung press conference at IFA Berlin on 3 September 2014 and was released globally in October 2014 as successor to t ...
, which features two clusters of four Cortex-A57 and Cortex-A53 cores in a
big.LITTLE
ARM big.LITTLE is a heterogeneous computing architecture developed by ARM Holdings, coupling relatively battery-saving and slower processor cores (''LITTLE'') with relatively more powerful and power-hungry ones (''big''). Typically, only one "s ...
configuration; but it will run only in AArch32 mode.
To both AArch32 and AArch64, ARMv8-A makes VFPv3/v4 and advanced SIMD (Neon) standard. It also adds cryptography instructions supporting
AES,
SHA-1
In cryptography, SHA-1 (Secure Hash Algorithm 1) is a cryptographically broken but still widely used hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecima ...
/
SHA-256
SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compression ...
and
finite field arithmetic In mathematics, finite field arithmetic is arithmetic in a finite field (a field containing a finite number of elements) contrary to arithmetic in a field with an infinite number of elements, like the field of rational numbers.
There are infinitel ...
.
Naming conventions
* 64 + 32 bit
** Architecture: AArch64
** Specification: ARMv8-A
** Instruction sets: A64 + A32
** Suffixes: v8-A
* 32 + 16 (Thumb) bit
** Architecture: AArch32
** Specification: ARMv8-R / ARMv7-A
** Instruction sets: A32 + T32
** Suffixes: -A32 / -R / v7-A
** Example: ARMv8-R, Cortex-A32
AArch64 features
* New instruction set, A64
** Has 31 general-purpose 64-bit registers.
** Has dedicated zero or stack pointer (SP) register (depending on instruction).
** The program counter (PC) is no longer directly accessible as a register.
** Instructions are still 32 bits long and mostly the same as A32 (with LDM/STM instructions and most conditional execution dropped).
*** Has paired loads/stores (in place of LDM/STM).
*** No
predication for most instructions (except branches).
** Most instructions can take 32-bit or 64-bit arguments.
** Addresses assumed to be 64-bit.
* Advanced
SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
(Neon) enhanced
** Has 32 × 128-bit registers (up from 16), also accessible via VFPv4.
** Supports
double-precision floating-point format
Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide dynamic range of numeric values by using a floating radix point.
Fl ...
.
** Fully
IEEE 754
The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard addressed many problems found i ...
compliant.
** AES encrypt/decrypt and SHA-1/SHA-2 hashing instructions also use these registers.
* A new exception system
** Fewer banked registers and modes.
* Memory translation from 48-bit virtual addresses based on the existing Large Physical Address Extension (LPAE), which was designed to be easily extended to 64-bit.
Extension: Data gathering hint (ARMv8.0-DGH)
AArch64 was introduced in ARMv8-A and is included in subsequent versions of ARMv8-A. It was also introduced in ARMv8-R as an option, after its introduction in ARMv8-A; it is not included in ARMv8-M.
Instruction formats
The main opcode for selecting which group an A64 instruction belongs to is at bits 25-28.
ARMv8.1-A
In December 2014, ARMv8.1-A, an update with "incremental benefits over v8.0", was announced. The enhancements fell into two categories: changes to the instruction set, and changes to the exception model and memory translation.
Instruction set enhancements included the following:
* A set of AArch64 atomic read-write instructions.
* Additions to the Advanced SIMD instruction set for both AArch32 and AArch64 to enable opportunities for some library optimizations:
** Signed Saturating Rounding Doubling Multiply Accumulate, Returning High Half.
** Signed Saturating Rounding Doubling Multiply Subtract, Returning High Half.
** The instructions are added in vector and scalar forms.
* A set of AArch64 load and store instructions that can provide memory access order that is limited to configurable address regions.
* The optional CRC instructions in v8.0 become a requirement in ARMv8.1.
Enhancements for the exception model and memory translation system included the following:
* A new Privileged Access Never (PAN) state bit provides control that prevents privileged access to user data unless explicitly enabled.
* An increased VMID range for virtualization; supports a larger number of virtual machines.
* Optional support for hardware update of the page table access flag, and the standardization of an optional, hardware updated, dirty bit mechanism.
* The Virtualization Host Extensions (VHE). These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification.
* A mechanism to free up some translation table bits for operating system use, where the hardware support is not needed by the OS.
*
Top byte ignore
A spinning top, or simply a top, is a toy with a squat body and a sharp point at the bottom, designed to be spun on its vertical axis, balancing on the tip due to the gyroscopic effect.
Once set in motion, a top will usually wobble for a few se ...
for
memory tagging
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembered, ...
.
ARMv8.2-A
In January 2016, ARMv8.2-A was announced. Its enhancements fell into four categories:
* Optional
half-precision floating-point data processing (half-precision was already supported, but not for processing, just as a storage format.)
* Memory model enhancements
* Introduction of
Reliability, Availability and Serviceability Extension (RAS Extension)
* Introduction of statistical profiling
Scalable Vector Extension (SVE)
The Scalable Vector Extension (SVE) is "an optional extension to the ARMv8.2-A architecture and newer" developed specifically for vectorization of
high-performance computing
High-performance computing (HPC) uses supercomputers and computer clusters to solve advanced computation problems.
Overview
HPC integrates systems administration (including network and security knowledge) and parallel programming into a mult ...
scientific workloads.
The specification allows for variable vector lengths to be implemented from 128 to 2048 bits. The extension is complementary to, and does not replace, the
NEON
Neon is a chemical element with the symbol Ne and atomic number 10. It is a noble gas. Neon is a colorless, odorless, inert monatomic gas under standard conditions, with about two-thirds the density of air. It was discovered (along with krypton ...
extensions.
A 512-bit SVE variant has already been implemented on the
Fugaku supercomputer using the
Fujitsu A64FX
The A64FX is a 64-bit ARM architecture microprocessor designed by Fujitsu. The processor is replacing the SPARC64 V as Fujitsu's processor for supercomputer applications. It powers the Fugaku supercomputer, the fastest supercomputer in the wor ...
ARM processor. It aims to be the world's highest-performing supercomputer with "the goal of beginning full operations around 2021."
SVE is supported by the
GCC compiler, with GCC 8 supporting automatic vectorization
and GCC 10 supporting C intrinsics. As of July 2020,
LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
and
clang
Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
support C and IR intrinsics. ARM's own fork of LLVM supports auto-vectorization.
ARMv8.3-A
In October 2016, ARMv8.3-A was announced. Its enhancements fell into six categories:
* Pointer authentication (AArch64 only); mandatory extension (based on a new block cipher,
QARMA QARMA (from Qualcomm ARM Authenticator) is a lightweight cryptography, lightweight Tweakable cipher, tweakable block cipher primarily known for its use in the ARMv8 architecture for protection of software as a cryptographic hash for the Pointer Auth ...
) to the architecture (compilers need to exploit the security feature, but as the instructions are in NOP space, they are backwards compatible albeit providing no extra security on older chips).
* Nested virtualization (AArch64 only)
* Advanced SIMD
complex number
In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the form ...
support (AArch64 and AArch32); e.g. rotations by multiples of 90 degrees.
* New FJCVTZS (Floating-point
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
Convert to Signed fixed-point, rounding toward Zero) instruction.
* A change to the memory consistency model (AArch64 only); to support the (non-default) weaker RCpc (Release Consistent processor consistent) model of
C++11
C++11 is a version of the ISO/IEC 14882 standard for the C++ programming language. C++11 replaced the prior version of the C++ standard, called C++03, and was later replaced by C++14. The name follows the tradition of naming language versions by ...
/
C11 C11, C.XI, C-11 or C.11 may refer to:
Transport
* C-11 Fleetster, a 1920s American light transport aircraft for use of the United States Assistant Secretary of War
* Fokker C.XI, a 1935 Dutch reconnaissance seaplane
* LET C-11, a license-build var ...
(the default C++11/C11 consistency model was already supported in previous ARMv8).
* ID mechanism support for larger system-visible caches (AArch64 and AArch32)
ARMv8.3-A architecture is now supported by (at least) the
GCC 7 compiler.
ARMv8.4-A
In November 2017, ARMv8.4-A was announced. Its enhancements fell into these categories:
* "SHA3 / SHA512 / SM3 /
SM4 crypto extensions"
* Improved virtualization support
* Memory Partitioning and Monitoring (MPAM) capabilities
* A new Secure EL2 state and Activity Monitors
* Signed and unsigned integer dot product (SDOT and UDOT) instructions.
ARMv8.5-A and ARMv9.0-A
In September 2018, ARMv8.5-A was announced. Its enhancements fell into these categories:
* Memory Tagging Extension (MTE)
* Branch Target Indicators (BTI) to reduce "the ability of an attacker to execute arbitrary code",
* Random Number Generator instructions – "providing Deterministic and True Random Numbers conforming to various National and International Standards"
On 2 August 2019,
Google
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
announced
Android would adopt Memory Tagging Extension (MTE).
In March 2021, ARMv9-A was announced. ARMv9-A's baseline is all the features from ARMv8.5. ARMv9-A also adds:
* Scalable Vector Extension 2 (SVE2). SVE2 builds on SVE's scalable vectorization for increased fine-grain
Data Level Parallelism (DLP), to allow more work done per instruction. SVE2 aims to bring these benefits to a wider range of software including DSP and multimedia SIMD code that currently use
Neon
Neon is a chemical element with the symbol Ne and atomic number 10. It is a noble gas. Neon is a colorless, odorless, inert monatomic gas under standard conditions, with about two-thirds the density of air. It was discovered (along with krypton ...
.
The
LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
/
Clang
Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
9.0 and
GCC 10.0 development codes were updated to support SVE2.
* Transactional Memory Extension (TME). Following
the x86 extensions, TME brings support for
Hardware Transactional Memory (HTM) and Transactional Lock Elision (TLE). TME aims to bring scalable concurrency to increase coarse-grained
Thread Level Parallelism (TLP), to allow more work done per thread.
The
LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
/
Clang
Clang is a compiler front end for the C, C++, Objective-C, and Objective-C++ programming languages, as well as the OpenMP, OpenCL, RenderScript, CUDA, and HIP frameworks. It acts as a drop-in replacement for the GNU Compiler Collection (GCC), ...
9.0 and
GCC 10.0 development codes were updated to support TME.
* Confidential Compute Architecture (CCA)
* Scalable Matrix Extension (SME). SME adds new features to process matrices efficiently, such as:
** Matrix tile storage
** On-the-fly matrix transposition
** Load/store/insert/extract tile vectors
** Matrix outer product of SVE vectors
** "Streaming mode" SVE
ARMv8.6-A and ARMv9.1-A
In September 2019, ARMv8.6-A was announced. Its enhancements fell into these categories:
* General Matrix Multiply (GEMM)
*
Bfloat16 format support
* SIMD matrix manipulation instructions, BFDOT, BFMMLA, BFMLAL and BFCVT
* enhancements for virtualization, system management and security
* and the following extensions (that
LLVM
LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate represen ...
11 already added support for):
** Enhanced Counter Virtualization (ARMv8.6-ECV)
** Fine-Grained Traps (ARMv8.6-FGT)
** Activity Monitors virtualization (ARMv8.6-AMU)
For example, fine-grained traps, Wait-for-Event (WFE) instructions, EnhancedPAC2 and FPAC. The Bfloat16 extensions for SVE and Neon are mainly for deep learning use.
ARMv8.7-A and ARMv9.2-A
In September 2020, ARMv8.7-A was announced. Its enhancements fell into these categories:
* Enhanced support for PCIe hot plug (AArch64)
* Atomic 64-byte load and stores to accelerators (AArch64)
* Wait For Instruction (WFI) and Wait For Event (WFE) with timeout (AArch64)
* Branch-Record recording (ARMv9.2 only)
ARMv8.8-A and ARMv9.3-A
In September 2021, ARMv8.8-A and ARMv9.3-A were announced. Their enhancements fell into these categories:
* Non-maskable interrupts (AArch64)
* Instructions to optimize memcpy() and memset() style operations (AArch64)
* Enhancements to PAC (AArch64)
* Hinted conditional branches (AArch64)
ARMv8.9-A and ARMv9.4-A
In September 2022, ARMv8.9-A and ARMv9.4-A were announced, including:
* 2022 Virtual Memory System Architecture (VMSA) enhancements
** Permission indirection and overlays
** Translation hardening
** 128-bit translation tables (ARMv9 only)
* SME2 (ARMv9 only)
** Multi-vector instructions
** Multi-vector predicates
** 2b/4b weight compression
** 1b binary networks
** Range Prefetch
* Guarded Control Stack (GCS) (ARMv9 only)
* Confidential Computing
** Memory Encryption Contexts
** Device Assignment
Armv8-R (real-time architecture)
Optional AArch64 support was added to the Armv8-R profile, with the first Arm core implementing it being the Cortex-R82. It adds the A64 instruction set, with some changes to the memory barrier instructions.
References
{{Reflist
Computer-related introductions in 2011
ARM architecture
Instruction set architectures
64-bit computers