AArch64, also known as ARM64, is a 64-bit version of the

ARM architecture family ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of RISC instruction set architectures (ISAs) for computer processors. Arm Holdings develops the ISAs and lic ...

, a widely used set of computer processor designs. It was introduced in 2011 with the ARMv8 architecture and later became part of the ARMv9 series. AArch64 allows processors to handle more memory and perform faster calculations than earlier

32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in a maximum of 32- bit units. Compared to smaller bit widths, 32-bit computers can perform la ...

versions. It is designed to work alongside the older 32-bit mode, known as AArch32, allowing compatibility with a wide range of software. Devices that use AArch64 include

smartphone A smartphone is a mobile phone with advanced computing capabilities. It typically has a touchscreen interface, allowing users to access a wide range of applications and services, such as web browsing, email, and social media, as well as multi ...

s, tablets,

personal computer A personal computer, commonly referred to as PC or computer, is a computer designed for individual use. It is typically used for tasks such as Word processor, word processing, web browser, internet browsing, email, multimedia playback, and PC ...

s, and servers. The AArch64 architecture has continued to evolve through updates that improve performance, security, and support for advanced computing tasks.

AArch64 Execution state

In ARMv8-A, ARMv8-R, and ARMv9-A, an "Execution state" defines key characteristics of the processor’s environment. This includes the number of bits used in the primary

processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-onl ...

s, the supported

instruction set In computer science, an instruction set architecture (ISA) is an abstract model that generally defines how software controls the CPU in a computer or a family of computers. A device or program that executes instructions described by that ISA, s ...

s, and other aspects of the processor's execution environment. These versions of the ARM architecture support two Execution states: the 64-bit AArch64 state and the 32-bit AArch32 state.

Naming conventions

* 64-bit: ** Execution state: AArch64 ** Instruction sets: A64 * 32-bit: ** Execution state: AArch32 ** Instruction sets: A32 + T32 ** Example: ARMv8-R, Cortex-A32

AArch64 features

* New instruction set, A64: ** Has 31 general-purpose 64-bit registers ** Has dedicated zero or stack pointer (SP) register (depending on instruction) ** The program counter (PC) is no longer directly accessible as a register ** Instructions are still 32 bits long and mostly the same as A32 (with LDM/STM instructions and most conditional execution dropped) *** Has paired loads/stores (in place of LDM/STM) *** No predication for most instructions (except branches) ** Most instructions can take 32-bit or 64-bit arguments ** Addresses assumed to be 64-bit * Advanced

SIMD Single instruction, multiple data (SIMD) is a type of parallel computer, parallel processing in Flynn's taxonomy. SIMD describes computers with multiple processing elements that perform the same operation on multiple data points simultaneousl ...

(Neon) enhanced: ** Has 32 × 128-bit registers (up from 16), also accessible via VFPv4 ** Supports

double-precision floating-point format Double-precision floating-point format (sometimes called FP64 or float64) is a floating-point number format, usually occupying 64 bits in computer memory; it represents a wide range of numeric values by using a floating radix point. Double pre ...

** Fully

IEEE 754 The IEEE Standard for Floating-Point Arithmetic (IEEE 754) is a technical standard for floating-point arithmetic originally established in 1985 by the Institute of Electrical and Electronics Engineers (IEEE). The standard #Design rationale, add ...

compliant ** AES encrypt/decrypt and SHA-1/SHA-2 hashing instructions also use these registers * A new exception system: ** Fewer banked registers and modes * Memory translation from 48-bit virtual addresses based on the existing Large Physical Address Extension (LPAE), which was designed to be easily extended to 64-bit Extension: Data gathering hint (ARMv8.0-DGH). AArch64 was introduced in ARMv8-A and is included in subsequent versions of ARMv8-A, and in all versions of ARMv9-A. It was also introduced in ARMv8-R as an option, after its introduction in ARMv8-A; it is not included in ARMv8-M.

A64 instruction formats

The main opcode for selecting which group an A64 instruction belongs to is at bits 25–28.

ARM-A (application architecture)

Announced in October 2011, ARMv8-A represents a fundamental change to the ARM architecture. It adds an optional 64-bit Execution state, named "AArch64", and the associated new "A64" instruction set, in addition to a 32-bit Execution state, "AArch32", supporting the 32-bit "A32" (original 32-bit ARM) and "T32" (Thumb/Thumb-2) instruction sets. The latter instruction sets provide user-space compatibility with the existing 32-bit ARMv7-A architecture. ARMv8-A allows 32-bit applications to be executed in a 64-bit OS, and a 32-bit OS to be under the control of a 64-bit hypervisor. ARM announced their Cortex-A53 and Cortex-A57 cores on 30 October 2012.

Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...

was the first to release an ARMv8-A compatible core (

Cyclone In meteorology, a cyclone () is a large air mass that rotates around a strong center of low atmospheric pressure, counterclockwise in the Northern Hemisphere and clockwise in the Southern Hemisphere as viewed from above (opposite to an ant ...

) in a consumer product ( iPhone 5S). AppliedMicro, using an FPGA, was the first to demo ARMv8-A. The first ARMv8-A SoC from

Samsung Samsung Group (; stylised as SΛMSUNG) is a South Korean Multinational corporation, multinational manufacturing Conglomerate (company), conglomerate headquartered in the Samsung Town office complex in Seoul. The group consists of numerous a ...

is the Exynos 5433 used in the Galaxy Note 4, which features two clusters of four Cortex-A57 and Cortex-A53 cores in a big.LITTLE configuration; but it will run only in AArch32 mode. ARMv8-A includes the VFPv3/v4 and advanced SIMD (Neon) as standard features in both AArch32 and AArch64. It also adds cryptography instructions supporting AES,

SHA-1 In cryptography, SHA-1 (Secure Hash Algorithm 1) is a hash function which takes an input and produces a 160-bit (20-byte) hash value known as a message digest – typically rendered as 40 hexadecimal digits. It was designed by the United States ...

SHA-256 SHA-2 (Secure Hash Algorithm 2) is a set of cryptographic hash functions designed by the United States National Security Agency (NSA) and first published in 2001. They are built using the Merkle–Damgård construction, from a one-way compressi ...

and finite field arithmetic. An ARMv8-A processor can support one or both of AArch32 and AArch64; it may support AArch32 and AArch64 at lower Exception levels and only AArch64 at higher Exception levels. For example, the ARM Cortex-A32 supports only AArch32, the ARM Cortex-A34 supports only AArch64, and the ARM Cortex-A72 supports both AArch64 and AArch32. An ARMv9-A processor must support AArch64 at all Exception levels, and may support AArch32 at EL0.

ARMv8.1-A

In December 2014, ARMv8.1-A, an update with "incremental benefits over v8.0", was announced. The enhancements fell into two categories: changes to the instruction set, and changes to the exception model and memory translation. Instruction set enhancements included the following: * A set of AArch64 atomic read-write instructions. * Additions to the Advanced SIMD instruction set for both AArch32 and AArch64 to enable opportunities for some library optimizations: ** Signed Saturating Rounding Doubling Multiply Accumulate, Returning High Half. ** Signed Saturating Rounding Doubling Multiply Subtract, Returning High Half. ** The instructions are added in vector and scalar forms. * A set of AArch64 load and store instructions that can provide memory access order that is limited to configurable address regions. * The optional CRC instructions in v8.0 become a requirement in ARMv8.1. Enhancements for the exception model and memory translation system included the following: * A new Privileged Access Never (PAN) state bit provides control that prevents privileged access to user data unless explicitly enabled. * An increased VMID range for virtualization; supports a larger number of virtual machines. * Optional support for hardware update of the page table access flag, and the standardization of an optional, hardware updated, dirty bit mechanism. * The Virtualization Host Extensions (VHE). These enhancements improve the performance of Type 2 hypervisors by reducing the software overhead associated when transitioning between the Host and Guest operating systems. The extensions allow the Host OS to execute at EL2, as opposed to EL1, without substantial modification. * A mechanism to free up some translation table bits for operating system use, where the hardware support is not needed by the OS. * Top byte ignore for memory tagging.

ARMv8.2-A

ARMv8.2-A was announced in January 2016. Its enhancements fall into four categories: * Optional half-precision floating-point data processing (half-precision was already supported, but not for processing, just as a storage format.) * Memory model enhancements. * Introduction of Reliability, Availability and Serviceability Extension (RAS Extension). * Introduction of statistical profiling.

Scalable Vector Extension (SVE)

The Scalable Vector Extension (SVE) is "an optional extension to the ARMv8.2-A architecture and newer" developed specifically for vectorization of

high-performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...

scientific workloads. The specification allows for variable vector lengths to be implemented from 128 to 2048 bits. The extension is complementary to, and does not replace, the

NEON Neon is a chemical element; it has symbol Ne and atomic number 10. It is the second noble gas in the periodic table. Neon is a colorless, odorless, inert monatomic gas under standard conditions, with approximately two-thirds the density of ...

extensions. A 512-bit SVE variant has already been implemented on the Fugaku supercomputer using the Fujitsu A64FX ARM processor; this computer was the fastest supercomputer in the world for two years, from June 2020 to May 2022. A more flexible version, 2x256 SVE, was implemented by the AWS Graviton3 ARM processor. SVE is supported by GCC, with GCC 8 supporting automatic vectorization and GCC 10 supporting C intrinsics. ,

LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...

and

clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...

support C and IR intrinsics. ARM's own fork of LLVM supports auto-vectorization.

ARMv8.3-A

In October 2016, ARMv8.3-A was announced. Its enhancements fell into six categories: * Pointer authentication (PAC) (AArch64 only); mandatory extension (based on a new block cipher, QARMA) to the architecture (compilers need to exploit the security feature, but as the instructions are in NOP space, they are backwards compatible albeit providing no extra security on older chips). * Nested virtualization (AArch64 only). * Advanced SIMD

complex number In mathematics, a complex number is an element of a number system that extends the real numbers with a specific element denoted , called the imaginary unit and satisfying the equation i^= -1; every complex number can be expressed in the for ...

support (AArch64 and AArch32); e.g. rotations by multiples of 90 degrees. * New FJCVTZS (Floating-point

JavaScript JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior. Web browsers have ...

Convert to Signed fixed-point, rounding toward Zero) instruction. * A change to the memory consistency model (AArch64 only); to support the (non-default) weaker RCpc (Release Consistent processor consistent) model of

C++11 C++11 is a version of a joint technical standard, ISO/IEC 14882, by the International Organization for Standardization (ISO) and International Electrotechnical Commission (IEC), for the C++ programming language. C++11 replaced the prior vers ...

/ C11 (the default C++11/C11 consistency model was already supported in previous ARMv8). * ID mechanism support for larger system-visible caches (AArch64 and AArch32). ARMv8.3-A architecture is now supported by (at least) the GCC 7 compiler.

ARMv8.4-A

In November 2017, ARMv8.4-A was announced. Its enhancements fell into these categories: * "SHA3 / SHA512 / SM3 / SM4 crypto extensions." I.e. optional instructions. * Improved virtualization support. * Memory Partitioning and Monitoring (MPAM) capabilities. * A new Secure EL2 state and Activity Monitors. * Signed and unsigned integer

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...

(SDOT and UDOT) instructions.

ARMv8.5-A and ARMv9.0-A

In September 2018, ARMv8.5-A was announced. Its enhancements fell into these categories: * Memory Tagging Extension (MTE) (AArch64). * Branch Target Indicators (BTI) (AArch64) to reduce "the ability of an attacker to execute arbitrary code". Like pointer authentication, the relevant instructions are no-ops on earlier versions of ARMv8-A. * Random Number Generator instructions – "providing Deterministic and True Random Numbers conforming to various National and International Standards". On 2 August 2019,

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

announced Android would adopt Memory Tagging Extension (MTE). In March 2021, ARMv9-A was announced. ARMv9-A's baseline is all the features from ARMv8.5. ARMv9-A also adds: * Scalable Vector Extension 2 (SVE2). SVE2 builds on SVE's scalable vectorization for increased fine-grain Data Level Parallelism (DLP), to allow more work done per instruction. SVE2 aims to bring these benefits to a wider range of software including DSP and multimedia SIMD code that currently use

Neon Neon is a chemical element; it has symbol Ne and atomic number 10. It is the second noble gas in the periodic table. Neon is a colorless, odorless, inert monatomic gas under standard conditions, with approximately two-thirds the density of ...

. The

Clang Clang () is a compiler front end for the programming languages C, C++, Objective-C, Objective-C++, and the software frameworks OpenMP, OpenCL, RenderScript, CUDA, SYCL, and HIP. It acts as a drop-in replacement for the GNU Compiler ...

9.0 and GCC 10.0 development codes were updated to support SVE2. * Transactional Memory Extension (TME). Following the x86 extensions, TME brings support for Hardware Transactional Memory (HTM) and Transactional Lock Elision (TLE). TME aims to bring scalable concurrency to increase coarse-grained Thread Level Parallelism (TLP), to allow more work done per thread. The

9.0 and GCC 10.0 development codes were updated to support TME. * Confidential Compute Architecture (CCA).

ARMv8.6-A and ARMv9.1-A

In September 2019, ARMv8.6-A was announced. Its enhancements fell into these categories: * General Matrix Multiply (GEMM). * Bfloat16 format support. * SIMD matrix manipulation instructions, BFDOT, BFMMLA, BFMLAL and BFCVT. * Enhancements for virtualization, system management and security. * And the following extensions (that

11 already added support for): ** Enhanced Counter Virtualization (ARMv8.6-ECV). ** Fine-Grained Traps (ARMv8.6-FGT). ** Activity Monitors virtualization (ARMv8.6-AMU). For example, fine-grained traps, Wait-for-Event (WFE) instructions, EnhancedPAC2 and FPAC. The bfloat16 extensions for SVE and Neon are mainly for deep learning use.

ARMv8.7-A and ARMv9.2-A

In September 2020, ARMv8.7-A was announced. Its enhancements fell into these categories: * Scalable Matrix Extension (SME)(ARMv9.2 only). SME adds new features to process matrices efficiently, such as: ** Matrix tile storage. ** On-the-fly matrix transposition. ** Load/store/insert/extract tile vectors. ** Matrix outer product of SVE vectors. ** "Streaming mode" SVE. * Enhanced support for PCIe hot plug (AArch64). * Atomic 64-byte load and stores to accelerators (AArch64). * Wait For Interrupt (WFI) and Wait For Event (WFE) with timeout (AArch64). * Branch-Record recording (ARMv9.2 only). * Call Stack Recorder

ARMv8.8-A and ARMv9.3-A

In September 2021, ARMv8.8-A and ARMv9.3-A were announced. Their enhancements fell into these categories: * Non-maskable interrupts (AArch64). * Instructions to optimize memcpy() and memset() style operations (AArch64). * Enhancements to PAC (AArch64). * Hinted conditional branches (AArch64).

15 supports ARMv8.8-A and ARMv9.3-A.

ARMv8.9-A and ARMv9.4-A

In September 2022, ARMv8.9-A and ARMv9.4-A were announced, including: * Virtual Memory System Architecture (VMSA) enhancements. ** Permission indirection and overlays. ** Translation hardening. ** 128-bit translation tables (ARMv9 only). * Scalable Matrix Extension 2 (SME2) (ARMv9 only). ** Multi-vector instructions. ** Multi-vector predicates. ** 2b/4b weight compression. ** 1b binary networks. ** Range Prefetch. * Guarded Control Stack (GCS) (ARMv9 only). * Confidential Computing. ** Memory Encryption Contexts. ** Device Assignment.

ARMv9.5-A

In October 2023, ARMv9.5-A was announced, including: * FP8 support (E5M2 and E4M3 formats) added to: ** SME2 ** SVE2 ** Advanced SIMD (Neon) * Live migration of Virtual Machines using Hardware Dirty state tracking structures (FEAT_HDBSS) * Checked Point Arithmetic * Support for using a combination of the PC and SP as the modifier when generating or checking Pointer Authentication codes. * Support for Realm Management Extension (RME) enabled designs, support for non-secure only in the Granule Protection Tables and the ability to disable certain Physical Address Spaces (PAS). * EL3 configuration write-traps. * Breakpoint support for address range and mismatch triggering without the need for linking. * Support for efficiently delegating SErrors from EL3 to EL2 or EL1.

ARMv9.6-A

In October 2024, ARMv9.6-A was announced, including: * Improved SME efficiency with structured sparsity and quarter tile operations * MPAM Domains to better support shared-memory computer systems on multi-chiplet and multi-chip systems * Hypervisor memory control for Trace and Statistical Profiling on virtual machines * Improved Caching and Data Placement * Granular Data Isolation for Confidential Compute * Bitwise locking of EL1 system registers * Improved scaling of Granular Protection Tables (GPT) for large memory systems * New SVE instructions for expand/compact and finding first/last active element * Additional unprivileged load and store instructions to enable OS to interact with application memory * New compare and branch instruction * Injection of Undefined-Instruction exceptions from EL3

ARM-R (real-time architecture)

The ''ARM-R'' architecture, specifically the Armv8-R profile, is designed to address the needs of real-time applications, where predictable and deterministic behavior is essential. This profile focuses on delivering high performance, reliability, and efficiency in embedded systems where real-time constraints are critical. With the introduction of optional AArch64 support in the Armv8-R profile, the real-time capabilities have been further enhanced. The Cortex-R82 is the first processor to implement this extended support, bringing several new features and improvements to the real-time domain.

Key Features of Armv8-R with AArch64 Support

# AArch64 Instruction Set (A64): #* The A64 instruction set in the Cortex-R82 provides 64-bit data handling and operations, which improves performance for certain computational tasks and enhances overall system efficiency. #* Example Instruction: ADD X0, X1, X2 adds the values in 64-bit registers X1 and X2 and stores the result in X0. This 64-bit operation allows for larger and more complex calculations compared to the 32-bit operations of the previous A32 instruction set. # Enhanced Memory Management: #* Memory Barrier Instructions: The Cortex-R82 introduces improved memory barrier instructions to ensure proper ordering of memory operations, which is critical in real-time systems where the timing of memory operations must be strictly controlled. #** Data Synchronization Barrier (DSB): Ensures that all data accesses before the barrier are completed before continuing with subsequent operations. #** Data Memory Barrier (DMB): Guarantees that all memory accesses before the barrier are completed before any memory accesses after the barrier can proceed. #* Example: In a real-time automotive control system, DSB might be used to ensure that sensor data is fully written to memory before the system proceeds with processing or decision-making, preventing data corruption or inconsistencies. # Improved Address Space: #* 64-bit Addressing: AArch64 allows the Cortex-R82 to address a much larger memory space compared to its 32-bit predecessors, making it suitable for applications requiring extensive memory. #* Example: A complex industrial automation system can utilize the expanded address space to manage large data sets and buffers more efficiently, improving system performance and capability. # Real-Time Performance Enhancements: #* Interrupt Handling: With AArch64 support, the Cortex-R82 can handle interrupts with lower latency and improved predictability, crucial for real-time operations. #* Example: In a robotics application, the Cortex-R82's enhanced interrupt handling can ensure timely responses to external stimuli, such as changes in sensor data or control commands.

References

{{Reflist

External links

Arm Developer
Computer-related introductions in 2011 ARM architecture 64-bit computers