ARM Cortex-A77
   HOME

TheInfoList



OR:

The ARM Cortex-A77 is a
central processing unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
implementing the ARMv8.2-A 64-bit
instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...
designed by ARM Holdings'
Austin Austin is the capital city of the U.S. state of Texas, as well as the seat and largest city of Travis County, with portions extending into Hays and Williamson counties. Incorporated on December 27, 1839, it is the 11th-most-populous city ...
design centre. ARM announced an increase of 23% and 35% in integer and floating point performance, respectively. Memory bandwidth increased 15% relative to the A76.


Design

The Cortex-A77 serves as the successor of the Cortex-A76. The Cortex-A77 is a 4-wide decode out-of-order superscalar design with a new 1.5K macro-OP (MOPs) cache. It can fetch 4 instructions and 6 Mops per cycle. And rename and dispatch 6 Mops, and 13 µops per cycle. The out-of-order window size has been increased to 160 entries. The backend is 12 execution ports with a 50% increase over Cortex-A76. It has a pipeline depth of 13 stages and the execution latencies of 10 stages. There are six pipelines in the integer cluster – an increase of two additional integer pipelines from Cortex-A76. One of the changes from Cortex-A76 is the unification of the issue queues. Previously each pipeline had its own issue queue. On Cortex-A77, there is now a single unified issue queue which improves efficiency. Cortex-A77 added a new fourth general math ALU with a typical 1-cycle simple math operations and some 2-cycle more complex operations. In total, there are three simple ALUs that perform arithmetic and logical data processing operations and a fourth port which has support for complex arithmetic (e.g. MAC, DIV). Cortex-A77 also added a second branch ALU, doubling the throughput for branches. There are two ASIMD/FP execution pipelines. This is unchanged from Cortex-A76. What did change is the issue queues. As with the integer cluster, the ASIMD cluster now features a unified issue queue for both pipelines, improving efficiency. As with Cortex-A76, the ASIMD on Cortex-A77 are both 128-bit wide capable of 2 double-precision operations, 4 single-precision, 8 half-precision, or 16 8-bit integer operations. Those pipelines can also execute the cryptographic instructions if the extension is supported (not offered by default and requires an additional license from Arm). Cortex-A77 added a second AES unit in order to improve the throughput of cryptography operations. Larger ROB, Up to 160-entry, up from 128, Add New L0 MOP cache , can up to 1536-entry. The core supports unprivileged 32-bit applications, but privileged applications must utilize the 64-bit
ARMv8-A ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...
ISA. It also supports Load acquire (LDAPR) instructions (
ARMv8.3-A AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
), Dot Product instructions ( ARMv8.4-A), and PSTATE Speculative Store Bypass Safe (SSBS) bit instructions ( ARMv8.5-A). The Cortex-A77 supports ARM's DynamIQ technology, and is expected to be used as high-performance cores in combination with Cortex-A55 power-efficient cores.


Architecture changes in comparison with

ARM Cortex-A76 The ARM Cortex-A76 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM states a 25% and 35% increase in integer and floating point performance, respectively, over a Co ...

* Front-end ** Branch-prediction *** Better accuracy *** Up to 64B runahead window (From 32B) *** Increase L1 BRB capacity, up to 64-entry (From 16-entry) *** Increase BTB capacity, up to 8K-entry (From 6K-entry) ** Improved
prefetcher The Prefetcher is a component of Microsoft Windows which was introduced in Windows XP. It is a component of the Memory Manager that can speed up the Windows boot process and shorten the amount of time it takes to start up programs. It accomplishe ...
** Add new L0 Macro-op cache ** Wider
instruction fetch The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch-execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instruction ...
, up to 6 instructions/cycle (From 4 instructions/cycle) *
Execution engine Capital punishment, also known as the death penalty, is the state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule-governed process to conclude that t ...
** Wider
instruction fetch The instruction cycle (also known as the fetch–decode–execute cycle, or simply the fetch-execute cycle) is the cycle that the central processing unit (CPU) follows from boot-up until the computer has shut down in order to process instruction ...
, Up to 6 instructions/cycle (From 4 instructions/cycle) ** Larger
Re-Order Buffer A re-order buffer (ROB) is a hardware unit used in an extension to the Tomasulo algorithm to support out-of-order and speculative instruction execution. The extension forces instructions to be committed in-order. The buffer is a circular buffer ...
, Up to 160-entry (From 128-entry) ** Wider dispatch, uo to 10-way, (From 8-way) ** Wider issue, up to 12-way (From 8-way) ***
Execution unit In computer engineering, an execution unit (E-unit or EU) is a part of the central processing unit (CPU) that performs the operations and calculations as instructed by the computer program. It may have its own internal control sequence unit (not ...
s **** New integer ALU unit and port **** New branch unit and port **** New dedicated store data ports **** New AES unit added


Licensing

The Cortex-A77 is available as SIP core to licensees, and its design makes it suitable for integration with other SIP cores (e.g.
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
,
display controller A video display controller or VDC (also called a display engine or display interface) is an integrated circuit which is the main component in a video-signal generator, a device responsible for the production of a TV video signal in a computing ...
,
DSP DSP may refer to: Computing * Digital signal processing, the mathematical manipulation of an information signal * Digital signal processor, a microprocessor designed for digital signal processing * Yamaha DSP-1, a proprietary digital signal ...
,
image processor An image processor, also known as an image processing engine, image processing unit (IPU), or image signal processor (ISP), is a type of media processor or specialized digital signal processor (DSP) used for image processing, in digital cameras or ...
, etc.) into one
die Die, as a verb, refers to death, the cessation of life. Die may also refer to: Games * Die, singular of dice, small throwable objects used for producing random numbers Manufacturing * Die (integrated circuit), a rectangular piece of a semicondu ...
constituting a
system on a chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory ...
(SoC).


Usage

The Samsung Exynos 980 was introduced in September 2019 as the first SoC to use the Cortex-A77 microarchitecture. This was later followed by a lower-end variant Exynos 880 in May 2020. The MediaTek Dimensity 1000, 1000L and 1000+ SoCs also utilizes the Cortex-A77 microarchitecture. Derivatives by the names of Kryo 585, Kryo 570 and Kryo 560, are used in the Snapdragon 865, 750G, and
690 __NOTOC__ Year 690 (Roman numerals, DCXC) was a common year starting on Saturday (link will display the full calendar) of the Julian calendar. The denomination 690 for this year has been used since the early medieval period, when the Anno Domi ...
respectively.


See also

*
ARM Cortex-A76 The ARM Cortex-A76 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Austin design centre. ARM states a 25% and 35% increase in integer and floating point performance, respectively, over a Co ...
, predecessor *
ARM Cortex-A78 The ARM Cortex-A78 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Ltd.'s Austin centre, set to be distributed amongst high-end devices in 2020–2021. Design The ARM Cortex-A78 is the successor t ...
, successor *
Comparison of ARMv8-A cores This is a comparison of processors based on the ARM family of instruction sets designed by ARM Holdings and 3rd parties, sorted by version of the ARM instruction set, release and name. ARMv6 ARMv7-A This is a table comparing central proc ...
, ARMv8 family


References

{{Application ARM-based chips ARM processors