ARM big.LITTLE is a
heterogeneous computing
Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorp ...
architecture developed by
ARM Holdings
Arm is a British semiconductor and software design company based in Cambridge, England.
Its primary business is in the design of ARM processors (CPUs). It also designs other chips, provides software development tools under the DS-5, RealView an ...
, coupling relatively battery-saving and slower processor cores (''LITTLE'') with relatively more powerful and power-hungry ones (''big''). Typically, only one "side" or the other will be active at once, but all
cores have access to the same memory regions, so workloads can be swapped between Big and Little cores on the fly.
The intention is to create a
multi-core processor
A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...
that can adjust better to dynamic computing needs and use less power than
clock scaling alone. ARM's marketing material promises up to a 75% savings in power usage for some activities.
Most commonly, ARM big.LITTLE architectures are used to create a
multi-processor system-on-chip A multiprocessor system on a chip (, ' or ) is a system on a chip (SoC) which includes multiple microprocessors. As such, it is a multi-core system on a chip.
MPSoCs are usually targeted for embedded applications. It is used by platforms that co ...
(MPSoC).
In October 2011, big.LITTLE was announced along with the
Cortex-A7
The ARM Cortex-A7 MPCore is a 32-bit microprocessor core licensed by ARM Holdings implementing the ARMv7-A architecture announced in 2011.
Overview
It has two target applications; firstly as a smaller, simpler, and more power-efficient succes ...
, which was designed to be
architecturally
Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...
compatible with the
Cortex-A15
The ARM Cortex-A15 MPCore is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture. It is a multicore processor with out-of-order superscalar pipeline running at up to 2.5 GHz.
Overview
ARM has claimed t ...
.
In October 2012 ARM announced the
Cortex-A53
The ARM Cortex-A53 is one of the first two central processing units implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings' Cambridge design centre. The Cortex-A53 is a 2-wide decode superscalar processor, capable of dual-iss ...
and
Cortex-A57
The ARM Cortex-A57 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings. The Cortex-A57 is an out-of-order superscalar pipeline. It is available as SIP core to licensees, and its design makes ...
(
ARMv8-A
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...
) cores, which are also intercompatible to allow their use in a big.LITTLE chip.
ARM later announced the
Cortex-A12 at
Computex 2013 followed by the
Cortex-A17
The ARM Cortex-A17 is a 32-bit processor core implementing the ARMv7-A architecture, licensed by ARM Holdings. Providing up to four cache-coherent cores, it serves as the successor to the Cortex-A9 and replaces the previous ARM Cortex-A12 spe ...
in February 2014. Both the Cortex-A12 and the Cortex-A17 can also be paired in a big.LITTLE configuration with the Cortex-A7.
The problem that big.LITTLE solves
For a given library of
CMOS
Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss", ) is a type of metal–oxide–semiconductor field-effect transistor (MOSFET) fabrication process that uses complementary and symmetrical pairs of p-type and n-type MOSFE ...
logic, active power increases as the logic switches more per second, while leakage increases with the number of transistors. So, CPUs designed to run fast are different from CPUs designed to save power. When a very fast
out-of-order CPU is idling at very low speeds, a CPU with much less leakage (fewer transistors) could do the same work. For example, it might use a smaller (fewer transistors)
memory cache
In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...
, or a simpler microarchitecture such as a
pipeline
Pipeline may refer to:
Electronics, computers and computing
* Pipeline (computing), a chain of data-processing stages or a CPU optimization found on
** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
. big.LITTLE is a way to optimize for both cases: Power and speed, in the same system.
In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible.
Run-state migration
There are three ways for the different processor cores to be arranged in a big.LITTLE design, depending on the
scheduler
A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things are i ...
implemented in the
kernel
Kernel may refer to:
Computing
* Kernel (operating system), the central component of most operating systems
* Kernel (image processing), a matrix used for image convolution
* Compute kernel, in GPGPU programming
* Kernel method, in machine learnin ...
.
Clustered switching
The clustered model approach is the first and simplest implementation, arranging the processor into identically sized clusters of "big" or "LITTLE" cores. The operating system scheduler can only see one cluster at a time; when the
load on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data are then passed through the common
L2 cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
, the active core cluster is powered off and the other one is activated. A
Cache Coherent Interconnect (CCI) is used. This model has been implemented in the
Samsung
The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
Exynos
Exynos, formerly Hummingbird (), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs.
Exy ...
5 Octa (5410).
In-kernel switcher (CPU migration)
CPU migration via the in-kernel switcher (IKS) involves pairing up a 'big' core with a 'LITTLE' core, with possibly
many identical pairs in one chip. Each pair operates as one so-termed ''virtual core'', and only one real core is (fully) powered up and running at a time. The 'big' core is used when the demand is high and the 'LITTLE' core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up,
running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via the
cpufreq framework. A complete big.LITTLE IKS implementation was added in Linux 3.11. big.LITTLE IKS is an improvement of cluster migration (), the main difference being that each pair is visible to the scheduler.
A more complex arrangement involves a non-symmetric grouping of 'big' and 'LITTLE' cores. A single chip could have one or two 'big' cores and many more 'LITTLE' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in their
Tegra 3
Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics proc ...
System-on-Chip
A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memor ...
.
Heterogeneous multi-processing (global task scheduling)
The most powerful use model of big.LITTLE architecture is ''
Heterogeneous
Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
Multi-Processing
Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There are ...
'' (HMP), which enables the use of all physical cores at the same time.
Threads with
high priority
''High Priority'' is the second studio album by United States, American singer Cherrelle. Released on October 20, 1985, it reached #9 on the Top R&B/Hip-Hop albums chart and #36 on the Billboard 200. It generated Cherrelle's biggest pop hit with ...
or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores.
This model has been implemented in the
Samsung
The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...
Exynos
Exynos, formerly Hummingbird (), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs.
Exy ...
starting with the Exynos 5 Octa series (5420, 5422, 5430),
and
Apple A series
Apple silicon is a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture. It is the basis of most new Mac computers as well as iPhone, iPad, iPod Touch, Apple T ...
processors starting with the
Apple A11
The Apple A11 Bionic is a 64-bit ARM-based system on a chip (SoC), designed by Apple Inc. and manufactured by TSMC. It first appeared in the iPhone 8 and 8 Plus, and iPhone X which were introduced on September 12, 2017. Apple states that the ...
.
Scheduling
The paired arrangement allows for switching to be done transparently to the
operating system
An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs.
Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...
using the existing
dynamic voltage and
frequency scaling
In computer architecture, frequency scaling (also known as frequency ramping) is the technique of increasing a processor's frequency so as to enhance the performance of the system containing the processor in question. Frequency ramping was the dom ...
(DVFS) facility. The existing DVFS support in the kernel (e.g.
cpufreq
in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core. This is the early solution provided by Linux's "deadline" CPU scheduler (not to be confused with the I/O scheduler with the same name) since 2012.
Alternatively, all the cores may be exposed to the
kernel scheduler, which will decide where each process/thread is executed. This will be required for the non-paired arrangement but could possibly also be used on the paired cores. It poses unique problems for the kernel scheduler, which, at least with modern commodity hardware, has been able to assume all cores in a
SMP system are equal rather than
heterogeneous
Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...
. A 2019 addition to Linux 5.0 called ''Energy Aware Scheduling'' is an example of a scheduler that considers cores differently.
Advantages of global task scheduling
* Finer-grained control of workloads that are migrated between cores. Because the scheduler is directly migrating tasks between cores, kernel
overhead is reduced and
power
Power most often refers to:
* Power (physics), meaning "rate of doing work"
** Engine power, the power put out by an engine
** Electric power
* Power (social and political), the ability to influence people or events
** Abusive power
Power may a ...
savings can be correspondingly increased.
* Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS.
* The ability to easily support non-symmetrical clusters (e.g. with 2 Cortex-A15 cores and 4 Cortex-A7 cores).
* The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS.
Successor
In May 2017, ARM announce
DynamIQas the successor to big.LITTLE.
DynamIQ is expected to allow for more flexibility and scalability when designing multi-core processors. In contrast to big.LITTLE, it increases the maximum number of cores in a cluster to 8, allows for varying core designs within a single cluster, and up to 32 total clusters. The technology also offers more fine grained per core voltage control and faster L2 cache speeds. However, DynamIQ is incompatible with previous ARM designs and is initially only supported by the
Cortex-A75 and
Cortex-A55 CPU cores.
References
Further reading
*
*
*
*
*
* {{cite web , url=https://arstechnica.com/information-technology/2012/10/arm-goes-64-bit-with-new-cortex-a53-and-cortex-a57-designs/ , title=ARM goes 64-bit with new Cortex-A53 and Cortex-A57 designs , publisher=
Ars Technica
''Ars Technica'' is a website covering news and opinions in technology, science, politics, and society, created by Ken Fisher and Jon Stokes in 1998. It publishes news, reviews, and guides on issues such as computer hardware and software, sci ...
, date=30 October 2012 , author=Andrew Cunningham , access-date=2012-10-31
External links
big.LITTLE Processingbig.LITTLE Processing with ARM CortexTM-A15 & Cortex-A7(PDF) (full technical explanation)
ARM architecture
Heterogeneous computing