ARM big.LITTLE is a

heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incor ...

architecture developed by

ARM Holdings Arm is a British semiconductor and software design company based in Cambridge, England. Its primary business is in the design of ARM processors (CPUs). It also designs other chips, provides software development tools under the DS-5, RealView ...

, coupling relatively battery-saving and slower processor cores (''LITTLE'') with relatively more powerful and power-hungry ones (''big''). Typically, only one "side" or the other will be active at once, but all cores have access to the same memory regions, so workloads can be swapped between Big and Little cores on the fly. The intention is to create a

multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...

that can adjust better to dynamic computing needs and use less power than clock scaling alone. ARM's marketing material promises up to a 75% savings in power usage for some activities. Most commonly, ARM big.LITTLE architectures are used to create a multi-processor system-on-chip (MPSoC). In October 2011, big.LITTLE was announced along with the Cortex-A7, which was designed to be architecturally compatible with the

Cortex-A15 The ARM Cortex-A15 MPCore is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture. It is a multicore processor with out-of-order superscalar pipeline running at up to 2.5 GHz. Overview ARM has claimed th ...

. In October 2012 ARM announced the

Cortex-A53 The ARM Cortex-A53 is one of the first two central processing units implementing the ARMv8-A ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced i ...

and Cortex-A57 (

ARMv8-A ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...

) cores, which are also intercompatible to allow their use in a big.LITTLE chip. ARM later announced the Cortex-A12 at Computex 2013 followed by the Cortex-A17 in February 2014. Both the Cortex-A12 and the Cortex-A17 can also be paired in a big.LITTLE configuration with the Cortex-A7.

The problem that big.LITTLE solves

For a given library of

CMOS Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss", ) is a type of metal–oxide–semiconductor field-effect transistor (MOSFET) fabrication process that uses complementary and symmetrical pairs of p-type and n-type MOS ...

logic, active power increases as the logic switches more per second, while leakage increases with the number of transistors. So, CPUs designed to run fast are different from CPUs designed to save power. When a very fast out-of-order CPU is idling at very low speeds, a CPU with much less leakage (fewer transistors) could do the same work. For example, it might use a smaller (fewer transistors)

memory cache In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...

, or a simpler microarchitecture such as a

pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...

. big.LITTLE is a way to optimize for both cases: Power and speed, in the same system. In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible.

Run-state migration

There are three ways for the different processor cores to be arranged in a big.LITTLE design, depending on the

scheduler A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things are ...

implemented in the

kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learni ...

Clustered switching

The clustered model approach is the first and simplest implementation, arranging the processor into identically sized clusters of "big" or "LITTLE" cores. The operating system scheduler can only see one cluster at a time; when the load on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data are then passed through the common

L2 cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...

, the active core cluster is powered off and the other one is activated. A Cache Coherent Interconnect (CCI) is used. This model has been implemented in the

Samsung The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...

Exynos Exynos, formerly Hummingbird (), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs. Ex ...

5 Octa (5410).

In-kernel switcher (CPU migration)

CPU migration via the in-kernel switcher (IKS) involves pairing up a 'big' core with a 'LITTLE' core, with possibly many identical pairs in one chip. Each pair operates as one so-termed ''virtual core'', and only one real core is (fully) powered up and running at a time. The 'big' core is used when the demand is high and the 'LITTLE' core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up, running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via the cpufreq framework. A complete big.LITTLE IKS implementation was added in Linux 3.11. big.LITTLE IKS is an improvement of cluster migration (), the main difference being that each pair is visible to the scheduler. A more complex arrangement involves a non-symmetric grouping of 'big' and 'LITTLE' cores. A single chip could have one or two 'big' cores and many more 'LITTLE' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in their

Tegra 3 Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics proc ...

System-on-Chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory ...

Heterogeneous multi-processing (global task scheduling)

The most powerful use model of big.LITTLE architecture is ''

Heterogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, size ...

Multi-Processing Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There ar ...

'' (HMP), which enables the use of all physical cores at the same time. Threads with high priority or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores. This model has been implemented in the

starting with the Exynos 5 Octa series (5420, 5422, 5430), and Apple A series processors starting with the

Apple A11 The Apple A11 Bionic is a 64-bit ARM-based system on a chip (SoC), designed by Apple Inc. and manufactured by TSMC. It first appeared in the iPhone 8 and 8 Plus, and iPhone X which were introduced on September 12, 2017. Apple states that the t ...

Scheduling

The paired arrangement allows for switching to be done transparently to the

operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also ...

using the existing dynamic voltage and

frequency scaling In computer architecture, frequency scaling (also known as frequency ramping) is the technique of increasing a processor's frequency so as to enhance the performance of the system containing the processor in question. Frequency ramping was the dom ...

(DVFS) facility. The existing DVFS support in the kernel (e.g. cpufreq in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core. This is the early solution provided by Linux's "deadline" CPU scheduler (not to be confused with the I/O scheduler with the same name) since 2012. Alternatively, all the cores may be exposed to the kernel scheduler, which will decide where each process/thread is executed. This will be required for the non-paired arrangement but could possibly also be used on the paired cores. It poses unique problems for the kernel scheduler, which, at least with modern commodity hardware, has been able to assume all cores in a SMP system are equal rather than

heterogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, size ...

. A 2019 addition to Linux 5.0 called ''Energy Aware Scheduling'' is an example of a scheduler that considers cores differently.

Advantages of global task scheduling

* Finer-grained control of workloads that are migrated between cores. Because the scheduler is directly migrating tasks between cores, kernel overhead is reduced and power savings can be correspondingly increased. * Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS. * The ability to easily support non-symmetrical clusters (e.g. with 2 Cortex-A15 cores and 4 Cortex-A7 cores). * The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS.

Successor

In May 2017, ARM announce
DynamIQ
as the successor to big.LITTLE. DynamIQ is expected to allow for more flexibility and scalability when designing multi-core processors. In contrast to big.LITTLE, it increases the maximum number of cores in a cluster to 8, allows for varying core designs within a single cluster, and up to 32 total clusters. The technology also offers more fine grained per core voltage control and faster L2 cache speeds. However, DynamIQ is incompatible with previous ARM designs and is initially only supported by the Cortex-A75 and

Cortex-A55 The ARM Cortex-A55 is a central processing unit implementing the ARMv8.2-A 64-bit instruction set designed by ARM Holdings' Cambridge design centre. The Cortex-A55 is a 2-wide decode in-order superscalar pipeline. Design The Cortex-A55 serves a ...

CPU cores.

References

External links

big.LITTLE Processing

big.LITTLE Processing with ARM CortexTM-A15 & Cortex-A7
(PDF) (full technical explanation) ARM architecture Heterogeneous computing