ARM big.LITTLE is a

heterogeneous computing Heterogeneous computing refers to systems that use more than one kind of processor or cores. These systems gain performance or energy efficiency not just by adding the same type of processors, but by adding dissimilar coprocessors, usually incorp ...

architecture developed by

ARM Holdings Arm is a British semiconductor and software design company based in Cambridge, England. Its primary business is in the design of ARM processors (CPUs). It also designs other chips, provides software development tools under the DS-5, RealView an ...

, coupling relatively battery-saving and slower processor cores (''LITTLE'') with relatively more powerful and power-hungry ones (''big''). Typically, only one "side" or the other will be active at once, but all cores have access to the same memory regions, so workloads can be swapped between Big and Little cores on the fly. The intention is to create a

multi-core processor A multi-core processor is a microprocessor on a single integrated circuit with two or more separate processing units, called cores, each of which reads and executes program instructions. The instructions are ordinary CPU instructions (such a ...

that can adjust better to dynamic computing needs and use less power than clock scaling alone. ARM's marketing material promises up to a 75% savings in power usage for some activities. Most commonly, ARM big.LITTLE architectures are used to create a

multi-processor system-on-chip A multiprocessor system on a chip (, ' or ) is a system on a chip (SoC) which includes multiple microprocessors. As such, it is a multi-core system on a chip. MPSoCs are usually targeted for embedded applications. It is used by platforms that co ...

(MPSoC). In October 2011, big.LITTLE was announced along with the

Cortex-A7 The ARM Cortex-A7 MPCore is a 32-bit microprocessor core licensed by ARM Holdings implementing the ARMv7-A architecture announced in 2011. Overview It has two target applications; firstly as a smaller, simpler, and more power-efficient succes ...

, which was designed to be

architecturally Architecture is the art and technique of designing and building, as distinguished from the skills associated with construction. It is both the process and the product of sketching, conceiving, planning, designing, and constructing buildings ...

compatible with the

Cortex-A15 The ARM Cortex-A15 MPCore is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture. It is a multicore processor with out-of-order superscalar pipeline running at up to 2.5 GHz. Overview ARM has claimed t ...

. In October 2012 ARM announced the

Cortex-A53 The ARM Cortex-A53 is one of the first two central processing units implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings' Cambridge design centre. The Cortex-A53 is a 2-wide decode superscalar processor, capable of dual-iss ...

and

Cortex-A57 The ARM Cortex-A57 is a central processing unit implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings. The Cortex-A57 is an out-of-order superscalar pipeline. It is available as SIP core to licensees, and its design makes ...

(

ARMv8-A ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer (RISC) instruction set architectures for computer processors, configured ...

) cores, which are also intercompatible to allow their use in a big.LITTLE chip. ARM later announced the Cortex-A12 at Computex 2013 followed by the

Cortex-A17 The ARM Cortex-A17 is a 32-bit processor core implementing the ARMv7-A architecture, licensed by ARM Holdings. Providing up to four cache-coherent cores, it serves as the successor to the Cortex-A9 and replaces the previous ARM Cortex-A12 spe ...

in February 2014. Both the Cortex-A12 and the Cortex-A17 can also be paired in a big.LITTLE configuration with the Cortex-A7.

The problem that big.LITTLE solves

For a given library of

CMOS Complementary metal–oxide–semiconductor (CMOS, pronounced "sea-moss", ) is a type of metal–oxide–semiconductor field-effect transistor (MOSFET) fabrication process that uses complementary and symmetrical pairs of p-type and n-type MOSFE ...

logic, active power increases as the logic switches more per second, while leakage increases with the number of transistors. So, CPUs designed to run fast are different from CPUs designed to save power. When a very fast out-of-order CPU is idling at very low speeds, a CPU with much less leakage (fewer transistors) could do the same work. For example, it might use a smaller (fewer transistors)

memory cache In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsewher ...

, or a simpler microarchitecture such as a

pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...

. big.LITTLE is a way to optimize for both cases: Power and speed, in the same system. In practice, a big.LITTLE system can be surprisingly inflexible. One issue is the number and types of power and clock domains that the IC provides. These may not match the standard power management features offered by an operating system. Another is that the CPUs no longer have equivalent abilities, and matching the right software task to the right CPU becomes more difficult. Most of these problems are being solved by making the electronics and software more flexible.

Run-state migration

There are three ways for the different processor cores to be arranged in a big.LITTLE design, depending on the

scheduler A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible task (project management), tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order ...

implemented in the

kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learnin ...

Clustered switching

The clustered model approach is the first and simplest implementation, arranging the processor into identically sized clusters of "big" or "LITTLE" cores. The operating system scheduler can only see one cluster at a time; when the

load Load or LOAD may refer to: Aeronautics and transportation *Load factor (aeronautics), the ratio of the lift of an aircraft to its weight *Passenger load factor, the ratio of revenue passenger miles to available seat miles of a particular transpo ...

on the whole processor changes between low and high, the system transitions to the other cluster. All relevant data are then passed through the common

L2 cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...

, the active core cluster is powered off and the other one is activated. A Cache Coherent Interconnect (CCI) is used. This model has been implemented in the

Samsung The Samsung Group (or simply Samsung) ( ko, 삼성 ) is a South Korean multinational manufacturing conglomerate headquartered in Samsung Town, Seoul, South Korea. It comprises numerous affiliated businesses, most of them united under the ...

Exynos Exynos, formerly Hummingbird (), is a series of ARM-based system-on-chips developed by Samsung Electronics' System LSI division and manufactured by Samsung Foundry. It is a continuation of Samsung's earlier S3C, S5L and S5P line of SoCs. Exy ...

5 Octa (5410).

In-kernel switcher (CPU migration)

CPU migration via the in-kernel switcher (IKS) involves pairing up a 'big' core with a 'LITTLE' core, with possibly many identical pairs in one chip. Each pair operates as one so-termed ''virtual core'', and only one real core is (fully) powered up and running at a time. The 'big' core is used when the demand is high and the 'LITTLE' core is employed when demand is low. When demand on the virtual core changes (between high and low), the incoming core is powered up, running state is transferred, the outgoing is shut down, and processing continues on the new core. Switching is done via the cpufreq framework. A complete big.LITTLE IKS implementation was added in Linux 3.11. big.LITTLE IKS is an improvement of cluster migration (), the main difference being that each pair is visible to the scheduler. A more complex arrangement involves a non-symmetric grouping of 'big' and 'LITTLE' cores. A single chip could have one or two 'big' cores and many more 'LITTLE' cores, or vice versa. Nvidia created something similar to this with the low-power 'companion core' in their

Tegra 3 Tegra is a system on a chip (SoC) series developed by Nvidia for mobile devices such as smartphones, personal digital assistants, and mobile Internet devices. The Tegra integrates an ARM architecture central processing unit (CPU), graphics proc ...

System-on-Chip A system on a chip or system-on-chip (SoC ; pl. ''SoCs'' ) is an integrated circuit that integrates most or all components of a computer or other electronic system. These components almost always include a central processing unit (CPU), memory ...

Heterogeneous multi-processing (global task scheduling)

The most powerful use model of big.LITTLE architecture is ''

Heterogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...

Multi-Processing Multiprocessing is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. There ar ...

'' (HMP), which enables the use of all physical cores at the same time. Threads with

high priority ''High Priority'' is the second studio album by United States, American singer Cherrelle. Released on October 20, 1985, it reached #9 on the Top R&B/Hip-Hop albums chart and #36 on the Billboard 200. It generated Cherrelle's biggest pop hit with ...

or computational intensity can in this case be allocated to the "big" cores while threads with less priority or less computational intensity, such as background tasks, can be performed by the "LITTLE" cores. This model has been implemented in the

starting with the Exynos 5 Octa series (5420, 5422, 5430), and

Apple A series Apple silicon is a series of system on a chip (SoC) and system in a package (SiP) processors designed by Apple Inc., mainly using the ARM architecture. It is the basis of most new Mac computers as well as iPhone, iPad, iPod Touch, Apple T ...

processors starting with the

Apple A11 The Apple A11 Bionic is a 64-bit ARM-based system on a chip (SoC), designed by Apple Inc. and manufactured by TSMC. It first appeared in the iPhone 8 and 8 Plus, and iPhone X which were introduced on September 12, 2017. Apple states that the ...

Scheduling

The paired arrangement allows for switching to be done transparently to the

operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...

using the existing dynamic voltage and

frequency scaling In computer architecture, frequency scaling (also known as frequency ramping) is the technique of increasing a processor's frequency so as to enhance the performance of the system containing the processor in question. Frequency ramping was the dom ...

(DVFS) facility. The existing DVFS support in the kernel (e.g. cpufreq in Linux) will simply see a list of frequencies/voltages and will switch between them as it sees fit, just like it does on the existing hardware. However, the low-end slots will activate the 'Little' core and the high-end slots will activate the 'Big' core. This is the early solution provided by Linux's "deadline" CPU scheduler (not to be confused with the I/O scheduler with the same name) since 2012. Alternatively, all the cores may be exposed to the kernel scheduler, which will decide where each process/thread is executed. This will be required for the non-paired arrangement but could possibly also be used on the paired cores. It poses unique problems for the kernel scheduler, which, at least with modern commodity hardware, has been able to assume all cores in a SMP system are equal rather than

heterogeneous Homogeneity and heterogeneity are concepts often used in the sciences and statistics relating to the uniformity of a substance or organism. A material or image that is homogeneous is uniform in composition or character (i.e. color, shape, siz ...

. A 2019 addition to Linux 5.0 called ''Energy Aware Scheduling'' is an example of a scheduler that considers cores differently.

Advantages of global task scheduling

* Finer-grained control of workloads that are migrated between cores. Because the scheduler is directly migrating tasks between cores, kernel overhead is reduced and

power Power most often refers to: * Power (physics), meaning "rate of doing work" ** Engine power, the power put out by an engine ** Electric power * Power (social and political), the ability to influence people or events ** Abusive power Power may a ...

savings can be correspondingly increased. * Implementation in the scheduler also makes switching decisions faster than in the cpufreq framework implemented in IKS. * The ability to easily support non-symmetrical clusters (e.g. with 2 Cortex-A15 cores and 4 Cortex-A7 cores). * The ability to use all cores simultaneously to provide improved peak performance throughput of the SoC compared to IKS.

Successor

In May 2017, ARM announce
DynamIQ
as the successor to big.LITTLE. DynamIQ is expected to allow for more flexibility and scalability when designing multi-core processors. In contrast to big.LITTLE, it increases the maximum number of cores in a cluster to 8, allows for varying core designs within a single cluster, and up to 32 total clusters. The technology also offers more fine grained per core voltage control and faster L2 cache speeds. However, DynamIQ is incompatible with previous ARM designs and is initially only supported by the Cortex-A75 and Cortex-A55 CPU cores.

References

External links

big.LITTLE Processing

big.LITTLE Processing with ARM CortexTM-A15 & Cortex-A7
(PDF) (full technical explanation) ARM architecture Heterogeneous computing