
Non-uniform memory access (NUMA) is a
computer memory
Computer memory stores information, such as data and programs, for immediate use in the computer. The term ''memory'' is often synonymous with the terms ''RAM,'' ''main memory,'' or ''primary storage.'' Archaic synonyms for main memory include ...
design used in
multiprocessing
Multiprocessing (MP) is the use of two or more central processing units (CPUs) within a single computer system. The term also refers to the ability of a system to support more than one processor or the ability to allocate tasks between them. The ...
, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own
local memory faster than non-local memory (memory local to another processor or memory shared between processors). NUMA is beneficial for workloads with high memory
locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
and low
lock contention, because a processor may operate on a subset of memory mostly or entirely within its own cache node, reducing traffic on the memory bus.
NUMA architectures logically follow in scaling from
symmetric multiprocessing
Symmetric multiprocessing or shared-memory multiprocessing (SMP) involves a multiprocessor computer hardware and software architecture where two or more identical processors are connected to a single, shared main memory, have full access to all ...
(SMP) architectures. They were developed commercially during the 1990s by
Unisys
Unisys Corporation is a global technology solutions company founded in 1986 and headquartered in Blue Bell, Pennsylvania. The company provides cloud, AI, digital workplace, logistics, and enterprise computing services.
History Founding
Unis ...
,
Convex Computer (later
Hewlett-Packard
The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company. It was founded by Bill Hewlett and David Packard in 1939 in a one-car garage in Palo Alto, California ...
),
Honeywell
Honeywell International Inc. is an American publicly traded, multinational conglomerate corporation headquartered in Charlotte, North Carolina. It primarily operates in four areas of business: aerospace, building automation, industrial automa ...
Information Systems Italy (HISI) (later
Groupe Bull
Bull SAS (also known as Groupe Bull, Bull Information Systems, or simply Bull) is a French computer company headquartered in Les Clayes-sous-Bois, in the western suburbs of Paris. The company has also been known at various times as Bull General ...
),
Silicon Graphics
Silicon Graphics, Inc. (stylized as SiliconGraphics before 1999, later rebranded SGI, historically known as Silicon Graphics Computer Systems or SGCS) was an American high-performance computing manufacturer, producing computer hardware and soft ...
(later
Silicon Graphics International
Silicon Graphics International Corp. (SGI; formerly Rackable Systems, Inc.) was an American manufacturer of computer hardware and software, including high-performance computing systems, x86-based servers for datacenter deployment, and visuali ...
),
Sequent Computer Systems
Sequent Computer Systems, Inc. was a computer company that designed and manufactured multiprocessing computer systems. They were among the pioneers in high-performance symmetric multiprocessing (SMP) Open system (computing), open systems, innovatin ...
(later
IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
),
Data General
Data General Corporation was an early minicomputer firm formed in 1968. Three of the four founders were former employees of Digital Equipment Corporation (DEC).
Their first product, 1969's Data General Nova, was a 16-bit minicomputer intended to ...
(later
EMC, now
Dell Technologies
Dell Technologies Inc. is an American multinational technology company headquartered in Round Rock, Texas. It was formed as a result of the September 2016 merger of Dell and EMC Corporation. Dell Technologies ranked 48th on the 2024 Fortune ...
),
Digital
Digital usually refers to something using discrete digits, often binary digits.
Businesses
*Digital bank, a form of financial institution
*Digital Equipment Corporation (DEC) or Digital, a computer company
*Digital Research (DR or DRI), a software ...
(later
Compaq
Compaq Computer Corporation was an American information technology, information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced some of the first IBM PC compati ...
, then
HP, now
HPE) and
ICL. Techniques developed by these companies later featured in a variety of
Unix-like
A Unix-like (sometimes referred to as UN*X, *nix or *NIX) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Uni ...
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s, and to an extent in
Windows NT
Windows NT is a Proprietary software, proprietary Graphical user interface, graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993. Original ...
.
The first commercial implementation of a NUMA-based Unix system was the Symmetrical Multi Processing XPS-100 family of servers, designed by Dan Gielan of VAST Corporation for
Honeywell Information Systems Italy.
Overview
Modern CPUs operate considerably faster than the main memory they use. In the early days of computing and data processing, the CPU generally ran slower than its own memory. The performance lines of processors and memory crossed in the 1960s with the advent of the first
supercomputer
A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
s. Since then, CPUs increasingly have found themselves "starved for data" and having to stall while waiting for data to arrive from memory (e.g. for Von-Neumann architecture-based computers, see
Von Neumann bottleneck
The von Neumann architecture—also known as the von Neumann model or Princeton architecture—is a computer architecture based on the '' First Draft of a Report on the EDVAC'', written by John von Neumann in 1945, describing designs discus ...
). Many supercomputer designs of the 1980s and 1990s focused on providing high-speed memory access as opposed to faster processors, allowing the computers to work on large data sets at speeds other systems could not approach.
Limiting the number of memory accesses provided the key to extracting high performance from a modern computer. For commodity processors, this meant installing an ever-increasing amount of high-speed
cache memory
In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsew ...
and using increasingly sophisticated algorithms to avoid
cache miss
In computing, a cache ( ) is a hardware or software component that stores data so that future requests for that data can be served faster; the data stored in a cache might be the result of an earlier computation or a copy of data stored elsew ...
es. But the dramatic increase in size of the operating systems and of the applications run on them has generally overwhelmed these cache-processing improvements. Multi-processor systems without NUMA make the problem considerably worse. Now a system can starve several processors at the same time, notably because only one processor can access the computer's memory at a time.
NUMA attempts to address this problem by providing separate memory for each processor, avoiding the performance hit when several processors attempt to address the same memory. For problems involving spread data (common for
servers and similar applications), NUMA can improve the performance over a single shared memory by a factor of roughly the number of processors (or separate memory banks).
Another approach to addressing this problem is the
multi-channel memory architecture, in which a linear increase in the number of memory channels increases the memory access concurrency linearly.
Of course, not all data ends up confined to a single task, which means that more than one processor may require the same data. To handle these cases, NUMA systems include additional hardware or software to move data between memory banks. This operation slows the processors attached to those banks, so the overall speed increase due to NUMA heavily depends on the nature of the running tasks.
Implementations
AMD
Advanced Micro Devices, Inc. (AMD) is an American multinational corporation and technology company headquartered in Santa Clara, California and maintains significant operations in Austin, Texas. AMD is a hardware and fabless company that de ...
implemented NUMA with its
Opteron
Opteron is AMD's x86 former server and workstation Microprocessor, processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64). It was released on April 22, 2003, with the ''Sl ...
processor (2003), using
HyperTransport
HyperTransport (HT), formerly known as Lightning Data Transport, is a technology for interconnection of computer Processor (computing), processors. It is a bidirectional Serial communication, serial/Parallel communication, parallel high-Bandwi ...
.
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer compo ...
announced NUMA compatibility for its x86 and
Itanium
Itanium (; ) is a discontinued family of 64-bit computing, 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). The Itanium architecture originated at Hewlett-Packard (HP), and was later jointly dev ...
servers in late 2007 with its
Nehalem and
Tukwila CPUs. Both Intel CPU families share a common
chipset
In a computer system, a chipset is a set of electronic components on one or more integrated circuits that manages the data flow between the processor, memory and peripherals. The chipset is usually found on the motherboard of computers. Chips ...
; the interconnection is called Intel
QuickPath Interconnect (QPI), which provides extremely high bandwidth to enable high on-board scalability and was replaced by a new version called Intel
UltraPath Interconnect with the release of
Skylake (2017).
Cache coherent NUMA (ccNUMA)
Nearly all CPU architectures use a small amount of very fast non-shared memory known as
cache to exploit
locality of reference
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
in memory accesses. With NUMA, maintaining
cache coherence
In computer architecture, cache coherence is the uniformity of shared resource data that is stored in multiple local caches. In a cache coherent system, if multiple clients have a cached copy of the same region of a shared memory resource, all ...
across shared memory has a significant overhead. Although simpler to design and build, non-cache-coherent NUMA systems become prohibitively complex to program in the standard
von Neumann architecture
The von Neumann architecture—also known as the von Neumann model or Princeton architecture—is a computer architecture based on the '' First Draft of a Report on the EDVAC'', written by John von Neumann in 1945, describing designs discus ...
programming model.
Typically, ccNUMA uses inter-processor communication between cache controllers to keep a consistent memory image when more than one cache stores the same memory location. For this reason, ccNUMA may perform poorly when multiple processors attempt to access the same memory area in rapid succession. Support for NUMA in
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s attempts to reduce the frequency of this kind of access by allocating processors and memory in NUMA-friendly ways and by avoiding scheduling and locking algorithms that make NUMA-unfriendly accesses necessary.
Alternatively, cache coherency protocols such as the
MESIF protocol attempt to reduce the communication required to maintain cache coherency.
Scalable Coherent Interface (SCI) is an
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is an American 501(c)(3) organization, 501(c)(3) public charity professional organization for electrical engineering, electronics engineering, and other related disciplines.
The IEEE ...
standard defining a directory-based cache coherency protocol to avoid scalability limitations found in earlier multiprocessor systems. For example, SCI is used as the basis for the NumaConnect technology.
NUMA vs. cluster computing
One can view NUMA as a tightly coupled form of
cluster computing
A computer cluster is a set of computers that work together so that they can be viewed as a single system. Unlike Grid computing, grid computers, computer clusters have each Node (networking), node set to perform the same task, controlled an ...
. The addition of
virtual memory
In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...
paging to a cluster architecture can allow the implementation of NUMA entirely in software. However, the inter-node latency of software-based NUMA remains several orders of magnitude greater (slower) than that of hardware-based NUMA.
Software support
Since NUMA largely influences memory access performance, certain software optimizations are needed to allow scheduling threads and processes close to their in-memory data.
*
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
Windows 7
Windows 7 is a major release of the Windows NT operating system developed by Microsoft. It was Software release life cycle#Release to manufacturing (RTM), released to manufacturing on July 22, 2009, and became generally available on October 22, ...
and
Windows Server 2008 R2
Windows Server 2008 R2, codenamed "Windows Server 7" or "Windows Server 2008 Release 2", is the eighth major version of the Windows NT operating system produced by Microsoft to be released under the Windows Server brand name. It was release ...
added support for NUMA architecture over 64 logical cores.
*
Java 7 added support for NUMA-aware memory allocator and
garbage collector.
*
Linux kernel
The Linux kernel is a Free and open-source software, free and open source Unix-like kernel (operating system), kernel that is used in many computer systems worldwide. The kernel was created by Linus Torvalds in 1991 and was soon adopted as the k ...
:
**Version 2.5 provided a basic NUMA support, which was further improved in subsequent kernel releases.
**Version 3.8 of the Linux kernel brought a new NUMA foundation that allowed development of more efficient NUMA policies in later kernel releases.
**Version 3.13 of the Linux kernel brought numerous policies that aim at putting a process near its memory, together with the handling of cases such as having
memory pages shared between processes, or the use of transparent
huge pages; new
sysctl settings allow NUMA balancing to be enabled or disabled, as well as the configuration of various NUMA memory balancing parameters.
*
OpenSolaris
OpenSolaris () is a discontinued open-source computer operating system for SPARC and x86 based systems, created by Sun Microsystems and based on Solaris. Its development began in the mid 2000s and ended in 2010.
OpenSolaris was developed as ...
models NUMA architecture with lgroups.
*
FreeBSD
FreeBSD is a free-software Unix-like operating system descended from the Berkeley Software Distribution (BSD). The first version was released in 1993 developed from 386BSD, one of the first fully functional and free Unix clones on affordable ...
added support for NUMA architecture in version 9.0.
*
Silicon Graphics
Silicon Graphics, Inc. (stylized as SiliconGraphics before 1999, later rebranded SGI, historically known as Silicon Graphics Computer Systems or SGCS) was an American high-performance computing manufacturer, producing computer hardware and soft ...
IRIX
IRIX (, ) is a discontinued operating system developed by Silicon Graphics (SGI) to run on the company's proprietary MIPS architecture, MIPS workstations and servers. It is based on UNIX System V with Berkeley Software Distribution, BSD extensio ...
(discontinued as of 2021) support for ccNUMA architecture over 1240 CPU with Origin server series.
Hardware support
As of 2011, ccNUMA systems are multiprocessor systems based on the
AMD Opteron processor, which can be implemented without external logic, and the Intel
Itanium processor, which requires the chipset to support NUMA. Examples of ccNUMA-enabled chipsets are the SGI Shub (Super hub), the Intel E8870, the
HP sx2000 (used in the Integrity and Superdome servers), and those found in NEC Itanium-based systems. Earlier ccNUMA systems such as those from
Silicon Graphics
Silicon Graphics, Inc. (stylized as SiliconGraphics before 1999, later rebranded SGI, historically known as Silicon Graphics Computer Systems or SGCS) was an American high-performance computing manufacturer, producing computer hardware and soft ...
were based on
MIPS processors and the
DEC Alpha 21364 (EV7) processor.
See also
*
Uniform memory access (UMA)
*
Cache-only memory architecture (COMA)
*
HiperDispatch
HiperDispatch is a workload dispatching feature found in recent IBM mainframe models (the System z10 and IBM zEnterprise System processors and later models) running recent releases of z/OS. HiperDispatch was introduced in February 2008. Support w ...
*
Partitioned global address space
*
Nodal architecture
*
Scratchpad memory (SPM)
References
External links
NUMA FAQOpenSolaris NUMA ProjectIntroduction video for the Alpha EV7 system architectureMore videos related to EV7 systems: CPU, IO, etcNUMA optimization in Windows ApplicationsNUMA Support in Linux at SGIIntel TukwilaIntel QPI (CSI) explainedcurrent Itanium NUMA systems
{{Parallel Computing
Parallel computing
Computer memory