A memory management unit (MMU), sometimes called paged memory management unit (PMMU), is a
computer hardware
Computer hardware includes the physical parts of a computer, such as the central processing unit (CPU), random-access memory (RAM), motherboard, computer data storage, graphics card, sound card, and computer case. It includes external devices ...
unit that examines all references to
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
, and translates the memory addresses being referenced, known as
virtual memory addresses, into
physical address
In computing, a physical address (also real address, or binary address), is a memory address that is represented in the form of a binary number on the address bus circuitry in order to enable the data bus to access a ''particular'' storage cell o ...
es in
main memory
Computer data storage or digital data storage is a technology consisting of computer components and recording media that are used to retain digital data. It is a core function and fundamental component of computers.
The central processin ...
.
In modern systems, programs generally have addresses that access the theoretical maximum memory of the
computer architecture, 32 or 64 bits. The MMU maps the addresses from each program into separate areas in physical memory, which is generally much smaller than the theoretical maximum. This is possible because programs rarely use large amounts of memory at any one time.
Most modern
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
s (OS) work in concert with an MMU to provide
virtual memory
In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...
(VM) support.
The MMU tracks memory use in fixed-size blocks known as ''pages''.
If a program refers to a location in a page that is not in physical memory, the MMU sends an
interrupt
In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted ...
to the
operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
.
The OS selects a lesser-used block in memory, writes it to
backing storage such as a
hard drive
A hard disk drive (HDD), hard disk, hard drive, or fixed disk is an electro-mechanical data storage device that stores and retrieves digital data using magnetic storage with one or more rigid rapidly rotating hard disk drive platter, pla ...
if it has been modified since it was read in, reads the page from backing storage into that block, and sets up the MMU to map the block to the originally requested page so the program can use it.
This is known as
demand paging.
Some simpler
real-time operating system
A real-time operating system (RTOS) is an operating system (OS) for real-time computing applications that processes data and events that have critically defined time constraints. A RTOS is distinct from a time-sharing operating system, such as Unix ...
s do not support virtual memory and do not need an MMU, but still need a hardware
memory protection unit.
MMUs generally provide
memory protection
Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that h ...
to block attempts by a program to access memory it has not previously requested, which prevents a misbehaving program from using up all memory or malicious code from reading data from another program.
In some early
microprocessor
A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
designs,
memory management
Memory management (also dynamic memory management, dynamic storage allocation, or dynamic memory allocation) is a form of Resource management (computing), resource management applied to computer memory. The essential requirement of memory manag ...
was performed by a separate
integrated circuit
An integrated circuit (IC), also known as a microchip or simply chip, is a set of electronic circuits, consisting of various electronic components (such as transistors, resistors, and capacitors) and their interconnections. These components a ...
such as the
VLSI Technology VI475 (1986), the
Motorola 68851 (1984) used with the
Motorola 68020 CPU in the
Macintosh II, or the Z8010 and Z8015 (1985) used with the
Zilog Z8000 family of processors. Later microprocessors (such as the
Motorola 68030 and the
Zilog Z280) placed the MMU together with the CPU on the same integrated circuit, as did the
Intel 80286
The Intel 80286 (also marketed as the iAPX 286 and often called Intel 286) is a 16-bit microprocessor that was introduced on February 1, 1982. It was the first 8086-based CPU with separate, non- multiplexed address and data buses and also the f ...
and later
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
microprocessors.
Some early systems, especially
8-bit
In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data bu ...
systems, used very simple MMUs to perform
bank switching.
Types of address translation
Early systems used
base and bounds addressing that further developed into
segmentation, or used a fixed set of blocks instead of loading them on demand. The difference between these two approaches is the size of the contiguous block of memory; paged systems break up main memory into a series of equal sized blocks, while segmented systems generally allow for variable sizes.
Segmented translation
In segmented translation, a memory address contains a segment number and an offset within the segment.
Segments are variable-length, and may have permissions, such as read, write, and execute, associated with them.
[
] A segment is loaded into a contiguous area of physical memory. Typically, the segment number is used as an index into a ''segment table''; each entry in the segment table holds the address of the area of physical memory, the length of the segment, and other information such as permission flags.
This style has the advantage of simplicity; the memory blocks are continuous and thus only the two values, base and limit, need to be stored for mapping purposes.
The disadvantage of this approach is that it leads to an effect known as ''
external fragmentation
External may refer to:
* Externality
In economics, an externality is an Indirect costs, indirect cost (external cost) or indirect benefit (external benefit) to an uninvolved third party that arises as an effect of another party's (or parties') ...
''. This occurs when memory allocations are released but are non-contiguous. In this case, enough memory may be available to handle a request, but this is spread out and cannot be allocated to a single segment. On systems where programs start and stop over time, this can eventually lead to memory being highly fragmented and no large blocks remaining; in this case, segments would need to be moved in memory, and their segment table entries modified to reflect the new physical address, to make a contiguous space large enough for a segment available.
Some models of the
PDP-11
The PDP–11 is a series of 16-bit minicomputers originally sold by Digital Equipment Corporation (DEC) from 1970 into the late 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of a ...
16-bit
16-bit microcomputers are microcomputers that use 16-bit microprocessors.
A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two ...
minicomputer
A minicomputer, or colloquially mini, is a type of general-purpose computer mostly developed from the mid-1960s, built significantly smaller and sold at a much lower price than mainframe computers . By 21st century-standards however, a mini is ...
have a segmented memory management unit with a set of ''page address registers'' (PARs) and ''page description registers'' (PDRs); this maps an 16-bit virtual address to an 18-bit physical address. The
PDP-11/70 expands that to produce a 22-bit physical address.
Segmenting was widely used on
microcomputer
A microcomputer is a small, relatively inexpensive computer having a central processing unit (CPU) made out of a microprocessor. The computer also includes memory and input/output (I/O) circuitry together mounted on a printed circuit board (P ...
platforms of the 1980s. Among the MMUs that used this concept were the
Motorola 68451[ and the Zilog Z8010, but many other examples exist.
The ]Intel 8086
The 8086 (also called iAPX 86) is a 16-bit computing, 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-b ...
, Intel 8088
The Intel 8088 ("''eighty-eighty-eight''", also called iAPX 88) microprocessor is a variant of the Intel 8086. Introduced on June 1, 1979, the 8088 has an eight-bit external data bus instead of the 16-bit bus of the 8086. The 16-bit registers ...
, Intel 80186
The Intel 80186, also known as the iAPX 186, or just 186, is a microprocessor and microcontroller introduced in 1982. It was based on the Intel 8086 and, like it, had a 16-bit external Bus (computing)#Address bus, data bus multiplexed with a 20 ...
, and Intel 80188 provide crude memory segmentation and no memory protection
Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that h ...
. (Every byte of every segment is always available to any program.) The 16-bit segment registers allow for 65,536 segments; each segment begins at a fixed offset equal to 16 times the segment number; the segment starting address granularity is 16 bytes. Each segment grants read-write access to 64 KiB (65,536 bytes) of address space (this limit is set by the 16-bit PC and SP registers; the processor does no bounds checking). Offset+address exceeding 0xFFFFF wraps around to 0x00000. Each 64 KiB segment overlaps the next 4,095 segments; each physical address can be denoted by 4,096 segment–offset pairs. This scheme can address only 1 MiB (1024 KiB) of physical memory (and memory-mapped i/o). (Optional expanded memory
In DOS memory management, expanded memory is a system of bank switching that provided additional memory to DOS programs beyond the limit of conventional memory (640 KiB).
''Expanded memory'' is an umbrella term for several incompatible tech ...
hardware can add bank-switched memory under software control.) Later x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
processors, starting with the Intel 80286
The Intel 80286 (also marketed as the iAPX 286 and often called Intel 286) is a 16-bit microprocessor that was introduced on February 1, 1982. It was the first 8086-based CPU with separate, non- multiplexed address and data buses and also the f ...
, supported real segmented mapping, with a segment table.
Paged translation
In paged translation, the address space
In computing, an address space defines a range of discrete addresses, each of which may correspond to a network host, peripheral device, disk sector, a memory cell or other logical or physical entity.
For software programs to save and retrieve ...
(the range of addresses used by the processor) is divided into pages, each having a size which is a power of 2, usually a few kilobyte
The kilobyte is a multiple of the unit byte for Computer data storage, digital information.
The International System of Units (SI) defines the prefix ''kilo-, kilo'' as a multiplication factor of 1000 (103); therefore, one kilobyte is 1000&nbs ...
s, but they may be much larger. Programs reference memory using the natural address size of the machine, typically 32 or 64 bits in modern systems. The bottom bits of the address (the offset within a page) are left unchanged. The upper address bits are the virtual page numbers.
Most MMUs use an in-memory table of items called a page table, containing one page table entry (PTE) per virtual page, to map virtual page numbers to physical page numbers in main memory. Multi-level page tables are often used to reduce the size of the page table. An associative cache of PTEs is called a translation lookaside buffer
A translation lookaside buffer (TLB) is a memory CPU cache, cache that stores the recent translations of virtual memory address to a physical memory Memory_address, location. It is used to reduce the time taken to access a user memory location. It ...
(TLB) and is used to avoid the necessity of accessing the main memory every time a virtual address is mapped.[
Other MMUs may have a private array of memory, a set of registers, or a one-or-more-level array of static RAM to store a set of mapping information.
The MMU splits the virtual address into a virtual page number and an offset within the page. The virtual page number is used to select a page table entry; if the page table entry is found, and the page is marked as being in memory, the physical page number in the page table entry is combined with the offset to construct the physical address corresponding to the virtual address.]
The virtual page number may be directly used as an index into the page table or other mapping information, or it may be further divided, with bits at a given level used as an index into a table of lower-level tables into which bits at the next level down are used as an index, with two or more levels of indexing.
One issue with paged translation is that as the virtual address space expands, the amount of memory needed to hold the mapping increases as well.
For instance, in the 68020 the addresses are 32 bits wide, meaning the virtual page number for an 8 kB page size is the upper 19 bits of the address, and a single-level page table would be 512 kB in size.[ In the 1980s, for an in-memory page table, this might be a significant fraction of the main memory of the machine, and, for an MMU that holds the page map in static RAM, might require a costly amount of static RAM.
This problem can be reduced by making the pages larger, say 64 kB instead of 8.
Now the page index uses 16 bits and the resulting page table is 64 kB, which is more tractable.
Moving to a larger page size leads to the second problem: increased internal fragmentation.
A program that generates a series of requests for small blocks will be assigned large blocks and thereby waste large amounts of memory.][
If the address space is sparse, so that not all regions of it are allocated, the problem can be reduced by using a multi-level page table or static RAM map, and not allocate all the page table entries that would be needed for an empty region.
The paged translation approach was widely used by microprocessor MMUs in the 1970s and 1980s, including the 68020's 68851 and the on-chip MMU of the 68030, the Zilog Z8000's Z8015, and the NS32000 series's NS16082.
A page table entry or other per-page information may also include information about whether the page has been written to (the '' dirty bit''), when it was last used (the ''accessed bit'', for a ]least recently used
In computing, cache replacement policies (also known as cache replacement algorithms or cache algorithms) are Program optimization, optimizing instructions or algorithms which a computer program or hardware-maintained structure can utilize to ma ...
(LRU) page replacement algorithm), what kind of processes (user mode
A modern computer operating system usually uses virtual memory to provide separate address spaces or regions of a single address space, called user space and kernel space. This separation primarily provides memory protection and hardware prote ...
or supervisor mode) may read and write it, and whether it should be cached.
Sometimes, a page table entry or other per-page information prohibits access to a particular virtual page, perhaps because no physical random-access memory
Random-access memory (RAM; ) is a form of Computer memory, electronic computer memory that can be read and changed in any order, typically used to store working Data (computing), data and machine code. A random-access memory device allows ...
(RAM) has been allocated to that virtual page. In this case, the MMU signals a page fault
In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space ...
to the CPU. The operating system
An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs.
Time-sharing operating systems scheduler (computing), schedule tasks for ...
(OS) then handles the situation, perhaps by trying to find a spare frame of RAM and set up the page map to map it to the requested virtual address. If no RAM is free, it may be necessary to choose an existing page (known as a ''victim''), using some replacement algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
, and save it to disk (a process called ''paging
In computer operating systems, memory paging is a memory management scheme that allows the physical Computer memory, memory used by a program to be non-contiguous. This also helps avoid the problem of memory fragmentation and requiring compact ...
''). With some MMUs, there can also be a shortage of PTEs, in which case the OS will have to free one for the new mapping.
The MMU may also generate illegal access error conditions or invalid page faults upon illegal or non-existing memory accesses, respectively, leading to segmentation fault or bus error
In computing, a bus error is a Trap (computing), fault raised by hardware, notifying an operating system (OS) that a process is trying to access computer data storage, memory that the Central processing unit, CPU cannot physically address: an inva ...
conditions when handled by the operating system.
Paged translation mitigates the problem of external fragmentation of memory. After blocks of memory have been allocated and freed, the free memory may become fragmented (discontinuous) so that the largest contiguous block of free memory may be much smaller than the total amount. With virtual memory, a contiguous range of virtual addresses can be mapped to several non-contiguous blocks of physical memory; this non-contiguous allocation is one of the benefits of paging
In computer operating systems, memory paging is a memory management scheme that allows the physical Computer memory, memory used by a program to be non-contiguous. This also helps avoid the problem of memory fragmentation and requiring compact ...
.
However, paged translation causes another problem, '' internal fragmentation''. This occurs when a program requests a block of memory that does not cleanly map into a page, for instance, if a program requests a 1 KB buffer to perform file work. In this case, the request results in an entire page being set aside even though only 1 KB of the page will ever be used; if pages are larger than 1 KB, the remainder of the page is wasted. If many small allocations of this sort are made, memory can be used up even though much of it remains empty.[
]
Segmentation plus paging
Some systems, such as the GE 645 and its successors, used both segmentation and paging. The table of segments, instead of containing per-segment entries giving the physical base address and length of the segment, contains entries giving the physical base address of a page table for the segment, in addition to the length of the segment. Physical memory is divided into fixed-size pages, and the same techniques used for purely page-based demand paging are used for segment-and-page-based demand paging.
The Signetics
Signetics Corporation was an American electronics manufacturer specifically established to make integrated circuits. Founded in 1961, they went on to develop a number of early microprocessors and support chips, as well as the widely used 555 time ...
68905 MMU, designed for the Motorola
Motorola, Inc. () was an American multinational telecommunications company based in Schaumburg, Illinois. It was founded by brothers Paul and Joseph Galvin in 1928 and had been named Motorola since 1947. Many of Motorola's products had been ...
68000
The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
, 68010, and 68012 microprocessors, supported segmentation and paging. Both Signetics and Philips
Koninklijke Philips N.V. (), simply branded Philips, is a Dutch multinational health technology company that was founded in Eindhoven in 1891. Since 1997, its world headquarters have been situated in Amsterdam, though the Benelux headquarter ...
produced a version of the 68000 that combined the 68905 on the same physical chip as the processor, the 68070.[
]
Benefits
An MMU can provide memory protection
Memory protection is a way to control memory access rights on a computer, and is a part of most modern instruction set architectures and operating systems. The main purpose of memory protection is to prevent a process from accessing memory that h ...
, so that attempts by code to access or modify data to which it does not have access results in a failure; this catches both software bug
A software bug is a design defect ( bug) in computer software. A computer program with many or serious bugs may be described as ''buggy''.
The effects of a software bug range from minor (such as a misspelled word in the user interface) to sev ...
s and deliberate attempts to access or modify that data. Typically, an operating system assigns each program its own virtual address space.
An MMU can also be used to expand the size of the physical address when the virtual address is too small. For instance, the PDP-11
The PDP–11 is a series of 16-bit minicomputers originally sold by Digital Equipment Corporation (DEC) from 1970 into the late 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of a ...
originally had a 16-bit address that made it too small as memory sizes increased in the 1970s. This was addressed by expanding the physical memory bus to 18 bits, and using an MMU to add two more bits based on other pins on the processor bus to indicate which program was accessing memory.
Other memory-related MMU functions
Some MMUs such as the Signetics
Signetics Corporation was an American electronics manufacturer specifically established to make integrated circuits. Founded in 1961, they went on to develop a number of early microprocessors and support chips, as well as the widely used 555 time ...
68905, also included a controller to manage a processor cache, which stores recently accessed data in a very fast memory and thus reduces the need to talk to the slower main memory. In some implementations, MMUs are also responsible for bus arbitration
Arbitration is a formal method of dispute resolution involving a third party neutral who makes a binding decision. The third party neutral (the 'arbitrator', 'arbiter' or 'arbitral tribunal') renders the decision in the form of an 'arbitrati ...
, controlling access to the memory bus among the many parts of the computer that desire access.
IOMMUs
Hardware that performs direct memory access
Direct memory access (DMA) is a feature of computer systems that allows certain hardware subsystems to access main system computer memory, memory independently of the central processing unit (CPU).
Without DMA, when the CPU is using programmed i ...
completely bypasses the CPU's MMU.
Some I/O systems allow DMA hardware to use physical addresses to read or write directly anywhere in physical RAM without any restrictions.
Other I/O systems have a separate MMU called an input-output memory management unit (IOMMU) that can be programmed by the OS to translate device addresses from the DMA hardware to specific, limited physical pages chosen by the OS.
The IOMMU can be used to block DMA attacks.
Examples
Most modern systems divide memory into pages that are in size, often with the capability to use so-called huge pages of or in size (often both variants are possible). Page translations are cached in a translation lookaside buffer
A translation lookaside buffer (TLB) is a memory CPU cache, cache that stores the recent translations of virtual memory address to a physical memory Memory_address, location. It is used to reduce the time taken to access a user memory location. It ...
(TLB). Some systems, mainly older RISC
In electronics and computer science, a reduced instruction set computer (RISC) is a computer architecture designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a comp ...
designs, trap into the OS when a page translation is not found in the TLB. Most systems use a hardware-based tree walker. Most systems allow the MMU to be disabled, but some disable the MMU when trapping into OS code.
IBM System/360 Model 67, IBM System/370, and successors
The IBM System/360 Model 67, which was introduced August, 1965, included an MMU called a dynamic address translation (DAT) box. It has the unusual feature of storing accessed and dirty bits outside of the page table (along with the four bit protection key for all S/360 processors). They refer to physical memory rather than virtual memory, and are accessed by special-purpose instructions.[ This reduces overhead for the OS, which would otherwise need to propagate accessed and dirty bits from the page tables to a more physically oriented data structure. This makes OS-level virtualization, later called paravirtualization, easier.
Starting in August, 1972, the ]IBM System/370
The IBM System/370 (S/370) is a range of IBM mainframe computers announced as the successors to the IBM System/360, System/360 family on June 30, 1970. The series mostly maintains backward compatibility with the S/360, allowing an easy migrati ...
has a similar MMU, although it initially supported only a 24-bit virtual address space rather than the 32-bit virtual address space of the System/360 Model 67. It also stores the accessed and dirty bits outside the page table. In early 1983, the System/370-XA architecture expanded the virtual address space to 31 bits, and in 2000, the 64-bit
In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit central processing units (CPU) and arithmetic logic units (ALU) are those that are based on processor registers, a ...
z/Architecture
z/Architecture, initially and briefly called ESA Modal Extensions (ESAME), is IBM's 64-bit complex instruction set computer (CISC) instruction set architecture, implemented by its mainframe computers. IBM introduced its first z/Architecture ...
was introduced, with the address space expanded to 64 bits; those continue to store the accessed and dirty bits outside the page table.
VAX
VAX pages are 512 bytes, which is very small. An OS may treat multiple pages as if they were a single larger page. For example, Linux
Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
on VAX groups eight pages together. Thus, the system is viewed as having pages. The VAX divides memory into four fixed-purpose regions, each in size. They are:
;P0 space: Used for general-purpose per-process memory such as heaps.
;P1 space: (Or control space) which is also per-process and is typically used for supervisor, executive, kernel, user stacks and other per-process control structures managed by the operating system.
;S0 space: (Or system space) which is global to all processes and stores operating system code and data, whether paged or not, including pagetables.
;S1 space: Which is unused and "Reserved to Digital
Digital usually refers to something using discrete digits, often binary digits.
Businesses
*Digital bank, a form of financial institution
*Digital Equipment Corporation (DEC) or Digital, a computer company
*Digital Research (DR or DRI), a software ...
".
Page tables are big linear arrays. Normally, this would be very wasteful when addresses are used at both ends of the possible range, but the page tables for P0 and P1 space are stored in the paged S0 space. Thus, there is effectively a two-level tree
In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, e.g., including only woody plants with secondary growth, only ...
, allowing applications to have sparse memory layout without wasting a lot of space on unused page table entries. Unlike page table entries in most MMUs, page table entries in the VAX MMU lack an accessed bit. OSes which implement paging must find some way to emulate the accessed bit if they are to operate efficiently. Typically, the OS will periodically unmap pages so that page-not-present faults can be used to let the OS set an accessed bit.
ARM
ARM architecture
ARM (stylised in lowercase as arm, formerly an acronym for Advanced RISC Machines and originally Acorn RISC Machine) is a family of reduced instruction set computer, RISC instruction set architectures (ISAs) for central processing unit, com ...
-based application processors implement an MMU defined by ARM's virtual memory system architecture (VMSA). The current architecture defines PTEs for describing and pages, sections and super-sections; legacy versions also defined a tiny page. ARM uses a two-level page table if using and pages, or just a one-level page table for sections and sections.
TLB updates are performed automatically by page table walking hardware. PTEs include read/write access permission based on privilege, cacheability information, an NX bit
The NX bit (no-execute bit) is a processor feature that separates areas of a virtual address space (the memory layout a program uses) into sections for storing data or program instructions. An operating system supporting the NX bit can mark certai ...
, and a non-secure bit.
DEC Alpha
DEC Alpha
Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set computers ( ...
processors divide memory into , , , or ; the page size is dependent on the processor. pages. After a TLB miss, low-level firmware
In computing
Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, h ...
machine code (here called PALcode
PALcode (Privileged Architecture Library code) is the name used by DEC in the DEC Alpha, Alpha instruction set architecture (ISA) for a set of functions in the System Reference Manual (SRM) or AlphaBIOS firmware, providing a hardware abstraction la ...
) walks a page table.
The OpenVMS
OpenVMS, often referred to as just VMS, is a multi-user, multiprocessing and virtual memory-based operating system. It is designed to support time-sharing, batch processing, transaction processing and workstation applications. Customers using Op ...
AXP PALcode and DEC OSF/1 PALcode walk a three-level tree-structured page table. Addresses are broken down into an unused set of bits (containing the same value as the uppermost bit of the index into the root level of the tree), a set of bits to index the root level of the tree, a set of bits to index the middle level of the tree, a set of bits to index the leaf level of the tree, and remaining bits that pass through to the physical address without modification, indexing a byte within the page. The sizes of the fields are dependent on the page size; all three tree index fields are the same size. The OpenVMS AXP PALcode supports full read and write permission bits for user, supervisor, executive, and kernel modes, and also supports fault on read/write/execute bits are also supported. The DEC OSF/1 PALcode supports full read and write permission bits for user and kernel modes, and also supports fault on read/write/execute bits are also supported.
The Windows NT
Windows NT is a Proprietary software, proprietary Graphical user interface, graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993. Original ...
AXP PALcode can either walk a single-level page table in a virtual address space or a two-level page table in physical address space. The upper 32 bits of an address are ignored.
For a single-level page table, addresses are broken down into a set of bits to index the page table and remaining bits that pass through to the physical address without modification, indexing a byte within the page.
For a two-level page table, addresses are broken down into a set of bits to index the root level of the tree, a set of bits to index the top level of the tree, a set of bits to index the leaf level of the tree, and remaining bits that pass through to the physical address without modification, indexing a byte within the page. The sizes of the fields are dependent on the page size. The Windows NT AXP PALcode supports a page being accessible only from kernel mode or being accessible from user and kernel mode, and also supports a fault on write bit.
MIPS
The MIPS architecture
MIPS (Microprocessor without Interlocked Pipelined Stages) is a family of reduced instruction set computer (RISC) instruction set architectures (ISA)Price, Charles (September 1995). ''MIPS IV Instruction Set'' (Revision 3.2), MIPS Technologies ...
supports one to 64 entries in the TLB. The number of TLB entries is configurable at CPU configuration before synthesis. TLB entries are dual. Each TLB entry maps a virtual page number (VPN2) to either one of two page frame numbers (PFN0 or PFN1), depending on the least significant bit of the virtual address that is not part of the page mask
A mask is an object normally worn on the face, typically for protection, disguise, performance, or entertainment, and often employed for rituals and rites. Masks have been used since antiquity for both ceremonial and practical purposes, ...
. This bit and the page mask bits are not stored in the VPN2. Each TLB entry has its own page size, which can be any value from to in multiples of four. Each PFN in a TLB entry has a caching attribute, a dirty and a valid status bit. A VPN2 has a global status bit and an OS assigned ID which participates in the virtual address TLB entry match, if the global status bit is set to zero. A PFN stores the physical address without the page mask bits.
A TLB refill exception is generated when there are no entries in the TLB that match the mapped virtual address. A TLB invalid exception is generated when there is a match but the entry is marked invalid. A TLB modified exception is generated when a store instruction references a mapped address and the matching entry's dirty status is not set. If a TLB exception occurs when processing a TLB exception, a double fault TLB exception, it is dispatched to its own exception handler.
MIPS32 and MIPS32r2 support 32 bits of virtual address space and up to 36 bits of physical address space. MIPS64 supports up to 64 bits of virtual address space and up to 59 bits of physical address space.
Sun MMU
The original Sun-1 is a single-board computer
A single-board computer (SBC) is a complete computer built on a single circuit board, with microprocessor(s), memory, input/output (I/O) and other features required of a functional computer. Single-board computers are commonly made as demonst ...
built around the Motorola 68000
The Motorola 68000 (sometimes shortened to Motorola 68k or m68k and usually pronounced "sixty-eight-thousand") is a 16/32-bit complex instruction set computer (CISC) microprocessor, introduced in 1979 by Motorola Semiconductor Products Sector ...
microprocessor and introduced in 1982. It includes the original Sun 1 memory management unit that provides address translation, memory protection, memory sharing and memory allocation for multiple processes running on the CPU. All access of the CPU to private on-board RAM, external Multibus memory, on-board I/O and the Multibus I/O runs through the MMU, where address translation and protection are done in a uniform fashion. The MMU is implemented in hardware on the CPU board.
The MMU consists of a context register, a segment map and a page map. Virtual addresses from the CPU are translated into intermediate addresses by the segment map, which in turn are translated into physical addresses by the page map. The page size is and the segment size is which gives 16 pages per segment. Up to 16 contexts can be mapped concurrently. The maximum logical address space for a context is or The maximum physical address that can be mapped simultaneously is also
The context register is important in a multitasking operating system because it allows the CPU to switch between processes without reloading all the translation state information. The 4-bit context register can switch between 16 sections of the segment map under supervisor control, which allows 16 contexts to be mapped concurrently. Each context has its own virtual address space. Sharing of virtual address space and inter-context communications can be provided by writing the same values in to the segment or page maps of different contexts. Additional contexts can be handled by treating the segment map as a context cache and replacing out-of-date contexts on a least-recently used basis.
The context register makes no distinction between user and supervisor states. Interrupts and traps do not switch contexts, which requires that all valid interrupt vectors always be mapped in page 0 of context, as well as the valid supervisor stack.
The Sun-2 workstations are similar; they are built around the Motorola 68010
The Motorola MC68010 and Motorola MC68012 are 16/32-bit microprocessor, microprocessors from Motorola, released in 1982 as successors to the Motorola 68000. The 68010 and 68012 added virtualization features, optimized loops and fixed several sma ...
microprocessor and have a similar memory management unit, with pages and segments. The context register has a 3-bit system context used in supervisor state and a 3-bit user context used in user state.
The Sun-3
Sun-3 is a series of UNIX computer workstations and servers produced by Sun Microsystems, launched on September 9, 1985. The Sun-3 series are VMEbus-based systems similar to some of the earlier Sun-2 series, but using the Motorola 68020 mic ...
workstations, except for the Sun-3/80, Sun-3/460, Sun-3/470, and Sun-3/480, are built around the Motorola 68020, and have a similar memory management unit. The page size is increased to . (The later models are built around the Motorola 68030 and use the 68030's on-chip MMU.)
The Sun-4
Sun-4 is a series of Unix workstations and servers produced by Sun Microsystems, first appearing in July 1987, with the launch of the Sun 4/260. The original Sun-4 series were VMEbus-based systems similar to the earlier Sun-3 series, but employi ...
workstations are built around various SPARC microprocessors, and have a memory management unit similar to that of the Sun-3 workstations.
PowerPC
In PowerPC
PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple Inc., App ...
G1, G2, G3, and G4 pages are normally After a TLB miss, the standard PowerPC MMU begins two simultaneous lookups. One lookup attempts to match the address with one of four or eight data block address translation (DBAT) registers, or four or eight instruction block address translation registers (IBAT), as appropriate. The BAT registers can map linear chunks of memory as large as and are normally used by an OS to map large portions of the address space for the OS kernel's own use. If the BAT lookup succeeds, the other lookup is halted and ignored.
The other lookup, not directly supported by all processors in this family, is via a so-called inverted page table, which acts as a hashed off-chip extension of the TLB. First, the top four bits of the address are used to select one of 16 segment registers. Then 24 bits from the segment register replace those four bits, producing a 52-bit address. The use of segment registers allows multiple processes to share the same hash table
In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
.
The 52-bit address is hashed, then used as an index into the off-chip table. There, a group of eight-page table entries is scanned for one that matches. If none match due to excessive hash collision
In computer science, a hash collision or hash clash is when two distinct pieces of data in a hash table share the same hash value. The hash value in this case is derived from a hash function which takes a data input and returns a fixed length of ...
s, the processor tries again with a slightly different hash function
A hash function is any Function (mathematics), function that can be used to map data (computing), data of arbitrary size to fixed-size values, though there are some hash functions that support variable-length output. The values returned by a ...
. If this, too, fails, the CPU traps into OS (with MMU disabled) so that the problem may be resolved. The OS needs to discard an entry from the hash table to make space for a new entry. The OS may generate the new entry from a more normal tree-like page table or from per-mapping data structures which are likely to be slower and more space-efficient. Support for no-execute control is in the segment registers, leading to granularity.
A major problem with this design is poor cache locality caused by the hash function. Tree-based designs avoid this by placing the page table entries for adjacent pages in adjacent locations. An operating system running on the PowerPC may minimize the size of the hash table to reduce this problem.
It is also somewhat slow to remove the page table entries of a process. The OS may avoid reusing segment values to delay facing this, or it may elect to suffer the waste of memory associated with per-process hash tables. G1 chips do not search for page table entries, but they do generate the hash, with the expectation that an OS will search the standard hash table via software. The OS can write to the TLB. G2, G3, and early G4 chips use hardware to search the hash table. The latest chips allow the OS to choose either method. On chips that make this optional or do not support it at all, the OS may choose to use a tree-based page table exclusively.
x86
The x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel, based on the 8086 microprocessor and its 8-bit-external-bus variant, the 8088. Th ...
architecture has evolved over a very long time while maintaining full software compatibility, even for OS code. Thus, the MMU is extremely complex, with many different possible operating modes.
The 8086
The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allo ...
/ 8088 and 80186/80188 have no memory management unit; they support segmentation, but only to support more physical memory than a 16-bit address can support, as the segment number, in a segment register, is multiplied by 16 and added to the segment offset to generate a physical address.
The 80286 added an MMU that supports segmentation, but not paging. When segmentation is enabled by turning on protected mode
In computing, protected mode, also called protected virtual address mode, is an operational mode of x86-compatible central processing units (CPUs). It allows system software to use features such as Memory_segmentation, segmentation, virtual mem ...
, the segment number acts as an index into a table of segment descriptors; a segment descriptor contains a base physical address, a segment length, a presence bit to indicate whether the segment is currently in memory, permission bits, and control bits, If the offset in the segment is within the bounds specified by the segment descriptor, that offset is added to the base physical address to generate a physical address.
The 80386, which introduced the 32-bit IA-32
IA-32 (short for "Intel Architecture, 32-bit", commonly called ''i386'') is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the i386, 80386 microprocessor in 1985. IA-32 is the first incarn ...
version of x86, and subsequent x86 CPUs, support segmentation and paging. If paging is enabled, the base address in a segment descriptor is an address in a linear paged address space divided into pages, so when that is added to the offset in the segment, the resulting address is a linear address in that address space; in IA-32, that address is then masked to be no larger than 32 bits. The result may be looked up via a tree-structured page table, with the bits of the address being split as follows: 10 bits for the branch of the tree, 10 bits for the leaves of the branch, and the 12 lowest bits being directly copied to the result.
Segment registers, used in pre-80386 CPUs to extend the address space, are not used in modern OSes, with one major exception: access to thread-specific data for applications or CPU-specific data for OS kernels, which is done with explicit use of the FS and GS segment registers. All memory access involves a segment register, chosen according to the code being executed. Except when using FS or GS, the OS ensures that the offset will be zero. Some operating systems, such as OpenBSD
OpenBSD is a security-focused operating system, security-focused, free software, Unix-like operating system based on the Berkeley Software Distribution (BSD). Theo de Raadt created OpenBSD in 1995 by fork (software development), forking NetBSD ...
with its W^X feature, and Linux with the Exec Shield or PaX patches, may also limit the length of the code segment, as specified by the CS register, to disallow execution of code in modifiable regions of the address space.
Minor revisions of the MMU introduced with the Pentium
Pentium is a series of x86 architecture-compatible microprocessors produced by Intel from 1993 to 2023. The Pentium (original), original Pentium was Intel's fifth generation processor, succeeding the i486; Pentium was Intel's flagship proce ...
have allowed very large pages by skipping the bottom level of the tree (this leaves 10 bits for indexing the first level of page hierarchy with the remaining 10+12 bits being directly copied to the result). Minor revisions of the MMU introduced with the Pentium Pro
The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It implements the P6 (microarchitecture), P6 microarchitecture (sometimes termed i686), and was the first x86 Intel C ...
introduced the physical address extension
In computing, Physical Address Extension (PAE), sometimes referred to as Page Address Extension,
is a memory management feature for the x86 architecture. PAE was first introduced by Intel in the Pentium Pro, and later by AMD in the Athlon process ...
(PAE) feature, enabling 36-bit physical addresses with 2+9+9 bits for three-level page tables and 12 lowest bits being directly copied to the result. Large pages () are also available by skipping the bottom level of the tree (resulting in 2+9 bits for two-level table hierarchy and the remaining 9+12 lowest bits copied directly). In addition, the page attribute table
The page attribute table (PAT) is a processor supplementary capability extension to the page table format of certain x86 and x86-64 microprocessors. Like memory type range registers (MTRRs), they allow for fine-grained control over how areas ...
allowed specification of cacheability by looking up a few high bits in a small on-CPU table.
No-execute support was originally only provided on a per-segment basis, making it very awkward to use.
More recent x86 chips provide a per-page NX bit
The NX bit (no-execute bit) is a processor feature that separates areas of a virtual address space (the memory layout a program uses) into sections for storing data or program instructions. An operating system supporting the NX bit can mark certai ...
(no-execute bit) in the PAE mode.
The W^X, Exec Shield, and PaX mechanisms described above emulate per-page non-execute support on machines with x86 processors that lack the NX bit by setting the length of the code segment, with a performance loss and a reduction in the available address space.
x86-64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit extension of the x86 instruction set architecture, instruction set. It was announced in 1999 and first available in the AMD Opteron family in 2003. It introduces two new ope ...
, the 64-bit version of the x86 architecture, almost entirely removes segmentation in favor of the flat memory model used by almost all operating systems for the 386 or newer processors. In long mode, all segment offsets are ignored, except for the FS and GS segments; linear addresses are 64-bit rather than 32-bit, with the lowest 48 bits of the address being significant. When used with pages, the page table tree has four levels instead of three, to handle the larger linear addresses; in some newer x86-64 processors, a fifth page table level can be enabled, to support 57-bit linear addresses. In all levels of the page table, the page table entry includes an NX bit.
48-bit linear addresses are divided as follows: 16 bits unused, nine bits each for four tree levels (for a total of 36 bits), and the 12 lowest bits directly copied to the result. With pages, there are only three levels of page table, for a total of 27 bits used in paging and 21 bits of offset. Some newer CPUs also support a page with two levels of paging and of offset. CPUID can be used to determine if pages are supported.
In all three cases, the bits are required to be equal to the 48th bit, or in other words, the low are sign extended to the higher bits. This is done to allow further expansion of the addressable range, without compromising backwards compatibility.
57-bit linear addresses are divided as follows: 7 bits unused, nine bits each for five tree levels (for a total of 45 bits), and the 12 lowest bits directly copied to the result. The low are sign extended.
Alternatives
Burroughs B5000 series, B6x00/B7x00/B5900/A-series/Unisys MCP systems
The Burroughs B5000 from 1961 was the first commercial system to support virtual memory
In computing, virtual memory, or virtual storage, is a memory management technique that provides an "idealized abstraction of the storage resources that are actually available on a given machine" which "creates the illusion to users of a ver ...
(after the Atlas
An atlas is a collection of maps; it is typically a bundle of world map, maps of Earth or of a continent or region of Earth. Advances in astronomy have also resulted in atlases of the celestial sphere or of other planets.
Atlases have traditio ...
), has no need for an external MMU. The B5000 and its successors up to current Unisys ClearPath MCP (Libra) systems provide the two functions of an MMU — virtual memory addresses and memory protection — with a different architectural approach. Rather than add virtual memory onto a processor not designed for virtual memory, it is integrated in the core design of the processor/system so there is no requirement for an external unit to add functionality of address translation or bounds checking, resulting in unprecedented memory safety and security.
First, in the mapping of virtual memory addresses, instead of needing an MMU, the machines are descriptor-based. The uppermost bit of a 48-bit memory word in the Burroughs B5000 series systems indicate whether the word is an user data or a descriptor/control word; descriptors were read-only to user processes and could only be updated by the system (hardware or operating system). Memory words in the B6x00/B7x00/B5900/A-series/Unisys MCP systems have 48 data bits and 3 tag bits (and later systems to current have 4 tag bits) words whose tag is an odd number are read-only to user processes – descriptors have a tag of 5 and code words have a tag of 3.
Each allocated memory block is given a master descriptor with the properties of the block — physical address, size, and whether it is present in main memory or not, in which case on first access the block must be allocated, or if there is an address, it is an address on secondary storage and must be loaded to main memory. All references to code and data are made using a descriptor. When a request is made to access the block for reading or writing, the hardware checks its presence via the presence bit (pbit) in the descriptor.
A pbit of 1 indicates the presence of the block. In this case, the block can be accessed via the physical main-memory address in the descriptor. If the pbit is zero, a pbit interrupt is generated for the MCP (operating system) to make the block present. If the address field is zero, this is the first access to this block, and it is allocated (an init (initial) pbit). If the address field is non-zero, it is a disk address of the block, which has previously been rolled out — the block is fetched from disk, the pbit is set to one and the physical memory address updated to point to the block in memory. This makes descriptors equivalent to a page-table entry in an MMU system, but descriptors are free of a table.
System performance can be monitored through the number of pbit interrupts, which is the count of accesses to descriptors with a zero bit. 'Init' pbits indicate initial allocations or new memory blocks, but a high level of 'other' pbits indicate that the system had to load a block from virtual memory secondary storage. In normal operation this will not occur very much. If there are above average number of other pbits occurring the system might be thrashing, indicating system overload. In production systems, the machine load will be more constant and thus the system configured or tuned to avoid thrashing.
All memory allocation is therefore completely automatic (one of the features of modern systems) and there is no way to allocate blocks other than this mechanism. There are no such calls as malloc or dealloc, since memory blocks are also automatically allocated on pbit interrupt or discarded. The scheme is also lazy, since a block will not be allocated until it is actually referenced. When memory is nearly full, the MCP examines the working set, trying compaction (since the system is segmented, not paged), deallocating read-only segments (such as code-segments which can be restored from their original copy) and, as a last resort, rolling dirty (that is updated) data segments out to disk.
Another way these systems provide a function of a MMU is in protection. Since all accesses are via the descriptor, the hardware can check that all accesses are within bounds and, in the case of a write, that the process has write permission. The MCP system is inherently secure and thus has no need of an MMU to provide this level of memory protection.
Blocks can be shared between processes via copy descriptors in process stacks (a stack is a special system structure representing the execution state of a process).
Thus, some processes may have write permission, whereas others do not.
Code segments are read-only, thus reentrant and shared between processes.
Copy descriptors contain a 20-bit address field giving the index of the master descriptor in the master descriptor array.
This also implements a very efficient and secure Interprocess Communication (IPC) mechanism.
Blocks can easily be relocated, since only the master descriptor needs to be updated when a block's status changes.
Another aspect is performance do MMU-based or non-MMU-based systems provide better performance? MCP systems may be implemented on top of standard hardware that does have an MMU (for example, a standard PC). Even if the system implementation uses the MMU in some way, this will not be at all visible at the MCP level.
Bank switching
A simple technique called bank switching was widely used by early 8-bit microprocessor
A microprocessor is a computer processor (computing), processor for which the data processing logic and control is included on a single integrated circuit (IC), or a small number of ICs. The microprocessor contains the arithmetic, logic, a ...
s like the MOS 6502
The MOS Technology 6502 (typically pronounced "sixty-five-oh-two" or "six-five-oh-two") William Mensch and the moderator both pronounce the 6502 microprocessor as ''"sixty-five-oh-two"''. is an 8-bit microprocessor that was designed by a small ...
. For instance, the Atari MMU would express additional bits on the address bus to select among several ''banks'' of DRAM
Dram, DRAM, or drams may refer to:
Technology and engineering
* Dram (unit), a unit of mass and volume, and an informal name for a small amount of liquor, especially whisky or whiskey
* Dynamic random-access memory, a type of electronic semicondu ...
memory based on which of the chips was currently active, normally the CPU or ANTIC. This was used to expand the available memory on the Atari 130XE to 128 kB. The Commodore 128 used a similar approach.
See also
* Burroughs large systems descriptors
* Memory controller
A memory controller, also known as memory chip controller (MCC) or a memory controller unit (MCU), is a digital circuit that manages the flow of data going to and from a computer's main memory. When a memory controller is integrated into anothe ...
* Memory protection unit
References
{{DEFAULTSORT:Memory Management Unit
Central processing unit
Digital circuits
Memory management
Virtual memory