Nehalem is the
codename
A code name, call sign or cryptonym is a Code word (figure of speech), code word or name used, sometimes clandestinely, to refer to another name, word, project, or person. Code names are often used for military purposes, or in espionage. They may ...
for
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
's
45 nm microarchitecture released in November 2008. It was used in the first-generation of the
Intel Core
Intel Core is a line of streamlined midrange consumer, workstation and enthusiast computer central processing units (CPUs) marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time ...
i5 and
i7 processors, and succeeds the older
Core microarchitecture
The Intel Core microarchitecture (provisionally referred to as Next Generation Micro-architecture, and developed as Merom) is a multi-core processor microarchitecture launched by Intel in mid-2006. It is a major evolution over the Yonah, the p ...
used on
Core 2 processors. The term "Nehalem" comes from the
Nehalem River
The Nehalem River is a river on the Pacific coast of northwest Oregon in the United States, approximately long. It drains part of the Northern Oregon Coast Range northwest of Portland, originating on the east side of the mountains and flowing i ...
.
Nehalem is built on the
45 nm process, is able to run at higher clock speeds, and is more energy-efficient than
Penryn microprocessors.
Hyper-threading
Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multip ...
is reintroduced, along with a reduction in L2 cache size, as well as an enlarged L3 cache that is shared among all cores. Nehalem is an architecture that differs radically from
Netburst
The NetBurst microarchitecture, called P68 inside Intel, was the successor to the P6 microarchitecture in the x86 family of central processing units (CPUs) made by Intel. The first CPU to use this architecture was the Willamette-core Pentium ...
, while retaining some of the latter's minor features.
Nehalem later received a die-shrink to
32 nm
The 32 nm node is the step following the 45 nm process in CMOS (MOSFET) semiconductor device fabrication. "32-nanometre" refers to the average half-pitch (i.e., half the distance between identical features) of a memory cell (computing), memory cel ...
with
Westmere, and was fully succeeded by "second-generation"
Sandy Bridge
Sandy Bridge is the codename for Intel's 32 nm microarchitecture used in the second generation of the Intel Core processors (Core i7, i5, i3). The Sandy Bridge microarchitecture is the successor to Nehalem and Westmere microarchitecture ...
in January 2011.
Technology
* Cache line block on L2/L3 cache was reduced from 128 bytes in Netburst & Conroe/Penryn to 64 bytes per line in this generation (same size as Yonah and Pentium M).
*
Hyper-threading
Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multip ...
reintroduced.
* Intel
Turbo Boost
Intel Turbo Boost is Intel's trade name for central processing units (CPUs) dynamic frequency scaling feature that automatically raises certain versions of its operating frequency when demanding tasks are running, thus enabling a higher resulting ...
1.0.
* 2–24 MiB
L3 cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
*
Instruction Fetch Unit (IFU) containing second-level
branch predictor
In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...
with two level
Branch Target Buffer
In computer architecture, a branch target predictor is the part of a processor that predicts the target of a taken conditional branch or an unconditional branch instruction before the target of the branch instruction is computed by the execution ...
(BTB) and
Return Stack Buffer
Return may refer to:
In business, economics, and finance
* Return on investment (ROI), the financial gain after an expense.
* Rate of return, the financial term for the profit or loss derived from an investment
* Tax return, a blank document or t ...
(RSB). Nehalem also supports all predictor types previously used in Intel's processors like Indirect Predictor and Loop Detector.
* sTLB (second level unified
translation lookaside buffer
A translation lookaside buffer (TLB) is a memory cache that stores the recent translations of virtual memory to physical memory. It is used to reduce the time taken to access a user memory location. It can be called an address-translation cache. ...
) (i.e. both instructions and data) that contains 512 entries for small pages only, and is again 4 way associative.
* 3 integer ALU, 2 vector ALU and 2 AGU per core.
* Native (all processor cores on a single die) quad- and octa-core processors
*
Intel QuickPath Interconnect
The Intel QuickPath Interconnect (QPI) is a point-to-point microprocessor, processor electrical connection, interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. I ...
in high-end models replacing the legacy
front side bus
A front-side bus (FSB) is a computer communication interface (bus) that was often used in Intel-chip-based computers during the 1990s and 2000s. The EV6 bus served the same function for competing AMD CPUs. Both typically carry data between the ...
* 64 KB L1 cache per core (32 KB L1 data and 32 KB L1 instruction), and 256 KB L2 cache per core.
* Integration of
PCI Express
PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common ...
and
DMI into the processor in mid-range models, replacing the
northbridge
* Integrated
memory controller
The memory controller is a digital circuit that manages the flow of data going to and from the computer's main memory. A memory controller can be a separate chip or integrated into another chip, such as being placed on the same die or as an int ...
supporting two or three memory channels of
DDR3 SDRAM
Double Data Rate 3 Synchronous Dynamic Random-Access Memory (DDR3 SDRAM) is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth (" double data rate") interface, and has been in use since 2007. It is the higher-speed ...
or four
FB-DIMM2 channels
* Second-generation Intel Virtualization Technology, which introduced
Extended Page Table Second Level Address Translation (SLAT), also known as nested paging, is a hardware-assisted virtualization technology which makes it possible to avoid the overhead associated with software-managed shadow page tables.
AMD has supported SLAT throu ...
support, virtual processor identifiers (VPIDs), and
non-maskable interrupt
In computing, a non-maskable interrupt (NMI) is a hardware interrupt that standard interrupt-masking techniques in the system cannot ignore. It typically occurs to signal attention for non-recoverable hardware errors. Some NMIs may be masked, but ...
-window exiting
*
SSE4.2
SSE4 (Streaming SIMD Extensions 4) is a SIMD CPU instruction set used in the Intel Core (microarchitecture), Core microarchitecture and AMD K10, AMD K10 (K8L). It was announced on September 27, 2006, at the Fall 2006 Intel Developer Forum, with vag ...
and POPCNT instructions
*
Macro-op fusion now works in 64-bit mode.
* 20 to 24 pipeline stages
:
Performance and power improvements
It has been reported that Nehalem has a focus on performance, thus the increased core size.
Compared to Penryn, Nehalem has:
* 10–25% better single-threaded performance / 20–100% better
multithreaded performance at the same power level
* 30% lower
power consumption
Electric energy consumption is the form of energy consumption that uses electrical energy. Electric energy consumption is the actual energy demand made on existing electricity supply for transportation, residential, industrial, commercial, and o ...
for the same
performance
A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function.
Management science
In the work place ...
* On average, Nehalem provides a 15–20% clock-for-clock increase in performance per core.
Overclocking is possible with Bloomfield processors and the
X58 chipset.
Lynnfield processors use a
PCH removing the need for a northbridge.
Nehalem processors incorporate
SSE 4.2 SIMD
Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...
instructions, adding seven new instructions to the SSE 4.1 set in the Core 2 series. The Nehalem architecture reduces atomic operation latency by 50% in an attempt to eliminate overhead on atomic operations such as the
LOCK CMPXCHG
compare-and-swap
In computer science, compare-and-swap (CAS) is an atomic instruction used in multithreading to achieve synchronization. It compares the contents of a memory location with a given value and, only if they are the same, modifies the contents of tha ...
instruction.
Variants
* Lynnfield processors feature 16
PCIe
PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common mo ...
lanes, which can be used in 1x16 or 2x8 configuration.
*
1 6500 series scalable up to 2 sockets, 7500 series scalable up to 4/8 sockets.
Server and desktop processors
* Intel states the Gainestown processors have six memory channels. Gainestown processors have dual QPI links and have a separate set of memory registers for each link in effect, a multiplexed six-channel system.
[
]
Mobile processors
See also
*
List of Intel CPU microarchitectures
The following is a ''partial'' list of Intel CPU microarchitectures. The list is ''incomplete''. Additional details can be found in Intel's Tick–tock model and Process–architecture–optimization model.
x86 microarchitectures
16-bit
; ...
*
Tick–tock model
Tick–tock was a production model adopted in 2007 by chip manufacturer Intel. Under this model, every microarchitecture change (tock) was followed by a die shrink of the process technology (tick). It was replaced by the process–architecture– ...
References
Further reading
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
External links
Nehalem processorat
Intel
Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
.com
{{Intel processor roadmap
Intel x86 microprocessors
Intel microarchitectures
X86 microarchitectures