IA-64 (Intel Itanium architecture) is the

instruction set architecture In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...

(ISA) of the

Itanium Itanium ( ) is a discontinued family of 64-bit Intel microprocessors that implement the Intel Itanium architecture (formerly called IA-64). Launched in June 2001, Intel marketed the processors for enterprise servers and high-performance computin ...

family of 64-bit

Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...

microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...

s. The basic ISA specification originated at

Hewlett-Packard The Hewlett-Packard Company, commonly shortened to Hewlett-Packard ( ) or HP, was an American multinational information technology company headquartered in Palo Alto, California. HP developed and provided a wide variety of hardware components ...

(HP), and was subsequently implemented by Intel in collaboration with HP. The first Itanium processor, codenamed ''Merced'', was released in 2001. The Itanium architecture is based on explicit

instruction-level parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. Discu ...

, in which the

compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...

decides which instructions to execute in parallel. This contrasts with

superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...

architectures, which depend on the processor to manage instruction dependencies at runtime. In all Itanium models, up to and including '' Tukwila'', cores execute up to six instructions per clock cycle. In 2008, Itanium was the fourth-most deployed microprocessor architecture for enterprise-class systems, behind

x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...

Power ISA Power ISA is a reduced instruction set computer (RISC) instruction set architecture (ISA) currently developed by the OpenPOWER Foundation, led by IBM. It was originally developed by IBM and the now-defunct Power.org industry group. Power IS ...

, and

SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system developed ...

History

Development: 1989–2000

In 1989, HP began to become concerned that

reduced instruction set computing In computer engineering, a reduced instruction set computer (RISC) is a computer designed to simplify the individual instructions given to the computer to accomplish tasks. Compared to the instructions given to a complex instruction set comput ...

(RISC) architectures were approaching a processing limit at one

instruction per cycle In computer architecture, instructions per cycle (IPC), commonly called instructions per clock is one aspect of a processor's performance: the average number of instructions executed for each clock cycle. It is the multiplicative inverse of cycl ...

. Both Intel and HP researchers had been exploring computer architecture options for future designs and separately began investigating a new concept known as

very long instruction word Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...

(VLIW) which came out of research by

Yale University Yale University is a private research university in New Haven, Connecticut. Established in 1701 as the Collegiate School, it is the third-oldest institution of higher education in the United States and among the most prestigious in the wo ...

in the early 1980s. VLIW is a computer architecture concept (like RISC and CISC) where a single instruction word contains multiple instructions encoded in one very long instruction word to facilitate the processor executing multiple

instructions Instruction or instructions may refer to: Computing * Instruction, one operation of a processor within a computer architecture instruction set * Computer program, a collection of instructions Music * Instruction (band), a 2002 rock band from Ne ...

in each clock cycle. Typical VLIW implementations rely heavily on sophisticated compilers to determine at compile time which instructions can be executed at the same time and the proper scheduling of these instructions for execution and also to help predict the direction of branch operations. The value of this approach is to do more useful work in fewer clock cycles and to simplify processor instruction scheduling and branch prediction hardware requirements, with a penalty in increased processor complexity, cost, and energy consumption in exchange for faster execution.

Production

During this time, HP had begun to believe that it was no longer cost-effective for individual enterprise systems companies such as itself to develop proprietary microprocessors. Intel had also been researching several architectural options for going beyond the x86 ISA to address high-end enterprise server and high-performance computing (HPC) requirements. Intel and HP partnered in 1994 to develop the IA-64 ISA, using a variation of VLIW design concepts which Intel named

explicitly parallel instruction computing Explicitly parallel instruction computing (EPIC) is a term coined in 1997 by the HP–Intel alliance to describe a computing paradigm that researchers had been investigating since the early 1980s. This paradigm is also called ''Independence'' a ...

(EPIC). Intel's goal was to leverage the expertise HP had developed in their early VLIW work along with their own to develop a volume product line targeted at the aforementioned high-end systems that could be sold to all original equipment manufacturers (OEMs), while HP wished to be able to purchase off-the-shelf processors built using Intel's volume manufacturing and contemporary process technology that were better than their PA-RISC processors. Intel took the lead on the design and commercialization process, while HP contributes to the ISA definition, the Merced/Itanium microarchitecture, and Itanium 2. The original goal year for delivering the first Itanium family product, Merced, was 1998.

Marketing

Intel's product marketing and industry engagement efforts were substantial and achieved design wins with the majority of enterprise server OEMs, including those based on RISC processors at the time.

Compaq Compaq Computer Corporation (sometimes abbreviated to CQ prior to a 2007 rebranding) was an American information technology company founded in 1982 that developed, sold, and supported computers and related products and services. Compaq produced ...

and

Silicon Graphics Silicon Graphics, Inc. (stylized as SiliconGraphics before 1999, later rebranded SGI, historically known as Silicon Graphics Computer Systems or SGCS) was an American high-performance computing manufacturer, producing computer hardware and soft ...

decided to abandon further development of the

Alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...

and MIPS architectures respectively in favor of migrating to IA-64. By 1997, it was apparent that the IA-64 architecture and the compiler were much more difficult to implement than originally thought, and the delivery of Itanium began slipping. Since Itanium was the first ever EPIC processor, the development effort encountered more unanticipated problems than the team was accustomed to. In addition, the EPIC concept depends on compiler capabilities that had never been implemented before, so more research was needed. Several groups developed operating systems for the architecture, including

Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...

Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...

and

Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...

systems such as

Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...

HP-UX HP-UX (from "Hewlett Packard Unix") is Hewlett Packard Enterprise's proprietary implementation of the Unix operating system, based on Unix System V (initially System III) and first released in 1984. Current versions support HPE Integrity Ser ...

FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...

Solaris Solaris may refer to: Arts and entertainment Literature, television and film * ''Solaris'' (novel), a 1961 science fiction novel by Stanisław Lem ** ''Solaris'' (1968 film), directed by Boris Nirenburg ** ''Solaris'' (1972 film), directed by ...

, Tru64 UNIX, and Monterey/64 (the last three were canceled before reaching the market). In 1999, Intel led the formation of an open-source industry consortium to port Linux to IA-64 they named "Trillium" (and later renamed "Trillian" due to a trademark issue), which was led by Intel and included

Caldera Systems Caldera International, Inc., earlier Caldera Systems, was an American software company that existed from 1998 to 2002 and developed and sold Linux- and Unix-based operating system products. Caldera Systems was created in August 1998 as a spinoff ...

CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...

Cygnus Solutions Cygnus Solutions, originally Cygnus Support, was founded in 1989 by John Gilmore, Michael Tiemann and David Henkel-Wallace to provide commercial support for free software. Its tagline was: ''Making free software affordable''. For years, employ ...

, Hewlett-Packard, IBM,

Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become ass ...

SGI SGI may refer to: Companies *Saskatchewan Government Insurance *Scientific Games International, a gambling company *Silicon Graphics, Inc., a former manufacturer of high-performance computing products *Silicon Graphics International, formerly Rac ...

SuSE SUSE ( , ) is a German-based multinational open-source software company that develops and sells Linux products to business customers. Founded in 1992, it was the first company to market Linux for enterprise. It is the developer of SUSE Linux Ent ...

, TurboLinux and

VA Linux Systems Geeknet, Inc. is a Fairfax County, Virginia–based company that is a subsidiary of GameStop. The company was formerly known as VA Research, VA Linux Systems, VA Software, and SourceForge, Inc. History VA Research VA Research was founded in No ...

. As a result, a working IA-64 Linux was delivered ahead of schedule and was the first OS to run on the new Itanium processors. Intel announced the official name of the processor, ''Itanium'', on October 4, 1999. Within hours, the name ''Itanic'' had been coined on a

Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it was ...

newsgroup as a

pun A pun, also known as paronomasia, is a form of word play that exploits multiple meanings of a term, or of similar-sounding words, for an intended humorous or rhetorical effect. These ambiguities can arise from the intentional use of homophoni ...

on the name ''Titanic'', the "unsinkable"

ocean liner An ocean liner is a passenger ship primarily used as a form of transportation across seas or oceans. Ocean liners may also carry cargo or mail, and may sometimes be used for other purposes (such as for pleasure cruises or as hospital ships). Ca ...

that sank on its maiden voyage in 1912. The very next day on 5th October 1999, AMD announced their plans to extend Intel's x86 instruction set to include a fully downward compatible 64-bit mode – additionally revealing AMD's newly coming x86 64-bit architecture, which the company already worked on, to be incorporated into AMD's upcoming eighth-generation microprocessor, code-named ''SledgeHammer''. AMD also signaled a full disclosure of the architecture's specifications and further details to be available in August 2000. As AMD was never invited to be a contributing party for the IA-64 architecture and any kind of licensing seemed unlikely, AMD's

AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...

architecture-extension was positioned from the beginning as an evolutionary way to add

64-bit computing In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit CPUs and ALUs are those that are based on processor registers, address buses, or data buses of that size. A compute ...

capabilities to the existing x86 architecture, while still supporting legacy 32-bit x86

code In communications and information processing, code is a system of rules to convert information—such as a letter, word, sound, image, or gesture—into another form, sometimes shortened or secret, for communication through a communication ...

– as opposed to Intel's approach of creating an entirely new, completely x86-incompatible 64-bit architecture with IA-64.

Itanium (Merced): 2001

By the time Itanium was released in June 2001, its performance was not superior to competing RISC and CISC processors. Recognizing that the lack of software could be a serious problem for the future, Intel made thousands of these early systems available to independent software vendors (ISVs) to stimulate development. HP and Intel brought the next-generation Itanium 2 processor to market a year later.

Itanium 2: 2002–2010

The Itanium 2 processor was released in 2002. It relieved many of the performance problems of the original Itanium processor, which were mostly caused by an inefficient memory subsystem. In 2003,

AMD Advanced Micro Devices, Inc. (AMD) is an American multinational semiconductor company based in Santa Clara, California, that develops computer processors and related technologies for business and consumer markets. While it initially manufactur ...

released the

Opteron Opteron is AMD's x86 former server and workstation processor line, and was the first processor which supported the AMD64 instruction set architecture (known generically as x86-64 or AMD64). It was released on April 22, 2003, with the ''SledgeHa ...

, which implemented its own 64-bit architecture (

). Opteron gained rapid acceptance in the enterprise server space because it provided an easy upgrade from

x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was introd ...

. Intel responded by implementing x86-64 (as

Em64t x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging mod ...

) in its

Xeon Xeon ( ) is a brand of x86 microprocessors designed, manufactured, and marketed by Intel, targeted at the non-consumer workstation, server, and embedded system markets. It was introduced in June 1998. Xeon processors are based on the same arc ...

microprocessors in 2004. In November 2005, the major Itanium server manufacturers joined with Intel and a number of software vendors to form the Itanium Solutions Alliance to promote the architecture and accelerate software porting. In 2006, Intel delivered ''Montecito'' (marketed as the Itanium 2 9000 series), a dual-core processor that roughly doubled performance and decreased energy consumption by about 20 percent.

Itanium 9300 (Tukwila): 2010

The Itanium 9300 series processor, codenamed ''Tukwila'', was released on 8 February 2010 with greater performance and memory capacity. Tukwila had originally been slated for release in 2007. The device uses a 65 nm process, includes two to four cores, up to 24 MB on-die caches, Hyper-Threading technology and integrated memory controllers. It implements double-device data correction (DDDC), which helps to fix memory errors. Tukwila also implements

Intel QuickPath Interconnect The Intel QuickPath Interconnect (QPI) is a point-to-point microprocessor, processor electrical connection, interconnect developed by Intel which replaced the front-side bus (FSB) in Xeon, Itanium, and certain desktop platforms starting in 2008. I ...

(QPI) to replace the Itanium bus-based architecture. It has a peak interprocessor bandwidth of 96 GB/s and a peak memory bandwidth of 34 GB/s. With QuickPath, the processor has integrated memory controllers and interfaces the memory directly, using QPI interfaces to directly connect to other processors and I/O hubs. QuickPath is also used on Intel processors using the '' Nehalem'' microarchitecture, making it probable that Tukwila and Nehalem will be able to use the same chipsets. Tukwila incorporates four memory controllers, each of which supports multiple

DDR3 Double Data Rate 3 Synchronous Dynamic Random-Access Memory (DDR3 SDRAM) is a type of synchronous dynamic random-access memory (SDRAM) with a high bandwidth (" double data rate") interface, and has been in use since 2007. It is the higher-spee ...

DIMM A DIMM () (Dual In-line Memory Module), commonly called a RAM stick, comprises a series of dynamic random-access memory integrated circuits. These memory modules are mounted on a printed circuit board and designed for use in personal compute ...

s via a separate memory controller, much like the Nehalem-based Xeon processor code-named ''

Beckton Beckton is a suburb in east London, England, located east of Charing Cross and part of the London Borough of Newham. Adjacent to the River Thames, the area consisted of unpopulated marshland known as the East Ham Levels in the parishes of Bark ...

''.

Itanium 9500 (Poulson): 2012

The Itanium 9500 series processor, codenamed ''Poulson'', is the follow-on processor to Tukwila features eight cores, has a 12-wide issue architecture, multithreading enhancements, and new instructions to take advantage of parallelism, especially in virtualization. The Poulson L3 cache size is 32 MB. L2 cache size is 6 MB, 512 I KB, 256 D KB per core. Die size is 544 mm², less than its predecessor Tukwila (698.75 mm²). At

ISSCC International Solid-State Circuits Conference is a global forum for presentation of advances in solid-state electrical network, circuits and System-on-a-chip, Systems-on-a-Chip. The conference is held every year in February at the San Francisco ...

2011, Intel presented a paper called, "A 32nm 3.1 Billion Transistor 12-Wide-Issue Itanium Processor for Mission Critical Servers." Given Intel's history of disclosing details about Itanium microprocessors at ISSCC, this paper most likely refers to Poulson. Analyst David Kanter speculates that Poulson will use a new microarchitecture, with a more advanced form of multi-threading that uses as many as two threads, to improve performance for single threaded and multi-threaded workloads. Some new information was released at

Hotchips The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...

conference. New information presents improvements in multithreading, resiliency improvements (Instruction Replay RAS) and few new instructions (thread priority, integer instruction, cache prefetching, data access hints).

Itanium 9700 (Kittson): 2017

The Kittson is the same as the 9500 Poulson, but slightly higher clocked.

End of life: 2021

In January 2019, Intel announced that Kittson would be discontinued, with a last order date of January 2020, and a last ship date of July 2021. There is no planned successor.

Architecture

Intel has extensively documented the Itanium

instruction set In computer science, an instruction set architecture (ISA), also called computer architecture, is an abstract model of a computer. A device that executes instructions described by that ISA, such as a central processing unit (CPU), is called an ' ...

and the technical press has provided overviews. The architecture has been renamed several times during its history. HP originally called it ''PA-WideWord''. Intel later called it ''IA-64'', then ''Itanium Processor Architecture'' (IPA), before settling on ''Intel Itanium Architecture'', but it is still widely referred to as ''IA-64''. It is a 64-bit register-rich explicitly parallel architecture. The base data word is 64 bits, byte-addressable. The

logical address In computing, a logical address is the address at which an item ( memory cell, storage element, network host) appears to reside from the perspective of an executing application program. A logical address may be different from the physical addr ...

space is 2⁶⁴ bytes. The architecture implements predication,

speculation In finance, speculation is the purchase of an asset (a commodity, good (economics), goods, or real estate) with the hope that it will become more valuable shortly. (It can also refer to short sales in which the speculator hopes for a decline i ...

, and

branch prediction In computer architecture, a branch predictor is a digital circuit that tries to guess which way a branch (e.g., an if–then–else structure) will go before this is known definitively. The purpose of the branch predictor is to improve the flow ...

. It uses variable-sized register windowing for parameter passing. The same mechanism is also used to permit parallel execution of loops. Speculation, prediction, predication, and renaming are under control of the compiler: each instruction word includes extra bits for this. This approach is the distinguishing characteristic of the architecture. The architecture implements a large number of registers: * 128 general integer registers, which are 64-bit plus one trap bit ("NaT", which stands for "not a thing") used for

speculative execution Speculative execution is an optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing t ...

. 32 of these are static, the other 96 are stacked using variably-sized

register window In computer engineering, register windows are a feature which dedicates registers to a subroutine by dynamically aliasing a subset of internal registers to fixed, programmer-visible registers. Register windows are implemented to improve the perf ...

s, or rotating for pipelined loops. gr₀ always reads 0. * 128

floating-point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can b ...

registers. The floating-point registers are 82 bits long to preserve precision for intermediate results. Instead of a dedicated "NaT" trap bit like the integer registers, floating-point registers have a trap value called "NaTVal" ("Not a Thing Value"), similar to (but distinct from)

NaN Nan or NAN may refer to: Places China * Nan County, Yiyang, Hunan, China * Nan Commandery, historical commandery in Hubei, China Thailand * Nan Province ** Nan, Thailand, the administrative capital of Nan Province * Nan River People Given name ...

. These also have 32 static registers and 96 windowed or rotating registers. fr₀ always reads +0.0, and fr₁ always reads +1.0. * 64 one-bit predicate registers. These also have 32 static registers and 96 windowed or rotating registers. pr₀ always reads 1 (true). * 8 branch registers, for the addresses of indirect jumps. br₀ is set to the return address when a function is called with br.call. * 128 special purpose (or "application") registers, which are mostly of interest to the kernel and not ordinary applications. For example, one register called bsp points to the second stack, which is where the hardware will automatically spill registers when the register window wraps around. Each 128-bit instruction word is called a ''bundle'', and contains three ''slots'' each holding a 41-bit instruction, plus a 5-bit ''template'' indicating which type of instruction is in each slot. Those types are M-unit (memory instructions), I-unit (integer ALU, non-ALU integer, or long immediate extended instructions), F-unit (floating-point instructions), or B-unit (branch or long branch extended instructions). The template also encodes ''stops'' which indicate that a data dependency exists between data before and after the stop. All instructions between a pair of stops constitute an ''instruction group'', regardless of their bundling, and must be free of many types of data dependencies; this knowledge allows the processor to execute instructions in parallel without having to perform its own complicated data analysis, since that analysis was already done when the instructions were written. Within each slot, all but a few instructions are predicated, specifying a predicate register, the value of which (true or false) will determine whether the instruction is executed. Predicated instructions which should always execute are predicated on pr₀, which always reads as true. The IA-64 assembly language and instruction format was deliberately designed to be written mainly by compilers, not by humans. Instructions must be grouped into bundles of three, ensuring that the three instructions match an allowed template. Instructions must issue stops between certain types of data dependencies, and stops can also only be used in limited places according to the allowed templates.

Instruction execution

The fetch mechanism can read up to two bundles per clock from the L1

cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...

into the pipeline. When the compiler can take maximum advantage of this, the processor can execute six instructions per clock cycle. The processor has thirty functional execution units in eleven groups. Each unit can execute a particular subset of the

, and each unit executes at a rate of one instruction per cycle unless execution stalls waiting for data. While not all units in a group execute identical subsets of the instruction set, common instructions can be executed in multiple units. The execution unit groups include: * Six general-purpose ALUs, two integer units, one shift unit * Four data cache units * Six multimedia units, two parallel shift units, one parallel multiply, one population count * Two 82-bit floating-point multiply–accumulate units, two

SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it should ...

floating-point multiply–accumulate units (two 32-bit operations each) * Three branch units Ideally, the compiler can often group instructions into sets of six that can execute at the same time. Since the floating-point units implement a

multiply–accumulate operation In computing, especially digital signal processing, the multiply–accumulate (MAC) or multiply-add (MAD) operation is a common step that computes the product of two numbers and adds that product to an accumulator. The hardware unit that performs ...

, a single floating-point instruction can perform the work of two instructions when the application requires a multiply followed by an add: this is very common in scientific processing. When it occurs, the processor can execute four

FLOP In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...

s per cycle. For example, the 800 MHz Itanium had a theoretical rating of 3.2 G

FLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate meas ...

and the fastest Itanium 2, at 1.67 GHz, was rated at 6.67 GFLOPS. In practice, the processor may often be underutilized, with not all slots filled with useful instructions due to e.g. data dependencies or limitations in the available bundle templates. The densest possible code requires 42.6 bits per instruction, compared to 32 bits per instruction on traditional RISC processors of the time, and no-ops due to wasted slots further decrease the density of code. Additional instructions for speculative loads and hints for branches and cache are difficult to generate optimally, even with modern compilers.

Memory architecture

From 2002 to 2006, Itanium 2 processors shared a common cache hierarchy. They had 16 KB of Level 1 instruction cache and 16 KB of Level 1 data cache. The L2 cache was unified (both instruction and data) and is 256 KB. The Level 3 cache was also unified and varied in size from 1.5 MB to 24 MB. The 256 KB L2 cache contains sufficient logic to handle semaphore operations without disturbing the main

arithmetic logic unit In computing, an arithmetic logic unit (ALU) is a Combinational logic, combinational digital circuit that performs arithmetic and bitwise operations on integer binary numbers. This is in contrast to a floating-point unit (FPU), which operates on ...

(ALU). Main memory is accessed through a

bus A bus (contracted from omnibus, with variants multibus, motorbus, autobus, etc.) is a road vehicle that carries significantly more passengers than an average car or van. It is most commonly used in public transport, but is also in use for cha ...

to an off-chip

chipset In a computer system, a chipset is a set of electronic components An electronic component is any basic discrete device or physical entity in an electronic system used to affect electrons or their associated fields. Electronic components are ...

. The Itanium 2 bus was initially called the McKinley bus, but is now usually referred to as the Itanium bus. The speed of the bus has increased steadily with new processor releases. The bus transfers 2×128 bits per clock cycle, so the 200 MHz McKinley bus transferred 6.4 GB/s, and the 533 MHz Montecito bus transfers 17.056 GB/ s

Architectural changes

Itanium processors released prior to 2006 had hardware support for the

IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation of ...

architecture to permit support for legacy server applications, but performance for IA-32 code was much worse than for native code and also worse than the performance of contemporaneous x86 processors. In 2005, Intel developed the

IA-32 Execution Layer The IA-32 Execution Layer (IA-32 EL) is a software emulator in the form of a software driver that improves performance of 32-bit applications running on 64-bit Intel Itanium-based systems, particularly those running Linux and Windows Server 2003 (i ...

(IA-32 EL), a software emulator that provides better performance. With Montecito, Intel therefore eliminated hardware support for IA-32 code. In 2006, with the release of Montecito, Intel made a number of enhancements to the basic processor architecture including: * Hardware multithreading: Each processor core maintains context for two threads of execution. When one thread stalls during memory access, the other thread can execute. Intel calls this "coarse multithreading" to distinguish it from the "

hyper-threading Hyper-threading (officially called Hyper-Threading Technology or HT Technology and abbreviated as HTT or HT) is Intel's proprietary simultaneous multithreading (SMT) implementation used to improve parallelization of computations (doing multip ...

technology" Intel integrated into some

and

microprocessors. * Hardware support for

virtualization In computing, virtualization or virtualisation (sometimes abbreviated v12n, a numeronym) is the act of creating a virtual (rather than actual) version of something at the same abstraction level, including virtual computer hardware platforms, stor ...

: Intel added Intel Virtualization Technology (Intel VT-i), which provides hardware assists for core virtualization functions. Virtualization allows a software "

hypervisor A hypervisor (also known as a virtual machine monitor, VMM, or virtualizer) is a type of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is calle ...

" to run multiple operating system instances on the processor concurrently. * Cache enhancements: Montecito added a split L2 cache, which included a dedicated 1 MB L2 cache for instructions. The original 256 KB L2 cache was converted to a dedicated data cache. Montecito also included up to 12 MB of on-die L3 cache. See Chipsets...Other markets.

References

External links

Intel Itanium Home Page

*
IA-64 tutorial, including code examples

Itanium Docs at HP
{{Authority control Computer-related introductions in 2001 Instruction set architectures Intel microprocessors Very long instruction word computing 64-bit computers