Replay System

	Replay System The replay system is a subsystem within the Intel Pentium 4 processor. Its primary function is to catch operations that have been mistakenly sent for execution by the processor's scheduler. Operations caught by the replay system are then re-executed in a loop until the conditions necessary for their proper execution have been fulfilled. Overview The replay system came about as a result of Intel's quest for ever-increasing clock speeds. These higher clock speeds necessitated very lengthy pipelines (up to 31 stages in the Prescott core). Because of this, there are six stages between the scheduler and the execution units in the Prescott core. In an attempt to maintain acceptable performance, Intel engineers had to design the scheduler to be very optimistic. The scheduler in a Pentium 4 processor is so aggressive that it will send operations for execution without a guarantee that they can be successfully executed. (Among other things, the scheduler assumes all data is in l ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California, and Delaware General Corporation Law, incorporated in Delaware. Intel designs, manufactures, and sells computer components such as central processing units (CPUs) and related products for business and consumer markets. It is one of the world's List of largest semiconductor chip manufacturers, largest semiconductor chip manufacturers by revenue, and ranked in the Fortune 500, ''Fortune'' 500 list of the List of largest companies in the United States by revenue, largest United States corporations by revenue for nearly a decade, from 2007 to 2016 Fiscal year, fiscal years, until it was removed from the ranking in 2018. In 2020, it was reinstated and ranked 45th, being the List of Fortune 500 computer software and information companies, 7th-largest technology company in the ranking. It was one of the first companies listed on Nasdaq. Intel supplies List of I ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Pentium 4 Pentium 4 is a series of single-core central processing unit, CPUs for Desktop computer, desktops, laptops and entry-level Server (computing), servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. All Pentium 4 CPUs are based on the NetBurst microarchitecture, the successor to the P6 (microarchitecture), P6. The Pentium 4 #Willamette, Willamette (180 nm) introduced SSE2, while the #Prescott, Prescott (90 nm) introduced SSE3 and later 64-bit technology. Later versions introduced Hyper-threading, Hyper-Threading Technology (HTT). The first Pentium 4-branded processor to implement x86-64, 64-bit was the Prescott (90 nm) (February 2004), but this feature was not enabled. Intel subsequently began selling 64-bit Pentium 4s using the ''"E0" revision'' of the Prescotts, being sold on the OEM market as the Pentium 4, model F. The E0 revision also adds eXecute Disable (XD) (Intel's name for the NX bit) to Intel 64. Int ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Instruction Scheduling In computer science, instruction scheduling is a compiler optimization used to improve instruction-level parallelism, which improves performance on machines with instruction pipelines. Put more simply, it tries to do the following without changing the meaning of the code: * Avoid pipeline stalls by rearranging the order of instructions. * Avoid illegal or semantically ambiguous operations (typically involving subtle instruction pipeline timing issues or non-interlocked resources). The pipeline stalls can be caused by structural hazards (processor resource limit), data hazards (output of one instruction needed by another instruction) and control hazards (branching). Data hazards Instruction scheduling is typically done on a single basic block. In order to determine whether rearranging the block's instructions in a certain way preserves the behavior of that block, we need the concept of a ''data dependency''. There are three types of dependencies, which also happen to be the thr ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Megahertz Myth The megahertz myth, or in more recent cases the gigahertz myth, refers to the misconception of only using clock rate (for example measured in megahertz or gigahertz) to compare the performance of different microprocessors. While clock rates are a valid way of comparing the performance of different speeds of the same model and type of processor, other factors such as an amount of execution units, pipeline depth, cache hierarchy, branch prediction, and instruction sets can greatly affect the performance when considering different processors. For example, one processor may take two clock cycles to add two numbers and another clock cycle to multiply by a third number, whereas another processor may do the same calculation in two clock cycles. Comparisons between different types of processors are difficult because performance varies depending on the type of task. A benchmark is a more thorough way of measuring and comparing computer performance. The myth started around 1984 when compa ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Pipeline (computing) In computing, a pipeline, also known as a data pipeline, is a set of data processing elements connected in series, where the output of one element is the input of the next one. The elements of a pipeline are often executed in parallel or in time-sliced fashion. Some amount of buffer storage is often inserted between elements. Concept and motivation Pipelining is a commonly used concept in everyday life. For example, in the assembly line of a car factory, each specific task—such as installing the engine, installing the hood, and installing the wheels—is often done by a separate work station. The stations carry out their tasks in parallel, each on a different car. Once a car has had one task performed, it moves to the next station. Variations in the time needed to complete the tasks can be accommodated by "buffering" (holding one or more cars in a space between the stations) and/or by "stalling" (temporarily halting the upstream stations), until the next station becomes avai ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Intel Prescott Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. All Pentium 4 CPUs are based on the NetBurst microarchitecture, the successor to the P6. The Pentium 4 Willamette (180 nm) introduced SSE2, while the Prescott (90 nm) introduced SSE3 and later 64-bit technology. Later versions introduced Hyper-Threading Technology (HTT). The first Pentium 4-branded processor to implement 64-bit was the Prescott (90 nm) (February 2004), but this feature was not enabled. Intel subsequently began selling 64-bit Pentium 4s using the ''"E0" revision'' of the Prescotts, being sold on the OEM market as the Pentium 4, model F. The E0 revision also adds eXecute Disable (XD) (Intel's name for the NX bit) to Intel 64. Intel's official launch of Intel 64 (under the name EM64T at that time) in mainstream desktop processors was the N0 stepping Pre ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Execution Unit In computer engineering, an execution unit (E-unit or EU) is a part of a processing unit that performs the operations and calculations forwarded from the instruction unit. It may have its own internal control sequence unit (not to be confused with a CPU's main control unit), some registers, and other internal units such as an arithmetic logic unit, address generation unit, floating-point unit, load–store unit, branch execution unit or other smaller and more specific components, and can be tailored to support a certain datatype, such as integers An integer is the number zero (0), a positive natural number (1, 2, 3, ...), or the negation of a positive natural number (−1, −2, −3, ...). The negations or additive inverses of the positive natural numbers are referred to as negative in ... or floating-points. It is common for modern processing units to have multiple parallel functional units within its execution units, which is referred to as superscalar design. Th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Trace Cache In computer architecture, a trace cache or execution trace cache is a specialized instruction cache which stores the dynamic stream of instructions known as trace. It helps in increasing the instruction fetch bandwidth and decreasing power consumption (in the case of Intel Pentium 4) by storing traces of instructions that have already been fetched and decoded. A trace processor is an architecture designed around the trace cache and processes the instructions at trace level granularity. The formal mathematical theory of traces is described by trace monoids. Background The earliest academic publication of trace cache was "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching". This widely acknowledged paper was presented by Eric Rotenberg, Steve Bennett, and Jim Smith at 1996 International Symposium on Microarchitecture (MICRO) conference. An earlier publication is US patent 5381533, by Alex Peleg and Uri Weiser of Intel, "Dynamic flow instruction cache me ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	CPU Cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit (M ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Instruction Pipeline In computer engineering, instruction pipelining is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing incoming Machine code, instructions into a series of sequential steps (the eponymous "Pipeline (computing), pipeline") performed by different Central processing unit#Structure and implementation, processor units with different parts of instructions processed in parallel. Concept and motivation In a pipelined computer, instructions flow through the central processing unit (CPU) in stages. For example, it might have one stage for each step of the von Neumann architecture, von Neumann cycle: Fetch the instruction, fetch the operands, do the instruction, write the results. A pipelined computer usually has "pipeline registers" after each stage. These store information from the instruction and calculations so that the logic gates of the next stage can do th ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Speculative Execution Speculative execution is an optimization (computer science), optimization technique where a computer system performs some task that may not be needed. Work is done before it is known whether it is actually needed, so as to prevent a delay that would have to be incurred by doing the work after it is known that it is needed. If it turns out the work was not needed after all, most changes made by the work are reverted and the results are ignored. The objective is to provide more Concurrency (computer science), concurrency if extra Resource (computer science), resources are available. This approach is employed in a variety of areas, including branch predictor, branch prediction in instruction pipeline, pipelined CPU, processors, value prediction for exploiting value locality, prefetching Instruction prefetch, memory and File system, files, and optimistic concurrency control in Relational database management system, database systems. Speculative multithreading is a special case of specu ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]