Hardware Scout

	Hardware Scout Hardware scout is a technique that uses otherwise idle processor execution resources to perform prefetching during cache misses. When a thread is stalled by a cache miss, the processor pipeline checkpoints the register file, switches to runahead mode, and continues to issue instructions from the thread that is waiting for memory. The thread of execution in run-ahead mode is known as a ''scout thread''. When the data returns from memory, the processor restores the register file contents from the checkpoint, and switches back to normal execution mode. The computation during run-ahead mode is discarded by the processor; nevertheless, scouting provides speedup because memory level parallelism (MLP) is increased. The cache lines brought into the cache hierarchy are often used by the processor again when it switches back to normal mode. Rock processor scout Sun's Rock processor (later canceled) used a form of hardware scout. However, any computations in run-ahead mode that do not depe ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Central Processing Unit A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and input/output (I/O) operations specified by the instructions in the program. This contrasts with external components such as main memory and I/O circuitry, and specialized processors such as graphics processing units (GPUs). The form, design, and implementation of CPUs have changed over time, but their fundamental operation remains almost unchanged. Principal components of a CPU include the arithmetic–logic unit (ALU) that performs arithmetic and logic operations, processor registers that supply operands to the ALU and store the results of ALU operations, and a control unit that orchestrates the fetching (from memory), decoding and execution (of instructions) by directing the coordinated operations of the ALU, registers and oth ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Instruction Prefetch Instruction or instructions may refer to: Computing * Instruction, one operation of a processor within a computer architecture instruction set * Computer program, a collection of instructions Music * Instruction (band), a 2002 rock band from New York City, US * "Instruction" (song), a 2017 song by English DJ Jax Jones * ''Instructions'' (album), a 2001 album by Jermaine Dupri Other uses * Instruction, teaching or education performed by a teacher * Sebayt, a work of the ancient Egyptian didactic literature aiming to teach ethical behaviour * Instruction, the pre-trial phase of an investigation led by a judge in an inquisitorial system of justice * Instruction manual, an instructional book or booklet ** Instruction manual (gaming), a booklet that instructs the player on how to play the game See also * Instructor (other) Instructor may refer to: Education * Instructor, a teacher of a specialised subject that involves skill: Teaching assistant Tutor Lecture ... [...More Info...] [...Related Items...] OR:** [Wikipedia] [Google] [Baidu]
picture info	CPU Cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Register File A register file is an array of processor registers in a central processing unit (CPU). Register banking is the method of using a single name to access multiple different physical registers depending on the operating mode. Modern integrated circuit-based register files are usually implemented by way of fast static RAMs with multiple ports. Such RAMs are distinguished by having dedicated read and write ports, whereas ordinary multiported SRAMs will usually read and write through the same ports. The instruction set architecture of a CPU will almost always define a set of registers which are used to stage data between memory and the functional units on the chip. In simpler CPUs, these ''architectural registers'' correspond one-for-one to the entries in a physical register file (PRF) within the CPU. More complicated CPUs use register renaming, so that the mapping of which physical entry stores a particular architectural register changes dynamically during execution. The register fi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Runahead {{No footnotes, date=November 2010 Runahead is a technique that allows a microprocessor to pre-process instructions during cache miss cycles instead of stalling. The pre-processed instructions are used to generate instruction and data stream prefetches by detecting cache misses before they would otherwise occur by using the idle execution resources to calculate instruction and data stream fetch addresses using the available information that is independent of the cache miss. The principal hardware cost is a means of checkpointing the register file state and preventing pre-processed stores from modifying memory. This checkpointing can be accomplished using very little hardware since all results computed during runahead are discarded after the cache miss has been serviced, at which time normal execution resumes using the checkpointed register file state. Branch outcomes computed during runahead mode can be saved into a shift register, which can be used as a highly accurate branch ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Memory Level Parallelism Memory-level parallelism (MLP) is a term in computer architecture referring to the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time. In a single processor, MLP may be considered a form of instruction-level parallelism (ILP). However, ILP is often conflated with superscalar, the ability to execute more than one instruction at the same time, e.g. a processor such as the Intel Pentium Pro is five-way superscalar, with the ability to start executing five different microinstructions in a given cycle, but it can handle four different cache misses for up to 20 different load microinstructions at any time. It is possible to have a machine that is not superscalar but which nevertheless has high MLP. Arguably a machine that has no ILP, which is not superscalar, which executes one instruction at a time in a non-pipelined manner, but which performs hardware prefetching (not software instruction-level ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Rock Processor Rock (or ROCK) was a multithreading, multicore, SPARC microprocessor under development at Sun Microsystems. Canceled in 2010, it was a separate project from the SPARC T-Series (CoolThreads/Niagara) family of processors. Rock aimed at higher per-thread performance, higher floating-point performance, and greater SMP scalability than the Niagara family. The Rock processor targeted traditional high-end data-facing workloads, such as back-end database servers, as well as floating-point intensive high-performance computing workloads, whereas the Niagara family targets network-facing workloads such as web servers. Processor core The Rock processor implements the 64-bit SPARC V9 instruction set and the VIS 3.0 SIMD multimedia instruction set extension. Each Rock processor has 16 cores, with each core capable of running two threads simultaneously, yielding 32 threads per chip. Servers built with Rock use FB-DIMMs to increase reliability, speed and density of memory systems. The R ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Instruction-level Parallelism Instruction-level parallelism (ILP) is the parallel or simultaneous execution of a sequence of instructions in a computer program. More specifically ILP refers to the average number of instructions run per step of this parallel execution. Discussion ILP must not be confused with concurrency. In ILP there is a single specific thread of execution of a process. On the other hand, concurrency involves the assignment of multiple threads to a CPU's core in a strict alternation, or in true parallelism if there are enough CPU cores, ideally one core for each runnable thread. There are two approaches to instruction-level parallelism: hardware and software. Hardware level works upon dynamic parallelism, whereas the software level works on static parallelism. Dynamic parallelism means the processor decides at run time which instructions to execute in parallel, whereas static parallelism means the compiler decides which instructions to execute in parallel. The Pentium processor ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Simultaneous Multithreading Simultaneous multithreading (SMT) is a technique for improving the overall efficiency of superscalar CPUs with hardware multithreading. SMT permits multiple independent threads of execution to better use the resources provided by modern processor architectures. Details The term ''multithreading'' is ambiguous, because not only can multiple threads be executed simultaneously on one CPU core, but also multiple tasks (with different page tables, different task state segments, different protection rings, different I/O permissions, etc.). Although running on the same core, they are completely separated from each other. Multithreading is similar in concept to preemptive multitasking but is implemented at the thread level of execution in modern superscalar processors. Simultaneous multithreading (SMT) is one of the two main implementations of multithreading, the other form being temporal multithreading (also known as super-threading). In temporal multithreading, only one threa ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Random Access Memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the same amount of time irrespective of the physical location of data inside the memory, in contrast with other direct-access data storage media (such as hard disks, CD-RWs, DVD-RWs and the older magnetic tapes and drum memory), where the time required to read and write data items varies significantly depending on their physical locations on the recording medium, due to mechanical limitations such as media rotation speeds and arm movement. RAM contains multiplexing and demultiplexing circuitry, to connect the data lines to the addressed storage for reading or writing the entry. Usually more than one bit of storage is accessed by the same address, and RAM devices often have multiple data lines and are said to be "8-bit" or "16-bit", etc. devices ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Runahead {{No footnotes, date=November 2010 Runahead is a technique that allows a microprocessor to pre-process instructions during cache miss cycles instead of stalling. The pre-processed instructions are used to generate instruction and data stream prefetches by detecting cache misses before they would otherwise occur by using the idle execution resources to calculate instruction and data stream fetch addresses using the available information that is independent of the cache miss. The principal hardware cost is a means of checkpointing the register file state and preventing pre-processed stores from modifying memory. This checkpointing can be accomplished using very little hardware since all results computed during runahead are discarded after the cache miss has been serviced, at which time normal execution resumes using the checkpointed register file state. Branch outcomes computed during runahead mode can be saved into a shift register, which can be used as a highly accurate branch ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]