Hardware Performance Counter
In computers, hardware performance counters (HPCs), or hardware counters are a set of special-purpose registers built into modern microprocessors to store the counts of hardware-related activities. Advanced users often rely on those counters to conduct low-level performance analysis or tuning. Implementations The number of available hardware counters in a processor is limited while each CPU model might have a lot of different events that a developer might like to measure. Each counter can be programmed with the index of an event type to be monitored, like a L1 cache miss or a branch misprediction. One of the first processors to implement a hardware counter and an associated instruction to access it (the RDPMC instruction) was the Intel Pentium, but they were not documented until Terje Mathisen wrote an article about reverse engineering them in ''Byte'' July 1994. The following table shows some examples of CPUs and the number of available hardware counters: Versus software ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Computer
A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic sets of operations known as Computer program, ''programs'', which enable computers to perform a wide range of tasks. The term computer system may refer to a nominally complete computer that includes the Computer hardware, hardware, operating system, software, and peripheral equipment needed and used for full operation; or to a group of computers that are linked and function together, such as a computer network or computer cluster. A broad range of Programmable logic controller, industrial and Consumer electronics, consumer products use computers as control systems, including simple special-purpose devices like microwave ovens and remote controls, and factory devices like industrial robots. Computers are at the core of general-purpose devices ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Athlon
AMD Athlon is the brand name applied to a series of x86, x86-compatible microprocessors designed and manufactured by AMD, Advanced Micro Devices. The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and the first desktop processor to reach speeds of one gigahertz (GHz). It made its debut as AMD's high-end processor brand on June 23, 1999. Over the years AMD has used the Athlon name with the 64-bit Athlon 64 architecture, the Athlon II, and Accelerated Processing Unit (APU) chips targeting the Socket AM1 desktop System on a chip, SoC architecture, and Socket AM4 Zen (microarchitecture). The modern Zen-based Athlon with a Radeon, Radeon Graphics processor was introduced in 2019 as AMD's highest-performance entry-level processor. Brand history K7 design and development The first Athlon processor was a result of AMD's development of K7 processors in the 1990s. AMD founder and then-CEO Jerry Sanders (businessman), Jerry Sa ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Perf (Linux)
perf (sometimes called perf_events or perf tools, originally Performance Counters for Linux, PCL) is a performance analyzing tool in Linux, available from Linux kernel version 2.6.31 in 2009. Userspace controlling utility, named perf, is accessed from the command line and provides a number of subcommands; it is capable of statistical profiling of the entire system (both kernel and userland code). It supports hardware performance counters, tracepoints, software performance counters (e.g. hrtimer), and dynamic probes (for example, kprobes or uprobes). In 2012, two IBM engineers recognized perf (along with OProfile) as one of the two most commonly used performance counter profiling tools on Linux. Implementation The interface between the perf utility and the kernel consists of only one syscall and is done via a file descriptor and a mapped memory region.Roberto A. Vitillo ( LBNL)PERFORMANCE TOOLS DEVELOPMENTS 16 June 2011, presentation from "Future computing in particle ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Out-of-order Execution
In computer engineering, out-of-order execution (or more formally dynamic execution) is an instruction scheduling paradigm used in high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently. History Out-of-order execution is a restricted form of dataflow architecture, which was a major research area in computer architecture in the 1970s and early 1980s. Early use in supercomputers The first machine to use out-of-order execution was the CDC 6600 (1964), designed by James E. Thornton, which uses a scoreboard to avoid conflicts. It permits ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Superscalar
A superscalar processor (or multiple-issue processor) is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute or start executing more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput (the number of instructions that can be executed in a unit of time which can even be less than 1) than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor (or a core if the processor is a multi-core processor), but an execution resource within a single CPU such as an arithmetic logic unit. While a superscalar CPU is typically also pipelined, superscalar and pipelining execution are considered different performance enhancement techni ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Pentium 4
Pentium 4 is a series of single-core central processing unit, CPUs for Desktop computer, desktops, laptops and entry-level Server (computing), servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. All Pentium 4 CPUs are based on the NetBurst microarchitecture, the successor to the P6 (microarchitecture), P6. The Pentium 4 #Willamette, Willamette (180 nm) introduced SSE2, while the #Prescott, Prescott (90 nm) introduced SSE3 and later 64-bit technology. Later versions introduced Hyper-threading, Hyper-Threading Technology (HTT). The first Pentium 4-branded processor to implement x86-64, 64-bit was the Prescott (90 nm) (February 2004), but this feature was not enabled. Intel subsequently began selling 64-bit Pentium 4s using the ''"E0" revision'' of the Prescotts, being sold on the OEM market as the Pentium 4, model F. The E0 revision also adds eXecute Disable (XD) (Intel's name for the NX bit) to Intel 64. Int ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
POWER4
The POWER4 is a microprocessor developed by IBM, International Business Machines (IBM) that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, enabling RS/6000 and IBM AS/400, eServer iSeries models of AS/400 computer servers to run on the same processor, as a step toward converging the two lines. The POWER4 was a Multi-core processor, multicore microprocessor, with two cores on a single die, the first non-embedded microprocessor to do so. POWER4 Chip was first commercially available multiprocessor chip.William Stallings, ''Computer Organization and Architecture'', Seventh Edition, -pp 44 The original POWER4 had a clock speed of 1.1 and 1.3 GHz, while an enhanced version, the POWER4+, reached a clock speed of 1.9 GHz. The PowerPC 970 is a derivative of the POWER4. Functional layout The POWER4 has a unified L2 cache, divided into three equal parts. Each has its own in ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ARM Cortex-A9 MPCore
The ARM Cortex-A9 MPCore is a 32-bit multi-core processor that provides up to 4 cache-coherent cores, each implementing the ARM v7 architecture instruction set. It was introduced in 2007. Features Key features of the Cortex-A9 core are: * Out-of-order speculative issue superscalar execution 8-stage pipeline giving 8.50 DMIPS/MHz/core. * NEON SIMD instruction set extension performing up to 16 operations per instruction (optional). * High performance VFPv3 floating point unit doubling the performance of previous ARM FPUs (optional). * Thumb-2 instruction set encoding reduces the size of programs with little impact on performance. * TrustZone security extensions. * Jazelle DBX support for Java execution. * Jazelle RCT for JIT compilation. * Program Trace Macrocell and CoreSight Design Kit for non-intrusive tracing of instruction execution. * L2 cache controller (0–4 MB). * Multi-core processing. ARM states that the TSMC 40G hard macro implementation typically o ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ARM Cortex-A8
The ARM Cortex-A8 is a 32-bit processor core licensed by ARM Holdings implementing the ARM architecture, ARMv7-A architecture. Compared to the ARM11, the Cortex-A8 is a dual-issue superscalar processor, superscalar design, achieving roughly twice the instructions per cycle. The Cortex-A8 was the first Cortex design to be adopted on a large scale in consumer devices. Features Key features of the Cortex-A8 core are: * Frequency from 600 MHz to 1 GHz and above * Superscalar dual-issue microarchitecture * ARM NEON, NEON SIMD instruction set extension * 13-stage integer instruction pipeline, pipeline and 10-stage NEON pipeline * VFPv3 floating-point unit * Thumb-2 instruction set encoding * Jazelle RCT (also known as ThumbEE instruction set) * Advanced branch predictor, branch prediction unit with >95% accuracy * Integrated level 2 Cache (0–4 MiB) * 2.0 Dhrystone, DMIPS/MHz Chips Several system-on-a-chip, system-on-chips (SoC) have implemented the Cortex-A8 core, incl ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ARM Cortex-A5
The ARM Cortex-A5 is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture announced in 2009. Overview The Cortex-A5 is intended to replace the ARM9 and ARM11 cores for use in low-end devices. The Cortex-A5 offers features of the ARMv7 architecture focusing on internet applications e.g. VFPv4 and NEON advanced SIMD. Key features of the Cortex-A5 core are: * Single-issue, in-order microarchitecture with an 8-stage pipeline * NEON SIMD instruction set extension (optional) * VFPv4 floating-point unit (optional) * Thumb-2 instruction set encoding * Jazelle RCT * 1.57 DMIPS / MHz Chips Several system-on-chips (SoC) have implemented the Cortex-A5 core, including: * Actions Semiconductor ATM7029 (gs702a) is a quad-core Cortex-A5 configuration * AMD APUs include a Cortex-A5 as a security co-processor * Amlogic S805, M805 and A111 * Analog Devices ADSP-SC57x, ADSP-SC58x series ARM Cortex-A5 + SHARC+ multicore DSP * Atmel SAMA5Dxx * Freescale V ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
IA-64
IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the discontinued Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in collaboration with HP. The first Itanium processor, codenamed ''Merced'', was released in 2001. The Itanium architecture is based on explicit instruction-level parallelism, in which the compiler decides which instructions to execute in parallel. This contrasts with superscalar architectures, which depend on the processor to manage instruction dependencies at runtime. In all Itanium models, up to and including '' Tukwila'', cores execute up to six instructions per cycle. In 2008, Itanium was the fourth-most deployed microprocessor architecture for enterprise-class systems, behind x86-64, Power ISA, and SPARC. In 2019, Intel announced the discontinuation of the last of the CPUs supporting the IA-64 architecture. Microsoft Win ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |