Hardware Performance Counters
In computers, hardware performance counters (HPC), or hardware counters are a set of special-purpose registers built into modern microprocessors to store the counts of hardware-related activities within computer systems. Advanced users often rely on those counters to conduct low-level performance analysis or tuning. Implementations The number of available hardware counters in a processor is limited while each CPU model might have a lot of different events that a developer might like to measure. Each counter can be programmed with the index of an event type to be monitored, like a L1 cache miss or a branch misprediction. One of the first processors to implement such counter and an associated instruction RDPMC to access it was the Intel Pentium, but they were not documented until Terje Mathisen wrote an article about reverse engineering them in ''Byte'' July 1994. The following table shows some examples of CPUs and the number of available hardware counters: Versus software t ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Computer
A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as Computer program, programs. These programs enable computers to perform a wide range of tasks. A computer system is a nominally complete computer that includes the Computer hardware, hardware, operating system (main software), and peripheral equipment needed and used for full operation. This term may also refer to a group of computers that are linked and function together, such as a computer network or computer cluster. A broad range of Programmable logic controller, industrial and Consumer electronics, consumer products use computers as control systems. Simple special-purpose devices like microwave ovens and remote controls are included, as are factory devices like industrial robots and computer-aided design, as well as general-purpose devi ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Athlon
Athlon is the brand name applied to a series of x86-compatible microprocessors designed and manufactured by Advanced Micro Devices (AMD). The original Athlon (now called Athlon Classic) was the first seventh-generation x86 processor and the first desktop processor to reach speeds of one gigahertz (GHz). It made its debut as AMD's high-end processor brand on June 23, 1999. Over the years AMD has used the Athlon name with the 64-bit Athlon 64 architecture, the Athlon II, and Accelerated Processing Unit (APU) chips targeting the Socket AM1 desktop System on a chip, SoC architecture, and Socket AM4 Zen microarchitecture. The modern Zen-based Athlon with a Radeon, Radeon Graphics processor was introduced in 2019 as AMD's highest-performance entry-level processor. Athlon comes from the Ancient Greek (''athlon''), meaning "(sport) contest", or "prize of a contest", or "place of a contest; arena". With the Athlon name originally used for AMD's high-end processors, AMD ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Perf (Linux)
perf (sometimes called perf_events or perf tools, originally Performance Counters for Linux, PCL) is a performance analyzing tool in Linux, available from Linux kernel version 2.6.31 in 2009. Userspace controlling utility, named perf, is accessed from the command line and provides a number of subcommands; it is capable of statistical profiling of the entire system (both kernel and userland code). It supports hardware performance counters, tracepoints, software performance counters (e.g. hrtimer), and dynamic probes (for example, kprobes or uprobes). In 2012, two IBM engineers recognized perf (along with OProfile) as one of the two most commonly used performance counter profiling tools on Linux. Implementation The interface between the perf utility and the kernel consists of only one syscall and is done via a file descriptor and a mapped memory region.Roberto A. Vitillo (LBNL)PERFORMANCE TOOLS DEVELOPMENTS 16 June 2011, presentation from "Future computing in particle phys ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Out-of-order Execution
In computer engineering, out-of-order execution (or more formally dynamic execution) is a paradigm used in most high-performance central processing units to make use of instruction cycles that would otherwise be wasted. In this paradigm, a processor executes instructions in an order governed by the availability of input data and execution units, rather than by their original order in a program. In doing so, the processor can avoid being idle while waiting for the preceding instruction to complete and can, in the meantime, process the next instructions that are able to run immediately and independently. History Out-of-order execution is a restricted form of data flow computation, which was a major research area in computer architecture in the 1970s and early 1980s. The first machine to use out-of-order execution was the CDC 6600 (1964), designed by James E. Thornton, which uses a scoreboard to avoid conflicts. It permits an instruction to execute if its source operand (read) a ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Superscalar
A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a superscalar processor can execute more than one instruction during a clock cycle by simultaneously dispatching multiple instructions to different execution units on the processor. It therefore allows more throughput (the number of instructions that can be executed in a unit of time) than would otherwise be possible at a given clock rate. Each execution unit is not a separate processor (or a core if the processor is a multi-core processor), but an execution resource within a single CPU such as an arithmetic logic unit. In Flynn's taxonomy, a single-core superscalar processor is classified as an SISD processor (single instruction stream, single data stream), though a single-core superscalar processor that supports short vector operations could ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Pentium 4
Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 2000 until May 21, 2010. All Pentium 4 CPUs are based on the NetBurst microarchitecture. The Pentium 4 '' Willamette'' (180 nm) introduced SSE2, while the '' Prescott'' (90 nm) introduced SSE3. Later versions introduced Hyper-Threading Technology (HTT). The first Pentium 4-branded processor to implement 64-bit was the ''Prescott'' (90 nm) (February 2004), but this feature was not enabled. Intel subsequently began selling 64-bit Pentium 4s using the ''"E0" revision'' of the Prescotts, being sold on the OEM market as the Pentium 4, model F. The E0 revision also adds eXecute Disable (XD) (Intel's name for the NX bit) to Intel 64. Intel's official launch of Intel 64 (under the name EM64T at that time) in mainstream deskt ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
POWER4
The POWER4 is a microprocessor developed by International Business Machines (IBM) that implemented the 64-bit PowerPC and PowerPC AS instruction set architectures. Released in 2001, the POWER4 succeeded the POWER3 and RS64 microprocessors, enabling RS/6000 and eServer iSeries models of AS/400 computer servers to run on the same processor, as a step toward converging the two lines. The POWER4 was a multicore microprocessor, with two cores on a single die, the first non-embedded microprocessor to do so. POWER4 Chip was first commercially available multiprocessor chip.William Stallings, ''Computer Organization and Architecture'', Seventh Edition, -pp 44 The original POWER4 had a clock speed of 1.1 and 1.3 GHz, while an enhanced version, the POWER4+, reached a clock speed of 1.9 GHz. The PowerPC 970 is a derivative of the POWER4. Functional layout The POWER4 has a unified L2 cache, divided into three equal parts. Each has its own independent L2 controller which can f ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ARM Cortex-A9 MPCore
The ARM Cortex-A9 MPCore is a 32-bit multi-core processor that provides up to 4 cache-coherent cores, each implementing the ARM v7 architecture instruction set. It was introduced in 2007. Features Key features of the Cortex-A9 core are: * Out-of-order speculative issue superscalar execution 8-stage pipeline giving 2.50 DMIPS/MHz/core. * NEON SIMD instruction set extension performing up to 16 operations per instruction (optional). * High performance VFPv3 floating point unit doubling the performance of previous ARM FPUs (optional). * Thumb-2 instruction set encoding reduces the size of programs with little impact on performance. * TrustZone security extensions. * Jazelle DBX support for Java execution. * Jazelle RCT for JIT compilation. * Program Trace Macrocell and CoreSight Design Kit for non-intrusive tracing of instruction execution. * L2 cache controller (0–4 MB). * Multi-core processing. ARM states that the TSMC 40G hard macro implementation typically oper ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ARM Cortex-A8
The ARM Cortex-A8 is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture. Compared to the ARM11, the Cortex-A8 is a dual-issue superscalar design, achieving roughly twice the instructions per cycle. The Cortex-A8 was the first Cortex design to be adopted on a large scale in consumer devices. Features Key features of the Cortex-A8 core are: * Frequency from 600 MHz to 1 GHz and above * Superscalar dual-issue microarchitecture * NEON SIMD instruction set extension * 13-stage integer pipeline and 10-stage NEON pipeline * VFPv3 Floating Point Unit * Thumb-2 instruction set encoding * Jazelle RCT (Also known as ThumbEE instruction set) * Advanced branch prediction unit with >95% accuracy * Integrated level 2 Cache (0–4 MiB) * 2.0 DMIPS/MHz Chips Several system-on-chips (SoC) have implemented the Cortex-A8 core, including: * Allwinner A1X * Apple A4 * Freescale Semiconductor i.MX51 * Rockchip RK2918, RK2906 * Samsung Exyn ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ARM Cortex-A5
The ARM Cortex-A5 is a 32-bit processor core licensed by ARM Holdings implementing the ARMv7-A architecture announced in 2009. Overview The Cortex-A5 is intended to replace the ARM9 and ARM11 cores for use in low-end devices. The Cortex-A5 offers features of the ARMv7 architecture focusing on internet applications e.g. VFPv4 and NEON advanced SIMD. Key features of the Cortex-A5 core are: * Single-issue, in-order microarchitecture with an 8-stage pipeline * NEON SIMD instruction set extension (optional) * VFPv4 floating-point unit (optional) * Thumb-2 instruction set encoding * Jazelle RCT * 1.57 DMIPS / MHz Chips Several system-on-chips (SoC) have implemented the Cortex-A5 core, including: * Actions Semiconductor ATM7029 (gs702a) is a quad-core Cortex-A5 configuration * AMD Fusion APUs include a Cortex-A5 as a security co-processor * Amlogic S805, M805 and A111 * Analog Devices ADSP-SC57x, ADSP-SC58x series ARM Cortex-A5 + SHARC+ multicore DSP * Atmel SAMA5Dxx * Freescale V ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
IA-64
IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in collaboration with HP. The first Itanium processor, codenamed ''Merced'', was released in 2001. The Itanium architecture is based on explicit instruction-level parallelism, in which the compiler decides which instructions to execute in parallel. This contrasts with superscalar architectures, which depend on the processor to manage instruction dependencies at runtime. In all Itanium models, up to and including '' Tukwila'', cores execute up to six instructions per clock cycle. In 2008, Itanium was the fourth-most deployed microprocessor architecture for enterprise-class systems, behind x86-64, Power ISA, and SPARC. History Development: 1989–2000 In 1989, HP began to become concerned that reduced instruction set computing (RISC) archite ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |