Hardware performance counter
   HOME

TheInfoList



OR:

In
computer A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
s, hardware performance counters (HPC), or hardware counters are a set of special-purpose
register Register or registration may refer to: Arts entertainment, and media Music * Register (music), the relative "height" or range of a note, melody, part, instrument, etc. * ''Register'', a 2017 album by Travis Miller * Registration (organ), the ...
s built into modern
microprocessor A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
s to store the counts of hardware-related activities within computer systems. Advanced users often rely on those counters to conduct low-level performance analysis or
tuning Tuning can refer to: Common uses * Tuning, the process of tuning a tuned amplifier or other electronic component * Musical tuning, musical systems of tuning, and the act of tuning an instrument or voice ** Guitar tunings ** Piano tuning, adjusting ...
.


Implementations

The number of available hardware counters in a processor is limited while each
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
model might have a lot of different events that a developer might like to measure. Each counter can be programmed with the index of an event type to be monitored, like a L1 cache miss or a branch misprediction. One of the first processors to implement such counter and an associated instruction RDPMC to access it was the
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
Pentium Pentium is a brand used for a series of x86 architecture-compatible microprocessors produced by Intel. The original Pentium processor from which the brand took its name was first released on March 22, 1993. After that, the Pentium II and Pe ...
, but they were not documented until Terje Mathisen wrote an article about reverse engineering them in ''
Byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
'' July 1994. The following table shows some examples of CPUs and the number of available hardware counters:


Versus software techniques

Compared to software
profilers Offender profiling, also known as criminal profiling, is an investigative strategy used by law enforcement agencies to identify likely suspects and has been used by investigators to link cases that may have been committed by the same perpetrator ...
, hardware counters provide low-overhead access to a wealth of detailed performance information related to CPU's functional units, caches and main memory etc. Another benefit of using them is that no source code modifications are needed in general. However, the types and meanings of hardware counters vary from one kind of architecture to another due to the variation in hardware organizations. There can be difficulties correlating the low level performance metrics back to source code. The limited number of registers to store the counters often force users to conduct multiple measurements to collect all desired performance metrics.


Instruction based sampling

Modern
superscalar A superscalar processor is a CPU that implements a form of parallelism called instruction-level parallelism within a single processor. In contrast to a scalar processor, which can execute at most one single instruction per clock cycle, a sup ...
processors schedule and execute multiple instructions out-of-order at one time. These "in-flight" instructions can retire at any time, depending on memory access, hits in cache, stalls in the pipeline and many other factors. This can cause performance counter events to be attributed to the wrong instructions, making precise performance analysis difficult or impossible. AMD introduced methods to mitigate some of these drawbacks. For example, the Opteron processors have implemented in 2007 a technique known as Instruction Based Sampling (or IBS). AMD's implementation of IBS provides hardware counters for both fetch sampling (the front of the superscalar pipeline) and op sampling (the back of the pipeline). This results in discrete performance data associating retired instructions with the "parent" AMD64 instruction.


See also

*
perf (Linux) perf (sometimes called perf_events or perf tools, originally Performance Counters for Linux, PCL) is a performance analyzing tool in Linux, available from Linux kernel version 2.6.31 in 2009. Userspace controlling utility, named perf, is access ...
*
Row hammer Row hammer (also written as rowhammer) is a security exploit that takes advantage of an unintended and undesirable side effect in dynamic random-access memory (DRAM) in which memory cells interact electrically between themselves by leaking thei ...


References

{{DEFAULTSORT:Hardware Performance Counter Central processing unit Profilers