Profiling (computer programming)
   HOME

TheInfoList



OR:

In
software engineering Software engineering is a systematic engineering approach to software development. A software engineer is a person who applies the principles of software engineering to design, develop, maintain, test, and evaluate computer software. The term '' ...
, profiling ("program profiling", "software profiling") is a form of
dynamic program analysis Dynamic program analysis is the analysis of computer software that is performed by executing programs on a real or virtual processor. For dynamic program analysis to be effective, the target program must be executed with sufficient test inputs ...
that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or the frequency and duration of function calls. Most commonly, profiling information serves to aid
program optimization In computer science, program optimization, code optimization, or software optimization, is the process of modifying a software system to make some aspect of it work more efficiently or use fewer resources. In general, a computer program may be o ...
, and more specifically, performance engineering. Profiling is achieved by instrumenting either the program
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
or its binary executable form using a tool called a ''profiler'' (or ''code profiler''). Profilers may use a number of different techniques, such as event-based, statistical, instrumented, and simulation methods.


Gathering program events

Profilers use a wide variety of techniques to collect data, including hardware interrupts, code instrumentation, instruction set simulation, operating system hooks, and performance counters.


Use of profilers

The output of a profiler may be: * A statistical ''summary'' of the events observed (a profile) :Summary profile information is often shown annotated against the source code statements where the events occur, so the size of measurement data is linear to the code size of the program. /* ------------ source------------------------- count */ 0001 IF X = "A" 0055 0002 THEN DO 0003 ADD 1 to XCOUNT 0032 0004 ELSE 0005 IF X = "B" 0055 * A stream of recorded events (a trace) :For sequential programs, a summary profile is usually sufficient, but performance problems in parallel programs (waiting for messages or synchronization issues) often depend on the time relationship of events, thus requiring a full trace to get an understanding of what is happening. : The size of a (full) trace is linear to the program's
instruction path length In computer performance, the instruction path length is the number of machine code instructions required to execute a section of a computer program. The total path length for the entire program could be deemed a measure of the algorithm's performa ...
, making it somewhat impractical. A trace may therefore be initiated at one point in a program and terminated at another point to limit the output. * An ongoing interaction with the
hypervisor A hypervisor (also known as a virtual machine monitor, VMM, or virtualizer) is a type of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is called ...
(continuous or periodic monitoring via on-screen display for instance) : This provides the opportunity to switch a trace on or off at any desired point during execution in addition to viewing on-going metrics about the (still executing) program. It also provides the opportunity to suspend asynchronous processes at critical points to examine interactions with other parallel processes in more detail. A profiler can be applied to an individual method or at the scale of a module or program, to identify performance bottlenecks by making long-running code obvious. A profiler can be used to understand code from a timing point of view, with the objective of optimizing it to handle various runtime conditions or various loads. Profiling results can be ingested by a compiler that provides profile-guided optimization. Profiling results can be used to guide the design and optimization of an individual algorithm; the
Krauss matching wildcards algorithm In computer science, the Krauss wildcard-matching algorithm is a pattern matching algorithm. Based on the wildcard syntax in common use, e.g. in the Microsoft Windows command-line interface, the algorithm provides a non- recursive mechanism for mat ...
is an example. Profilers are built into some
application performance management In the fields of information technology and systems management, application performance management (APM) is the monitoring and management of the performance and availability of software applications. APM strives to detect and diagnose complex appl ...
systems that aggregate profiling data to provide insight into transaction workloads in
distributed Distribution may refer to: Mathematics *Distribution (mathematics), generalized functions used to formulate solutions of partial differential equations *Probability distribution, the probability of a particular value or value range of a varia ...
applications.


History

Performance-analysis tools existed on IBM/360 and
IBM/370 The IBM System/370 (S/370) is a model range of IBM mainframe computers announced on June 30, 1970, as the successors to the System/360 family. The series mostly maintains backward compatibility with the S/360, allowing an easy migration path f ...
platforms from the early 1970s, usually based on timer interrupts which recorded the
program status word The program status word (PSW) is a register that performs the function of a status register and program counter, and sometimes more. The term is also applied to a copy of the PSW in storage. This article only discusses the PSW in the IBM System/3 ...
(PSW) at set timer-intervals to detect "hot spots" in executing code. This was an early example of sampling (see below). In early 1974 instruction-set simulators permitted full trace and other performance-monitoring features. Profiler-driven program analysis on Unix dates back to 1973,Unix Programmer's Manual, 4th Edition
/ref> when Unix systems included a basic tool, prof, which listed each function and how much of program execution time it used. In 1982 gprof extended the concept to a complete
call graph A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' cal ...
analysis. S.L. Graham, P.B. Kessler, and M.K. McKusick
''gprof: a Call Graph Execution Profiler''
Proceedings of the SIGPLAN '82 Symposium on Compiler Construction, ''
SIGPLAN SIGPLAN is the Association for Computing Machinery's Special Interest Group on programming languages. Conferences * Principles of Programming Languages (POPL) * Programming Language Design and Implementation (PLDI) * International Symposium on ...
Notices'', Vol. 17, No 6, pp. 120-126; doi:10.1145/800230.806987
In 1994, Amitabh Srivastava and
Alan Eustace Robert Alan Eustace (born 1956/1957) is an American computer scientist who served as Senior Vice President of Engineering at Google until retiring in 2015. On October 24, 2014, he made a free-fall jump from the stratosphere, breaking Felix Bau ...
of
Digital Equipment Corporation Digital Equipment Corporation (DEC ), using the trademark Digital, was a major American company in the computer industry from the 1960s to the 1990s. The company was co-founded by Ken Olsen and Harlan Anderson in 1957. Olsen was president un ...
published a paper describing ATOM (Analysis Tools with OM). The ATOM platform converts a program into its own profiler: at
compile time In computer science, compile time (or compile-time) describes the time window during which a computer program is compiled. The term is used as an adjective to describe concepts related to the context of program compilation, as opposed to concep ...
, it inserts code into the program to be analyzed. That inserted code outputs analysis data. This technique - modifying a program to analyze itself - is known as "
instrumentation Instrumentation a collective term for measuring instruments that are used for indicating, measuring and recording physical quantities. The term has its origins in the art and science of scientific instrument-making. Instrumentation can refer to ...
". In 2004 both the gprof and ATOM papers appeared on the list of the 50 most influential PLDI papers for the 20-year period ending in 1999.


Profiler types based on output


Flat profiler

Flat profilers compute the average call times, from the calls, and do not break down the call times based on the callee or the context.


Call-graph profiler

Call graph A call graph (also known as a call multigraph) is a control-flow graph, which represents calling relationships between subroutines in a computer program. Each node represents a procedure and each edge ''(f, g)'' indicates that procedure ''f'' cal ...
profilers show the call times, and frequencies of the functions, and also the call-chains involved based on the callee. In some tools full context is not preserved.


Input-sensitive profiler

Input-sensitive profilersE. Coppa, C. Demetrescu, and I. Finocchi
Profiling''
IEEE Trans. Software Eng. 40(12): 1185-1205 (2014); doi:10.1109/TSE.2014.2339825
add a further dimension to flat or call-graph profilers by relating performance measures to features of the input workloads, such as input size or input values. They generate charts that characterize how an application's performance scales as a function of its input.


Data granularity in profiler types

Profilers, which are also programs themselves, analyze target programs by collecting information on their execution. Based on their data granularity, on how profilers collect information, they are classified into event based or statistical profilers. Profilers interrupt program execution to collect information, which may result in a limited resolution in the time measurements, which should be taken with a grain of salt.
Basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually deco ...
profilers report a number of machine
clock cycles In electronics and especially synchronous digital circuits, a clock signal (historically also known as ''logic beat'') oscillates between a high and a low state and is used like a metronome to coordinate actions of digital circuits. A clock signa ...
devoted to executing each line of code, or a timing based on adding these together; the timings reported per basic block may not reflect a difference between
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache County ...
hits and misses.


Event-based profilers

The programming languages listed here have event-based profilers: *
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
: the JVMTI (JVM Tools Interface) API, formerly JVMPI (JVM Profiling Interface), provides hooks to profilers, for trapping events like calls, class-load, unload, thread enter leave. * .NET: Can attach a profiling agent as a ''COM'' server to the ''CLR'' using Profiling ''API''. Like Java, the runtime then provides various callbacks into the agent, for trapping events like method
JIT Jit (also known as jiti, jit-jive and the Harare beat) is a style of popular Zimbabwean dance music. It features a swift rhythm played on drums and accompanied by a guitar. Jit evolved out many diverse influences, including domestic chimurenga, ...
/ enter / leave, object creation, etc. Particularly powerful in that the profiling agent can rewrite the target application's bytecode in arbitrary ways. *
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
: Python profiling includes the profile module, hotshot (which is call-graph based), and using the 'sys.setprofile' function to trap events like c_, python_. *
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
: Ruby also uses a similar interface to Python for profiling. Flat-profiler in profile.rb, module, and ruby-prof a C-extension are present.


Statistical profilers

Some profilers operate by sampling. A sampling profiler probes the target program's
call stack In computer science, a call stack is a stack data structure that stores information about the active subroutines of a computer program. This kind of stack is also known as an execution stack, program stack, control stack, run-time stack, or m ...
at regular intervals using
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted, ...
s. Sampling profiles are typically less numerically accurate and specific, but allow the target program to run at near full speed. The resulting data are not exact, but a statistical approximation. "The actual amount of error is usually more than one sampling period. In fact, if a value is n times the sampling period, the expected error in it is the square-root of n sampling periods." In practice, sampling profilers can often provide a more accurate picture of the target program's execution than other approaches, as they are not as intrusive to the target program, and thus don't have as many side effects (such as on memory caches or instruction decoding pipelines). Also since they don't affect the execution speed as much, they can detect issues that would otherwise be hidden. They are also relatively immune to over-evaluating the cost of small, frequently called routines or 'tight' loops. They can show the relative amount of time spent in user mode versus interruptible kernel mode such as
system call In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, acc ...
processing. Still, kernel code to handle the interrupts entails a minor loss of CPU cycles, diverted cache usage, and is unable to distinguish the various tasks occurring in uninterruptible kernel code (microsecond-range activity). Dedicated hardware can go beyond this: ARM Cortex-M3 and some recent MIPS processors JTAG interface have a PCSAMPLE register, which samples the program counter in a truly undetectable manner, allowing non-intrusive collection of a flat profile. Some commonly used statistical profilers for Java/managed code are SmartBear Software's AQtime and
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washin ...
's CLR Profiler. Those profilers also support native code profiling, along with
Apple Inc. Apple Inc. is an American multinational technology company headquartered in Cupertino, California, United States. Apple is the largest technology company by revenue (totaling in 2021) and, as of June 2022, is the world's biggest company ...
's
Shark Sharks are a group of elasmobranch fish characterized by a cartilaginous skeleton, five to seven gill slits on the sides of the head, and pectoral fins that are not fused to the head. Modern sharks are classified within the clade Selachi ...
(OSX),
OProfile In computing, OProfile is a system-wide statistical profiling tool for Linux. John Levon wrote it in 2001 for Linux kernel version 2.4 after his M.Sc. project; it consists of a kernel module, a user-space daemon and several user-space tools. De ...
(Linux),
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
VTune and Parallel Amplifier (part of Intel Parallel Studio), and Oracle
Performance Analyzer Performance Analyzer is a commercial utility software for software performance analysis for x86 or SPARC machines. It has both a graphical user interface and a command line interface. It is available for both Linux and Solaris operating systems. ...
, among others.


Instrumentation

This technique effectively adds instructions to the target program to collect the required information. Note that instrumenting a program can cause performance changes, and may in some cases lead to inaccurate results and/or
heisenbug In computer programming jargon, a heisenbug is a software bug that seems to disappear or alter its behavior when one attempts to study it. The term is a pun on the name of Werner Heisenberg, the physicist who first asserted the observer effect of ...
s. The effect will depend on what information is being collected, on the level of timing details reported, and on whether basic block profiling is used in conjunction with instrumentation. For example, adding code to count every procedure/routine call will probably have less effect than counting how many times each statement is obeyed. A few computers have special hardware to collect information; in this case the impact on the program is minimal. Instrumentation is key to determining the level of control and amount of time resolution available to the profilers. * Manual: Performed by the programmer, e.g. by adding instructions to explicitly calculate runtimes, simply count events or calls to measurement
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
s such as the
Application Response Measurement Application Response Measurement (ARM) is an open standard published by the Open Group for monitoring and diagnosing performance bottlenecks within complex enterprise applications that use loosely-coupled designs or service-oriented architectures ...
standard. * Automatic source level: instrumentation added to the source code by an automatic tool according to an instrumentation policy. * Intermediate language: instrumentation added to assembly or decompiled
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
s giving support for multiple higher-level source languages and avoiding (non-symbolic) binary offset re-writing issues. * Compiler assisted * Binary translation: The tool adds instrumentation to a compiled executable. * Runtime instrumentation: Directly before execution the code is instrumented. The program run is fully supervised and controlled by the tool. * Runtime injection: More lightweight than runtime instrumentation. Code is modified at runtime to have jumps to helper functions.


Interpreter instrumentation

* Interpreter debug options can enable the collection of performance metrics as the interpreter encounters each target statement. A
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
, control table or
JIT Jit (also known as jiti, jit-jive and the Harare beat) is a style of popular Zimbabwean dance music. It features a swift rhythm played on drums and accompanied by a guitar. Jit evolved out many diverse influences, including domestic chimurenga, ...
interpreters are three examples that usually have complete control over execution of the target code, thus enabling extremely comprehensive data collection opportunities.


Hypervisor/Simulator

* Hypervisor: Data are collected by running the (usually) unmodified program under a
hypervisor A hypervisor (also known as a virtual machine monitor, VMM, or virtualizer) is a type of computer software, firmware or hardware that creates and runs virtual machines. A computer on which a hypervisor runs one or more virtual machines is called ...
. Example:
SIMMON SIMMON (SIMulation MONitor) was a proprietary software testing system developed in the late 1960s in the IBM Product Test Laboratory, then at Poughkeepsie, N.Y. It was designed for the then-new line of System/360 computers as a vehicle for testi ...
* Simulator and Hypervisor: Data collected interactively and selectively by running the unmodified program under an
Instruction Set Simulator An instruction set simulator (ISS) is a simulation model, usually coded in a high-level programming language, which mimics the behavior of a mainframe or microprocessor by "reading" instructions and maintaining internal variables which represent t ...
.


See also

*
Algorithmic efficiency In computer science, algorithmic efficiency is a property of an algorithm which relates to the amount of computational resources used by the algorithm. An algorithm must be analyzed to determine its resource usage, and the efficiency of an algo ...
*
Benchmark Benchmark may refer to: Business and economics * Benchmarking, evaluating performance within organizations * Benchmark price * Benchmark (crude oil), oil-specific practices Science and technology * Benchmark (surveying), a point of known elevati ...
*
Java performance In software development, the programming language Java was historically considered slower than the fastest 3rd generation typed languages such as C and C++. The main reason being a different language design, where after compiling, Java progr ...
*
List of performance analysis tools This is a list of performance analysis tools for use in software development. General purpose, language independent The following tools work based on log files that can be generated from various systems. * time (Unix) - can be used to determi ...
* PAPI is a portable interface (in the form of a library) to hardware performance counters on modern microprocessors. * Performance engineering * Performance prediction *
Performance tuning Performance tuning is the improvement of system performance. Typically in computer systems, the motivation for such activity is called a performance problem, which can be either real or anticipated. Most systems will respond to increased load with ...
* Runtime verification * Profile-guided optimization *
Static code analysis In computer science, static program analysis (or static analysis) is the analysis of computer programs performed without executing them, in contrast with dynamic program analysis, which is performed on programs during their execution. The term ...
* Software archaeology *
Worst-case execution time The worst-case execution time (WCET) of a computational task is the maximum length of time the task could take to execute on a specific hardware platform. What it is used for Worst case execution time is typically used in reliable real-time sys ...
(WCET)


References


External links

* Article
Need for speed — Eliminating performance bottlenecks
on doing execution time analysis of Java applications using IBM Rational Application Developer.
Profiling Runtime Generated and Interpreted Code using the VTune Performance Analyzer
{{DEFAULTSORT:Software Performance Analysis Software optimization *