Speedup
   HOME

TheInfoList



OR:

In
computer architecture In computer engineering, computer architecture is a description of the structure of a computer system made from component parts. It can sometimes be a high-level description that ignores details of the implementation. At a more detailed level, t ...
, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with different resources. The notion of speedup was established by
Amdahl's law In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It states tha ...
, which was particularly focused on parallel processing. However, speedup can be used more generally to show the effect on performance after any resource enhancement.


Definitions

Speedup can be defined for two different types of quantities: '' latency'' and ''
throughput Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ov ...
''. ''Latency'' of an architecture is the reciprocal of the execution speed of a task: : L = \frac = \frac, where * ''v'' is the execution speed of the task; * ''T'' is the execution time of the task; * ''W'' is the execution workload of the task. ''Throughput'' of an architecture is the execution rate of a task: : Q = \rho vA = \frac = \frac, where * ''ρ'' is the execution density (e.g., the number of stages in an
instruction pipeline In computer engineering, instruction pipelining or ILP is a technique for implementing instruction-level parallelism within a single processor. Pipelining attempts to keep every part of the processor busy with some instruction by dividing inco ...
for a
pipeline Pipeline may refer to: Electronics, computers and computing * Pipeline (computing), a chain of data-processing stages or a CPU optimization found on ** Instruction pipelining, a technique for implementing instruction-level parallelism within a s ...
d architecture); * ''A'' is the execution capacity (e.g., the number of
processors A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, ...
for a parallel architecture). Latency is often measured in seconds per unit of execution workload. Throughput is often measured in units of execution workload per second. Another unit of throughput is
instructions per cycle In computer architecture, instructions per cycle (IPC), commonly called instructions per clock is one aspect of a processor's performance: the average number of instructions executed for each clock cycle. It is the multiplicative inverse of cycl ...
(IPC) and its reciprocal,
cycles per instruction In computer architecture, cycles per instruction (aka clock cycles per instruction, clocks per instruction, or CPI) is one aspect of a processor's performance: the average number of clock cycles per instruction for a program or program fragment. ...
(CPI), is another unit of latency. Speedup is dimensionless and defined differently for each type of quantity so that it is a consistent metric.


Speedup in latency

Speedup in ''latency'' is defined by the following formula: : S_\text = \frac = \frac, where * ''S''latency is the speedup in latency of the architecture 2 with respect to the architecture 1; * ''L''1 is the latency of the architecture 1; * ''L''2 is the latency of the architecture 2. Speedup in latency can be predicted from
Amdahl's law In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It states tha ...
or
Gustafson's law In computer architecture, Gustafson's law (or Gustafson–Barsis's law) gives the speedup in the execution time of a task that theoretically gains from parallel computing, using a hypothetical run of ''the task'' on a single-core machine as the b ...
.


Speedup in throughput

Speedup in ''throughput'' is defined by the formula: : S_\text = \frac = \frac = \fracS_\text, where * ''S''throughput is the speedup in throughput of the architecture 2 with respect to the architecture 1; * ''Q''1 is the throughput of the architecture 1; * ''Q''2 is the throughput of the architecture 2.


Examples


Using execution times

We are testing the effectiveness of a branch predictor on the execution of a program. First, we execute the program with the standard branch predictor on the processor, which yields an execution time of 2.25 seconds. Next, we execute the program with our modified (and hopefully improved) branch predictor on the same processor, which produces an execution time of 1.50 seconds. In both cases the execution workload is the same. Using our speedup formula, we know :S_\text = \frac = \frac = 1.5. Our new branch predictor has provided a 1.5x speedup over the original.


Using cycles per instruction and instructions per cycle

We can also measure speedup in cycles per instruction (CPI) which is a latency. First, we execute the program with the standard branch predictor, which yields a CPI of 3. Next, we execute the program with our modified branch predictor, which yields a CPI of 2. In both cases the execution workload is the same and both architectures are not pipelined nor parallel. Using the speedup formula gives : S_\text = \frac = \frac = 1.5. We can also measure speedup in instructions per cycle ( IPC), which is a throughput and the inverse of CPI. Using the speedup formula gives : S_\text = \frac = \frac = 1.5. We achieve the same 1.5x speedup, though we measured different quantities.


Additional details

Let ''S'' be the speedup of execution of a task and ''s'' the speedup of execution of the part of the task that benefits from the improvement of the resources of an architecture. ''Linear speedup'' or ''ideal speedup'' is obtained when . When running a task with linear speedup, doubling the local speedup doubles the overall speedup. As this is ideal, it is considered very good
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
. ''Efficiency'' is a metric of the utilization of the resources of the improved system defined as : \eta = \frac. Its value is typically between 0 and 1. Programs with linear speedup and programs running on a single processor have an efficiency of 1, while many difficult-to-parallelize programs have efficiency such as 1/ln(''s'') that approaches 0 as the number of processors increases. In engineering contexts, efficiency curves are more often used for graphs than speedup curves, since * all of the area in the graph is useful (whereas in speedup curves half of the space is wasted); * it is easy to see how well the improvement of the system is working; * there is no need to plot a "perfect speedup" curve. In marketing contexts, speedup curves are more often used, largely because they go up and to the right and thus appear better to the less-informed.


Super-linear speedup

Sometimes a speedup of more than ''A'' when using ''A'' processors is observed in
parallel computing Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different fo ...
, which is called ''super-linear speedup''. Super-linear speedup rarely happens and often confuses beginners, who believe the theoretical maximum speedup should be ''A'' when ''A'' processors are used. One possible reason for super-linear speedup in low-level computations is the cache effect resulting from the different memory hierarchies of a modern computer: in parallel computing, not only do the numbers of processors change, but so does the size of accumulated caches from different processors. With the larger accumulated cache size, more or even all of the
working set Working set is a concept in computer science which defines the amount of memory that a process requires in a given time interval. Definition Peter Denning (1968) defines "the working set of information W(t, \tau) of a process at time t to be the ...
can fit into caches and the memory access time reduces dramatically, which causes the extra speedup in addition to that from the actual computation. An analogous situation occurs when searching large datasets, such as the genomic data searched by
BLAST Blast or The Blast may refer to: *Explosion, a rapid increase in volume and release of energy in an extreme manner *Detonation, an exothermic front accelerating through a medium that eventually drives a shock front Film * ''Blast'' (1997 film), ...
implementations. There the accumulated RAM from each of the nodes in a cluster enables the dataset to move from disk into RAM thereby drastically reducing the time required by e.g. mpiBLAST to search it. Super-linear speedups can also occur when performing
backtracking Backtracking is a class of algorithms for finding solutions to some computational problems, notably constraint satisfaction problems, that incrementally builds candidates to the solutions, and abandons a candidate ("backtracks") as soon as it de ...
in parallel: an exception in one thread can cause several other threads to backtrack early, before they reach the exception themselves. Super-linear speedups can also occur in parallel implementations of branch-and-bound for optimization: the processing of one node by one processor may affect the work other processors need to do for the other nodes.


See also

*
Amdahl's law In computer architecture, Amdahl's law (or Amdahl's argument) is a formula which gives the theoretical speedup in latency of the execution of a task at fixed workload that can be expected of a system whose resources are improved. It states tha ...
*
Gustafson's law In computer architecture, Gustafson's law (or Gustafson–Barsis's law) gives the speedup in the execution time of a task that theoretically gains from parallel computing, using a hypothetical run of ''the task'' on a single-core machine as the b ...
*
Brooks's law Brooks' law is an observation about software project management according to which adding manpower to software project that is behind schedule delays it even longer.Frederick P. Brooks, Jr. '' The Mythical Man-Month''. 1995 975 Addison-Wesley. It ...
*
Karp–Flatt metric The Karp–Flatt metric is a measure of parallelization of code in parallel processor systems. This metric exists in addition to Amdahl's law and Gustafson's law as an indication of the extent to which a particular computer code is parallelized. ...
*
Parallel slowdown Parallel slowdown is a phenomenon in parallel computing where parallelization of a parallel algorithm beyond a certain point causes the program to run slower (take more time to run to completion). Parallel slowdown is typically the result of a ...
*
Scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...


References

{{Parallel Computing Analysis of parallel algorithms