computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...

, analysis of parallel algorithms is the process of finding the

computational complexity In computer science, the computational complexity or simply complexity of an algorithm is the amount of resources required to run it. Particular focus is given to computation time (generally measured by the number of needed elementary operations ...

algorithm In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...

s executed in

parallel Parallel may refer to: Mathematics * Parallel (geometry), two lines in the Euclidean plane which never intersect * Parallel (operator), mathematical operation named after the composition of electrical resistance in parallel circuits Science a ...

– the amount of time, storage, or other

resources ''Resource'' refers to all the materials available in our environment which are Technology, technologically accessible, Economics, economically feasible and Culture, culturally Sustainability, sustainable and help us to satisfy our needs and want ...

needed to execute them. In many respects, analysis of

parallel algorithm In computer science, a parallel algorithm, as opposed to a traditional serial algorithm, is an algorithm which can do multiple operations in a given time. It has been a tradition of computer science to describe serial algorithms in abstract mach ...

s is similar to the analysis of

sequential algorithm In computer science, a sequential algorithm or serial algorithm is an algorithm that is executed sequentially – once through, from start to finish, without other processing executing – as opposed to concurrently or in parallel. The term is pr ...

s, but is generally more involved because one must reason about the behavior of multiple cooperating threads of execution. One of the primary goals of parallel analysis is to understand how a parallel algorithm's use of resources (speed, space, etc.) changes as the number of processors is changed.

Background

A so-called work-time (WT) (sometimes called work-depth, or work-span) framework was originally introduced by Shiloach and Vishkin for conceptualizing and describing parallel algorithms. In the WT framework, a parallel algorithm is first described in terms of parallel rounds. For each round, the operations to be performed are characterized, but several issues can be suppressed. For example, the number of operations at each round need not be clear, processors need not be mentioned and any information that may help with the assignment of processors to jobs need not be accounted for. Second, the suppressed information is provided. The inclusion of the suppressed information is guided by the proof of a scheduling theorem due to Brent, which is explained later in this article. The WT framework is useful since while it can greatly simplify the initial description of a parallel algorithm, inserting the details suppressed by that initial description is often not very difficult. For example, the WT framework was adopted as the basic presentation framework in the parallel algorithms books (for the

parallel random-access machine In computer science, a parallel random-access machine (parallel RAM or PRAM) is a shared-memory abstract machine. As its name indicates, the PRAM is intended as the parallel-computing analogy to the random-access machine (RAM) (not to be confuse ...

PRAM model) and, as well as in the class notes . The overview below explains how the WT framework can be used for analyzing more general parallel algorithms, even when their description is not available within the WT framework.

Definitions

Suppose computations are executed on a machine that has processors. Let denote the time that expires between the start of the computation and its end. Analysis of the computation's

running time In theoretical computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations p ...

focuses on the following notions: * The ''work'' of a computation executed by processors is the total number of primitive operations that the processors perform. Ignoring communication overhead from synchronizing the processors, this is equal to the time used to run the computation on a single processor, denoted . * The ''depth'' or ''span'' is the length of the longest series of operations that have to be performed sequentially due to data dependencies (the '). The depth may also be called the ''critical path length'' of the computation. Minimizing the depth/span is important in designing parallel algorithms, because the depth/span determines the shortest possible execution time. Alternatively, the span can be defined as the time spent computing using an idealized machine with an infinite number of processors. * The ''cost'' of the computation is the quantity . This expresses the total time spent, by all processors, in both computing and waiting. Several useful results follow from the definitions of work, span and cost: * ''Work law''. The cost is always at least the work: . This follows from the fact that processors can perform at most operations in parallel. * ''Span law''. A finite number of processors cannot outperform an infinite number, so that . Using these definitions and laws, the following measures of performance can be given: * ''

Speedup In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with ...

'' is the gain in speed made by parallel execution compared to sequential execution: . When the speedup is for processors (using

big O notation Big ''O'' notation is a mathematical notation that describes the asymptotic analysis, limiting behavior of a function (mathematics), function when the Argument of a function, argument tends towards a particular value or infinity. Big O is a memb ...

), the speedup is linear, which is optimal in simple models of computation because the work law implies that ( super-linear speedup can occur in practice due to

memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...

effects). The situation is called perfect linear speedup. An algorithm that exhibits linear speedup is said to be

scalable Scalability is the property of a system to handle a growing amount of work. One definition for software systems specifies that this may be done by adding resources to the system. In an economic context, a scalable business model implies that ...

. Analytical expressions for the speedup of many important parallel algorithms are presented in this book. * ''Efficiency'' is the speedup per processor, . * ''Parallelism'' is the ratio . It represents the maximum possible speedup on any number of processors. By the span law, the parallelism bounds the speedup: if , then:

\frac \leq \frac < p.

* The ''slackness'' is . A slackness less than one implies (by the span law) that perfect linear speedup is impossible on processors.

Execution on a limited number of processors

Analysis of parallel algorithms is usually carried out under the assumption that an unbounded number of processors is available. This is unrealistic, but not a problem, since any computation that can run in parallel on processors can be executed on processors by letting each processor execute multiple units of work. A result called Brent's law states that one can perform such a "simulation" in time , bounded by :

T_p \leq T_N + \frac,

or, less precisely, :

T_p = O \left( T_N + \frac \right) .

An alternative statement of the law bounds above and below by :

\frac \leq T_p \leq \frac + T_\infty

. showing that the span (depth) and the work together provide reasonable bounds on the computation time.

References

{{Parallel Computing