High performance computing High-performance computing (HPC) is the use of supercomputers and computer clusters to solve advanced computation problems. Overview HPC integrates systems administration (including network and security knowledge) and parallel programming into ...

applications run on

massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of ...

supercomputers A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instru ...

consist of concurrent programs designed using

multi-threaded In computer architecture, multithreading is the ability of a central processing unit (CPU) (or a single core in a multi-core processor) to provide multiple threads of execution. Overview The multithreading paradigm has become more popular a ...

, multi-process models. The applications may consist of various constructs ( threads, local processes, distributed processes, etc.) with varying degree of parallelism. Although high performance concurrent programs use similar design patterns, models and principles as that of sequential programs, unlike sequential programs, they typically demonstrate non-deterministic behavior. The probability of bugs increases with the number of interactions between the various parallel constructs.

Race conditions A race condition or race hazard is the condition of an electronics, software, or other system where the system's substantive behavior is dependent on the sequence or timing of other uncontrollable events, leading to unexpected or inconsistent ...

, data races, deadlocks, missed signals and live lock are common error types.

Challenges

Parallel programs can be divided into two general categories: explicitly and

implicitly parallel In computer science, implicit parallelism is a characteristic of a programming language that allows a compiler or interpreter to automatically exploit the parallelism inherent to the computations expressed by some of the language's construct ...

. Using parallel language constructs defined for process creation,

communication Communication is commonly defined as the transmission of information. Its precise definition is disputed and there are disagreements about whether Intention, unintentional or failed transmissions are included and whether communication not onl ...

and

synchronization Synchronization is the coordination of events to operate a system in unison. For example, the Conductor (music), conductor of an orchestra keeps the orchestra synchronized or ''in time''. Systems that operate with all parts in synchrony are sa ...

make an application explicitly parallel. Using a tool or parallelizing compiler to convert a serial program into a parallel one, makes it implicitly parallel. Both categories are equally bug-prone.

Heisenbugs

Concurrent applications should execute correctly on every possible thread schedule in the underlying operating system. However, traditional testing methods detect few bugs, chiefly because of the Heisenbugs problem. A Heisenbug is an error that changes or disappears when an attempt is made to isolate and probe them via

debugger A debugger is a computer program used to test and debug other programs (the "target" programs). Common features of debuggers include the ability to run or halt the target program using breakpoints, step through code line by line, and display ...

, by adding some constructs such as synchronization requests or delay statements.

Non-repeatability

Another issue is caused due to the unpredictable behavior of the

scheduler A schedule (, ) or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things ...

. Differences in system load influence scheduler behavior. This behavior cannot be changed manually. To counter this indeterminism, the program must be executed many times under various execution environments. Still, it is not guaranteed that a bug can be reproduced. Most of the time, the program runs correctly, and the bug is visible only when specific conditions are matched. As a result, non-repeatability of the concurrent programs is a major source of roadblock for detecting error. As an example, consider the following. Clearly, this has a problem of causing deadlocks. Yet, it may cause deadlock in some runs of the program while in others, it may run successfully.

Probe effect

Probe effect is seen in parallel programs when delay-statements are inserted in parallel programs facing synchronization problems. This effect, like Heisenbugs, alters behavior changes that may obscure problems. Detecting the source of a probe effect is a great challenge in testing parallel applications.
The main difference between Probe effect and Heisenbugs is that Heisenbugs are seen when additional delay statements and/or synchronization requests are added to the concurrent application during testing, while probe effect is seen when the developer adds delay statements to concurrent applications with poor synchronization.

Testing strategies

The differences between sequential and concurrent programs lead to the differences in their testing strategies. Strategies for sequential programs can be modified to make them suitable for concurrent applications. Specialized strategies have also been developed. Conventionally, testing includes designing test cases and checking that the program produces the expected results. Thus, errors in specification, functionality, etc. are detected by running the application and subjecting it to testing methods such as

Functional Testing In software development, functional testing is a form of software testing that verifies whether a system meets its functional requirements.ISO/IEC/IEEE 24765:2017, "Systems and software engineering — Vocabulary", International Organization fo ...

, White Box,

Black Box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...

and Grey Box Testing.

Static analysis Static analysis, static projection, or static scoring is a simplified analysis wherein the effect of an immediate change to a system is calculated without regard to the longer-term response of the system to that change. If the short-term effect i ...

is also used for detecting errors in high performance software using methods such as

Data Flow Analysis Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. It forms the foundation for a wide variety of compiler optimizations and program verification techn ...

Control Flow Analysis In computer science, control-flow analysis (CFA) is a static-code-analysis technique for determining the control flow of a program. The control flow is expressed as a control-flow graph (CFG). For both functional programming languages and objec ...

, Cyclomatic Complexities, Thread Escape Analysis and Static Slicing Analysis to find problems. Using static analysis before functionality testing can save time. It can detect ‘what the error is’ find the error source. Static analysis techniques can detect problems like lack of

, improper synchronizations, predict occurrence of

deadlock Deadlock commonly refers to: * Deadlock (computer science), a situation where two processes are each waiting for the other to finish * Deadlock (locksmithing) or deadbolt, a physical door locking mechanism * Political deadlock or gridlock, a si ...

s and ''post-wait'' errors in rendezvous requests. Details:

Deterministic scheduling / reproducible testing

The indeterminacy of scheduling has two sources. :1. Time slicing ::Scheduler switches contexts at equal intervals of time. This interval depends on the speed of individual processors, memory-cache hierarchy state and system load. Even on the same processor, under the same load, the interval varies slightly due to minor variations in frequency of the system clock. :2. Page Faults ::Scheduler suspends a program if a

page fault In computing, a page fault is an exception that the memory management unit (MMU) raises when a process accesses a memory page without proper preparations. Accessing the page requires a mapping to be added to the process's virtual address space ...

occurs, letting other threads proceed while the system fetches the page. As the occurrence of page faults depends upon what other processes are running, the timing of a context switch becomes indeterminate. To make concurrent programs repeatable, an external scheduler is used. The program under test is instrumented to add calls to this scheduler. Such calls are made at the beginning and end of each thread as well as before every synchronization request. This scheduler selectively blocks threads of execution by maintaining a semaphore associated with each thread, such that only one thread is ready for execution at any given time. Thus, it converts parallel non-deterministic application into a serial execution sequence in order to achieve repeatability. The number of scheduling decisions made by the serializing scheduler is given by – (N * K / P)*
Where
N = number of threads
K = potential context switch points
P = budget of pre-emptive context switches

Feedback-directed testing

To obtain more accurate results using deterministic scheduling, an alternate approach can be chosen. A few properly-placed pre-emptions in the concurrent program can detect bugs related to data-races. Bugs are found in clusters. The existence of one bug establishes a high probability of more bugs in the same region of code. Thus each pass of the testing process identifies sections of code with bugs. The next pass more thoroughly scrutinizes those sections by adding scheduler calls around them. Allowing the problematic locations to execute in a different order can reveal unexpected behavior.

Timing-related testing

This strategy ensures that the application is not prone to the Probe Effect. Sources of errors that cause the Probe Effect can range from task creation issues to synchronization and communication problems. Requirements of timing related tests: * Delay duration must vary * Delay points must cover appropriate program locations * Delay statements must be inserted, removed or relocated to induce Probe Effect The number of test cases per input data set is: ⁿC₁ + ⁿC₁ + … + ⁿC₁ = 2ⁿ -1
Where n = total number of synchronization, process creation and communication calls. This equation has exponential order. In order to reduce the number of test cases, either Deterministic Execution Method (DET) or Multiple Execution Technique (MET) is used. Various issues must be handled: * Delayed execution :Addition of delays is a straightforward task. A typical sleep() statement can be used to insert delays. *Deciding where to insert delays :Insertion locations are known as delay-points. As the objective of timing related test cases is to detect synchronization, communication and thread creation errors, delay statements are added just in front of these statements.

Advantages

* Easy to implement on multiple processors without any need of ordering the synchronization requests. * No need to generate concurrency graph * More effective for fault detection * Total number of test cases are less, yet code coverage is more, due to static analysis

All Du-Path testing

This method applies the concept of define-use pair, in order to determine the paths to be tested.

Verification strategies

Software verification Software verification is a discipline of software engineering, programming languages, and theory of computation whose goal is to assure that software satisfies the expected requirements. Broad scope and classification A broad definition of verif ...

is a process that proves that software is working correctly and is performing the intended task as designed.

Test calculations

Input is given to the system to generate a known result. This input-result pair can be obtained from previous empirical results and/or manual calculations. This is a system-level test that can be performed only when all relevant modules are integrated. Moreover, it only shows that bugs exist. It offers no detailed information regarding the number of bugs, their location or nature.

Symmetry tests

These tests are primarily used for scientific simulations. The output of the simulation often cannot be predicted. Since these simulations attempt to describe scientific laws, any symmetries in the theory must be honored by the simulation. Thus, by varying input conditions along the lines of symmetry and then comparing the obtained results with externally derived results, the existence of bugs can be detected. Hpc graph

In scientific computing most data lies in the central region of the simulation conditions. As a result, it is difficult to perform Boundary-value testing with real time experimental data. Thus, center of simulation (for example, for data value of 10 in Figure 1) is shifted to one of the boundaries so as to test the boundary condition effectively.

Parallel implementation tests

Parallel implementation tests are usually used for applications that use

distributed memory In computer science, distributed memory refers to a Multiprocessing, multiprocessor computer system in which each Central processing unit, processor has its own private Computer memory, memory. Computational tasks can only operate on local data ...

programming models such as

message passing In computer science, message passing is a technique for invoking behavior (i.e., running a program) on a computer. The invoking program sends a message to a process (which may be an actor or object) and relies on that process and its supporting ...

. These tests are often applied to programs that use regular grids of processors.

Global summation

Many parallel databases use distributed parallel processing to execute the queries. While executing an aggregate function such as sum, the following strategy is used: * Compute a partial sum locally and concurrently at each processor using the data present in the associated disk partition with it. * Add these local subtotals to get the final result. The final result may contain some rounding error as each processor independently rounds-off local results. One test is to ensure that such errors do not occur. This requires showing that the aggregate sum is decomposition-independent. An alternate summation scheme is to send all of the individual values to one processor for summation. This result can be compared with the distributed result to ensure consistency.

Tools

Microsoft CHESS

This tool eliminates non-determinacy using deterministic scheduling. It tracks schedule paths executed previously and guarantees that each time a new schedule path is executed.

References

{{Reflist, 30em Software testing