Mutation testing (or ''mutation analysis'' or ''program mutation'') is used to design new software tests and evaluate the quality of existing software tests. Mutation testing involves modifying a program in small ways. Each mutated version is called a ''mutant'' and tests detect and reject mutants by causing the behaviour of the original version to differ from the mutant. This is called ''killing'' the mutant. Test suites are measured by the percentage of mutants that they kill. New tests can be designed to kill additional mutants. Mutants are based on well-defined ''mutation operators'' that either mimic typical programming errors (such as using the wrong operator or variable name) or force the creation of valuable tests (such as dividing each expression by zero). The purpose is to help the tester develop effective tests or locate weaknesses in the test data used for the program or in sections of the code that are seldom or never accessed during

execution Capital punishment, also known as the death penalty, is the state-sanctioned practice of deliberately killing a person as a punishment for an actual or supposed crime, usually following an authorized, rule-governed process to conclude that ...

. Mutation testing is a form of

white-box testing White-box testing (also known as clear box testing, glass box testing, transparent box testing, and structural testing) is a method of software testing that tests internal structures or workings of an application, as opposed to its functionality ...

Introduction

Most of this article is about "program mutation", in which the program is modified. A more general definition of ''mutation analysis'' is using well-defined rules defined on syntactic structures to make systematic changes to software artifacts.Paul Ammann and Jeff Offutt. Introduction to Software Testing. Cambridge University Press, 2008. Mutation analysis has been applied to other problems, but is usually applied to testing. So ''mutation testing'' is defined as using mutation analysis to design new software tests or to evaluate existing software tests. Thus, mutation analysis and testing can be applied to design models, specifications, databases, tests, XML, and other types of software artifacts, although program mutation is the most common.

Overview

Tests can be created to verify the correctness of the implementation of a given software system, but the creation of tests still poses the question whether the tests are correct and sufficiently cover the requirements that have originated the implementation. (This technological problem is itself an instance of a deeper philosophical problem named "

Quis custodiet ipsos custodes? is a Latin phrase found in the work of the Roman poet Juvenal from his ''Satires'' (Satire VI, lines 347–348). It is literally translated as "Who will guard the guards themselves?", though it is also known by variant translations, such as "Who ...

" Who will guard the guards?") The idea behind mutation testing is that if a mutant is introduced, this normally causes a bug in the program's functionality which the tests should find. This way, the tests are tested. If a mutant is not detected by the test suite, this typically indicates that the test suite is unable to locate the faults represented by the mutant, but it can also indicate that the mutation introduces no faults, that is, the mutation is a valid change that does not affect functionality. One (common) way a mutant can be valid is that the code that has been changed is "dead code" that is never executed. For mutation testing to function at scale, a large number of mutants are usually introduced, leading to the compilation and execution of an extremely large number of copies of the program. This problem of the expense of mutation testing had reduced its practical use as a method of software testing. However, the increased use of object oriented programming languages and

unit testing In computer programming, unit testing is a software testing method by which individual units of source code—sets of one or more computer program modules together with associated control data, usage procedures, and operating procedures&md ...

frameworks has led to the creation of mutation testing tools that test individual portions of an application.

Goals

The goals of mutation testing are multiple: * identify weakly tested pieces of code (those for which mutants are not killed) * identify weak tests (those that never kill mutants) * compute the mutation score, the mutation score is the number of mutants killed / total number of mutants. * learn about error propagation and state infection in the program

History

Mutation testing was originally proposed by Richard Lipton as a student in 1971,Mutation 2000: Uniting the Orthogonal
by A. Jefferson Offutt and Roland H. Untch. and first developed and published by DeMillo, Lipton and Sayward.Richard A. DeMillo, Richard J. Lipton, and Fred G. Sayward. Hints on test data selection: Help for the practicing programmer. IEEE Computer, 11(4):34-41. April 1978. The first implementation of a mutation testing tool was by Timothy Budd as part of his PhD work (titled ''Mutation Analysis'') in 1980 from

Yale University Yale University is a Private university, private research university in New Haven, Connecticut. Established in 1701 as the Collegiate School, it is the List of Colonial Colleges, third-oldest institution of higher education in the United Sta ...

.Tim A. Budd, Mutation Analysis of Program Test Data. PhD thesis, Yale University New Haven CT, 1980. Recently, with the availability of massive computing power, there has been a resurgence of mutation analysis within the computer science community, and work has been done to define methods of applying mutation testing to object oriented programming languages and non-procedural languages such as

XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...

, SMV, and finite state machines. In 2004 a company called Certess Inc. (now part of

Synopsys Synopsys is an American electronic design automation (EDA) company that focuses on silicon design and verification, silicon intellectual property and software security and quality. Products include tools for logic synthesis and physical de ...

) extended many of the principles into the hardware verification domain. Whereas mutation analysis only expects to detect a difference in the output produced, Certess extends this by verifying that a checker in the testbench will actually detect the difference. This extension means that all three stages of verification, namely: activation, propagation and detection are evaluated. They called this functional qualification.

Fuzzing In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions ...

can be considered to be a special case of mutation testing. In fuzzing, the messages or data exchanged inside communication interfaces (both inside and between software instances) are mutated to catch failures or differences in processing the data.

Codenomicon Synopsys is an American electronic design automation (EDA) company that focuses on silicon design and verification, silicon intellectual property and software security and quality. Products include tools for logic synthesis and physical desig ...

(2001) and

Mu Dynamics Spirent Communications plc is a British multinational telecommunications testing company headquartered in Crawley, West Sussex, in the United Kingdom. It is listed on the London Stock Exchange and is a constituent of the FTSE 250 Index. Histor ...

(2005) evolved

fuzzing In programming and software development, fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program. The program is then monitored for exceptions ...

concepts to a fully stateful mutation testing platform, complete with monitors for thoroughly exercising protocol implementations.

Mutation testing overview

Mutation testing is based on two hypotheses. The first is the ''competent programmer'' hypothesis. This hypothesis states that competent programmers write programs that are close to being correct. "Close" is intended to be based on behavior, not syntax. The second hypothesis is called the ''coupling effect''. The coupling effect asserts that simple faults can cascade or ''couple'' to form other emergent faults.A. Jefferson Offutt. 1992. Investigations of the software testing coupling effect. ACM Trans. Softw. Eng. Methodol. 1, 1 (January 1992), 5-20.A. T. Acree, T. A. Budd, R. A. DeMillo, R. J. Lipton, and F. G. Sayward, "Mutation Analysis," Georgia Institute of Technology, Atlanta, Georgia, Technique Report GIT-ICS-79/08, 1979. Subtle and important faults are also revealed by higher-order mutants, which further support the coupling effect.Yue Jia; Harman, M., "Constructing Subtle Faults Using Higher Order Mutation Testing," Source Code Analysis and Manipulation, 2008 Eighth IEEE International Working Conference on , vol., no., pp.249,258, 28-29 Sept. 2008Maryam Umar, "An Evaluation of Mutation Operators For Equivalent Mutants," MS Thesis, 2006Smith B., "On Guiding Augmentation of an Automated Test Suite via Mutation Analysis," 2008Polo M. and Piattini M., "Mutation Testing: practical aspects and cost analysis," University of Castilla-La Mancha (Spain), Presentation, 2009Anderson S., "Mutation Testing", the University of Edinburgh, School of Informatics, Presentation, 2011 Higher-order mutants are enabled by creating mutants with more than one mutation. Mutation testing is done by selecting a set of mutation operators and then applying them to the source program one at a time for each applicable piece of the source code. The result of applying one mutation operator to the program is called a ''mutant''. If the test suite is able to detect the change (i.e. one of the tests fails), then the mutant is said to be ''killed''. For example, consider the following C++ code fragment: if (a && b) else The condition mutation operator would replace && with , , and produce the following mutant: if (a , , b) else Now, for the test to kill this mutant, the following three conditions should be met: # A test must ''reach'' the mutated statement. # Test input data should ''infect'' the program state by causing different program states for the mutant and the original program. For example, a test with a = 1 and b = 0 would do this. # The incorrect program state (the value of 'c') must ''propagate'' to the program's output and be checked by the test. These conditions are collectively called the ''RIP model''. ''Weak mutation testing'' (or ''weak mutation coverage'') requires that only the first and second conditions are satisfied. ''Strong mutation testing'' requires that all three conditions are satisfied. Strong mutation is more powerful, since it ensures that the test suite can really catch the problems. Weak mutation is closely related to

code coverage In computer science, test coverage is a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run. A program with high test coverage has more of its source code executed during testing, ...

methods. It requires much less computing power to ensure that the test suite satisfies weak mutation testing than strong mutation testing. However, there are cases where it is not possible to find a test case that could kill this mutant. The resulting program is behaviorally equivalent to the original one. Such mutants are called ''equivalent mutants''. Equivalent mutants detection is one of biggest obstacles for practical usage of mutation testing. The effort needed to check if mutants are equivalent or not can be very high even for small programs. A systematic literature review of a wide range of approaches to overcome the Equivalent Mutant Problem identified 17 relevant techniques (in 22 articles) and three categories of techniques: detecting (DEM); suggesting (SEM); and avoiding equivalent mutant generation (AEMG). The experiment indicated that Higher Order Mutation in general and JudyDiffOp strategy in particular provide a promising approach to the Equivalent Mutant Problem. In addition to equivalent mutants, there are ''subsumed mutants'' which are mutants that exist in the same source code location as another mutant, and are said to be "subsumed" by the other mutant. Subsumed mutants are not visible to a mutation testing tool, and do not contribute to coverage metrics. For example, let's say you have two mutants, A and B, that both change a line of code in the same way. Mutant A is tested first, and the result is that the code is not working correctly. Mutant B is then tested, and the result is the same as with mutant A. In this case, Mutant B is considered to be subsumed by Mutant A, since the result of testing Mutant B is the same as the result of testing Mutant A. Therefore, Mutant B does not need to be tested, as the result will be the same as Mutant A.

Mutation operators

Many mutation operators have been explored by researchers. Here are some examples of mutation operators for imperative languages: * Statement deletion * Statement duplication or insertion, e.g. goto fail; * Replacement of boolean subexpressions with ''true'' and ''false'' * Replacement of some arithmetic operations with others, e.g. + with *, - with / * Replacement of some boolean relations with others, e.g. > with >=, and <= * Replacement of variables with others from the same scope (variable types must be compatible) * Remove method body, implemented in Pitest These mutation operators are also called traditional mutation operators. There are also mutation operators for object-oriented languages, for concurrent constructions, complex objects like containers, etc. Operators for containers are called ''class-level'' mutation operators. For example, the muJava tool offers various class-level mutation operators such as Access Modifier Change, Type Cast Operator Insertion, and Type Cast Operator Deletion. Mutation operators have also been developed to perform security vulnerability testing of programs.Mutation-based Testing of Buffer Overflows, SQL Injections, and Format String Bugs
by H. Shahriar and M. Zulkernine.

References

{{Software testing Software testing