HOME

TheInfoList



OR:

In
software development Software development is the process of conceiving, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components. Software development invol ...
, effort estimation is the process of predicting the most realistic amount of effort (expressed in terms of person-hours or money) required to develop or maintain
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
based on incomplete, uncertain and noisy input. Effort
estimates {{otheruses, Estimate (disambiguation) In the Westminster system of government, the ''Estimates'' are an outline of government spending for the following fiscal year presented by the cabinet to parliament. The Estimates are drawn up by bureaucrat ...
may be used as input to project plans, iteration plans, budgets, investment analyses, pricing processes and bidding rounds.


State-of-practice

Published surveys on estimation practice suggest that expert estimation is the dominant strategy when estimating software development effort. Typically, effort estimates are over-optimistic and there is a strong over-confidence in their accuracy. The mean effort overrun seems to be about 30% and not decreasing over time. For a review of effort estimation error surveys, see. However, the measurement of estimation error is problematic, see Assessing the accuracy of estimates. The strong overconfidence in the accuracy of the effort estimates is illustrated by the finding that, on average, if a software professional is 90% confident or “almost sure” to include the actual effort in a minimum-maximum interval, the observed frequency of including the actual effort is only 60-70%. Currently the term “effort estimate” is used to denote as different concepts such as most likely use of effort (modal value), the effort that corresponds to a probability of 50% of not exceeding (median), the planned effort, the budgeted effort or the effort used to propose a bid or price to the client. This is believed to be unfortunate, because communication problems may occur and because the concepts serve different goals.


History

Software researchers and practitioners have been addressing the problems of effort estimation for software development projects since at least the 1960s; see, e.g., work by Farr and Nelson. Most of the research has focused on the construction of formal software effort estimation models. The early models were typically based on
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
or mathematically derived from theories from other domains. Since then a high number of model building approaches have been evaluated, such as approaches founded on
case-based reasoning In artificial intelligence and philosophy, case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems. In everyday life, an auto mechanic who fixes an engine by recalli ...
, classification and regression trees,
simulation A simulation is the imitation of the operation of a real-world process or system over time. Simulations require the use of Conceptual model, models; the model represents the key characteristics or behaviors of the selected system or proc ...
,
neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
,
Bayesian statistics Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a ''degree of belief'' in an event. The degree of belief may be based on prior knowledge about the event, ...
,
lexical analysis In computer science, lexical analysis, lexing or tokenization is the process of converting a sequence of characters (such as in a computer program or web page) into a sequence of ''lexical tokens'' (strings with an assigned and thus identified m ...
of requirement specifications,
genetic programming In artificial intelligence, genetic programming (GP) is a technique of evolving programs, starting from a population of unfit (usually random) programs, fit for a particular task by applying operations analogous to natural genetic processes to t ...
,
linear programming Linear programming (LP), also called linear optimization, is a method to achieve the best outcome (such as maximum profit or lowest cost) in a mathematical model whose requirements are represented by linear function#As a polynomial function, li ...
, economic production models,
soft computing Soft computing is a set of algorithms, including neural networks, fuzzy logic, and evolutionary algorithms. These algorithms are tolerant of imprecision, uncertainty, partial truth and approximation. It is contrasted with hard computing: al ...
,
fuzzy logic Fuzzy logic is a form of many-valued logic in which the truth value of variables may be any real number between 0 and 1. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely ...
modeling, statistical
bootstrapping In general, bootstrapping usually refers to a self-starting process that is supposed to continue or grow without external input. Etymology Tall boots may have a tab, loop or handle at the top known as a bootstrap, allowing one to use fingers ...
, and combinations of two or more of these models. The perhaps most common estimation methods today are the parametric estimation models
COCOMO The Constructive Cost Model (COCOMO) is a procedural software cost estimation model developed by Barry W. Boehm. The model parameters are derived from fitting a regression formula using data from historical projects (63 projects for COCOMO 81 ...
,
SEER-SEM SEER for Software (SEER-SEM) is a project management application used to estimate resources required for software development. History 1966 System Development Corporation Model based on regressions. 1980 Don Reifer and Dan Galorath paper which ...
and SLIM. They have their basis in estimation research conducted in the 1970s and 1980s and are since then updated with new calibration data, with the last major release being COCOMO II in the year 2000. The estimation approaches based on functionality-based size measures, e.g., function points, is also based on research conducted in the 1970s and 1980s, but are re-calibrated with modified size measures and different counting approaches, such as the use case points or
object point Object points are an approach used in software development effort estimation under some models such as COCOMO II. Object points are a way of estimating effort size, similar to Source Lines Of Code (SLOC) or Function Points. They are not necess ...
s in the 1990s.


Estimation approaches

There are many ways of categorizing estimation approaches, see for example. The top level categories are the following: * Expert estimation: The quantification step, i.e., the step where the estimate is produced based on judgmental processes. * Formal estimation model: The quantification step is based on mechanical processes, e.g., the use of a formula derived from historical data. * Combination-based estimation: The quantification step is based on a judgmental and mechanical combination of estimates from different sources. Below are examples of estimation approaches within each category.


Selection of estimation approaches

The evidence on differences in estimation accuracy of different estimation approaches and models suggest that there is no “best approach” and that the relative accuracy of one approach or model in comparison to another depends strongly on the context . This implies that different organizations benefit from different estimation approaches. Findings that may support the selection of estimation approach based on the expected accuracy of an approach include: * Expert estimation is on average at least as accurate as model-based effort estimation. In particular, situations with unstable relationships and information of high importance not included in the model may suggest use of expert estimation. This assumes, of course, that experts with relevant experience are available. * Formal estimation models not tailored to a particular organization’s own context, may be very inaccurate. Use of own historical data is consequently crucial if one cannot be sure that the estimation model’s core relationships (e.g., formula parameters) are based on similar project contexts. * Formal estimation models may be particularly useful in situations where the model is tailored to the organization’s context (either through use of own historical data or that the model is derived from similar projects and contexts), and it is likely that the experts’ estimates will be subject to a strong degree of wishful thinking. The most robust finding, in many forecasting domains, is that combination of estimates from independent sources, preferable applying different approaches, will on average improve the estimation accuracy. It is important to be aware of the limitations of each traditional approach to measuring software development productivity. In addition, other factors such as ease of understanding and communicating the results of an approach, ease of use of an approach, and cost of introduction of an approach should be considered in a selection process.


Assessing the accuracy of estimates

The most common measure of the average estimation accuracy is the MMRE (Mean Magnitude of Relative Error), where the MRE of each estimate is defined as: : This measure has been criticized and there are several alternative measures, such as more symmetric measures, Weighted Mean of Quartiles of relative errors (WMQ) and Mean Variation from Estimate (MVFE). MRE is not reliable if the individual items are skewed. PRED(25) is preferred as a measure of estimation accuracy. PRED(25) measures the percentage of predicted values that are within 25 percent of the actual value. A high estimation error cannot automatically be interpreted as an indicator of low estimation ability. Alternative, competing or complementing, reasons include low cost control of project, high complexity of development work, and more delivered functionality than originally estimated. A framework for improved use and interpretation of estimation error measurement is included in.


Psychological issues

There are many psychological factors potentially explaining the strong tendency towards over-optimistic effort estimates that need to be dealt with to increase accuracy of effort estimates. These factors are essential even when using formal estimation models, because much of the input to these models is judgment-based. Factors that have been demonstrated to be important are:
Wishful thinking Wishful thinking is the formation of beliefs based on what might be pleasing to imagine, rather than on evidence, rationality, or reality. It is a product of resolving conflicts between belief and desire. Methodologies to examine wishful think ...
,
anchoring An anchor is a device, normally made of metal , used to secure a vessel to the bed of a body of water to prevent the craft from drifting due to wind or current. The word derives from Latin ''ancora'', which itself comes from the Greek ἄγ ...
,
planning fallacy The planning fallacy is a phenomenon in which predictions about how much time will be needed to complete a future task display an optimism bias and underestimate the time needed. This phenomenon sometimes occurs regardless of the individual's know ...
and
cognitive dissonance In the field of psychology, cognitive dissonance is the perception of contradictory information, and the mental toll of it. Relevant items of information include a person's actions, feelings, ideas, beliefs, values, and things in the environment. ...
. A discussion on these and other factors can be found in work by Jørgensen and Grimstad. * It's easy to estimate what is known. * It's hard to estimate what is known to be unknown. (known unknowns) * It's very hard to estimate what is not known to be unknown. (unknown unknowns)


Humor

The chronic underestimation of development effort has led to the coinage and popularity of numerous humorous adages, such as ironically referring to a task as a " small matter of programming" (when much effort is likely required), and citing laws about underestimation: * Ninety–ninety rule: *
Hofstadter's law Hofstadter's law is a self-referential adage, coined by Douglas Hofstadter in his book '' Gödel, Escher, Bach: An Eternal Golden Braid'' (1979) to describe the widely experienced difficulty of accurately estimating the time it will take to complet ...
: * Fred Brooks' law:


Comparison of development estimation software


See also


References

{{Reflist Software project management Software engineering costs