HOME

TheInfoList



OR:

The root mean square deviation (RMSD) or root mean square error (RMSE) is either one of two closely related and frequently used measures of the differences between true or predicted values on the one hand and observed values or an
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on Sample (statistics), observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguish ...
on the other. The deviation is typically simply a differences of
scalars Scalar may refer to: *Scalar (mathematics), an element of a field, which is used to define a vector space, usually the field of real numbers *Scalar (physics), a physical quantity that can be described by a single element of a number field such a ...
; it can also be generalized to the vector lengths of a
displacement Displacement may refer to: Physical sciences Mathematics and physics *Displacement (geometry), is the difference between the final and initial position of a point trajectory (for instance, the center of mass of a moving object). The actual path ...
, as in the
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
concept of root mean square deviation of atomic positions.


RMSD of a sample

The RMSD of a sample is the quadratic mean of the differences between the observed values and predicted ones. These deviations are called '' residuals'' when the calculations are performed over the data sample that was used for estimation (and are therefore always in reference to an estimate) and are called ''errors'' (or prediction errors) when computed out-of-sample (aka on the full set, referencing a true value rather than an estimate). The RMSD serves to aggregate the magnitudes of the errors in predictions for various data points into a single measure of predictive power. RMSD is a measure of
accuracy Accuracy and precision are two measures of ''observational error''. ''Accuracy'' is how close a given set of measurements (observations or readings) are to their ''true value''. ''Precision'' is how close the measurements are to each other. The ...
, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent. RMSD is always non-negative, and a value of 0 (almost never achieved in practice) would indicate a perfect fit to the data. In general, a lower RMSD is better than a higher one. However, comparisons across different types of data would be invalid because the measure is dependent on the scale of the numbers used. RMSD is the square root of the average of squared errors. The effect of each error on RMSD is proportional to the size of the squared error; thus larger errors have a disproportionately large effect on RMSD. Consequently, RMSD is sensitive to
outliers In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter ar ...
.


Formulas


Estimator

The RMSD of an
estimator In statistics, an estimator is a rule for calculating an estimate of a given quantity based on Sample (statistics), observed data: thus the rule (the estimator), the quantity of interest (the estimand) and its result (the estimate) are distinguish ...
\hat with respect to an estimated parameter \theta is defined as the square root of the
mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
: :\operatorname(\hat) = \sqrt = \sqrt. For an
unbiased estimator In statistics, the bias of an estimator (or bias function) is the difference between this estimator's expected value and the true value of the parameter being estimated. An estimator or decision rule with zero bias is called ''unbiased''. In stat ...
, the RMSD is the square root of the
variance In probability theory and statistics, variance is the expected value of the squared deviation from the mean of a random variable. The standard deviation (SD) is obtained as the square root of the variance. Variance is a measure of dispersion ...
, known as the
standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
.


Samples

If is a sample of a population with true mean value x_0, then the RMSD of the sample is :\operatorname = \sqrt. The RMSD of predicted values \hat y_t for times ''t'' of a regression's
dependent variable A variable is considered dependent if it depends on (or is hypothesized to depend on) an independent variable. Dependent variables are studied under the supposition or demand that they depend, by some law or rule (e.g., by a mathematical functio ...
y_t, with variables observed over ''T'' times, is computed for ''T'' different predictions as the square root of the mean of the squares of the deviations: :\operatorname=\sqrt. (For regressions on
cross-sectional data In statistics and econometrics, cross-sectional data is a type of data collected by observing many subjects (such as individuals, firms, countries, or regions) at a single point or period of time. Analysis of cross-sectional data usually consists ...
, the subscript ''t'' is replaced by ''i'' and ''T'' is replaced by ''n''.) In some disciplines, the RMSD is used to compare differences between two things that may vary, neither of which is accepted as the "standard". For example, when measuring the average difference between two time series x_ and x_, the formula becomes :\operatorname= \sqrt.


Normalization

Normalizing the RMSD facilitates the comparison between datasets or models with different scales. Though there is no consistent means of normalization in the literature, common choices are the mean or the range (defined as the maximum value minus the minimum value) of the measured data: :\mathrm = \frac or \mathrm = \frac . This value is commonly referred to as the ''normalized root mean square deviation'' or ''error'' (NRMSD or NRMSE), and often expressed as a percentage, where lower values indicate less residual variance. This is also called
Coefficient of Variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...
or Percent RMS. In many cases, especially for smaller samples, the sample range is likely to be affected by the size of sample which would hamper comparisons. Another possible method to make the RMSD a more useful comparison measure is to divide the RMSD by the
interquartile range In descriptive statistics, the interquartile range (IQR) is a measure of statistical dispersion, which is the spread of the data. The IQR may also be called the midspread, middle 50%, fourth spread, or H‑spread. It is defined as the differen ...
(IQR). When dividing the RMSD with the IQR the normalized value gets less sensitive for extreme values in the target variable. :\mathrm = \frac where IQR = Q_3 - Q_1 with Q_1 = \text^(0.25) and Q_3 = \text^(0.75) , where CDF−1 is the
quantile function In probability and statistics, the quantile function is a function Q: ,1\mapsto \mathbb which maps some probability x \in ,1/math> of a random variable v to the value of the variable y such that P(v\leq y) = x according to its probability distr ...
. When normalizing by the mean value of the measurements, the term ''coefficient of variation of the RMSD, CV(RMSD)'' may be used to avoid ambiguity. This is analogous to the
coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...
with the RMSD taking the place of the
standard deviation In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its Expected value, mean. A low standard Deviation (statistics), deviation indicates that the values tend to be close to the mean ( ...
. : \mathrm = \frac .


Mean absolute error

Some researchers have recommended the use of the mean absolute error (MAE) instead of the root mean square deviation. MAE possesses advantages in interpretability over RMSD. MAE is the average of the absolute values of the errors. MAE is fundamentally easier to understand than the square root of the average of squared errors. Furthermore, each error influences MAE in direct proportion to the absolute value of the error, which is not the case for RMSD.


Applications

*In
meteorology Meteorology is the scientific study of the Earth's atmosphere and short-term atmospheric phenomena (i.e. weather), with a focus on weather forecasting. It has applications in the military, aviation, energy production, transport, agricultur ...
, to see how effectively a
mathematical Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
model predicts the behavior of the
atmosphere An atmosphere () is a layer of gases that envelop an astronomical object, held in place by the gravity of the object. A planet retains an atmosphere when the gravity is great and the temperature of the atmosphere is low. A stellar atmosph ...
. *In
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
, the root mean square deviation of atomic positions is the measure of the average distance between the atoms of superimposed
proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, re ...
. *In structure based drug design, the RMSD is a measure of the difference between a crystal conformation of the ligand conformation and a docking prediction. *In
economics Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goods and services. Economics focuses on the behaviour and interac ...
, the RMSD is used to determine whether an economic model fits
economic indicator An economic indicator is a statistic about an Economics, economic activity. Economic indicators allow analysis of economic performance and predictions of future performance. One application of economic indicators is the study of business cycles. ...
s. Some experts have argued that RMSD is less reliable than Relative Absolute Error. *In
experimental psychology Experimental psychology is the work done by those who apply Experiment, experimental methods to psychological study and the underlying processes. Experimental psychologists employ Research participant, human participants and Animal testing, anim ...
, the RMSD is used to assess how well mathematical or computational models of behavior explain the empirically observed behavior. *In GIS, the RMSD is one measure used to assess the accuracy of spatial analysis and
remote sensing Remote sensing is the acquisition of information about an physical object, object or phenomenon without making physical contact with the object, in contrast to in situ or on-site observation. The term is applied especially to acquiring inform ...
. *In
hydrogeology Hydrogeology (''hydro-'' meaning water, and ''-geology'' meaning the study of the Earth) is the area of geology that deals with the distribution and movement of groundwater in the soil and rock (geology), rocks of the Earth's crust (ge ...
, RMSD and NRMSD are used to evaluate the calibration of a groundwater model. *In imaging science, the RMSD is part of the
peak signal-to-noise ratio Peak signal-to-noise ratio (PSNR) is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation. Because many signals have a very wide dynamic ...
, a measure used to assess how well a method to reconstruct an image performs relative to the original image. *In
computational neuroscience Computational neuroscience (also known as theoretical neuroscience or mathematical neuroscience) is a branch of  neuroscience which employs mathematics, computer science, theoretical analysis and abstractions of the brain to understand th ...
, the RMSD is used to assess how well a system learns a given model. *In protein nuclear magnetic resonance spectroscopy, the RMSD is used as a measure to estimate the quality of the obtained bundle of structures. *Submissions for the
Netflix Prize The Netflix Prize was an open competition for the best collaborative filtering algorithm to predict user ratings for films, based on previous ratings without any other information about the users or films, i.e. without the users being identified ...
were judged using the RMSD from the test dataset's undisclosed "true" values. *In the simulation of energy consumption of buildings, the RMSE and CV(RMSE) are used to calibrate models to measured building performance. *In
X-ray crystallography X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
, RMSD (and RMSZ) is used to measure the deviation of the molecular internal coordinates deviate from the restraints library values. *In control theory, the RMSE is used as a quality measure to evaluate the performance of a
state observer In control theory, a state observer, state estimator, or Luenberger observer is a system that provides an estimate of the state space (controls), internal state of a given real system, from measurements of the Input/output, input and output of th ...
.https://kalman-filter.com/root-mean-square-error
/ref> *In
fluid dynamics In physics, physical chemistry and engineering, fluid dynamics is a subdiscipline of fluid mechanics that describes the flow of fluids – liquids and gases. It has several subdisciplines, including (the study of air and other gases in motion ...
, normalized root mean square deviation (NRMSD), coefficient of variation (CV), and percent RMS are used to quantify the uniformity of flow behavior such as velocity profile, temperature distribution, or gas species concentration. The value is compared to industry standards to optimize the design of flow and thermal equipment and processes.


See also

*
Root mean square In mathematics, the root mean square (abbrev. RMS, or rms) of a set of values is the square root of the set's mean square. Given a set x_i, its RMS is denoted as either x_\mathrm or \mathrm_x. The RMS is also known as the quadratic mean (denote ...
* Mean absolute error *
Average absolute deviation The average absolute deviation (AAD) of a data set is the average of the absolute deviations from a central point. It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be a mean, median, ...
* Mean signed deviation *
Mean squared error In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference betwee ...
* Squared deviations from the mean *
Errors and residuals In statistics and optimization, errors and residuals are two closely related and easily confused measures of the deviation of an observed value of an element of a statistical sample from its "true value" (not necessarily observable). The erro ...
*
Coefficient of variation In probability theory and statistics, the coefficient of variation (CV), also known as normalized root-mean-square deviation (NRMSD), percent RMS, and relative standard deviation (RSD), is a standardized measure of dispersion of a probability ...


References

{{Machine learning evaluation metrics Point estimation performance Statistical deviation and dispersion