HOME

TheInfoList



OR:

In statistics, the strictly standardized mean difference (SSMD) is a measure of
effect size In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...
. It is the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
divided by the standard deviation of a difference between two random values each from one of two groups. It was initially proposed for quality control and
hit selection In high-throughput screening (HTS), one of the major goals is to select compounds (including small molecules, siRNAs, shRNA, genes, et al.) with a desired size of inhibition or activation effects. A compound with a desired size of effects in an HTS ...
in
high-throughput screening High-throughput screening (HTS) is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology, materials science and chemistry. Using robotics, data processing/control software, liquid handlin ...
(HTS) and has become a statistical parameter measuring effect sizes for the comparison of any two groups with random values.


Background

In
high-throughput screening High-throughput screening (HTS) is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology, materials science and chemistry. Using robotics, data processing/control software, liquid handlin ...
(HTS), quality control (QC) is critical. An important QC characteristic in a HTS
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
is how much the positive controls, test compounds, and negative controls differ from one another. This QC characteristic can be evaluated using the comparison of two well types in HTS
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
s. Signal-to-noise ratio (S/N), signal-to-background ratio (S/B), and the
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
have been adopted to evaluate the quality of HTS
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
s through the comparison of two investigated types of wells. However, the S/B does not take into account any information on variability; and the S/N can capture the variability only in one group and hence cannot assess the quality of
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
when the two groups have different variabilities. Zhang JH et al. proposed the
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
. The advantage of the
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
over the S/N and S/B is that it takes into account the variabilities in both compared groups. As a result, the
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
has been broadly used as a QC metric in HTS assays. The absolute sign in the
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
makes it inconvenient to derive its statistical inference mathematically. To derive a better interpretable parameter for measuring the differentiation between two groups, Zhang XHD proposed SSMD to evaluate the differentiation between a positive control and a negative control in HTS assays. SSMD has a probabilistic basis due to its strong link with d+-probability (i.e., the probability that the difference between two groups is positive). To some extent, the d+-probability is equivalent to the well-established probabilistic index P(''X'' > ''Y'') which has been studied and applied in many areas. Supported on its probabilistic basis, SSMD has been used for both quality control and
hit selection In high-throughput screening (HTS), one of the major goals is to select compounds (including small molecules, siRNAs, shRNA, genes, et al.) with a desired size of inhibition or activation effects. A compound with a desired size of effects in an HTS ...
in high-throughput screening.


Concept


Statistical parameter

As a statistical parameter, SSMD (denoted as \beta) is defined as the ratio of
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
to standard deviation of the difference of two random values respectively from two groups. Assume that one group with random values has
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\mu_1 and
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
\sigma_1^2 and another group has
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\mu_2 and
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
\sigma_2^2. The
covariance In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the le ...
between the two groups is \sigma_. Then, the SSMD for the comparison of these two groups is defined as :\beta = \frac. If the two groups are independent, :\beta = \frac. If the two independent groups have equal
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s \sigma^2, :\beta = \frac. In the situation where the two groups are correlated, a commonly used strategy to avoid the calculation of \sigma_ is first to obtain paired observations from the two groups and then to estimate SSMD based on the paired observations. Based on a paired difference D with population
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\mu_D and \sigma_D^2, SSMD is :\beta = \frac.


Statistical estimation

In the situation where the two groups are independent, Zhang XHD derived the maximum-likelihood estimate (MLE) and method-of-moment (MM) estimate of SSMD. Assume that groups 1 and 2 have sample
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\bar_1, \bar_2, and sample
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s s_1^2, s_2^2. The MM estimate of SSMD is then :\hat = \frac. When the two groups have normal distributions with equal
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
, the uniformly minimal variance unbiased estimate (UMVUE) of SSMD is, :\hat = \frac, where n_1, n_2 are the sample sizes in the two groups and K \approx n_1 + n_2 - 3.48 . In the situation where the two groups are correlated, based on a paired difference with a sample size n, sample
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\bar and sample
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s_D^2, the MM estimate of SSMD is :\hat = \frac. The UMVUE estimate of SSMD is :\hat = \frac \sqrt \frac. SSMD looks similar to t-statistic and Cohen's d, but they are different with one another as illustrated in.


Application in high-throughput screening assays

SSMD is the ratio of
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
to the standard deviation of the difference between two groups. When the data is preprocessed using log-transformation as we normally do in HTS experiments, SSMD is the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
of log fold change divided by the standard deviation of log fold change with respect to a negative reference. In other words, SSMD is the average fold change (on the log scale) penalized by the variability of fold change (on the log scale) . For quality control, one index for the quality of an HTS assay is the magnitude of difference between a positive control and a negative reference in an
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
plate. For hit selection, the size of effects of a compound (i.e., a
small molecule Within the fields of molecular biology and pharmacology, a small molecule or micromolecule is a low molecular weight (≤ 1000 daltons) organic compound that may regulate a biological process, with a size on the order of 1 nm. Many drugs are ...
or an
siRNA Small interfering RNA (siRNA), sometimes known as short interfering RNA or silencing RNA, is a class of double-stranded RNA at first non-coding RNA molecules, typically 20-24 (normally 21) base pairs in length, similar to miRNA, and operating ...
) is represented by the magnitude of difference between the compound and a negative reference. SSMD directly measures the magnitude of difference between two groups. Therefore, SSMD can be used for both quality control and hit selection in HTS experiments.


Quality control

The number of wells for the positive and negative controls in a plate in the 384-well or 1536-well platform is normally designed to be reasonably large . Assume that the positive and negative controls in a plate have sample
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\bar_P, \bar_N, sample
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s s_P^2, s_N^2, and sample sizes n_P, n_N. Usually, the assumption that the controls have equal variance in a plate holds. In such a case, The SSMD for assessing quality in that plate is estimated as :\hat = \frac, where K \approx n_P + n_N - 3.48 . When the assumption of equal variance does not hold, the SSMD for assessing quality in that plate is estimated as :\hat = \frac. If there are clearly
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s in the controls, the SSMD can be estimated as :\hat = \frac, where \tilde_P, \tilde_N, \tilde_P, \tilde_N are the medians and
median absolute deviation In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For ...
s in the positive and negative controls, respectively. The
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
based QC criterion is popularly used in HTS assays. However, it has been demonstrated that this QC criterion is most suitable for an
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
with very or extremely strong positive controls. In an
RNAi RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known by o ...
HTS assay, a strong or moderate positive control is usually more instructive than a very or extremely strong positive control because the effectiveness of this control is more similar to the hits of interest. In addition, the positive controls in the two HTS experiments theoretically have different sizes of effects. Consequently, the QC thresholds for the moderate control should be different from those for the strong control in these two experiments. Furthermore, it is common that two or more positive controls are adopted in a single experiment. Applying the same
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
-based QC criteria to both controls leads to inconsistent results as illustrated in the literatures. The SSMD-based QC criteria listed in the following table take into account the effect size of a positive control in an HTS assay where the positive control (such as an inhibition control) theoretically has values less than the negative reference. In application, if the effect size of a positive control is known biologically, adopt the corresponding criterion based on this table. Otherwise, the following strategy should help to determine which QC criterion should be applied: (i) in many small molecule HTS assay with one positive control, usually criterion D (and occasionally criterion C) should be adopted because this control usually has very or extremely strong effects; (ii) for RNAi HTS assays in which cell viability is the measured response, criterion D should be adopted for the controls without cells (namely, the wells with no cells added) or background controls; (iii) in a viral
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
in which the amount of viruses in host cells is the interest, criterion C is usually used, and criterion D is occasionally used for the positive control consisting of siRNA from the virus. Similar SSMD-based QC criteria can be constructed for an HTS assay where the positive control (such as an activation control) theoretically has values greater than the negative reference. More details about how to apply SSMD-based QC criteria in HTS experiments can be found in a book.


Hit selection

In an HTS assay, one primary goal is to select compounds with a desired size of inhibition or activation effect. The size of the compound effect is represented by the magnitude of difference between a test compound and a negative reference group with no specific inhibition/activation effects. A compound with a desired size of effects in an HTS screen is called a hit. The process of selecting hits is called hit selection. There are two main strategies of selecting hits with large effects. One is to use certain metric(s) to rank and/or classify the compounds by their effects and then to select the largest number of potent compounds that is practical for validation
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
s. The other strategy is to test whether a compound has effects strong enough to reach a pre-set level. In this strategy, false-negative rates (FNRs) and/or false-positive rates (FPRs) must be controlled. SSMD can not only rank the size of effects but also classify effects as shown in the following table based on the population value (\beta ) of SSMD. The estimation of SSMD for screens without replicates differs from that for screens with replicates. In a primary screen without replicates, assuming the measured value (usually on the log scale) in a well for a tested compound is X_i and the negative reference in that plate has sample size n_N, sample
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\bar_N , median \tilde_N , standard deviation s_N and
median absolute deviation In statistics, the median absolute deviation (MAD) is a robust measure of the variability of a univariate sample of quantitative data. It can also refer to the population parameter that is estimated by the MAD calculated from a sample. For ...
\tilde_N , the SSMD for this compound is estimated as :\text= \frac, where K \approx n_N-2.48. When there are outliers in an
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
which is usually common in HTS experiments, a robust version of SSMD can be obtained using :\text= \frac In a confirmatory or primary screen with replicates, for the i-th test compound with n replicates, we calculate the paired difference between the measured value (usually on the log scale) of the compound and the median value of a negative control in a plate, then obtain the
mean There are several kinds of mean in mathematics, especially in statistics. Each mean serves to summarize a given group of data, often to better understand the overall value ( magnitude and sign) of a given data set. For a data set, the '' ari ...
\bar_i and
variance In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its population mean or sample mean. Variance is a measure of dispersion, meaning it is a measure of how far a set of number ...
s_i^2 of the paired difference across replicates. The SSMD for this compound is estimated as :\text= \frac \sqrt \frac In many cases, scientists may use both SSMD and average fold change for hit selection in HTS experiments. The dual-flashlight plot can display both average fold change and SSMD for all test compounds in an
assay An assay is an investigative (analytic) procedure in laboratory medicine, mining, pharmacology, environmental biology and molecular biology for qualitatively assessing or quantitatively measuring the presence, amount, or functional activity of ...
and help to integrate both of them to select hits in HTS experiments . The use of SSMD for hit selection in HTS experiments is illustrated step-by-step in


See also

*
Effect size In statistics, an effect size is a value measuring the strength of the relationship between two variables in a population, or a sample-based estimate of that quantity. It can refer to the value of a statistic calculated from a sample of data, the ...
*
high-throughput screening High-throughput screening (HTS) is a method for scientific experimentation especially used in drug discovery and relevant to the fields of biology, materials science and chemistry. Using robotics, data processing/control software, liquid handlin ...
*
Z-factor The Z-factor is a measure of statistical effect size. It has been proposed for use in high-throughput screening (where it is also known as Z-prime), and commonly written as Z' to judge whether the response in a particular assay is large enough ...
*
Hit selection In high-throughput screening (HTS), one of the major goals is to select compounds (including small molecules, siRNAs, shRNA, genes, et al.) with a desired size of inhibition or activation effects. A compound with a desired size of effects in an HTS ...
*
SMCV In statistics, the standardized mean of a contrast variable (SMCV or SMC), is a parameter assessing effect size. The SMCV is defined as mean divided by the standard deviation of a contrast variable. The SMCV was first proposed for one-way ANOVA ...
* c+-probability *
Contrast variable In statistics, particularly in analysis of variance and linear regression, a contrast is a linear combination of variables (parameters or statistics) whose coefficients add up to zero, allowing comparison of different treatments. Definitions Let \ ...
*
Dual-flashlight plot In statistics, a dual-flashlight plot is a type of scatter-plot in which the standardized mean of a contrast variable ( SMCV) is plotted against the mean of a contrast variable representing a comparison of interest . The commonly used dual-flashl ...


Further reading

* Zhang XHD (2011
"Optimal High-Throughput Screening: Practical Experimental Design and Data Analysis for Genome-scale RNAi Research, Cambridge University Press"


References

{{DEFAULTSORT:SSMD Effect size Descriptive statistics