Homogeneity (statistics)
   HOME

TheInfoList



OR:

In statistics, homogeneity and its opposite, heterogeneity, arise in describing the properties of a
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
, or several datasets. They relate to the validity of the often convenient assumption that the statistical properties of any one part of an overall dataset are the same as any other part. In
meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
, which combines the data from several studies, homogeneity measures the differences or similarities between the several studies (see also
Study heterogeneity In statistics, (between-) study heterogeneity is a phenomenon that commonly occurs when attempting to undertake a meta-analysis. In a simplistic scenario, studies whose results are to be combined in the meta-analysis would all be undertaken in the ...
). Homogeneity can be studied to several degrees of complexity. For example, considerations of
homoscedasticity In statistics, a sequence (or a vector) of random variables is homoscedastic () if all its random variables have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. Th ...
examine how much the variability of data-values changes throughout a dataset. However, questions of homogeneity apply to all aspects of the statistical distributions, including the
location parameter In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
. Thus, a more detailed study would examine changes to the whole of the
marginal distribution In probability theory and statistics, the marginal distribution of a subset of a collection of random variables is the probability distribution of the variables contained in the subset. It gives the probabilities of various values of the varia ...
. An intermediate-level study might move from looking at the variability to studying changes in the
skewness In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The skewness value can be positive, zero, negative, or undefined. For a unimodal ...
. In addition to these, questions of homogeneity apply also to the
joint distributions Given two random variables that are defined on the same probability space, the joint probability distribution is the corresponding probability distribution on all possible pairs of outputs. The joint distribution can just as well be considered ...
. The concept of homogeneity can be applied in many different ways and, for certain types of statistical analysis, it is used to look for further properties that might need to be treated as varying within a dataset once some initial types of non-homogeneity have been dealt with.


Examples


Regression

Differences in the typical values across the dataset might initially be dealt with by constructing a regression model using certain explanatory variables to relate variations in the typical value to known quantities. There should then be a later stage of analysis to examine whether the errors in the predictions from the regression behave in the same way across the dataset. Thus the question becomes one of the homogeneity of the distribution of the residuals, as the explanatory variables change. See
regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
.


Time series

The initial stages in the analysis of a time series may involve plotting values against time to examine homogeneity of the series in various ways: stability across time as opposed to a trend; stability of local fluctuations over time.


Combining information across sites

In
hydrology Hydrology () is the scientific study of the movement, distribution, and management of water on Earth and other planets, including the water cycle, water resources, and environmental watershed sustainability. A practitioner of hydrology is call ...
, data-series across a number of sites composed of annual values of the within-year annual maximum river-flow are analysed. A common model is that the distributions of these values are the same for all sites apart from a simple scaling factor, so that the location and scale are linked in a simple way. There can then be questions of examining the homogeneity across sites of the distribution of the scaled values.


Combining information sources

In
meteorology Meteorology is a branch of the atmospheric sciences (which include atmospheric chemistry and physics) with a major focus on weather forecasting. The study of meteorology dates back millennia, though significant progress in meteorology did no ...
, weather datasets are acquired over many years of record and, as part of this, measurements at certain stations may cease occasionally while, at around the same time, measurements may start at nearby locations. There are then questions as to whether, if the records are combined to form a single longer set of records, those records can be considered homogeneous over time. An example of homogeneity testing of wind speed and direction data can be found in Romanić ''et al''., 2015.Romanić D. Ćurić M- Jovičić I. Lompar M. 2015. Long-term trends of the ‘Koshava’ wind during the period 1949–2010. International Journal of Climatology 35(2):288-302. DOI:10.1002/joc.3981.


Homogeneity within populations

Simple populations surveys may start from the idea that responses will be homogeneous across the whole of a population. Assessing the homogeneity of the population would involve looking to see whether the responses of certain identifiable
subpopulation In statistics, a population is a Set (mathematics), set of similar items or events which is of interest for some question or experiment. A statistical population can be a group of existing objects (e.g. the set of all stars within the Milky Way g ...
s differ from those of others. For example, car-owners may differ from non-car-owners, or there may be differences between different age-groups.


Tests

A test for homogeneity, in the sense of exact equivalence of statistical distributions, can be based on an
E-statistic Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in ''R''d with cumulative distribution functions (cdf) F and G respectively, then the energy distance between the distribution ...
. A
location test A location test is a statistical hypothesis test that compares the location parameter of a statistical population to a given constant, or that compares the location parameters of two statistical populations to each other. Most commonly, the locat ...
tests the simpler hypothesis that distributions have the same
location parameter In geography, location or place are used to denote a region (point, line, or area) on Earth's surface or elsewhere. The term ''location'' generally implies a higher degree of certainty than ''place'', the latter often indicating an entity with an ...
.


See also

* Consistency (statistics) *
Reliability (statistics) In statistics and psychometrics, reliability is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions:"It is the characteristic of a set of test scores that ...


References


Further reading

* Hall, M.J. (2003) The interpretation of non-homogeneous hydrometeorological time series a case study. ''
Meteorological Applications ''Meteorological Applications'' is a peer-reviewed scientific journal of meteorology published four times per year since 1994. It is published by John Wiley & Sons on behalf of the Royal Meteorological Society. Abstracting and indexing The journ ...
'', 10, 61–67. {{doi, 10.1017/S1350482703005061 * Krus, D.J., & Blackman, H.S. (1988).Test reliability and homogeneity from perspective of the ordinal test theory. ''Applied Measurement in Education,'' 1, 79–8
(Request reprint).
* Loevinger, J. (1948). The technic of homogeneous tests compared with some aspects of scale analysis and factor analysis. ''Psychological Bulletin,'' 45, 507–529. Statistical analysis Meta-analysis