In
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, Gower's distance between two mixed-type objects is a
similarity measure
In statistics and related fields, a similarity measure or similarity function or similarity metric is a real-valued function that quantifies the similarity between two objects. Although no single definition of a similarity exists, usually such mea ...
that can handle different types of data within the same dataset and is particularly useful in
cluster analysis
Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
or other multivariate statistical techniques. Data can be binary, ordinal, or continuous variables. It works by normalizing the differences between each pair of variables and then computing a weighted average of these differences. The distance was defined in 1971 by Gower and it takes values between 0 and 1 with smaller values indicating higher similarity.
Definition
For two objects
and
having
descriptors, the similarity
is defined as:
where the
are non-negative weights usually set to
and
is the similarity between the two objects regarding their
-th variable. If the variable is binary or ordinal, the values of
are 0 or 1, with 1 denoting equality. If the variable is continuous,
with
being the range of
-th variable and thus ensuring
. As a result, the overall similarity
between two objects is the weighted average of the
similarities calculated for all their descriptors.
In its original exposition, the distance does not treat ordinal variables in a special manner. In the 1990s, first Kaufman and
Rousseeuw and later Podani suggested extensions where the ordering of an ordinal feature is used. For example, Podani obtains relative rank differences as
with
being the ranks corresponding to the ordered categories of the
-th variable.
Software implementations
Many programming languages and statistical packages, such as
R,
Python, etc., include implementations of Gower's distance. The implementations may follow Kaufmann and Rousseeuw's extensions, which change the similarity for continuous variables to
References
{{reflist
Statistical distance
Similarity measures