Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of

n

objects or individuals" into a configuration of

n

points mapped into an abstract

Cartesian space A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in ...

. More technically, MDS refers to a set of related

ordination Ordination is the process by which individuals are Consecration, consecrated, that is, set apart and elevated from the laity class to the clergy, who are thus then authorization, authorized (usually by the religious denomination, denominational ...

techniques used in

information visualization Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...

, in particular to display the information contained in a

distance matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''dist ...

. It is a form of non-linear dimensionality reduction. Given a distance matrix with the distances between each pair of objects in a set, and a chosen number of dimensions, ''N'', an MDS

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...

places each object into ''N''-

dimension In physics and mathematics, the dimension of a Space (mathematics), mathematical space (or object) is informally defined as the minimum number of coordinates needed to specify any Point (geometry), point within it. Thus, a Line (geometry), lin ...

al space (a lower-dimensional representation) such that the between-object distances are preserved as well as possible. For ''N'' = 1, 2, and 3, the resulting points can be visualized on a

scatter plot A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates to display values for typically two variables for a set of data. ...

. Core theoretical contributions to MDS were made by James O. Ramsay of

McGill University McGill University (french: link=no, Université McGill) is an English-language public research university located in Montreal, Quebec, Canada. Founded in 1821 by royal charter granted by King George IV,Frost, Stanley Brice. ''McGill Universit ...

, who is also regarded as the founder of

functional data analysis Functional data analysis (FDA) is a branch of statistics that analyses data providing information about curves, surfaces or anything else varying over a continuum. In its most general form, under an FDA framework, each sample element of functional ...

Types

MDS algorithms fall into a

taxonomy Taxonomy is the practice and science of categorization or classification. A taxonomy (or taxonomical classification) is a scheme of classification, especially a hierarchical classification, in which things are organized into groups or types. ...

, depending on the meaning of the input matrix:

Classical multidimensional scaling

It is also known as Principal Coordinates Analysis (PCoA), Torgerson Scaling or Torgerson–Gower scaling. It takes an input matrix giving dissimilarities between pairs of items and outputs a coordinate matrix whose configuration minimizes a

loss function In mathematical optimization and decision theory, a loss function or cost function (sometimes also called an error function) is a function that maps an event or values of one or more variables onto a real number intuitively representing some "cost ...

called ''strain.'', which is given by

\text_D(x_1,x_2,...,x_N)=\Biggl(\frac \Biggr)^,

where

x_

denote vectors in ''N''-dimensional space,

x_i^T x_j

denotes the scalar product between

x_

and

x_

, and

b_

are the elements of the matrix

B

defined on step 2 of the following algorithm, which are computed from the distances. : Steps of a Classical MDS algorithm: : Classical MDS uses the fact that the coordinate matrix

X

can be derived by

eigenvalue decomposition In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matr ...

from

B=XX'

. And the matrix

B

can be computed from proximity matrix

D

by using double centering. :# Set up the squared proximity matrix

D^=_^2 /math>
:# Apply double centering: B=-\fracCD^C using the

centering matrix In mathematics and multivariate statistics, the centering matrixJohn I. Marden, ''Analyzing and Modeling Rank Data'', Chapman & Hall, 1995, , page 59. is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect a ...

C=I-\fracJ_n

, where

n

is the number of objects,

I

is the

n \times n

identity matrix, and

J_

is an

n\times n

matrix of all ones. :# Determine the

m

largest

eigenvalues In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

\lambda_1,\lambda_2,...,\lambda_m

and corresponding

eigenvectors In linear algebra, an eigenvector () or characteristic vector of a linear transformation is a nonzero vector that changes at most by a scalar factor when that linear transformation is applied to it. The corresponding eigenvalue, often denoted b ...

e_1,e_2,...,e_m

B

(where

m

is the number of dimensions desired for the output). :# Now,

X=E_m\Lambda_m^

, where

E_m

is the matrix of

m

eigenvectors and

\Lambda_m

is the

diagonal matrix In linear algebra, a diagonal matrix is a matrix in which the entries outside the main diagonal are all zero; the term usually refers to square matrices. Elements of the main diagonal can either be zero or nonzero. An example of a 2×2 diagonal ma ...

m

eigenvalues of

B

. :Classical MDS assumes Euclidean distances. So this is not applicable for direct dissimilarity ratings.

Metric multidimensional scaling (mMDS)

It is a superset of classical MDS that generalizes the optimization procedure to a variety of loss functions and input matrices of known distances with weights and so on. A useful loss function in this context is called ''stress'', which is often minimized using a procedure called stress majorization. Metric MDS minimizes the cost function called “stress” which is a residual sum of squares:

$\text_D(x_1,x_2,...,x_N)=\sqrt.$

Metric scaling uses a power transformation with a user-controlled exponent

p

d_^p

and

-d_^

for distance. In classical scaling

p=1.

Non-metric scaling is defined by the use of isotonic regression to nonparametrically estimate a transformation of the dissimilarities.

Non-metric multidimensional scaling (NMDS)

In contrast to metric MDS, non-metric MDS finds both a

non-parametric Nonparametric statistics is the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distri ...

monotonic In mathematics, a monotonic function (or monotone function) is a function between ordered sets that preserves or reverses the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order ...

relationship between the dissimilarities in the item-item matrix and the Euclidean distances between items, and the location of each item in the low-dimensional space. The relationship is typically found using

isotonic regression In statistics and numerical analysis, isotonic regression or monotonic regression is the technique of fitting a free-form line to a sequence of observations such that the fitted line is non-decreasing (or non-increasing) everywhere, and lies as ...

: let

x

denote the vector of proximities,

f(x)

a monotonic transformation of

x

, and

d

the point distances; then coordinates have to be found, that minimize the so-called stress, :

$\text=\sqrt.$

A few variants of this cost function exist. MDS programs automatically minimize stress in order to obtain the MDS solution. The core of a non-metric MDS algorithm is a twofold optimization process. First the optimal monotonic transformation of the proximities has to be found. Secondly, the points of a configuration have to be optimally arranged, so that their distances match the scaled proximities as closely as possible. The basic steps in a non-metric MDS algorithm are: :# Find a random configuration of points, e. g. by sampling from a normal distribution. :# Calculate the distances d between the points. :# Find the optimal monotonic transformation of the proximities, in order to obtain optimally scaled data

f(x)

. :# Minimize the stress between the optimally scaled data and the distances by finding a new configuration of points. :# Compare the stress to some criterion. If the stress is small enough then exit the algorithm else return to 2.

Louis Guttman Louis (Eliyahu) Guttman (February 10, 1916 – October 25, 1987; he, לואיס (אליהו) גוטמן) was an American sociologist and Professor of Social and Psychological Assessment at the Hebrew University of Jerusalem, known primarily for ...

's smallest space analysis (SSA) is an example of a non-metric MDS procedure.

Generalized multidimensional scaling (GMD)

An extension of metric multidimensional scaling, in which the target space is an arbitrary smooth non-Euclidean space. In cases where the dissimilarities are distances on a surface and the target space is another surface, GMDS allows finding the minimum-distortion embedding of one surface into another.

Details

The data to be analyzed is a collection of

M

objects (colors, faces, stocks, . . .) on which a ''distance function'' is defined, :

d_ :=

distance between

i

-th and

j

-th objects. These distances are the entries of the ''dissimilarity matrix'' :

D := 
\begin
d_ & d_ & \cdots & d_ \\
d_ & d_ & \cdots & d_ \\
\vdots & \vdots & & \vdots \\
d_ & d_ & \cdots & d_
\end.

The goal of MDS is, given

D

, to find

M

vectors

x_1,\ldots,x_M \in \mathbb^N

such that :

\, x_i - x_j\,  \approx d_

for all

i,j\in

, where

\, \cdot\,

is a

vector norm In mathematics, a norm is a function (mathematics), function from a real number, real or complex number, complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the Origin (mathematics), origin: it ...

. In classical MDS, this norm is the

Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, therefor ...

, but, in a broader sense, it may be a

metric Metric or metrical may refer to: * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics In mathema ...

or arbitrary distance function. Kruskal, J. B., and Wish, M. (1978), ''Multidimensional Scaling'', Sage University Paper series on Quantitative Application in the Social Sciences, 07-011. Beverly Hills and London: Sage Publications. In other words, MDS attempts to find a mapping from the

M

objects into

\mathbb^N

such that distances are preserved. If the dimension

N

is chosen to be 2 or 3, we may plot the vectors

x_i

to obtain a visualization of the similarities between the

M

objects. Note that the vectors

x_i

are not unique: With the Euclidean distance, they may be arbitrarily translated, rotated, and reflected, since these transformations do not change the pairwise distances

\, x_i - x_j\,

. (Note: The symbol

\mathbb

indicates the set of

real numbers In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...

, and the notation

\mathbb^N

refers to the Cartesian product of

N

copies of

\mathbb

, which is an

N

-dimensional vector space over the field of the real numbers.) There are various approaches to determining the vectors

x_i

. Usually, MDS is formulated as an

optimization problem In mathematics, computer science and economics, an optimization problem is the problem of finding the ''best'' solution from all feasible solutions. Optimization problems can be divided into two categories, depending on whether the variables ...

, where

(x_1,\ldots,x_M)

is found as a minimizer of some cost function, for example, :

\underset \sum_ ( \, x_i - x_j\,  - d_ )^2. \,

A solution may then be found by numerical optimization techniques. For some particularly chosen cost functions, minimizers can be stated analytically in terms of matrix eigendecompositions.

Procedure

There are several steps in conducting MDS research: # Formulating the problem – What variables do you want to compare? How many variables do you want to compare? What purpose is the study to be used for? # Obtaining input data – For example, :- Respondents are asked a series of questions. For each product pair, they are asked to rate similarity (usually on a 7-point

Likert scale A Likert scale ( , commonly mispronounced as ) is a psychometric scale commonly involved in research that employs questionnaires. It is the most widely used approach to scaling responses in survey research, such that the term (or more fully the ...

from very similar to very dissimilar). The first question could be for Coke/Pepsi for example, the next for Coke/Hires rootbeer, the next for Pepsi/Dr Pepper, the next for Dr Pepper/Hires rootbeer, etc. The number of questions is a function of the number of brands and can be calculated as

Q = N (N - 1) / 2

where ''Q'' is the number of questions and ''N'' is the number of brands. This approach is referred to as the “Perception data : direct approach”. There are two other approaches. There is the “Perception data : derived approach” in which products are decomposed into attributes that are rated on a

semantic differential The semantic differential (SD) is a measurement scale designed to measure a person's subjective perception of, and affective reactions to, the properties of concepts, objects, and events by making use of a set of bipolar scales. The SD is used to a ...

scale. The other is the “Preference data approach” in which respondents are asked their preference rather than similarity. # Running the MDS statistical program – Software for running the procedure is available in many statistical software packages. Often there is a choice between Metric MDS (which deals with interval or ratio level data), and Nonmetric MDS (which deals with ordinal data). # Decide number of dimensions – The researcher must decide on the number of dimensions they want the computer to create. Interpretability of the MDS solution is often important, and lower dimensional solutions will typically be easier to interpret and visualize. However, dimension selection is also an issue of balancing underfitting and overfitting. Lower dimensional solutions may underfit by leaving out important dimensions of the dissimilarity data. Higher dimensional solutions may overfit to noise in the dissimilarity measurements. Model selection tools like

AIC AIC may refer to: Arts and entertainment * Alice in Chains, American rock band * Alice in Chains: AIC 23, a 2013 mockumentary * Anime International Company, a Japanese animation studio * Art Institute of Chicago, an art museum in Chicago Busin ...

, BIC,

Bayes factors The Bayes factor is a ratio of two competing statistical models represented by their marginal likelihood, and is used to quantify the support for one model over the other. The models in questions can have a common set of parameters, such as a nul ...

, or cross-validation can thus be useful to select the dimensionality that balances underfitting and overfitting. # Mapping the results and defining the dimensions – The statistical program (or a related module) will map the results. The map will plot each product (usually in two-dimensional space). The proximity of products to each other indicate either how similar they are or how preferred they are, depending on which approach was used. How the dimensions of the embedding actually correspond to dimensions of system behavior, however, are not necessarily obvious. Here, a subjective judgment about the correspondence can be made (see perceptual mapping). # Test the results for reliability and validity – Compute

R-squared In statistics, the coefficient of determination, denoted ''R''2 or ''r''2 and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the independent variable(s). It is a statistic used ...

to determine what proportion of variance of the scaled data can be accounted for by the MDS procedure. An R-square of 0.6 is considered the minimum acceptable level. An R-square of 0.8 is considered good for metric scaling and .9 is considered good for non-metric scaling. Other possible tests are Kruskal’s Stress, split data tests, data stability tests (i.e., eliminating one brand), and test-retest reliability. # Report the results comprehensively – Along with the mapping, at least distance measure (e.g., Sorenson index,

Jaccard index The Jaccard index, also known as the Jaccard similarity coefficient, is a statistic used for gauging the similarity and diversity of sample sets. It was developed by Grove Karl Gilbert in 1884 as his ratio of verification (v) and now is freque ...

) and reliability (e.g., stress value) should be given. It is also very advisable to give the algorithm (e.g., Kruskal, Mather), which is often defined by the program used (sometimes replacing the algorithm report), if you have given a start configuration or had a random choice, the number of runs, the assessment of dimensionality, the

Monte Carlo method Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be determi ...

results, the number of iterations, the assessment of stability, and the proportional variance of each axis (r-square).

Implementations

ELKI ELKI (for ''Environment for DeveLoping KDD-Applications Supported by Index-Structures'') is a data mining (KDD, knowledge discovery in databases) software framework developed for use in research and teaching. It was originally at the database s ...

includes two MDS implementations. *

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...

includes two MDS implementations (for classical (''cmdscale'') and non-classical (''mdscale'') MDS respectively). * The

R programming language R is a programming language for statistical computing and graphics supported by the R Core Team and the R Foundation for Statistical Computing. Created by statisticians Ross Ihaka and Robert Gentleman, R is used among data miners, bioinform ...

offers several MDS implementations, e.g. base ''cmdscale'' function, package
smacof
ref> (mMDS and nMDS), an
vegan
(weighted MDS). *

scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector m ...

contains functio
sklearn.manifold.MDS

References

Bibliography

* * * * * * {{Authority control Dimension reduction Quantitative marketing research Psychometrics

Types

Classical multidimensional scaling

Metric multidimensional scaling (mMDS)

Non-metric multidimensional scaling (NMDS)

Generalized multidimensional scaling (GMD)

Details

Procedure

Implementations

See also

References

Bibliography