HOME

TheInfoList



OR:

In
mathematics Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
,
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
and especially
graph theory In mathematics and computer science, graph theory is the study of ''graph (discrete mathematics), graphs'', which are mathematical structures used to model pairwise relations between objects. A graph in this context is made up of ''Vertex (graph ...
, a distance matrix is a
square matrix In mathematics, a square matrix is a Matrix (mathematics), matrix with the same number of rows and columns. An ''n''-by-''n'' matrix is known as a square matrix of order Any two square matrices of the same order can be added and multiplied. Squ ...
(two-dimensional array) containing the
distance Distance is a numerical or occasionally qualitative measurement of how far apart objects, points, people, or ideas are. In physics or everyday usage, distance may refer to a physical length or an estimation based on other criteria (e.g. "two co ...
s, taken pairwise, between the elements of a set. Depending upon the application involved, the ''distance'' being used to define this matrix may or may not be a
metric Metric or metrical may refer to: Measuring * Metric system, an internationally adopted decimal system of measurement * An adjective indicating relation to measurement in general, or a noun describing a specific type of measurement Mathematics ...
. If there are elements, this matrix will have size . In graph-theoretic applications, the elements are more often referred to as points, nodes or vertices.


Non-metric distance matrix

In general, a distance matrix is a weighted
adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph (discrete mathematics), graph. The elements of the matrix (mathematics), matrix indicate whether pairs of Vertex (graph theory), vertices ...
of some graph. In a network, a directed graph with weights assigned to the arcs, the distance between two nodes of the network can be defined as the minimum of the sums of the weights on the shortest paths joining the two nodes (where the number of steps in the path is bounded). This distance function, while well defined, is not a metric. There need be no restrictions on the weights other than the need to be able to combine and compare them, so negative weights are used in some applications. Since paths are directed, symmetry can not be guaranteed, and if negative-weight cycles exist the distance matrix may not be hollow (and in the absence of a bound on the step count, the matrix may be undefined). An algebraic formulation of the above can be obtained by using the min-plus algebra. Matrix multiplication in this system is defined as follows: Given two matrices and , their distance product is defined as an matrix such that :c_ = \min_^n \. Note that the off-diagonal elements that are not connected directly will need to be set to infinity or a suitable large value for the min-plus operations to work correctly. A zero in these locations will be incorrectly interpreted as an edge with no distance, cost, etc. If is an matrix containing the edge weights of a
graph Graph may refer to: Mathematics *Graph (discrete mathematics), a structure made of vertices and edges **Graph theory, the study of such graphs and their properties *Graph (topology), a topological space resembling a graph in the sense of discret ...
, then (using this distance product) gives the distances between vertices using paths of length at most edges, and so is the distance matrix of the graph when the step count bound is set to ''k''. If there are no loops of negative weight, will give the true distance matrix, with no bound, because removing repeated vertices from a path cannot lower its weight. On the other hand, if ''i'' and ''j'' are on a negative-weight loop, will decrease without bound as ''k'' increases. An arbitrary graph on vertices can be modeled as a weighted
complete graph In the mathematical field of graph theory, a complete graph is a simple undirected graph in which every pair of distinct vertices is connected by a unique edge. A complete digraph is a directed graph in which every pair of distinct vertices i ...
on vertices by assigning a weight of one to each edge of the complete graph that corresponds to an edge of and infinity to all other edges. for this complete graph is the
adjacency matrix In graph theory and computer science, an adjacency matrix is a square matrix used to represent a finite graph (discrete mathematics), graph. The elements of the matrix (mathematics), matrix indicate whether pairs of Vertex (graph theory), vertices ...
of . The distance matrix of can be computed from as above; by contrast, if normal matrix multiplication is used, and unlinked vertices are represented with 0, would instead encode the number of paths between any two vertices of length exactly .


Metric distance matrix

The value of a distance matrix formalism in many applications is in how the distance matrix can manifestly encode the metric axioms and in how it lends itself to the use of linear algebra techniques. That is, if with is a distance matrix for a metric distance, then # the entries on the main diagonal are all zero (that is, the matrix is a
hollow matrix In mathematics, a hollow matrix may refer to one of several related classes of matrix: a sparse matrix; a matrix with a large block of zeroes; or a matrix with diagonal entries all zero. Definitions Sparse A ''hollow matrix'' may be one with "f ...
), i.e. for all , # all the off-diagonal entries are positive ( if ), (that is, a non-negative matrix), # the matrix is a
symmetric matrix In linear algebra, a symmetric matrix is a square matrix that is equal to its transpose. Formally, Because equal matrices have equal dimensions, only square matrices can be symmetric. The entries of a symmetric matrix are symmetric with ...
(), and # for any and , for all (the triangle inequality). This can be stated in terms of tropical matrix multiplication When a distance matrix satisfies the first three axioms (making it a semi-metric) it is sometimes referred to as a pre-distance matrix. A pre-distance matrix that can be embedded in a Euclidean space is called a Euclidean distance matrix. For mixed-type data that contain numerical as well as categorical descriptors,
Gower's distance In statistics, Gower's distance between two mixed-type objects is a similarity measure that can handle different types of data within the same dataset and is particularly useful in cluster analysis Cluster analysis or clustering is the data an ...
is a common alternative. Another common example of a metric distance matrix arises in
coding theory Coding theory is the study of the properties of codes and their respective fitness for specific applications. Codes are used for data compression, cryptography, error detection and correction, data transmission and computer data storage, data sto ...
when in a
block code In coding theory, block codes are a large and important family of Channel coding, error-correcting codes that encode data in blocks. There is a vast number of examples for block codes, many of which have a wide range of practical applications. Th ...
the elements are strings of fixed length over an alphabet and the distance between them is given by the
Hamming distance In information theory, the Hamming distance between two String (computer science), strings or vectors of equal length is the number of positions at which the corresponding symbols are different. In other words, it measures the minimum number ...
metric. The smallest non-zero entry in the distance matrix measures the error correcting and error detecting capability of the code.


Additive distance matrix

An additive distance matrix is a special type of matrix used in
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
to build a
phylogenetic tree A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
. Let be the lowest common ancestor between two species and , we expect . This is where the additive metric comes from. A distance matrix for a set of species is said to be additive if and only if there exists a phylogeny for such that: * Every edge in is associated with a positive weight * For every , equals the sum of the weights along the path from to in For this case, is called an additive matrix and is called an additive tree. Below we can see an example of an additive distance matrix and its corresponding tree:


Ultrametric distance matrix

The ultrametric distance matrix is defined as an additive matrix which models the constant
molecular clock The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleot ...
. It is used to build a phylogenetic tree. A matrix is said to be ultrametric if there exists a tree such that: * equals the sum of the edge weights along the path from to in * A root of the tree can be identified with the distance to all the leaves being the same Here is an example of an ultrametric distance matrix with its corresponding tree:


Bioinformatics

The distance matrix is widely used in the bioinformatics field, and it is present in several methods, algorithms and programs. Distance matrices are used to represent
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residue (biochemistry), residues. Proteins perform a vast array of functions within organisms, including Enzyme catalysis, catalysing metab ...
structures in a coordinate-independent manner, as well as the pairwise distances between two sequences in
sequence space In functional analysis and related areas of mathematics, a sequence space is a vector space whose elements are infinite sequences of real or complex numbers. Equivalently, it is a function space whose elements are functions from the natural num ...
. They are used in
structural A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such as ...
and
sequential In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
alignment, and for the determination of protein structures from
NMR Nuclear magnetic resonance (NMR) is a physical phenomenon in which atomic nucleus, nuclei in a strong constant magnetic field are disturbed by a weak oscillating magnetic field (in the near and far field, near field) and respond by producing ...
or
X-ray crystallography X-ray crystallography is the experimental science of determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to Diffraction, diffract in specific directions. By measuring th ...
. Sometimes it is more convenient to express data as a similarity matrix. It is also used to define the
distance correlation In statistics and in probability theory, distance correlation or distance covariance is a measure of dependence between two paired random vectors of arbitrary, not necessarily equal, dimension. The population distance correlation coefficient is ze ...
.


Sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural biology, structural, or evolutionary relationships between ...

An alignment of two sequences is formed by inserting spaces in arbitrary locations along the sequences so that they end up with the same length and there are no two spaces at the same position of the two augmented sequences. One of the primary methods for sequence alignment is dynamic programming. The method is used to fill the distance matrix and then obtain the alignment. In typical usage, for sequence alignment a matrix is used to assign scores to amino-acid matches or mismatches, and a gap penalty for matching an amino-acid in one sequence with a gap in the other.


Global alignment

The Needleman–Wunsch algorithm used to calculate global alignment uses dynamic programming to obtain the distance matrix.


Local alignment

The Smith–Waterman algorithm is also dynamic programming based which consists also in obtaining the distance matrix and then obtain the local alignment.


Multiple sequence alignment

Multiple sequence alignment Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis an ...
is an extension of pairwise alignment to align several sequences at a time. Different MSA methods are based on the same idea of the distance matrix as global and local alignments. * Center star method. This method defines a center sequence which minimizes the distance between the sequence and any other sequence . Then it generates a multiple alignment for the set of sequences so that for every the alignment distance is the optimal pairwise alignment. This method has the characteristic that the computed alignment for whose sum-of-pair distance is at most twice the optimal multiple alignment. * Progressive alignment method. This heuristic method to create MSA first aligns the two most related sequences, and then it progressively aligns the next two most related sequences until all sequences are aligned. There are other methods that have their own program due to their popularity: * ClustalW *
MUSCLE Muscle is a soft tissue, one of the four basic types of animal tissue. There are three types of muscle tissue in vertebrates: skeletal muscle, cardiac muscle, and smooth muscle. Muscle tissue gives skeletal muscles the ability to muscle contra ...
* MAFFT * MANGO * And many more


= MAFFT

= Multiple alignment using fast Fourier transform (MAFFT) is a program with an algorithm based on progressive alignment, and it offers various multiple alignment strategies. First, MAFFT constructs a distance matrix based on the number of shared 6-tuples. Second, it builds the guide tree based on the previous matrix. Third, it clusters the sequences with the help of the
fast Fourier transform A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). A Fourier transform converts a signal from its original domain (often time or space) to a representation in ...
and starts the alignment. Based on the new alignment, it reconstructs the guide tree and align again.


Phylogenetic analysis

To perform
phylogenetic In biology, phylogenetics () is the study of the evolutionary history of life using observable characteristics of organisms (or genes), which is known as phylogenetic inference. It infers the relationship among organisms based on empirical dat ...
analysis, the first step is to reconstruct the phylogenetic tree: given a collection of species, the problem is to reconstruct or infer the ancestral relationships among the species, i.e., the phylogenetic tree among the species. Distance matrix methods perform this activity.


Distance matrix methods

Distance matrix methods of phylogenetic analysis explicitly rely on a measure of "genetic distance" between the sequences being classified, and therefore require multiple sequences as an input. Distance methods attempt to construct an all-to-all matrix from the sequence query set describing the distance between each sequence pair. From this is constructed a phylogenetic tree that places closely related sequences under the same interior node and whose branch lengths closely reproduce the observed distances between sequences. Distance-matrix methods may produce either rooted or unrooted trees, depending on the algorithm used to calculate them. Given species, the input is an distance matrix where is the mutation distance between species and . The aim is to output a tree of degree which is consistent with the distance matrix. They are frequently used as the basis for progressive and iterative types of
multiple sequence alignment Multiple sequence alignment (MSA) is the process or the result of sequence alignment of three or more biological sequences, generally protein, DNA, or RNA. These alignments are used to infer evolutionary relationships via phylogenetic analysis an ...
. The main disadvantage of distance-matrix methods is their inability to efficiently use information about local high-variation regions that appear across multiple subtrees. Despite potential problems, distance methods are extremely fast, and they often produce a reasonable estimate of phylogeny. They also have certain benefits over the methods that use characters directly. Notably, distance methods allow use of data that may not be easily converted to character data, such as
DNA–DNA hybridization In genomics, DNA–DNA hybridization is a molecular biology technique that measures the degree of genetic similarity between DNA sequences. It is used to determine the genetic distance between two organisms and has been used extensively in phylo ...
assays. The following are distance based methods for phylogeny reconstruction: * Additive tree reconstruction *
UPGMA UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. It also has a weighted variant, WPGMA, and they are generally attributed to Sokal and Michener. Note that the unwei ...
*
Neighbor joining In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm require ...
* Fitch–Margoliash


= Additive tree reconstruction

= Additive tree reconstruction is based on additive and ultrametric distance matrices. These matrices have a special characteristic: Consider an additive matrix . For any three species the corresponding tree is unique. Every ultrametric distance matrix is an additive matrix. We can observe this property for the tree below, which consists on the species . The additive tree reconstruction technique starts with this tree. And then adds one more species each time, based on the distance matrix combined with the property mentioned above. For example, consider an additive matrix and 5 species and . First we form an additive tree for two species and . Then we chose a third one, let's say and attach it to a point on the edge between and . The edge weights are computed with the property above. Next we add the fourth species to any of the edges. If we apply the property then we identify that should be attached to only one specific edge. Finally, we add following the same procedure as before.


= UPGMA

= The basic principle of UPGMA (Unweighted Pair Group Method with Arithmetic Mean) is that similar species should be closer in the phylogenetic tree. Hence, it builds the tree by clustering similar sequences iteratively. The method works by building the phylogenetic tree bottom up from its leaves. Initially, we have leaves (or singleton trees), each representing a species in . Those leaves are referred as clusters. Then, we perform iterations. In each iteration, we identify two clusters and with the smallest average distance and merge them to form a bigger cluster . If we suppose is ultrametric, for any cluster created by the UPGMA algorithm, is a valid ultrametric tree.


= Neighbor joining

= Neighbor is a bottom-up clustering method. It takes a distance matrix specifying the distance between each pair of sequences. The algorithm starts with a completely unresolved tree, whose topology corresponds to that of a
star network A star network is an implementation of a spoke–hub distribution paradigm in computer networks. In a star network, every host is connected to a central hub. In its simplest form, one central hub acts as a conduit to transmit messages. The ...
, and iterates over the following steps until the tree is completely resolved and all branch lengths are known: # Based on the current distance matrix calculate the matrix (defined below). # Find the pair of distinct taxa i and j (i.e. with) for which has its lowest value. These taxa are joined to a newly created node, which is connected to the central node. # Calculate the distance from each of the
taxa In biology, a taxon (back-formation from ''taxonomy''; : taxa) is a group of one or more populations of an organism or organisms seen by taxonomists to form a unit. Although neither is required, a taxon is usually known by a particular name and ...
in the pair to this new node. # Calculate the distance from each of the taxa outside of this pair to the new node. # Start the algorithm again, replacing the pair of joined neighbors with the new node and using the distances calculated in the previous step.


= Fitch–Margoliash

= The Fitch–Margoliash method uses a weighted
least squares The method of least squares is a mathematical optimization technique that aims to determine the best fit function by minimizing the sum of the squares of the differences between the observed values and the predicted values of the model. The me ...
method for clustering based on genetic distance. Closely related sequences are given more weight in the tree construction process to correct for the increased inaccuracy in measuring distances between distantly related sequences. The least-squares criterion applied to these distances is more accurate but less efficient than the neighbor-joining methods. An additional improvement that corrects for correlations between distances that arise from many closely related sequences in the data set can also be applied at increased computational cost.


Data Mining and Machine Learning


Data Mining

A common function in data mining is applying
cluster analysis Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
on a given set of data to group data based on how similar or more similar they are when compared to other groups. Distance matrices became heavily dependent and utilized in
cluster analysis Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more Similarity measure, similar (in some specific sense defined by the ...
since similarity can be measured with a distance metric. Thus, distance matrix became the representation of the similarity measure between all the different pairs of data in the set.


Hierarchical clustering

A distance matrix is necessary for traditional
hierarchical clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two ...
algorithms which are often heuristic methods employed in biological sciences such as phylogeny reconstruction. When implementing any of the hierarchical clustering algorithms in data mining, the distance matrix will contain all pair-wise distances between every point and then will begin to create clusters between two different points or clusters based entirely on distances from the distance matrix. If N be the number of points, the complexity of hierarchical clustering is: * Time complexity is O(N^3) due to the repetitive calculations done after every cluster to update the distance matrix * Space complexity is O(N^2)


Machine Learning

Distance metrics are a key part of several machine learning algorithms, which are used in both supervised and
unsupervised learning Unsupervised learning is a framework in machine learning where, in contrast to supervised learning, algorithms learn patterns exclusively from unlabeled data. Other frameworks in the spectrum of supervisions include weak- or semi-supervision, wh ...
. They are generally used to calculate the similarity between data points: this is where the distance matrix is an essential element. The use of an effective distance matrix improves the performance of the machine learning model, whether it is for classification tasks or for clustering.


K-Nearest Neighbors

A distance matrix is utilized in the k-NN algorithm which is one of the slowest but simplest and most used instance-based machine learning algorithms that can be used both in classification and regression tasks. It is one of the slowest machine learning algorithms since each test sample's predicted result requires a fully computed distance matrix between the test sample and each training sample in the training set. Once the distance matrix is computed, the algorithm selects the K number of training samples that are the closest to the test sample to predict the test sample's result based on the selected set's majority (classification) or average (regression) value. * Prediction time complexity is O(k * n * d), to compute the distance between each test sample with every training sample to construct the distance matrix where: # k = number of nearest neighbors selected # n = size of the training set # d = number of dimensions being used for the data This classification focused model predicts the label of the target based on the distance matrix between the target and each of the training samples to determine the K-number of samples that are the closest/nearest to the target.


Computer Vision

A distance matrix can be used in
neural networks A neural network is a group of interconnected units called neurons that send signals to one another. Neurons can be either Cell (biology), biological cells or signal pathways. While individual neurons are simple, many of them together in a netwo ...
for 2D to 3D regression in image predicting machine learning models.


Information retrieval


Distance matrices using Gaussian mixture distance



Gaussian mixture distance for performing accurate
nearest neighbor search Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: ...
for information retrieval. Under an established Gaussian finite mixture model for the distribution of the data in the database, the Gaussian mixture distance is formulated based on minimizing the Kullback-Leibler divergence between the distribution of the retrieval data and the data in database. In the comparison of performance of the Gaussian mixture distance with the well-known Euclidean and Mahalanobis distances based on a precision performance measurement, experimental results demonstrate that the Gaussian mixture distance function is superior in the others for different types of testing data. Potential basic algorithms worth noting on the topic of information retrieval is Fish School Search algorithm an information retrieval that partakes in the act of using distance matrices in order for gathering collective behavior of fish schools. By using a feeding operator to update their weights Eq. A: : x_i(t+1)=x_(t)- step_ rand(0,1)\frac, Eq. B: : x_i(t+1)=x_(t)+step_ rand(0,1)\frac, Stepvol defines the size of the maximum volume displacement preformed with the distance matrix, specifically using a
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...
matrix.


Evaluation of the similarity or dissimilarity of Cosine similarity and Distance matrices



While the
Cosine similarity In data analysis, cosine similarity is a measure of similarity between two non-zero vectors defined in an inner product space. Cosine similarity is the cosine of the angle between the vectors; that is, it is the dot product of the vectors divided ...
measure is perhaps the most frequently applied proximity measure in information retrieval by measuring the angles between documents in the search space on the base of the cosine. Euclidean distance is invariant to mean-correction. The sampling distribution of a mean is generated by repeated sampling from the same population and recording of the sample means obtained. This forms a distribution of different means, and this distribution has its own mean and variance. For the data which can be negative as well as positive, the
null distribution Null may refer to: Science, technology, and mathematics Astronomy *Nuller, an optical tool using interferometry to block certain sources of light Computing *Null (SQL) (or NULL), a special marker and keyword in SQL indicating that a data value do ...
for cosine similarity is the distribution of the
dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a Scalar (mathematics), scalar as a result". It is also used for other symmetric bilinear forms, for example in a pseudo-Euclidean space. N ...
of two independent random unit vectors. This distribution has a mean of zero and a variance of 1/n. While
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...
will be invariant to this correction.


Clustering Documents

The implementation of
hierarchical clustering In data mining and statistics, hierarchical clustering (also called hierarchical cluster analysis or HCA) is a method of cluster analysis that seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two ...
with distance-based metrics to organize and group similar documents together will require the need and utilization of a distance matrix. The distance matrix will represent the degree of association that a document has with another document that will be used to create clusters of closely associated documents that will be utilized in retrieval methods of relevant documents for a user's query.


Isomap

Isomap incorporates distance matrices to utilize geodesic distances to able to compute lower-dimensional embeddings. This helps to address a collection of documents that reside within a massive number of dimensions and empowers to perform document clustering.


Neighborhood Retrieval Visualizer (NeRV)

An algorithm used for both unsupervised and supervised visualization that uses distance matrices to find similar data based on the similarities shown on a display/screen. The distance matrix needed for Unsupervised NeRV can be computed through fixed input pairwise distances. The distance matrix needed for Supervised NeRV requires formulating a supervised distance metric to be able to compute the distance of the input in a supervised manner.


Chemistry

The distance matrix is a mathematical object widely used in both graphical-theoretical (topological) and geometric (topographic) versions of chemistry. The distance matrix is used in chemistry in both explicit and implicit forms.


Interconversion mechanisms between two permutational isomers

Distance matrices were used as the main approach to depict and reveal the shortest path sequence needed to determine the rearrangement between the two permutational isomers.


Distance Polynomials and Distance Spectra

Explicit use of Distance matrices is required in order to construct the distance polynomials and distance spectra of molecular structures.


Structure-property model

Implicit use of Distance matrices was applied through the use of the distance based metric Weiner number/ Weiner Index which was formulated to represent the distances in all chemical structures. The Weiner number is equal to half-sum of the elements of the distance matrix.


Graph-theoretical Distance matrix

Distance matrix in chemistry that are used for the 2-D realization of molecular graphs, which are used to illustrate the main foundational features of a molecule in a myriad of applications. # Creating a label tree that represents the carbon skeleton of a molecule based on its distance matrix. The distance matrix is imperative in this application because similar molecules can have a myriad of label tree variants of their carbon skeleton. The labeled tree structure of
hexane Hexane () or ''n''-hexane is an organic compound, a straight-chain alkane with six carbon atoms and the molecular formula C6H14. Hexane is a colorless liquid, odorless when pure, and with a boiling point of approximately . It is widely used as ...
(C6H14) carbon skeleton that is created based on the distance matrix in the example, has different carbon skeleton variants that affect both the distance matrix and the labeled tree # Creating a labeled graph with edge weights, used in
chemical graph theory Chemical graph theory is the topology branch of mathematical chemistry which applies graph theory to mathematical modelling of chemical phenomena. The pioneers of chemical graph theory are Alexandru Balaban, Ante Graovac, Iván Gutman, Haruo ...
, that represent molecules with hetero-atoms. # Le Verrier-Fadeev-Frame (LVFF) method is a computer oriented used to speed up the process of detecting the graph center in polycyclic graphs. However, LVFF requires the input to be a diagonalized distance matrix which is easily resolved by implementing the Householder tridiagonal-QL algorithm that takes in a distance matrix and returns the diagonalized distance needed for the LVFF method.


Geometric-Distance Matrix

While the graph-theoretical distance matrix 2-D captures the constitutional features of the molecule, its three-dimensional (3D) character is encoded in the geometric-distance matrix. The geometric-distance matrix is a different type of distance matrix that is based on the graph-theoretical distance matrix of a molecule to represent and graph the 3-D molecule structure. The geometric-distance matrix of a molecular structure is a real symmetric matrix defined in the same way as a 2-D matrix. However, the matrix elements will hold a collection of shortest Cartesian distances between and in . Also known as topographic matrix, the geometric-distance matrix can be constructed from the known geometry of the molecule. As an example, the geometric-distance matrix of the carbon skeleton of ''2,4-dimethylhexane'' is shown below:


Other Applications


Time Series Analysis

Dynamic Time Warping In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walk ...
distance matrices are utilized with the clustering and classification algorithms of a collection/group of time series objects.


Examples

For example, suppose these data are to be analyzed, where
pixel In digital imaging, a pixel (abbreviated px), pel, or picture element is the smallest addressable element in a Raster graphics, raster image, or the smallest addressable element in a dot matrix display device. In most digital display devices, p ...
Euclidean distance In mathematics, the Euclidean distance between two points in Euclidean space is the length of the line segment between them. It can be calculated from the Cartesian coordinates of the points using the Pythagorean theorem, and therefore is o ...
is the distance metric. The distance matrix would be: These data can then be viewed in graphic form as a
heat map A heat map (or heatmap) is a 2-dimensional data visualization technique that represents the magnitude of individual values within a dataset as a color. The variation in color may be by hue or intensity. In some applications such as crime analy ...
. In this image, black denotes a distance of 0 and white is maximal distance.


See also

*
Computer vision Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
*
Data clustering Cluster analysis or clustering is the data analyzing technique in which task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some specific sense defined by the analyst) to each o ...
*
Distance set In geometry, the distance set of a collection of points is the Set (mathematics), set of distances between distinct pairs of points. Thus, it can be seen as the generalization of a Minkowski difference, difference set, the set of distances (and th ...
*
Hollow matrix In mathematics, a hollow matrix may refer to one of several related classes of matrix: a sparse matrix; a matrix with a large block of zeroes; or a matrix with diagonal entries all zero. Definitions Sparse A ''hollow matrix'' may be one with "f ...
* Min-plus matrix multiplication


References

{{Matrix classes Metric geometry Bioinformatics Matrices (mathematics) Graph distance