Dot Plot (bioinformatics)
   HOME

TheInfoList



OR:

In
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
a dot plot is a graphical method for comparing two biological sequences and identifying regions of close similarity after
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
. It is a type of
recurrence plot In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for each moment i in time, the times at which the state of a dynamical system returns to the previous state at i, i.e., when the phase space trajectory visits rou ...
.


History

One way to visualize the similarity between two protein or nucleic acid sequences is to use a similarity matrix, known as a dot plot. These were introduced by Gibbs and McIntyre in 1970 and are two-dimensional matrices that have the sequences of the proteins being compared along the vertical and horizontal axes. For a simple visual representation of the similarity between two sequences, individual cells in the matrix can be shaded black if residues are identical, so that matching sequence segments appear as runs of diagonal lines across the matrix.


Interpretation

Some idea of the similarity of the two sequences can be gleaned from the number and length of matching segments shown in the matrix. Identical proteins will obviously have a diagonal line in the center of the matrix. Insertions and deletions between sequences give rise to disruptions in this diagonal. Regions of local similarity or repetitive sequences give rise to further diagonal matches in addition to the central diagonal. One way of reducing this noise is to only shade runs or '
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s' of residues, e.g. a tuple of 3 corresponds to three residues in a row. This is effective because the probability of matching three residues in a row by chance is much lower than single-residue matches. Dot plots compare two sequences by organizing one sequence on the x-axis, and another on the y-axis, of a plot. When the residues of both sequences match at the same location on the plot, a dot is drawn at the corresponding position. Note, that the sequences can be written backwards or forwards, however the sequences on both axes must be written in the same direction. Also note, that the direction of the sequences on the axes will determine the direction of the line on the dot plot. Once the dots have been plotted, they will combine to form lines. The closeness of the sequences in similarity will determine how close the diagonal line is to what a graph showing a curve demonstrating a
direct relationship In mathematics, two sequences of numbers, often experimental data, are proportional or directly proportional if their corresponding elements have a constant ratio, which is called the coefficient of proportionality or proportionality constant ...
is. This relationship is affected by certain sequence features such as frame shifts, direct repeats, and inverted repeats. Frame shifts include insertions, deletions, and mutations. The presence of one of these features, or the presence of multiple features, will cause for multiple lines to be plotted in a various possibility of configurations, depending on the features present in the sequences. A feature that will cause a very different result on the dot plot is the presence of low-complexity region/regions. Low-complexity regions are regions in the sequence with only a few amino acids, which in turn, causes redundancy within that small or limited region. These regions are typically found around the diagonal, and may or may not have a square in the middle of the dot plot.


Software to create dot plots


ANACON
– Contact analysis of dot plots.
D-Genies
– Specializes in interactive whole genome dotplots of large genomes
Dotlet
– Provides a program allowing you to construct a dot plot with your own sequences.
dotmatcher
– Web tool to generate dot plots (and part of the EMBOSS suite).

– easy (educational) HTML5 tool to generate dot plots from RNA sequences.
dotplot
– R package to rapidly generate dot plots as either traditional or ggplot graphics.

– Stand alone program to generate dot plots.
JDotter
– Java version of Dotter.
Flexidot
– Customizable and ambiguity-aware dotplot suite for aesthetics, batch analyses and printing (implemented in Python).
Gepard
– Dot plot tool suitable for even genome scale.
Genomdiff
– An open source Java dot plot program for viruses.
LAST
for whole-genome "split-alignment".
lastz
and
laj
– Programs to prepare and visualize genomic alignments.
yass
ref>
- Web-based tool to generate (both forward and reverse complement) dot plots from genomic alignments.

– R package to generate dot plots.
SynMap
– An easy to use, web-based tool to generate dotplots for many species with access to an extensive genome database. Offered by the
comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural lan ...
platform CoGe.
UGENE Dot Plot viewer
– Opensource dot plot visualizer.
General introduction to dot plots with example algorithms
and
software tool to create small and medium size dot plots.
In addition to the tools listed above, the NCBI Blast Server at https://blast.ncbi.nlm.nih.gov/Blast.cgi includes Dot Plots in its output.


See also

*
Protein contact map A protein contact map represents the distance between all possible amino acid residue pairs of a three-dimensional protein structure using a binary two-dimensional matrix. For two residues i and j, the ij element of the matrix is 1 if the two res ...
*
Recurrence plot In descriptive statistics and chaos theory, a recurrence plot (RP) is a plot showing, for each moment i in time, the times at which the state of a dynamical system returns to the previous state at i, i.e., when the phase space trajectory visits rou ...
*
Self-similarity matrix In data analysis, the self-similarity matrix is a graphical representation of similarity measure, similar sequences in a data series. Similarity can be explained by different measures, like spatial distance (distance matrix), correlation, or compar ...


References

{{cite journal, first1=Adrian J., last1=Gibbs, first2=George A., last2=McIntyre, title=The Diagram, a Method for Comparing Sequences. Its Use with Amino Acid and Nucleotide Sequences, journal=Eur. J. Biochem., volume=16, issue=1, year=1970, pages=1–11, doi=10.1111/j.1432-1033.1970.tb01046.x, pmid=5456129, doi-access=free Statistical charts and diagrams Bioinformatics