HOME

TheInfoList



OR:

In social sciences and other domains, representative sequences are whole sequences that best characterize or summarize a set of sequences. In bioinformatics, representative sequences also designate substrings of a sequence that characterize the sequence.


Social sciences

In
Sequence analysis in social sciences In social sciences, sequence analysis (SA) is concerned with the analysis of sets of categorical sequences that typically describe longitudinal data. Analyzed sequences are encoded representations of, for example, individual life trajectories such ...
, representative sequences are used to summarize sets of sequences describing for example the family life course or professional career of several thousands individuals. The identification of representative sequences proceeds from the pairwise dissimilarities between sequences. One typical solution is the medoid sequence, i.e., the observed sequence that minimizes the sum of its distances to all other sequences in the set. An other solution is the densest observed sequence, i.e., the sequence with the greatest number of other sequences in its neighborhood. When the diversity of the sequences is large, a single representative is often insufficient to efficiently characterize the set. In such cases, an as small as possible set of representative sequences covering (i.e., which includes in at least one neighborhood of a representative) a given percentage of all sequences is searched. A solution also considered is to select the medoids of relative frequency groups. More specifically, the method consists in sorting the sequences (for example, according to the first principal coordinate of the pairwise dissimilarity matrix), splitting the sorted list into equal sized groups (called relative frequency groups), and selecting the medoids of the equal sized groups. The methods for identifying representative sequences described above have been implemented in the R packag
TraMineR


Bioinformatics

Representative sequences are short regions within
protein sequences Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthes ...
that can be used to approximate the
evolutionary relationships Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation t ...
of those proteins, or the organisms from which they come. Representative sequences are contiguous subsequences (typically 300 residues) from
ubiquitous Omnipresence or ubiquity is the property of being present anywhere and everywhere. The term omnipresence is most often used in a religious context as an attribute of a deity or supreme being, while the term ubiquity is generally used to describe ...
, conserved proteins, such that each
orthologous Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
family of representative sequences taken alone gives a
distance matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''dist ...
in close agreement with the consensus matrix.


Use

Protein sequence Protein primary structure is the linear sequence of amino acids in a peptide or protein. By convention, the primary structure of a protein is reported starting from the amino-terminal (N) end to the carboxyl-terminal (C) end. Protein biosynthesi ...
s can provide data about the
biological function In evolutionary biology, function is the reason some object or process occurred in a system that evolved through natural selection. That reason is typically that it achieves some result, such as that chlorophyll helps to capture the energy of sunl ...
and
evolution Evolution is change in the heritable characteristics of biological populations over successive generations. These characteristics are the expressions of genes, which are passed on from parent to offspring during reproduction. Variation ...
of proteins and
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of s ...
s. Grouping and interrelating protein sequences can therefore provide information about both human biological processes, and the evolutionary development of biological processes on earth; such sequence clusters allow for the effective coverage of sequence space. Sequence clusters can reduce a large database of sequences to a smaller set of ''sequence representatives'', each of which should represent its cluster at the sequence level. Sequence representatives allow the effective coverage of the original database with fewer sequences. The database of sequence representatives is called ''non-redundant'', as similar (or redundant) sequences have been removed at a certain similarity threshold.


See also

Sequence analysis in social sciences In social sciences, sequence analysis (SA) is concerned with the analysis of sets of categorical sequences that typically describe longitudinal data. Analyzed sequences are encoded representations of, for example, individual life trajectories such ...
Sequence analysis In bioinformatics, sequence analysis is the process of subjecting a DNA, RNA or peptide sequence to any of a wide range of analytical methods to understand its features, function, structure, or evolution. Methodologies used include sequence alignm ...
in bioinformatics


References

Protein structure Bioinformatics {{Bioinformatics-stub Social sciences