Neighbor-joining

picture info	Neighbor-joining In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm requires knowledge of the distance between each pair of taxa (e.g., species or sequences) to create the phylogenetic tree. The algorithm Neighbor joining takes a distance matrix, which specifies the distance between each pair of taxa, as input. The algorithm starts with a completely unresolved tree, whose topology corresponds to that of a star network, and iterates over the following steps, until the tree is completely resolved, and all branch lengths are known: # Based on the current distance matrix, calculate a matrix Q (defined below). # Find the pair of distinct taxa i and j (i.e. with i \neq j) for which Q(i,j) is smallest. Make a new node that joins the taxa i and j, and connect the new node to the central node. For example, in part (B ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Computational Phylogenetics Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic"origin,_source,_birth")_is_the_study_of_the_evolutionary_his_... "origin, source, birth") is the study of the evolutionary his ... "origin, source, birth") is the study of the evolutionary his ... "origin, source, birth") is the study of the evolutionary his ... "origin, source, birth") is the study of the evolutionary his ... "tribe, clan, race", and wikt:γενετικός, γενετικός [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Phylogenetics In biology, phylogenetics (; from Greek language, Greek wikt:φυλή, φυλή/wikt:φῦλον, φῦλον [] "tribe, clan, race", and wikt:γενετικός, γενετικός [] "origin, source, birth") is the study of the evolutionary history and relationships among or within groups of organisms. These relationships are determined by Computational phylogenetics, phylogenetic inference methods that focus on observed heritable traits, such as DNA sequences, Protein, protein Amino acid, amino acid sequences, or Morphology (biology), morphology. The result of such an analysis is a phylogenetic tree—a diagram containing a hypothesis of relationships that reflects the evolutionary history of a group of organisms. The tips of a phylogenetic tree can be living taxa or fossils, and represent the "end" or the present time in an evolutionary lineage. A phylogenetic diagram can be rooted or unrooted. A rooted tree diagram indicates the hypothetical common ancestor of the tree. An un ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Distance Matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''distance'' being used to define this matrix may or may not be a metric. If there are elements, this matrix will have size . In graph-theoretic applications the elements are more often referred to as points, nodes or vertices. Non-metric distance matrix In general, a distance matrix is a weighted adjacency matrix of some graph. In a network, a directed graph with weights assigned to the arcs, the distance between two nodes of the network can be defined as the minimum of the sums of the weights on the shortest paths joining the two nodes. This distance function, while well defined, is not a metric. There need be no restrictions on the weights other than the need to be able to combine and compare them, so negative weights are used in some appli ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Phylogenetic Trees A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological species or other entities based upon similarities and differences in their physical or genetic characteristics. All life on Earth is part of a single phylogenetic tree, indicating common ancestry. In a ''rooted'' phylogenetic tree, each node with descendants represents the inferred most recent common ancestor of those descendants, and the edge lengths in some trees may be interpreted as time estimates. Each node is called a taxonomic unit. Internal nodes are generally called hypothetical taxonomic units, as they cannot be directly observed. Trees are useful in fields of biology such as bioinformatics, systematics, and phylogenetics. ''Unrooted'' trees illustrate only the relatedness of the leaf nodes and do not require the ancestral root to be ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Masatoshi Nei (born January 2, 1931) is a Japanese-born American evolutionary biologist currently affiliated with the Department of Biology at Temple University as a Carnell Professor. He was, until recently, Evan Pugh Professor of Biology at Pennsylvania State University and Director of the Institute of Molecular Evolutionary Genetics; he was there from 1990 to 2015. Nei was born in 1931 in Miyazaki Prefecture, on Kyūshū Island, Japan. He was associate professor and professor of biology at Brown University from 1969 to 1972 and professor of population genetics at the Center for Demographic and Population Genetics, University of Texas Health Science Center at Houston (UTHealth), from 1972 to 1990. Acting alone or working with his students, he has continuously developed statistical theories of molecular evolution taking into account discoveries in molecular biology. He has also developed concepts in evolutionary theory and advanced the theory of mutation-driven evolution. Together with W ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Bioinformatics Algorithms Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combines biology, chemistry, physics, computer science, information engineering, mathematics and statistics to analyze and interpret the biological data. Bioinformatics has been used for '' in silico'' analyses of biological queries using computational and statistical techniques. Bioinformatics includes biological studies that use computer programming as part of their methodology, as well as specific analysis "pipelines" that are repeatedly used, particularly in the field of genomics. Common uses of bioinformatics include the identification of candidates genes and single nucleotide polymorphisms (SNPs). Often, such identification is made with the aim to better understand the genetic basis of disease, unique adaptations, desirable properties ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Nearest Neighbor Search Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest-neighbor (NN) search problem is defined as follows: given a set ''S'' of points in a space ''M'' and a query point ''q'' ∈ ''M'', find the closest point in ''S'' to ''q''. Donald Knuth in vol. 3 of ''The Art of Computer Programming'' (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a ''k''-NN search, where we need to find the ''k'' closest points. Most commonly ''M'' is a metric space and dissimilarity is expressed as a distance metric, which is symmetric and satisfies the triangle inequality. Even more common, ''M'' is taken ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Molecular Clock Hypothesis The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins. The benchmarks for determining the mutation rate are often fossil or archaeological dates. The molecular clock was first tested in 1962 on the hemoglobin protein variants of various animals, and is commonly used in molecular evolution to estimate times of speciation or radiation. It is sometimes called a gene clock or an evolutionary clock. Early discovery and genetic equidistance The notion of the existence of a so-called "molecular clock" was first attributed to Émile Zuckerkandl and Linus Pauling who, in 1962, noticed that the number of amino acid differences in hemoglobin between different lineages changes roughly linearly with time, as estimated from fossil evidence. T ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	WPGMA WPGMA (Weighted Pair Group Method with Arithmetic Mean) is a simple agglomerative (bottom-up) hierarchical clustering method, generally attributed to Sokal and Michener. The WPGMA method is similar to its ''unweighted'' variant, the UPGMA method. Algorithm The WPGMA algorithm constructs a rooted tree (dendrogram) that reflects the structure present in a pairwise distance matrix (or a similarity matrix). At each step, the nearest two clusters, say i and j, are combined into a higher-level cluster i \cup j. Then, its distance to another cluster k is simply the arithmetic mean of the average distances between members of k and i and k and j : d_ = \frac The WPGMA algorithm produces rooted dendrograms and requires a constant-rate assumption: it produces an ultrametric tree in which the distances from the root to every branch tip are equal. This ultrametricity assumption is called the molecular clock when the tips involve DNA, RNA and protein data. Working example This work ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	UPGMA UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. The method is generally attributed to Sokal and Michener. The UPGMA method is similar to its ''weighted'' variant, the WPGMA method. Note that the unweighted term indicates that all distances contribute equally to each average that is computed and does not refer to the math by which it is achieved. Thus the simple averaging in WPGMA produces a weighted result and the proportional averaging in UPGMA produces an unweighted result ('' see the working example''). Algorithm The UPGMA algorithm constructs a rooted tree (dendrogram) that reflects the structure present in a pairwise similarity matrix (or a dissimilarity matrix). At each step, the nearest two clusters are combined into a higher-level cluster. The distance between any two clusters \mathcal and \mathcal, each of size (''i.e.'', cardinality) and , is taken to be the average of all distances d(x,y) ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Statistical Consistency In statistics, a consistent estimator or asymptotically consistent estimator is an estimator—a rule for computing estimates of a parameter ''θ''0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to ''θ''0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to ''θ''0 converges to one. In practice one constructs an estimator as a function of an available sample of size ''n'', and then imagines being able to keep collecting data and expanding the sample ''ad infinitum''. In this way one would obtain a sequence of estimates indexed by ''n'', and consistency is a property of what occurs as the sample size “grows to infinity”. If the sequence of estimates can be mathematically shown to converge in probability to the true value ''� ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lior Pachter Lior Samuel Pachter is a computational biologist. He works at the California Institute of Technology, where he is the Bren Professor of Computational Biology. He has widely varied research interests including genomics, combinatorics, computational geometry, machine learning, scientific computing, and statistics.. Early life and education Pachter was born in Israel and grew up in South Africa. He earned a bachelor's degree in mathematics from the California Institute of Technology in 1994. He completed his doctorate in mathematics from the Massachusetts Institute of Technology in 1999, supervised by Bonnie Berger, with Eric Lander and Daniel Kleitman as co-advisors. Career and research Pachter was with the University of California, Berkeley faculty from 1999 to 2018 and was given the Sackler Chair in 2012. As well as for his technical contributions, Pachter is known for using new media to promote open science and for a thought experiment he posted on his blog according to which ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]