Neighbor Joining

picture info	Neighbor Joining In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm requires knowledge of the distance between each pair of taxa (e.g., species or sequences) to create the phylogenetic tree. The algorithm Neighbor joining takes a distance matrix, which specifies the distance between each pair of taxa, as input. The algorithm starts with a completely unresolved tree, whose topology corresponds to that of a star network, and iterates over the following steps, until the tree is completely resolved, and all branch lengths are known: # Based on the current distance matrix, calculate a matrix Q (defined below). # Find the pair of distinct taxa i and j (i.e. with i \neq j) for which Q(i,j) is smallest. Make a new node that joins the taxa i and j, and connect the new node to the central node. For example, in part (B ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term ''computational biology'' refers to building and using models of biological systems. Computational, statistical, and computer programming techniques have been used for In silico, computer simulation analyses of biological queries. They include reused specific analysis "pipelines", particularly in the field of genomics, such as by the identification of genes and single nucleotide polymorphis ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Computational Phylogenetics Computational phylogenetics, phylogeny inference, or phylogenetic inference focuses on computational and optimization algorithms, Heuristic (computer science), heuristics, and approaches involved in Phylogenetics, phylogenetic analyses. The goal is to find a phylogenetic tree representing optimal evolutionary ancestry between a set of genes, species, or taxa. Maximum likelihood estimation, Maximum likelihood, Maximum parsimony (phylogenetics), parsimony, Bayesian inference in phylogeny, Bayesian, and minimum evolution are typical optimality criteria used to assess how well a phylogenetic tree topology describes the sequence data. Nearest Neighbour Interchange (NNI), Subtree Prune and Regraft (SPR), and Tree Bisection and Reconnection (TBR), known as tree rearrangements, are deterministic algorithms to search for optimal or the best phylogenetic tree. The space and the landscape of searching for the optimal phylogenetic tree is known as phylogeny search space. Maximum Likelihood (al ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Bioinformatics Algorithms Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, chemistry, physics, computer science, data science, computer programming, information engineering, mathematics and statistics to analyze and interpret biological data. The process of analyzing and interpreting data can sometimes be referred to as computational biology, however this distinction between the two terms is often disputed. To some, the term ''computational biology'' refers to building and using models of biological systems. Computational, statistical, and computer programming techniques have been used for In silico, computer simulation analyses of biological queries. They include reused specific analysis "pipelines", particularly in the field of genomics, such as by the identification of genes and single nucleotide polymorphis ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Minimum Evolution Minimum evolution is a distance method employed in phylogenetics modeling. It shares with maximum parsimony the aspect of searching for the phylogeny that has the shortest total sum of branch lengths. The theoretical foundations of the minimum evolution (ME) criterion lay in the seminal works of both Kidd and Sgaramella-Zonta (1971) and Rzhetsky and Nei (1993). In these frameworks, the molecular sequences from taxa are replaced by a set of measures of their dissimilarity (i.e., the so-called "evolutionary distances") and a fundamental result states that if such distances were unbiased estimates of the ''true evolutionary distances'' from taxa (i.e., the distances that one would obtain if all the molecular data from taxa were available), then the ''true phylogeny'' of taxa would have an expected length shorter than any other possible phylogeny T compatible with those distances. Relationships with and comparison with other methods Maximum parsimony It is worth noting her ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Nearest Neighbor Search Nearest neighbor search (NNS), as a form of proximity search, is the optimization problem of finding the point in a given set that is closest (or most similar) to a given point. Closeness is typically expressed in terms of a dissimilarity function: the less similar the objects, the larger the function values. Formally, the nearest-neighbor (NN) search problem is defined as follows: given a set ''S'' of points in a space ''M'' and a query point ''q'' ∈ ''M'', find the closest point in ''S'' to ''q''. Donald Knuth in vol. 3 of '' The Art of Computer Programming'' (1973) called it the post-office problem, referring to an application of assigning to a residence the nearest post office. A direct generalization of this problem is a ''k''-NN search, where we need to find the ''k'' closest points. Most commonly ''M'' is a metric space and dissimilarity is expressed as a distance metric, which is symmetric and satisfies the triangle inequality. Even more common, ''M'' is take ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Molecular Clock Hypothesis The molecular clock is a figurative term for a technique that uses the mutation rate of biomolecules to deduce the time in prehistory when two or more life forms diverged. The biomolecular data used for such calculations are usually nucleotide sequences for DNA, RNA, or amino acid sequences for proteins. Early discovery and genetic equidistance The notion of the existence of a so-called "molecular clock" was first attributed to Émile Zuckerkandl and Linus Pauling who, in 1962, noticed that the number of amino acid differences in hemoglobin between different lineages changes roughly linearly with time, as estimated from fossil evidence. They generalized this observation to assert that the rate of evolutionary change of any specified protein was approximately constant over time and over different lineages (known as the molecular clock hypothesis). The genetic equidistance phenomenon was first noted in 1963 by Emanuel Margoliash, who wrote: "It appears that the number of resi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	WPGMA WPGMA (Weighted Pair Group Method with Arithmetic Mean) is a simple agglomerative (bottom-up) hierarchical clustering method, generally attributed to Sokal and Michener. The WPGMA method is similar to its ''unweighted'' variant, the UPGMA method. Algorithm The WPGMA algorithm constructs a rooted tree (dendrogram) that reflects the structure present in a pairwise distance matrix (or a similarity matrix). At each step, the nearest two clusters, say i and j, are combined into a higher-level cluster i \cup j. Then, its distance to another cluster k is simply the arithmetic mean of the average distances between members of k and i and k and j : d_ = \frac The WPGMA algorithm produces rooted dendrograms and requires a constant-rate assumption: it produces an ultrametric tree in which the distances from the root to every branch tip are equal. This ultrametricity assumption is called the molecular clock when the tips involve DNA, RNA and protein data. Working example This wor ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	UPGMA UPGMA (unweighted pair group method with arithmetic mean) is a simple agglomerative (bottom-up) hierarchical clustering method. It also has a weighted variant, WPGMA, and they are generally attributed to Sokal and Michener. Note that the unweighted term indicates that all distances contribute equally to each average that is computed and does not refer to the math by which it is achieved. Thus the simple averaging in WPGMA produces a weighted result and the proportional averaging in UPGMA produces an unweighted result ('' see the working example''). Algorithm The UPGMA algorithm constructs a rooted tree ( dendrogram) that reflects the structure present in a pairwise similarity matrix (or a dissimilarity matrix). At each step, the nearest two clusters are combined into a higher-level cluster. The distance between any two clusters \mathcal and \mathcal, each of size (''i.e.'', cardinality) and , is taken to be the average of all distances d(x,y) between pairs of objects x in \m ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Statistical Consistency Statistics (from German: ', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments. When census data (comprising every member of the target population) cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can reasonably extend from the sample to the population as a whole. An experimental study involves taking measurements of the sy ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Lior Pachter Lior Samuel Pachter () is a computational biologist. He works at the California Institute of Technology, where he is the Bren Professor of Computational Biology. He has widely varied research interests including genomics, combinatorics, computational geometry, machine learning, scientific computing, and statistics.. Early life and education Pachter was born in Israel and grew up in South Africa. He earned a bachelor's degree in mathematics from the California Institute of Technology in 1994. He completed his doctorate in mathematics from the Massachusetts Institute of Technology in 1999, supervised by Bonnie Berger, with Eric Lander and Daniel Kleitman as co-advisors. Career and research Pachter was with the University of California, Berkeley faculty from 1999 to 2018 and was given the Sackler Chair in 2012. As well as for his technical contributions, Pachter is known for using new media to promote open science and for a thought experiment he posted on his blog according to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Computation A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms. Mechanical or electronic devices (or, historically, people) that perform computations are known as ''computers''. Computer science is an academic field that involves the study of computation. Introduction The notion that mathematical statements should be 'well-defined' had been argued by mathematicians since at least the 1600s, but agreement on a suitable definition proved elusive. A candidate definition was proposed independently by several mathematicians in the 1930s. The best-known variant was formalised by the mathematician Alan Turing, who defined a well-defined statement or calculation as any statement that could be expressed in terms of the initialisation parameters of a Turing machine. Other (mathematically equivalent) definitions include Alonzo Church's '' lambda-defin ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Maximum Likelihood In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference. If the likelihood function is differentiable, the derivative test for finding maxima can be applied. In some cases, the first-order conditions of the likelihood function can be solved analytically; for instance, the ordinary least squares estimator for a linear regression model maximizes the likelihood when the random errors are assumed to have normal distributions with the same variance. From the perspective of Bayesian inference, ML ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]