Position Weight Matrix
A position weight matrix (PWM), also known as a position-specific weight matrix (PSWM) or position-specific scoring matrix (PSSM), is a commonly used representation of motifs (patterns) in biological sequences. PWMs are often derived from a set of aligned sequences that are thought to be functionally related and have become an important part of many software tools for computational motif discovery. Background Creation Conversion of sequence to position probability matrix A PWM has one row for each symbol of the alphabet (4 rows for nucleotides in DNA sequences or 20 rows for amino acids in protein sequences) and one column for each position in the pattern. In the first step in constructing a PWM, a basic position frequency matrix (PFM) is created by counting the occurrences of each nucleotide at each position. From the PFM, a position probability matrix (PPM) can now be created by dividing that former nucleotide count at each position by the number of sequences, thereb ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
LexA Gram Positive Bacteria Sequence Logo
Repressor LexA or LexA is a transcriptional repressor () that represses SOS response genes coding primarily for error-prone DNA polymerases, DNA repair enzymes and cell division inhibitors. LexA forms ''de facto'' a two-component regulatory system with RecA, which senses DNA damage at stalled replication forks, forming monofilaments and acquiring an active conformation capable of binding to LexA and causing LexA to cleave itself, in a process called autoproteolysis. DNA damage can be inflicted by the action of antibiotics, bacteriophages, and UV light. Of potential clinical interest is the induction of the SOS response by antibiotics, such as ciprofloxacin. Bacteria require topoisomerases such as DNA gyrase or topoisomerase IV for DNA replication. Antibiotics such as ciprofloxacin are able to prevent the action of these molecules by attaching themselves to the gyrate–DNA complex, leading to replication fork stall and the induction of the SOS response. The expression of error-pron ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
GC-content
In molecular biology and genetics, GC-content (or guanine-cytosine content) is the percentage of nitrogenous bases in a DNA or RNA molecule that are either guanine (G) or cytosine (C). This measure indicates the proportion of G and C bases out of an implied four total bases, also including adenine and thymine in DNA and adenine and uracil in RNA. GC-content may be given for a certain fragment of DNA or RNA or for an entire genome. When it refers to a fragment, it may denote the GC-content of an individual gene or section of a gene (domain), a group of genes or gene clusters, a non-coding region, or a synthetic oligonucleotide such as a primer. Structure Qualitatively, guanine (G) and cytosine (C) undergo a specific hydrogen bonding with each other, whereas adenine (A) bonds specifically with thymine (T) in DNA and with uracil (U) in RNA. Quantitatively, each GC base pair is held together by three hydrogen bonds, while AT and AU base pairs are held together by two hydrogen bonds. ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
UniPROBE
The Universal PBM Resource for Oligonucleotide-Binding Evaluation (UniPROBE) is database of DNA-binding proteins determined by protein-binding microarrays. See also * Protein microarray * DNA-binding domain A DNA-binding domain (DBD) is an independently folded protein domain that contains at least one structural motif that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence (a recognition sequence) or have a genera ... References External links Official website Biological databases Microarrays Proteomics {{Biodatabase-stub ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
ScerTF
ScerTF is a comprehensive database of position weight matrices for the transcription factors of Saccharomyces. See also *Transcription factor *Gary Stormo Gary Stormo (born 1950) is an American geneticist and currently Joseph Erlanger Professor in the Department of Genetics and the Center for Genome Sciences and Systems Biology at Washington University School of Medicine in St Louis. He is consider ... References External links * http://stormo.wustl.edu/ScerTF. Biological databases Gene expression {{Biodatabase-stub ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Kullback–Leibler Divergence
In mathematical statistics, the Kullback–Leibler divergence (also called relative entropy and I-divergence), denoted D_\text(P \parallel Q), is a type of statistical distance: a measure of how one probability distribution ''P'' is different from a second, reference probability distribution ''Q''. A simple interpretation of the KL divergence of ''P'' from ''Q'' is the expected excess surprise from using ''Q'' as a model when the actual distribution is ''P''. While it is a distance, it is not a metric, the most familiar type of distance: it is not symmetric in the two distributions (in contrast to variation of information), and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions (notably an exponential family), it satisfies a generalized Pythagorean theorem (which applies to squared distances). In the simple case, a relative entropy of 0 ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Thermophilic
A thermophile is an organism—a type of extremophile—that thrives at relatively high temperatures, between . Many thermophiles are archaea, though they can be bacteria or fungi. Thermophilic eubacteria are suggested to have been among the earliest bacteria. Thermophiles are found in various geothermally heated regions of the Earth, such as hot springs like those in Yellowstone National Park (see image) and deep sea hydrothermal vents, as well as decaying plant matter, such as peat bogs and compost. Thermophiles can survive at high temperatures, whereas other bacteria or archaea would be damaged and sometimes killed if exposed to the same temperatures. The enzymes in thermophiles function at high temperatures. Some of these enzymes are used in molecular biology, for example the ''Taq'' polymerase used in PCR. "Thermophile" is derived from the el, θερμότητα (''thermotita''), meaning heat, and el, φίλια (''philia''), love. Classification Thermophiles can be c ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Self-information
In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative way of expressing probability, much like odds or log-odds, but which has particular mathematical advantages in the setting of information theory. The Shannon information can be interpreted as quantifying the level of "surprise" of a particular outcome. As it is such a basic quantity, it also appears in several other settings, such as the length of a message needed to transmit the event given an optimal source coding of the random variable. The Shannon information is closely related to ''entropy'', which is the expected value of the self-information of a random variable, quantifying how surprising the random variable is "on average". This is the average amount of self-information an observer would expect to gain about a random variable wh ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Uniform Distribution (discrete)
In probability theory and statistics, the discrete uniform distribution is a symmetric probability distribution wherein a finite number of values are equally likely to be observed; every one of ''n'' values has equal probability 1/''n''. Another way of saying "discrete uniform distribution" would be "a known, finite number of outcomes equally likely to happen". A simple example of the discrete uniform distribution is throwing a fair dice. The possible values are 1, 2, 3, 4, 5, 6, and each time the die is thrown the probability of a given score is 1/6. If two dice are thrown and their values added, the resulting distribution is no longer uniform because not all sums have equal probability. Although it is convenient to describe discrete uniform distributions over integers, such as this, one can also consider discrete uniform distributions over any finite set. For instance, a random permutation is a permutation generated uniformly from the permutations of a given length, and a unif ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Information Content
In information theory, the information content, self-information, surprisal, or Shannon information is a basic quantity derived from the probability of a particular event occurring from a random variable. It can be thought of as an alternative way of expressing probability, much like odds or log-odds, but which has particular mathematical advantages in the setting of information theory. The Shannon information can be interpreted as quantifying the level of "surprise" of a particular outcome. As it is such a basic quantity, it also appears in several other settings, such as the length of a message needed to transmit the event given an optimal source coding of the random variable. The Shannon information is closely related to ''entropy'', which is the expected value of the self-information of a random variable, quantifying how surprising the random variable is "on average". This is the average amount of self-information an observer would expect to gain about a random variable when ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Dirichlet Distribution
In probability and statistics, the Dirichlet distribution (after Peter Gustav Lejeune Dirichlet), often denoted \operatorname(\boldsymbol\alpha), is a family of continuous multivariate probability distributions parameterized by a vector \boldsymbol\alpha of positive reals. It is a multivariate generalization of the beta distribution, (Chapter 49: Dirichlet and Inverted Dirichlet Distributions) hence its alternative name of multivariate beta distribution (MBD). Dirichlet distributions are commonly used as prior distributions in Bayesian statistics, and in fact, the Dirichlet distribution is the conjugate prior of the categorical distribution and multinomial distribution. The infinite-dimensional generalization of the Dirichlet distribution is the ''Dirichlet process''. Definitions Probability density function The Dirichlet distribution of order ''K'' ≥ 2 with parameters ''α''1, ..., ''α''''K'' > 0 has a probability density function with respect to Lebesgue m ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Sequence Motif
In biology, a sequence motif is a nucleotide or amino-acid sequence pattern that is widespread and usually assumed to be related to biological function of the macromolecule. For example, an ''N''-glycosylation site motif can be defined as ''Asn, followed by anything but Pro, followed by either Ser or Thr, followed by anything but Pro residue''. Overview When a sequence motif appears in the exon of a gene, it may encode the "structural motif" of a protein; that is a stereotypical element of the overall structure of the protein. Nevertheless, motifs need not be associated with a distinctive secondary structure. " Noncoding" sequences are not translated into proteins, and nucleic acids with such motifs need not deviate from the typical shape (e.g. the "B-form" DNA double helix). Outside of gene exons, there exist regulatory sequence motifs and motifs within the " junk", such as satellite DNA. Some of these are believed to affect the shape of nucleic acids (see for example RN ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |
|
Laplace Estimator
In probability theory, the rule of succession is a formula introduced in the 18th century by Pierre-Simon Laplace in the course of treating the sunrise problem. The formula is still used, particularly to estimate underlying probabilities when there are few observations or for events that have not been observed to occur at all in (finite) sample data. Statement of the rule of succession If we repeat an experiment that we know can result in a success or failure, ''n'' times independently, and get ''s'' successes, and ''n − s'' failures, then what is the probability that the next repetition will succeed? More abstractly: If ''X''1, ..., ''X''''n''+1 are conditionally independent random variables that each can assume the value 0 or 1, then, if we know nothing more about them, :P(X_=1 \mid X_1+\cdots+X_n=s)=. Interpretation Since we have the prior knowledge that we are looking at an experiment for which both success and failure are possible, our estimate is as if we had obse ... [...More Info...]       [...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]   |