DNA-binding site
   HOME

TheInfoList



OR:

DNA binding sites are a type of
binding site In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may includ ...
found in DNA where other molecules may bind. DNA binding sites are distinct from other binding sites in that (1) they are part of a DNA sequence (e.g. a genome) and (2) they are bound by
DNA-binding protein DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, becaus ...
s. DNA binding sites are often associated with specialized proteins known as
transcription factors In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fun ...
, and are thus linked to
transcriptional regulation In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from al ...
. The sum of DNA binding sites of a specific transcription factor is referred to as its
cistrome In simple words, the cistrome refers a collection of regulatory elements of a set of genes, including transcription factor binding-sites and histone modifications. More specifically, "the set of cis-acting targets of a trans-acting factor on a genom ...
. DNA binding sites also encompasses the targets of other proteins, like
restriction enzymes A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class o ...
, site-specific recombinases (see
site-specific recombination Site-specific recombination, also known as conservative site-specific recombination, is a type of genetic recombination in which DNA strand exchange takes place between segments possessing at least a certain degree of sequence homology. Enzymes kno ...
) and
methyltransferase Methyltransferases are a large group of enzymes that all methylate their substrates but can be split into several subclasses based on their structural features. The most common class of methyltransferases is class I, all of which contain a Ross ...
s. DNA binding sites can be thus defined as short DNA sequences (typically 4 to 30 base pairs long, but up to 200 bp for recombination sites) that are specifically bound by one or more
DNA-binding protein DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, becaus ...
s or protein complexes. It has been reported that some binding sites have potential to undergo fast evolutionary change.


Types of DNA binding sites

DNA binding sites can be categorized according to their biological function. Thus, we can distinguish between transcription factor-binding sites, restriction sites and recombination sites. Some authors have proposed that binding sites could also be classified according to their most convenient mode of representation. On the one hand, restriction sites can be generally represented by consensus sequences. This is because they target mostly identical sequences and restriction efficiency decreases abruptly for less similar sequences. On the other hand, DNA binding sites for a given transcription factor are usually all different, with varying degrees of affinity of the transcription factor for the different binding sites. This makes it difficult to accurately represent transcription factor binding sites using
consensus sequences In molecular biology and bioinformatics, the consensus sequence (or canonical sequence) is the calculated order of most frequent residues, either nucleotide or amino acid, found at each position in a sequence alignment. It serves as a simplified r ...
, and they are typically represented using position specific frequency matrices (PSFM), which are often graphically depicted using sequence logos. This argument, however, is partly arbitrary. Restriction enzymes, like transcription factors, yield a gradual, though sharp, range of affinities for different sites and are thus also best represented by PSFM. Likewise, site-specific recombinases also show a varied range of affinities for different target sites.


History and main experimental techniques

The existence of something akin to DNA binding sites was suspected from the experiments on the biology of the
bacteriophage lambda ''Enterobacteria phage λ'' (lambda phage, coliphage λ, officially ''Escherichia virus Lambda'') is a bacterial virus, or bacteriophage, that infects the bacterial species ''Escherichia coli'' (''E. coli''). It was discovered by Esther Lederb ...
and the regulation of the Escherichia coli
lac operon The ''lactose'' operon (''lac'' operon) is an operon required for the transport and metabolism of lactose in ''E. coli'' and many other enteric bacteria. Although glucose is the preferred carbon source for most bacteria, the ''lac'' operon allow ...
. DNA binding sites were finally confirmed in both systems with the advent of DNA sequencing techniques. From then on, DNA binding sites for many transcription factors, restriction enzymes and site-specific recombinases have been discovered using a profusion of experimental methods. Historically, the experimental techniques of choice to discover and analyze DNA binding sites have been the DNAse footprinting assay and the
Electrophoretic Mobility Shift Assay An electrophoretic mobility shift assay (EMSA) or mobility shift electrophoresis, also referred as a gel shift assay, gel mobility shift assay, band shift assay, or gel retardation assay, is a common affinity electrophoresis technique used to stud ...
(EMSA). However, the development of
DNA microarrays A DNA microarray (also commonly known as DNA chip or biochip) is a collection of microscopic DNA spots attached to a solid surface. Scientists use DNA microarrays to measure the expression levels of large numbers of genes simultaneously or to g ...
and fast sequencing techniques has led to new, massively parallel methods for in-vivo identification of binding sites, such as
ChIP-chip ChIP-on-chip (also known as ChIP-chip) is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray (''"chip"''). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA '' in viv ...
and
ChIP-Seq ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated prote ...
. To quantify the binding affinity * of proteins and other molecules to specific DNA binding sites the biophysical method
Microscale Thermophoresis Microscale thermophoresis (MST) is a technology for the biophysical analysis of interactions between biomolecules. Microscale thermophoresis is based on the detection of a temperature-induced change in fluorescence of a target as a function of th ...
is used.


Databases

Due to the diverse nature of the experimental techniques used in determining binding sites and to the patchy coverage of most organisms and transcription factors, there is no central database (akin to
GenBank The GenBank sequence database is an open access, annotated collection of all publicly available nucleotide sequences and their protein translations. It is produced and maintained by the National Center for Biotechnology Information (NCBI; a part ...
at the
National Center for Biotechnology Information The National Center for Biotechnology Information (NCBI) is part of the United States National Library of Medicine (NLM), a branch of the National Institutes of Health (NIH). It is approved and funded by the government of the United States. The ...
) for DNA binding sites. Even though NCBI contemplates DNA binding site annotation in its reference sequences (
RefSeq The Reference Sequence (RefSeq) database is an open access, annotated and curated collection of publicly available nucleotide sequences ( DNA, RNA) and their protein products. RefSeq was first introduced in 2000. This database is built by National ...
), most submissions omit this information. Moreover, due to the limited success of bioinformatics in producing efficient DNA binding site prediction tools (large
false positive A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test resul ...
rates are often associated with in-silico motif discovery / site search methods), there has been no systematic effort to computationally annotate these features in sequenced genomes. There are, however, several private and public databases devoted to compilation of experimentally reported, and sometimes computationally predicted, binding sites for different transcription factors in different organisms. Below is a non-exhaustive table of available databases:


Representation of DNA binding sites

A collection of DNA binding sites, typically referred to as a DNA binding motif, can be represented by a consensus sequence. This representation has the advantage of being compact, but at the expense of disregarding a substantial amount of information. A more accurate way of representing binding sites is through Position Specific Frequency Matrices (PSFM). These matrices give information on the frequency of each base at each position of the DNA binding motif. PSFM are usually conceived with the implicit assumption of positional independence (different positions at the DNA binding site contribute independently to the site function), although this assumption has been disputed for some DNA binding sites. Frequency information in a PSFM can be formally interpreted under the framework of Information Theory, leading to its graphical representation as a sequence logo. PSFM for the transcriptional repressor
LexA Repressor LexA or LexA is a transcriptional repressor () that represses SOS response genes coding primarily for error-prone DNA polymerases, DNA repair enzymes and cell division inhibitors. LexA forms ''de facto'' a two-component regulatory system ...
as derived from 56 LexA-binding sites stored in Prodoric. Relative frequencies are obtained by dividing the counts in each cell by the total count (56)


Computational search and discovery of binding sites

In bioinformatics, one can distinguish between two separate problems regarding DNA binding sites: searching for additional members of a known DNA binding motif (the site search problem) and discovering novel DNA binding motifs in collections of functionally related sequences (the sequence motif discovery problem). Many different methods have been proposed to search for binding sites. Most of them rely on the principles of information theory and have available web servers (Yellaboina)(Munch), while other authors have resorted to
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
methods, such as artificial neural networks. A plethora of algorithms is also available for sequence motif discovery. These methods rely on the hypothesis that a set of sequences share a binding motif for functional reasons. Binding motif discovery methods can be divided roughly into enumerative, deterministic and stochastic. MEME and Consensus are classical examples of deterministic optimization, while the
Gibbs sampler In statistics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution, when direct sampling is diffic ...
is the conventional implementation of a purely stochastic method for DNA binding motif discovery. Another instance of this class of methods is SeSiMCMC that is focused of weak TFBS sites with symmetry. While enumerative methods often resort to
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
representation of binding sites, PSFM and their formal treatment under Information Theory methods are the representation of choice for both deterministic and stochastic methods. Hybrid methods, e.g. ChIPMunk that combines greedy optimization with subsampling, also use PSFM. Recent advances in sequencing have led to the introduction of comparative genomics approaches to DNA binding motif discovery, as exemplified by PhyloGibbs. More complex methods for binding site search and motif discovery rely on the base stacking and other interactions between DNA bases, but due to the small sample sizes typically available for binding sites in DNA, their efficiency is still not completely harnessed. An example of such tool is th
ULPB
ref name=Salama2010>


See also

*
DNA binding protein DNA-binding proteins are proteins that have DNA-binding domains and thus have a specific or general affinity for single- or double-stranded DNA. Sequence-specific DNA-binding proteins generally interact with the major groove of B-DNA, becaus ...
*
Binding site In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may includ ...
*
Transcriptional regulation In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from al ...


References


External links


ENCODE threads Explorer
Transcription factor motifs in ''Nature''
Manually Curated TF Binding Motifs for 157 plant species
{{DEFAULTSORT:Dna Binding Site Bioinformatics Transcription factors DNA-binding substances