HOME

TheInfoList



OR:

A DNA-binding domain (DBD) is an independently folded
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of ...
that contains at least one
structural motif In a chain-like biological molecule, such as a protein or nucleic acid, a structural motif is a common three-dimensional structure that appears in a variety of different, evolutionarily unrelated molecules. A structural motif does not have t ...
that recognizes double- or single-stranded DNA. A DBD can recognize a specific DNA sequence (a recognition sequence) or have a general affinity to DNA. Some DNA-binding domains may also include nucleic acids in their folded structure.


Function

One or more DNA-binding domains are often part of a larger
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
consisting of further
protein domain In molecular biology, a protein domain is a region of a protein's polypeptide chain that is self-stabilizing and that folds independently from the rest. Each domain forms a compact folded three-dimensional structure. Many proteins consist of ...
s with differing function. The extra domains often regulate the activity of the DNA-binding domain. The function of DNA binding is either structural or involves transcription regulation, with the two roles sometimes overlapping. DNA-binding domains with functions involving DNA structure have biological roles in
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritan ...
,
repair The technical meaning of maintenance involves functional checks, servicing, repairing or replacing of necessary devices, equipment, machinery, building infrastructure, and supporting utilities in industrial, business, and residential installa ...
, storage, and modification, such as
methylation In the chemical sciences, methylation denotes the addition of a methyl group on a substrate, or the substitution of an atom (or group) by a methyl group. Methylation is a form of alkylation, with a methyl group replacing a hydrogen atom. These ...
. Many proteins involved in the
regulation of gene expression Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are w ...
contain DNA-binding domains. For example, proteins that regulate transcription by binding DNA are called
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The f ...
s. The final output of most cellular signaling cascades is gene regulation. The DBD interacts with the nucleotides of DNA in a DNA sequence-specific or non-sequence-specific manner, but even non-sequence-specific recognition involves some sort of molecular complementarity between protein and DNA. DNA recognition by the DBD can occur at the major or minor groove of DNA, or at the sugar-phosphate DNA backbone (see the structure of DNA). Each specific type of DNA recognition is tailored to the protein's function. For example, the DNA-cutting
enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products ...
DNAse I cuts DNA almost randomly and so must bind to DNA in a non-sequence-specific manner. But, even so, DNAse I recognizes a certain 3-D DNA
structure A structure is an arrangement and organization of interrelated elements in a material object or system, or the object or system so organized. Material structures include man-made objects such as buildings and machines and natural objects such a ...
, yielding a somewhat specific DNA cleavage pattern that can be useful for studying DNA recognition by a technique called DNA footprinting. Many DNA-binding domains must recognize specific DNA sequences, such as DBDs of
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The f ...
s that activate specific genes, or those of enzymes that modify DNA at specific sites, like
restriction enzyme A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class ...
s and
telomerase Telomerase, also called terminal transferase, is a ribonucleoprotein that adds a species-dependent telomere repeat sequence to the 3' end of telomeres. A telomere is a region of repetitive sequences at each end of the chromosomes of most euk ...
. The
hydrogen bond In chemistry, a hydrogen bond (or H-bond) is a primarily electrostatic force of attraction between a hydrogen (H) atom which is covalently bound to a more electronegative "donor" atom or group (Dn), and another electronegative atom bearing a l ...
ing pattern in the DNA major groove is less degenerate than that of the DNA minor groove, providing a more attractive site for
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is called ...
-specific DNA recognition. The specificity of DNA-binding proteins can be studied using many biochemical and biophysical techniques, such as
gel electrophoresis Gel electrophoresis is a method for separation and analysis of biomacromolecules ( DNA, RNA, proteins, etc.) and their fragments, based on their size and charge. It is used in clinical chemistry to separate proteins by charge or size (IEF ...
,
analytical ultracentrifugation Analytical ultracentrifugation is an analytical technique which combines an ultracentrifuge with optical monitoring systems. In an analytical ultracentrifuge (commonly abbreviated as AUC), a sample’s sedimentation profile is monitored in real tim ...
,
calorimetry In chemistry and thermodynamics, calorimetry () is the science or act of measuring changes in ''state variables'' of a body for the purpose of deriving the heat transfer associated with changes of its state due, for example, to chemical re ...
, DNA
mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, m ...
,
protein structure Protein structure is the three-dimensional arrangement of atoms in an amino acid-chain molecule. Proteins are polymers specifically polypeptides formed from sequences of amino acids, the monomers of the polymer. A single amino acid monom ...
mutation or modification,
nuclear magnetic resonance Nuclear magnetic resonance (NMR) is a physical phenomenon in which nuclei in a strong constant magnetic field are perturbed by a weak oscillating magnetic field (in the near field) and respond by producing an electromagnetic signal with a ...
,
x-ray crystallography X-ray crystallography is the experimental science determining the atomic and molecular structure of a crystal, in which the crystalline structure causes a beam of incident X-rays to diffract into many specific directions. By measuring the angles ...
, surface plasmon resonance,
electron paramagnetic resonance Electron paramagnetic resonance (EPR) or electron spin resonance (ESR) spectroscopy is a method for studying materials that have unpaired electrons. The basic concepts of EPR are analogous to those of nuclear magnetic resonance (NMR), but the spin ...
,
cross-link In chemistry and biology a cross-link is a bond or a short sequence of bonds that links one polymer chain to another. These links may take the form of covalent bonds or ionic bonds and the polymers can be either synthetic polymers or natural ...
ing and microscale thermophoresis (MST).


DNA-binding protein in genomes

A large fraction of genes in each genome encodes DNA-binding proteins (see Table). However, only a rather small number of protein families are DNA-binding. For instance, more than 2000 of the ~20,000 human proteins are "DNA-binding", including about 750 Zinc-finger proteins.


Types


Helix-turn-helix

Originally discovered in bacteria, the helix-turn-helix motif is commonly found in repressor proteins and is about 20 amino acids long. In eukaryotes, the
homeodomain A homeobox is a DNA sequence, around 180 base pairs long, that regulates large-scale anatomical features in the early stages of embryonic development. For instance, mutations in a homeobox may change large-scale anatomical features of the full ...
comprises 2 helices, one of which recognizes the DNA (aka recognition helix). They are common in proteins that regulate developmental processes ( PROSITEbr>HTH
.


Zinc finger

The
zinc finger A zinc finger is a small protein structural motif that is characterized by the coordination of one or more zinc ions (Zn2+) in order to stabilize the fold. It was originally coined to describe the finger-like appearance of a hypothesized struct ...
domain is mostly found in eukaryotes, but some examples have been found in bacteria. The zinc finger domain is generally between 23 and 28 amino acids long and is stabilized by coordinating zinc ions with regularly spaced zinc-coordinating residues (either histidines or cysteines). The most common class of zinc finger (Cys2His2) coordinates a single zinc ion and consists of a recognition helix and a 2-strand beta-sheet. In transcription factors these domains are often found in arrays (usually separated by short linker sequences) and adjacent fingers are spaced at 3 basepair intervals when bound to DNA.


Leucine zipper

The basic leucine zipper ( bZIP) domain is found mainly in eukaryotes and to a limited extent in bacteria. The bZIP domain contains an alpha helix with a
leucine Leucine (symbol Leu or L) is an essential amino acid that is used in the biosynthesis of proteins. Leucine is an α-amino acid, meaning it contains an α- amino group (which is in the protonated −NH3+ form under biological conditions), an α- ...
at every 7th amino acid. If two such helices find one another, the leucines can interact as the teeth in a zipper, allowing dimerization of two proteins. When binding to the DNA, basic amino acid residues bind to the sugar-phosphate backbone while the helices sit in the major grooves. It regulates gene expression.


Winged helix

Consisting of about 110 amino acids, the winged helix (WH) domain has four helices and a two-strand beta-sheet.


Winged helix-turn-helix

The winged helix-turn-helix (wHTH) domain is typically 85-90 amino acids long. It is formed by a 3-helical bundle and a 4-strand beta-sheet (wing).


Helix-loop-helix

The basic helix-loop-helix (bHLH) domain is found in some
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The f ...
s and is characterized by two alpha helices (α-helixes) connected by a loop. One helix is typically smaller and due to the flexibility of the loop, allows dimerization by folding and packing against another helix. The larger helix typically contains the DNA-binding regions.


HMG-box

HMG-box domains are found in high mobility group proteins which are involved in a variety of DNA-dependent processes like replication and transcription. They also alter the flexibility of the DNA by inducing bends. The domain consists of three alpha helices separated by loops.


Wor3 domain

Wor3 domains, named after the White–Opaque Regulator 3 (Wor3) in ''
Candida albicans ''Candida albicans'' is an opportunistic pathogenic yeast that is a common member of the human gut flora. It can also survive outside the human body. It is detected in the gastrointestinal tract and mouth in 40–60% of healthy adults. It is usu ...
'' arose more recently in evolutionary time than most previously described DNA-binding domains and are restricted to a small number of fungi.


OB-fold domain

The OB-fold is a small structural motif originally named for its oligonucleotide/ oligosaccharide binding properties. OB-fold domains range between 70 and 150 amino acids in length. OB-folds bind single-stranded DNA, and hence are
single-stranded binding protein Single-stranded binding proteins (SSBs) are a class of proteins that have been identified in both viruses and organisms from bacteria to humans. Viral SSB Although the overall picture of ''human cytomegalovirus'' (HHV-5) DNA synthesis appears ...
s. OB-fold proteins have been identified as critical for
DNA replication In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritan ...
, DNA recombination,
DNA repair DNA repair is a collection of processes by which a cell identifies and corrects damage to the DNA molecules that encode its genome. In human cells, both normal metabolic activities and environmental factors such as radiation can cause DNA d ...
, transcription,
translation Translation is the communication of the meaning of a source-language text by means of an equivalent target-language text. The English language draws a terminological distinction (which does not exist in every language) between ''transla ...
,
cold shock response Cold shock response is a series of neurogenic cardio-respiratory responses caused by sudden immersion in cold water. In cold water immersions, such as by falling through thin ice, cold shock response is perhaps the most common cause of death. Als ...
, and
telomere A telomere (; ) is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes. Although there are different architectures, telomeres, in a broad sense, are a widespread genetic feature mos ...
maintenance.


Unusual


Immunoglobulin fold

The
immunoglobulin domain The immunoglobulin domain, also known as the immunoglobulin fold, is a type of protein domain that consists of a 2-layer sandwich of 7-9 antiparallel β-strands arranged in two β-sheets with a Greek key topology, consisting of about 125 amino ac ...
() consists of a beta-sheet structure with large connecting loops, which serve to recognize either DNA major grooves or antigens. Usually found in immunoglobulin proteins, they are also present in Stat proteins of the cytokine pathway. This is likely because the cytokine pathway evolved relatively recently and has made use of systems that were already functional, rather than creating its own.


B3 domain

The B3 DBD (, ) is found exclusively in
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The f ...
s from
higher plants Vascular plants (), also called tracheophytes () or collectively Tracheophyta (), form a large group of land plants ( accepted known species) that have lignified tissues (the xylem) for conducting water and minerals throughout the plant. They al ...
and restriction endonucleases EcoRII and BfiI and typically consists of 100-120 residues. It includes seven
beta sheet The beta sheet, (β-sheet) (also β-pleated sheet) is a common motif of the regular protein secondary structure. Beta sheets consist of beta strands (β-strands) connected laterally by at least two or three backbone hydrogen bonds, forming a ge ...
s and two alpha helices, which form a DNA-binding pseudobarrel protein fold.


TAL effector

TAL effector TAL (transcription activator-like) effectors (often referred to as TALEs, but not to be confused with the three amino acid loop extension homeobox class of proteins) are proteins secreted by some β- and γ-proteobacteria. Most of these are Xa ...
s are found in bacterial plant pathogens of the genus ''Xanthomonas'' and are involved in regulating the genes of the host plant in order to facilitate bacterial virulence, proliferation, and dissemination. They contain a central region of tandem 33-35 residue repeats and each repeat region encodes a single DNA base in the TALE's binding site. Within the repeat it is residue 13 alone that directly contacts the DNA base, determining sequence specificity, while other positions make contacts with the DNA backbone, stabilising the DNA-binding interaction. Each repeat within the array takes the form of paired alpha-helices, while the whole repeat array forms a right-handed superhelix, wrapping around the DNA-double helix. TAL effector repeat arrays have been shown to contract upon DNA binding and a two-state search mechanism has been proposed whereby the elongated TALE begins to contract around the DNA beginning with a successful Thymine recognition from a unique repeat unit N-terminal of the core TAL-effector repeat array. Related proteins are found in bacterial plant pathogen ''Ralstonia solanacearum'', the fungal endosymbiont ''Burkholderia rhizoxinica'' and two as-yet unidentified marine-microorganisms. The DNA binding code and the structure of the repeat array is conserved between these groups, referred to collectively as the TALE-likes.


RNA-guided

The CRISPR/Cas system of ''Streptococcus pyogenes'' can be programmed to direct both activation and repression to natural and artificial eukaryotic promoters through the simple engineering of guide RNAs with base-pairing complementarity to target DNA sites. Cas9 can be used as a customizable RNA-guided DNA-binding platform. Domain Cas9 can be functionalized with regulatory domains of interest (e.g., activation, repression, or epigenetic effector) or with endonuclease domain as a versatile tool for genome engineering biology. and then be targeted to multiple loci using different guide RNAs.


See also

*
Comparison of nucleic acid simulation software This is a list of notable computer programs that are used for nucleic acid Nucleic acids are biopolymers, macromolecules, essential to all known forms of life. They are composed of nucleotides, which are the monomers made of three components: a ...


References


External links


DBD database of predicted transcription factors
Uses a curated set of DNA-binding domains to predict transcription factors in all completely sequenced genomes
Table of DNA-binding motifs
* * {{MeshName, DNA-Binding+Proteins
DNA-binding domains
in PROSITE Molecular genetics Protein domains it:DNA-binding protein pl:Białko wiążące DNA