ChIP-seq
   HOME

TheInfoList



OR:

ChIP-sequencing, also known as ChIP-seq, is a method used to analyze
protein Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, res ...
interactions with DNA. ChIP-seq combines
chromatin immunoprecipitation Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific geno ...
(ChIP) with
massively parallel Massively parallel is the term for using a large number of computer processors (or separate computers) to simultaneously perform a set of coordinated computations in parallel. GPUs are massively parallel architecture with tens of thousands of th ...
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. T ...
to identify the
binding site In biochemistry and molecular biology, a binding site is a region on a macromolecule such as a protein that binds to another molecule with specificity. The binding partner of the macromolecule is often referred to as a ligand. Ligands may includ ...
s of DNA-associated proteins. It can be used to map global binding sites precisely for any protein of interest. Previously,
ChIP-on-chip ChIP-on-chip (also known as ChIP-chip) is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray (''"chip"''). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA ''in vivo' ...
was the most common technique utilized to study these protein–DNA relations.


Uses

ChIP-seq is primarily used to determine how
transcription factors In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription of genetic information from DNA to messenger RNA, by binding to a specific DNA sequence. The fun ...
and other chromatin-associated proteins influence
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology (biology), morphology or physical form and structure, its Developmental biology, developmental proc ...
-affecting mechanisms. Determining how proteins interact with DNA to regulate
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. T ...
is essential for fully understanding many biological processes and disease states. This
epigenetic In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are ...
information is complementary to
genotype The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
and expression analysis. ChIP-seq technology is currently seen primarily as an alternative to ChIP-chip which requires a hybridization array. This introduces some bias, as an array is restricted to a fixed number of probes. Sequencing, by contrast, is thought to have less bias, although the sequencing bias of different sequencing technologies is not yet fully understood. Specific DNA sites in direct physical interaction with transcription factors and other proteins can be isolated by
chromatin immunoprecipitation Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific geno ...
. ChIP produces a library of target DNA sites bound to a protein of interest. Massively parallel sequence analyses are used in conjunction with whole-genome sequence databases to analyze the interaction pattern of any protein with DNA, or the pattern of any epigenetic
chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important ...
modifications. This can be applied to the set of ChIP-able proteins and modifications, such as transcription factors,
polymerase A polymerase is an enzyme ( EC 2.7.7.6/7/19/48/49) that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by copying a DNA template strand using ba ...
s and transcriptional machinery,
structural proteins Proteins are large biomolecules and macromolecules that comprise one or more long chains of amino acid residues. Proteins perform a vast array of functions within organisms, including catalysing metabolic reactions, DNA replication, respond ...
, protein modifications, and DNA modifications. As an alternative to the dependence on specific antibodies, different methods have been developed to find the superset of all
nucleosome A nucleosome is the basic structural unit of DNA packaging in eukaryotes. The structure of a nucleosome consists of a segment of DNA wound around eight histone proteins and resembles thread wrapped around a spool. The nucleosome is the fundame ...
-depleted or nucleosome-disrupted active regulatory regions in the genome, like
DNase-Seq DNase-seq ( DNase I hypersensitive sites sequencing) is a method in molecular biology used to identify the location of regulatory regions, based on the genome-wide sequencing of regions sensitive to cleavage by DNase I. FAIRE-Seq is a successor of ...
and FAIRE-Seq.


Workflow of ChIP-sequencing


ChIP

ChIP Chromatin immunoprecipitation (ChIP) is a type of immunoprecipitation experimental technique used to investigate the interaction between proteins and DNA in the cell. It aims to determine whether specific proteins are associated with specific genom ...
is a powerful method to selectively enrich for DNA sequences bound by a particular protein in living
cells Cell most often refers to: * Cell (biology), the functional basic unit of life Cell may also refer to: Locations * Monastic cell, a small room, hut, or cave in which a religious recluse lives, alternatively the small precursor of a monastery w ...
. However, the widespread use of this method has been limited by the lack of a sufficiently robust method to identify all of the enriched DNA sequences. The ChIP wet lab protocol contains ChIP and hybridization. There are essentially five parts to the ChIP protocol that aid in better understanding the overall process of ChIP. In order to carry out the ChIP, the first step is cross-linking using formaldehyde and large batches of the DNA in order to obtain a useful amount. The cross-links are made between the protein and DNA, but also between RNA and other proteins. The second step is the process of chromatin fragmentation which breaks up the chromatin in order to get high quality DNA pieces for ChIP analysis in the end. These fragments should be cut to become under 500 base pairs each to have the best outcome for genome mapping. The third step is called chromatin immunoprecipitation, which is what ChIP is short for. The ChIP process enhances specific crosslinked DNA-protein complexes using an
antibody An antibody (Ab), also known as an immunoglobulin (Ig), is a large, Y-shaped protein used by the immune system to identify and neutralize foreign objects such as pathogenic bacteria and viruses. The antibody recognizes a unique molecule of t ...
against the protein of interest followed by incubation and centrifugation to obtain the immunoprecipitation. The immunoprecipitation step also allows for the removal of non-specific binding sites. The fourth step is DNA recovery and purification, taking place by the reversed effect on the cross-link between DNA and protein to separate them and cleaning DNA with an extraction. The fifth and final step is the analyzation step of the ChIP protocol by the process of qPCR,
ChIP-on-chip ChIP-on-chip (also known as ChIP-chip) is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray (''"chip"''). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA ''in vivo' ...
(hybrid array) or ChIP sequencing.
Oligonucleotide Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids ...
adaptors are then added to the small stretches of DNA that were bound to the protein of interest to enable massively parallel sequencing. Through the analysis, the sequences can then be identified and interpreted by the gene or region to where the protein was bound.


Sequencing

After size selection, all the resulting ChIP-DNA fragments are sequenced simultaneously using a genome sequencer. A single sequencing run can scan for genome-wide associations with high resolution, meaning that features can be located precisely on the chromosomes. ChIP-chip, by contrast, requires large sets of tiling arrays for lower resolution. There are many new sequencing methods used in this sequencing step. Some technologies that analyze the sequences can use
cluster amplification may refer to: Science and technology Astronomy * Cluster (spacecraft), constellation of four European Space Agency spacecraft * Asteroid cluster, a small asteroid family * Cluster II (spacecraft), a European Space Agency mission to study ...
of adapter-ligated ChIP DNA fragments on a solid flow cell substrate to create clusters of approximately 1000 clonal copies each. The resulting high density array of template clusters on the flow cell surface is sequenced by a genome analyzing program. Each template cluster undergoes sequencing-by-synthesis in parallel using novel fluorescently labelled reversible terminator nucleotides. Templates are sequenced base-by-base during each read. Then, the data collection and analysis software aligns sample sequences to a known genomic sequence to identify the ChIP-DNA fragments.


Quality control

ChIP-seq offers us a fast analysis, however, a quality control must be performed to make sure that the results obtained are reliable: * Non-redundant fraction: low-complexity regions should be removed as they are not informative and may interfere with mapping in the reference genome. * Fragments in peaks: ratio of reads that are located in peaks over reads that are located where there isn't a peak.


Sensitivity

Sensitivity of this technology depends on the depth of the sequencing run (i.e. the number of mapped sequence tags), the size of the genome and the distribution of the target factor. The sequencing depth is directly correlated with cost. If abundant binders in large genomes have to be mapped with high sensitivity, costs are high as an enormously high number of sequence tags will be required. This is in contrast to ChIP-chip in which the costs are not correlated with sensitivity. Unlike
microarray A microarray is a multiplex lab-on-a-chip. Its purpose is to simultaneously detect the expression of thousands of genes from a sample (e.g. from a tissue). It is a two-dimensional array on a solid substrate—usually a glass slide or silicon ...
-based ChIP methods, the precision of the ChIP-seq assay is not limited by the spacing of predetermined probes. By integrating a large number of short reads, highly precise binding site localization is obtained. Compared to ChIP-chip, ChIP-seq data can be used to locate the binding site within few tens of base pairs of the actual protein binding site. Tag densities at the binding sites are a good indicator of protein–DNA binding affinity, which makes it easier to quantify and compare binding affinities of a protein to different DNA sites.


Current research

STAT1 DNA association: ChIP-seq was used to study STAT1 targets in
HeLa HeLa (; also Hela or hela) is an immortalized cell line used in scientific research. It is the oldest and most commonly used human cell line. The line is derived from cervical cancer cells taken on February 8, 1951, named after Henrietta ...
S3 cells which are clones of the HeLa line that are used for analysis of cell populations. The performance of ChIP-seq was then compared to the alternative protein–DNA interaction methods of ChIP-PCR and ChIP-chip. Nucleosome Architecture of Promoters: Using ChIP-seq, it was determined that Yeast genes seem to have a minimal nucleosome-free promoter region of 150bp in which RNA polymerase can initiate transcription. Transcription factor conservation: ChIP-seq was used to compare conservation of TFs in the forebrain and heart tissue in embryonic mice. The authors identified and validated the heart functionality of transcription enhancers, and determined that transcription enhancers for the heart are less conserved than those for the forebrain during the same developmental stage. Genome-wide ChIP-seq: ChIP-sequencing was completed on the worm ''C. elegans'' to explore genome-wide binding sites of 22 transcription factors. Up to 20% of the annotated candidate genes were assigned to transcription factors. Several transcription factors were assigned to non-coding RNA regions and may be subject to developmental or environmental variables. The functions of some of the transcription factors were also identified. Some of the transcription factors regulate genes that control other transcription factors. These genes are not regulated by other factors. Most transcription factors serve as both targets and regulators of other factors, demonstrating a network of regulation. Inferring regulatory network: ChIP-seq signal of Histone modification were shown to be more correlated with transcription factor motifs at promoters in comparison to RNA level. Hence author proposed that using histone modification ChIP-seq would provide more reliable inference of gene-regulatory networks in comparison to other methods based on expression. ChIP-seq offers an alternative to ChIP-chip. STAT1 experimental ChIP-seq data have a high degree of similarity to results obtained by ChIP-chip for the same type of experiment, with greater than 64% of peaks in shared genomic regions. Because the data are sequence reads, ChIP-seq offers a rapid analysis pipeline as long as a high-quality genome sequence is available for read mapping and the genome doesn't have repetitive content that confuses the mapping process. ChIP-seq also has the potential to detect mutations in binding-site sequences, which may directly support any observed changes in protein binding and gene regulation.


Computational analysis

As with many high-throughput sequencing approaches, ChIP-seq generates extremely large data sets, for which appropriate computational analysis methods are required. To predict DNA-binding sites from ChIP-seq read count data,
peak calling Peak calling is a computational method used to identify areas in a genome that have been enriched with aligned reads as a consequence of performing a ChIP-sequencing or MeDIP-seq experiment. These areas are those where a protein interacts with DNA ...
methods have been developed. One of the most popular methods is MACS which empirically models the shift size of ChIP-Seq tags, and uses it to improve the spatial resolution of predicted binding sites. MACS is optimized for higher resolution peaks, while another popular algorithm, SICER is programmed to call for broader peaks, spanning over kilobases to megabases in order to search for broader chromatin domains. SICER is more useful for histone marks spanning gene bodies. A mathematical more rigorous method BCP (Bayesian Change Point) can be used for both sharp and broad peaks with faster computational speed, see benchmark comparison of ChIP-seq peak-calling tools by Thomas ''et al.'' (2017). Another relevant computational problem is differential peak calling, which identifies significant differences in two ChIP-seq signals from distinct biological conditions. Differential peak callers segment two ChIP-seq signals and identify differential peaks using
Hidden Markov Model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ...
s. Examples for two-stage differential peak callers are ChIPDiff and ODIN. To reduce spurious sites from ChIP-seq, multiple experimental controls can be used to detect binding sites from an IP experiment. Bay2Ctrls adopts a Bayesian model to integrate the DNA input control for the IP, the mock IP and its corresponding DNA input control to predict binding sites from the IP. This approach is particularly effective for complex samples such as whole model organisms. In addition, the analysis indicates that for complex samples mock IP controls substantially outperform DNA input controls probably due to the active genomes of the samples.


See also

*
ChIP-on-chip ChIP-on-chip (also known as ChIP-chip) is a technology that combines chromatin immunoprecipitation ('ChIP') with DNA microarray (''"chip"''). Like regular ChIP, ChIP-on-chip is used to investigate interactions between proteins and DNA ''in vivo' ...
* ChIP-PCR * ChIP-PET *
Mammalian promoter database The Mammalian Promoter Database (MPromDb) is a curated database of gene promoters identified from ChIP-seq. The proximal promoter region (upstream of the core-promoter region) contains the cis-regulatory elements of most of the transcription f ...


Similar methods

*
CUT&RUN sequencing CUT&RUN sequencing, also known as cleavage under targets and release using nuclease, is a method used to analyze protein interactions with DNA. CUT&RUN sequencing combines antibody-targeted controlled cleavage by micrococcal nuclease with massive ...
, antibody-targeted controlled cleavage by micrococcal nuclease instead of ChIP, allowing for enhanced signal-to-noise ratio during sequencing. * CUT&Tag sequencing, antibody-targeted controlled cleavage by transposase Tn5 instead of ChIP, allowing for enhanced signal-to-noise ratio during sequencing. * Sono-Seq, identical to ChIP-Seq but skipping the immunoprecipitation step. *
HITS-CLIP High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP, also known as CLIP-Seq) is a genome-wide means of mapping protein–RNA binding sites or RNA modification sites in vivo. HITS-CLIP was originally used to ...
(also called
CLIP-Seq High-throughput sequencing of RNA isolated by crosslinking immunoprecipitation (HITS-CLIP, also known as CLIP-Seq) is a genome-wide means of mapping protein– RNA binding sites or RNA modification sites in vivo. HITS-CLIP was originally used t ...
), for finding interactions with RNA rather than DNA. *
PAR-CLIP PAR-CLIP (photoactivatable ribonucleoside-enhanced crosslinking and immunoprecipitation) is a biochemical method for identifying the binding sites of cellular RNA-binding proteins (RBPs) and microRNA-containing ribonucleoprotein complexes (miRNPs) ...
, another method for identifying the binding sites of cellular RNA-binding proteins (RBPs). * RIP-Chip, same goal and first steps, but does not use cross linking methods and uses microarray instead of sequencing * SELEX, a method for finding a consensus binding sequence * Competition-ChIP, to measure relative replacement dynamics on DNA. * ChiRP-Seq to measure RNA-bound DNA and proteins. *
ChIP-exo ChIP-exo is a chromatin immunoprecipitation based method for mapping the locations at which a protein of interest (transcription factor) binds to the genome. It is a modification of the ChIP-seq protocol, improving the resolution of binding sites f ...
uses exonuclease treatment to achieve up to single base-pair resolution * ChIP-nexus improved version of
ChIP-exo ChIP-exo is a chromatin immunoprecipitation based method for mapping the locations at which a protein of interest (transcription factor) binds to the genome. It is a modification of the ChIP-seq protocol, improving the resolution of binding sites f ...
to achieve up to single base-pair resolution. *
DRIP-seq DRIP-seq (DRIP-sequencing) is a technology for genome-wide profiling of a type of DNA-RNA hybrid called an "R-loop". DRIP-seq utilizes a sequence-independent but structure-specific antibody for DNA-RNA immunoprecipitation (DRIP) to capture R-loops ...
uses S9.6 antibody to precipitate three-stranded DND:RNA hybrids called R-loops. * TCP-seq, principally similar method to measure mRNA translation dynamics. * Calling Cards, uses a transposase to mark the sequence where a transcription factor binds.


References


External links


ReMap catalogue
An integrative and uniform ChIP-Seq analysis of regulatory elements from +2800 ChIP-seq datasets, giving a catalogue of 80 million peaks from 485 transcription regulators.
ChIPBase database
a database for exploring transcription factor binding maps from ChIP-Seq data. It provides the most comprehensive ChIP-Seq data set for various cell/tissue types and conditions.
GeneProf database and analysis tool
GeneProf is a freely accessible, easy-to-use analysis environment for ChIP-seq and RNA-seq data and comes with a large database of ready-analysed public experiments, e.g. for transcription factor binding and histone modifications.
Differential Peak Calling
Tutorial for differential peak calling with ODIN.
Bioinformatic analysis of ChIP-seq data
Comprehensive analysis of ChIP-seq data.
KLTepigenome
Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform.
SignalSpider
a tool for probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles
FullSignalRanker
a tool for regression and peak prediction on multiple normalized ChIP-Seq signal profiles {{Use dmy dates, date=April 2017 Biotechnology DNA Genomics techniques Proteomic sequencing