Chromosome Conformation Capture
   HOME

TheInfoList



OR:

Chromosome conformation capture techniques (often abbreviated to 3C technologies or 3C-based methods) are a set of molecular biology methods used to analyze the spatial
organization An organization or organisation (Commonwealth English; see spelling differences), is an entity—such as a company, an institution, or an association—comprising one or more people and having a particular purpose. The word is derived from ...
of
chromatin Chromatin is a complex of DNA and protein found in eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important roles in r ...
in a cell. These methods quantify the number of interactions between genomic loci that are nearby in 3-D space, but may be separated by many
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules wi ...
s in the linear genome. Such interactions may result from biological functions, such as promoter- enhancer interactions, or from random polymer looping, where undirected physical motion of chromatin causes loci to collide. Interaction frequencies may be analyzed directly, or they may be converted to distances and used to reconstruct 3-D structures. The chief difference between 3C-based methods is their scope. For example, when using PCR to detect interaction in a 3C experiment, the interactions between two specific fragments are quantified. In contrast,
Hi-C Hi-C is a fruit juice–flavored drink made by the Minute Maid division of The Coca-Cola Company. It was created by Niles Foster in 1946 and released in 1947. The sole original flavor was orange. History Niles Foster, a former bakery and ...
quantifies interactions between all possible pairs of fragments simultaneously. Deep sequencing of material produced by 3C also produces genome-wide interactions maps.


History

Historically,
microscopy Microscopy is the technical field of using microscopes to view objects and areas of objects that cannot be seen with the naked eye (objects that are not within the resolution range of the normal eye). There are three well-known branches of micr ...
was the primary method of investigating
nuclear organization Nuclear organization refers to the spatial distribution of chromatin within a cell nucleus. There are many different levels and scales of nuclear organisation. Chromatin is a higher order structure of DNA. At the smallest scale, DNA is pack ...
, which can be dated back to 1590. * In 1879,
Walther Flemming Walther Flemming (21 April 1843 – 4 August 1905) was a German biologist and a founder of cytogenetics. He was born in Sachsenberg (now part of Schwerin) as the fifth child and only son of the psychiatrist Carl Friedrich Flemming (1799–18 ...
coined the term chromatin. * In 1883,
August Weismann August Friedrich Leopold Weismann FRS (For), HonFRSE, LLD (17 January 18345 November 1914) was a German evolutionary biologist. Fellow German Ernst Mayr ranked him as the second most notable evolutionary theorist of the 19th century, after Cha ...
connected chromatin with heredity. * In 1884,
Albrecht Kossel Ludwig Karl Martin Leonhard Albrecht Kossel (; 16 September 1853 – 5 July 1927) was a German biochemist and pioneer in the study of genetics. He was awarded the Nobel Prize for Physiology or Medicine in 1910 for his work in determining the ch ...
discovered histones. * In 1888,
Sutton Sutton (''south settlement'' or ''south town'' in Old English) may refer to: Places United Kingdom England In alphabetical order by county: * Sutton, Bedfordshire * Sutton, Berkshire, a List of United Kingdom locations: Stu-Sz#Su, location * S ...
and Boveri proposed the theory of continuity of chromatin during the cell cycle * In 1889, Wilhelm von Waldemeyer created the term "
chromosome A chromosome is a long DNA molecule with part or all of the genetic material of an organism. In most chromosomes the very long thin DNA fibers are coated with packaging proteins; in eukaryotic cells the most important of these proteins are ...
". * In 1928, Emil Heitz coined the terms
heterochromatin Heterochromatin is a tightly packed form of DNA or '' condensed DNA'', which comes in multiple varieties. These varieties lie on a continue between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a role ...
and
euchromatin Euchromatin (also called "open chromatin") is a lightly packed form of chromatin ( DNA, RNA, and protein) that is enriched in genes, and is often (but not always) under active transcription. Euchromatin stands in contrast to heterochromatin, whic ...
. * In 1942,
Conrad Waddington Conrad Hal Waddington (8 November 1905 – 26 September 1975) was a British developmental biologist, paleontologist, geneticist, embryologist and philosopher who laid the foundations for systems biology, epigenetics, and evolutionary devel ...
postulated the epigenetic landscapes. * In 1948, Rollin Hotchkiss discovered DNA methylation. * In 1953, Watson and Crick discovered the double helix structure of DNA. * In 1961, Mary Lyon postulated the principle of
X-inactivation X-inactivation (also called Lyonization, after English geneticist Mary Lyon) is a process by which one of the copies of the X chromosome is inactivated in therian female mammals. The inactive X chromosome is silenced by being packaged into ...
. * In 1973/1974, chromatin fiber was discovered. * In 1975, Pierre Chambon coined the term
nucleosome A nucleosome is the basic structural unit of DNA packaging in eukaryotes. The structure of a nucleosome consists of a segment of DNA wound around eight histone proteins and resembles thread wrapped around a spool. The nucleosome is the fundamen ...
s. * In 1982,
Chromosome territories In cell biology, chromosome territories are regions of the nucleus preferentially occupied by particular chromosomes. Interphase chromosomes are long DNA strands that are extensively folded, and are often described as appearing like a bowl of ...
were discovered. * In 1984,
John T. Lis John T. Lis (born in Willimantic, Connecticut) is the Barbara McClintock Professor of Molecular Biology & Genetics at the Cornell University College of Agriculture and Life Sciences. Dr. Lis was a recipient of a Guggenheim Fellowship in 2000 for ...
innovated the Chromatin immunoprecipitation technique. * In 1993, the Nuclear Ligation Assay was published, a method that could determine circularization frequencies of DNA in solution. This assay was used to show that
estrogen Estrogen or oestrogen is a category of sex hormone responsible for the development and regulation of the female reproductive system and secondary sex characteristics. There are three major endogenous estrogens that have estrogenic hormonal acti ...
induces an interaction between the
prolactin Prolactin (PRL), also known as lactotropin, is a protein best known for its role in enabling mammals to produce milk. It is influential in over 300 separate processes in various vertebrates, including humans. Prolactin is secreted from the pit ...
gene promoter and a nearby enhancer. * In 2002, Job Dekker introduced the new idea that dense matrices of interaction frequencies between loci could be used to infer the spatial organization of genomes. This idea was the basis for his development of the chromosome conformation capture (3C) assay, published in 2002 by Job Dekker and colleagues in the Kleckner lab at
Harvard University Harvard University is a private Ivy League research university in Cambridge, Massachusetts. Founded in 1636 as Harvard College and named for its first benefactor, the Puritan clergyman John Harvard, it is the oldest institution of higher le ...
. * In 2003, the
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
was finished. * In 2006, Marieke Simonis invented 4C, Dostie, in the Dekker lab, invented 5C. * In 2007, B. Franklin Pugh innovated
ChIP-seq ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated prote ...
technique. * In 2009,
Erez Lieberman Aiden Erez Lieberman Aiden (born 1980, né Erez Lieberman) is an American research scientist active in multiple fields related to applied mathematics. He is an assistant professor at the Baylor College of Medicine, and formerly a fellow at the Harvar ...
and Job Dekker invented Hi-C, Melissa J. Fullwood and Yijun Ruan invented ChIA-PET. * In 2012, The Ren group, and the groups led by Edith Heard and Job Dekker discovered Topologically Associating Domains (TADs) in mammals. *In 2013, Takashi Nagano and Peter Fraser introduced in-nuclei ligation for Hi-C and single-cell Hi-C. *In 2014, Suhas Rao, Miriam Huntley, et al. developed in-situ Hi-C and the use of 4-cutter restriction enzymes, and released the first high-resolution datasets down to kilobase resolution for several human cell lines. They also identified the first clear evidence of CTCF-Cohesin looping in Hi-C maps and identified the convergent CTCF motif rule underlying these loops.


Experimental methods

All 3C methods start with a similar set of steps, performed on a sample of cells. First, the cell genomes are
cross-linked In chemistry and biology a cross-link is a bond or a short sequence of bonds that links one polymer chain to another. These links may take the form of covalent bonds or ionic bonds and the polymers can be either synthetic polymers or natural ...
with
formaldehyde Formaldehyde ( , ) (systematic name methanal) is a naturally occurring organic compound with the formula and structure . The pure compound is a pungent, colourless gas that polymerises spontaneously into paraformaldehyde (refer to section F ...
, which introduces bonds that "freeze" interactions between genomic loci. Treatment of cells with 1-3% formaldehyde, for 10-30min at room temperature is most common, however, standardization for preventing high protein-DNA cross linking is necessary, as this may negatively affect the efficiency of restriction digestion in the subsequent step. The genome is then cut into fragments with a
restriction endonuclease A restriction enzyme, restriction endonuclease, REase, ENase or'' restrictase '' is an enzyme that cleaves DNA into fragments at or near specific recognition sites within molecules known as restriction sites. Restriction enzymes are one class o ...
. The size of restriction fragments determines the resolution of interaction mapping. Restriction enzymes (REs) that make cuts on 6bp recognition sequences, such as
EcoR1 ''Eco''RI (pronounced "eco R one") is a restriction endonuclease enzyme isolated from species '' E. coli.'' It is a restriction enzyme that cleaves DNA double helices into fragments at specific sites, and is also a part of the restriction modifica ...
or
HindIII ''Hin''dIII (pronounced "Hin D Three") is a type II site-specific deoxyribonuclease restriction enzyme isolated from ''Haemophilus influenzae'' that cleaves the DNA palindromic sequence AAGCTT in the presence of the cofactor Mg2+ via hydrolysis ...
, are used for this purpose, as they cut the genome once every 4000bp, giving ~ 1 million fragments in the human genome. For more precise interaction mapping, a 4bp recognizing RE may also be used. The next step is, proximity based
ligation Ligation may refer to: * Ligation (molecular biology), the covalent linking of two ends of DNA or RNA molecules * In medicine, the making of a ligature (tie) * Chemical ligation, the production of peptides from amino acids * Tubal ligation, a meth ...
. This takes place at low DNA concentrations or within intact, permeabilized nuclei in the presence of T4 DNA ligase, such that ligation between cross-linked interacting fragments is favored over ligation between fragments that are not cross-linked. Subsequently, interacting loci are quantified by amplifying ligated junctions by PCR methods.


Original methods


3C (one-vs-one)

The chromosome conformation capture (3C) experiment quantifies interactions between a single pair of genomic loci. For example, 3C can be used to test a candidate promoter-enhancer interaction. Ligated fragments are detected using PCR with known primers. That is why this technique requires the prior knowledge of the interacting regions.


4C (one-vs-all)

Chromosome conformation capture-on-chip (4C) (also known as circular chromosome conformation capture) captures interactions between one locus and all other genomic loci. It involves a second ligation step, to create self-circularized DNA fragments, which are used to perform inverse PCR. Inverse PCR allows the known sequence to be used to amplify the unknown sequence ligated to it. In contrast to 3C and 5C, the 4C technique does not require the prior knowledge of both interacting chromosomal regions. Results obtained using 4C are highly reproducible with most of the interactions that are detected between regions proximal to one another. On a single microarray, approximately a million interactions can be analyzed.


5C (many-vs-many)

Chromosome conformation capture carbon copy (5C) detects interactions between all restriction fragments within a given region, with this region's size typically no greater than a megabase. This is done by ligating universal primers to all fragments. However, 5C has relatively low coverage. The 5C technique overcomes the junctional problems at the intramolecular ligation step and is useful for constructing complex interactions of specific loci of interest. This approach is unsuitable for conducting genome-wide complex interactions since that will require millions of 5C primers to be used.


Hi-C (all-vs-all)

Hi-C uses
high-throughput sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. The ...
to find the nucleotide sequence of fragments and uses
paired end sequencing In genetics, shotgun sequencing is a method used for sequencing random DNA strands. It is named by analogy with the rapidly expanding, quasi-random shot grouping of a shotgun. The chain-termination method of DNA sequencing ("Sanger sequencing ...
, which retrieves a short sequence from each end of each ligated fragment. As such, for a given ligated fragment, the two sequences obtained should represent two different restriction fragments that were ligated together in the proximity based ligation step. The pair of sequences are individually aligned to the genome, thus determining the fragments involved in that ligation event. Hence, all possible pairwise interactions between fragments are tested.


Sequence capture-based methods

A number of methods use
oligonucleotide Oligonucleotides are short DNA or RNA molecules, oligomers, that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids c ...
capture to enrich 3C and Hi-C libraries for specific loci of interest. These methods include Capture-C, NG Capture-C, Capture-3C, HiCap, Capture Hi-C. and Micro Capture-C. These methods are able to produce higher resolution and sensitivity than 4C based methods, Micro Capture-C provides the highest resolution of the available 3C techniques and it is possible to generate base pair resolution data.


Single-cell methods

Single-cell adaptations of these methods, such as ChIP-seq and Hi-C can be used to investigate the interactions occurring in individual cells.


Multi-interaction methods

A number of methods sequence multiple ligation junctions simultaneously to detect higher-order structures where multiple regions of chromatin may be interacting. These methods include Tri-C, 3way 4C/C-walks, and multi-contact 4C (MC-4C).


Immunoprecipitation-based methods


ChIP-loop

ChIP-loop combines 3C with
ChIP-seq ChIP-sequencing, also known as ChIP-seq, is a method used to analyze protein interactions with DNA. ChIP-seq combines chromatin immunoprecipitation (ChIP) with massively parallel DNA sequencing to identify the binding sites of DNA-associated prote ...
to detect interactions between two loci of interest mediated by a protein of interest. The ChIP-loop may be useful in identifying long-range ''cis''-interactions and ''trans'' interaction mediated through proteins since frequent DNA collisions will not occur.


Genome wide methods

ChIA-PET Chromatin Interaction Analysis by Paired-End Tag Sequencing (ChIA-PET or ChIA-PETS) is a technique that incorporates chromatin immunoprecipitation (ChIP)-based enrichment, chromatin proximity ligation, Paired-End Tags, and High-throughput sequen ...
combines Hi-C with ChIP-seq to detect all interactions mediated by a protein of interest. HiChIP was designed to allow similar analysis as ChIA-PET with less input material.


Biological impact

3C methods have led to a number of biological insights, including the discovery of new structural features of chromosomes, the cataloguing of chromatin loops, and increased understanding of
transcriptional regulation In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA (transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from alt ...
mechanisms (the disruption of which can lead to disease). 3C methods have demonstrated the importance of spatial proximity of regulatory elements to the genes that they regulate. For example, in tissues that express
globin The globins are a superfamily of heme-containing globular proteins, involved in binding and/or transporting oxygen. These proteins all incorporate the globin fold, a series of eight alpha helical segments. Two prominent members include myogl ...
genes, the β-globin locus control region forms a loop with these genes. This loop is not found in tissues where the gene is not expressed. This technology has further aided the genetic and
epigenetic In biology, epigenetics is the study of stable phenotypic changes (known as ''marks'') that do not involve alterations in the DNA sequence. The Greek prefix '' epi-'' ( "over, outside of, around") in ''epigenetics'' implies features that are "o ...
study of chromosomes both in model organisms and in humans. These methods have revealed large-scale organization of the genome into
topologically associating domain A topologically associating domain (TAD) is a self-interacting genomic region, meaning that DNA sequences within a TAD physically interact with each other more frequently than with sequences outside the TAD. The median size of a TAD in mouse cells ...
s (TADs), which correlate with epigenetic markers. Some TADs are transcriptionally active, while others are repressed. Many TADs have been found in D. melanogaster, mouse and human. Moreover,
CTCF Transcriptional repressor CTCF also known as 11-zinc finger protein or CCCTC-binding factor is a transcription factor that in humans is encoded by the ''CTCF'' gene. CTCF is involved in many cellular processes, including transcriptional regulatio ...
and
cohesin Cohesin is a protein complex that mediates sister chromatid cohesion, homologous recombination, and DNA looping. Cohesin is formed of SMC3, SMC1, SCC1 and SCC3 ( SA1 or SA2 in humans). Cohesin holds sister chromatids together after DNA rep ...
play important roles in determining TADs and enhancer-promoter interactions. The result shows that the orientation of CTCF binding motifs in an enhancer-promoter loop should be facing to each other in order for the enhancer to find its correct target.


Human disease

There are several diseases caused by defects in promoter-enhancer interactions, which are reviewed in this paper.
Beta thalassemia Beta thalassemias (β thalassemias) are a group of inherited blood disorders. They are forms of thalassemia caused by reduced or absent synthesis of the beta chains of hemoglobin that result in variable outcomes ranging from severe anemia to cli ...
is a certain type of blood disorders caused by a deletion of LCR enhancer element.
Holoprosencephaly Holoprosencephaly (HPE) is a cephalic disorder in which the prosencephalon (the forebrain of the embryo) fails to develop into two hemispheres, typically occurring between the 18th and 28th day of gestation. Normally, the forebrain is formed and t ...
is cephalic disorder caused by a mutation in the SBE2 enhancer element, which in turn weakened the production of SHH gene. PPD2 (polydactyly of a triphalangeal thumb) is caused by a mutation of ZRS enhancer, which in turn strengthened the production of SHH gene.
Adenocarcinoma of the lung Adenocarcinoma of the lung is the most common type of lung cancer, and like other forms of lung cancer, it is characterized by distinct cellular and molecular features. It is classified as one of several non-small cell lung cancers (NSCLC), to di ...
can be caused by a duplication of enhancer element for MYC gene. T-cell acute lymphoblastic leukemia is caused by an introduction of a new enhancer.


Data analysis

The different 3C-style experiments produce data with very different structures and statistical properties. As such, specific analysis packages exist for each experiment type. Hi-C data is often used to analyze genome-wide chromatin organization, such as topologically associating domains (TADs), linearly contiguous regions of the genome that are associated in 3-D space. Several algorithms have been developed to identify TADs from Hi-C data. Hi-C and its subsequent analyses are evolving. Fit-Hi-C is a method based on a discrete binning approach with modifications of adding distance of interaction (initial spline fitting, aka spline-1) and refining the null model (spline-2). The result of Fit-Hi-C is a list of pairwise intra-chromosomal interactions with their p-values and q-values. The 3-D organization of the genome can also be analyzed via
eigendecomposition In linear algebra, eigendecomposition is the factorization of a matrix into a canonical form, whereby the matrix is represented in terms of its eigenvalues and eigenvectors. Only diagonalizable matrices can be factorized in this way. When the matr ...
of the contact matrix. Each eigenvector corresponds to a set of loci, which are not necessarily linearly contiguous, that share structural features. A significant confounding factor in 3C technologies is the frequent non-specific interactions between genomic loci that occur due to random
polymer A polymer (; Greek '' poly-'', "many" + ''-mer'', "part") is a substance or material consisting of very large molecules called macromolecules, composed of many repeating subunits. Due to their broad spectrum of properties, both synthetic a ...
behavior. An interaction between two loci must be confirmed as specific through statistical significance testing.


Normalization of Hi-C contact map

There are two major ways of normalizing raw Hi-C contact heat maps. The first way is to assume equal visibility, meaning there is an equal chance for each chromosomal position to have an interaction. Therefore, the true signal of a Hi-C contact map should be a balanced matrix (Balanced matrix has constant row sums and column sums). An example of algorithms that assumes equal visibility is Sinkhorn-Knopp algorithm, which scales the raw Hi-C contact map into a balanced matrix. The other way is to assume there is a bias associated with each chromosomal position. The contact map value at each coordinate will be the true signal at that position times bias associated with the two contact positions. An example of algorithms that aim to solve this model of bias is iterative correction, which iteratively regressed out row and column bias from the raw Hi-C contact map. There are a number of software tools available for analysis of Hi-C data.


DNA motif analysis

DNA motifs are specific short DNA sequences, often 8-20 nucleotides in length which are statistically overrepresented in a set of sequences with a common biological function. Currently, regulatory motifs on the long-range chromatin interactions have not been studied extensively. Several studies have focused on elucidating the impact of DNA motifs in promoter-enhancer interactions. Bailey et al. has identified that ZNF143 motif in the promoter regions provides sequence specificity for promoter-enhancer interactions. Mutation of ZNF143 motif decreased the frequency of promoter-enhancer interactions suggesting that ZNF143 is a novel chromatin-looping factor. For genome-scale motif analysis, in 2016, Wong et al. reported a list of 19,491 DNA motif pairs for K562 cell line on the promoter-enhancer interactions. As a result, they proposed that motif pairing multiplicity (number of motifs that are paired with a given motif) is linked to interaction distance and regulatory region type. In the next year, Wong published another article reporting 18,879 motif pairs in 6 human cell lines.Ka-Chun Wong; MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences, Bioinformatics, Volume 33, Issue 19, 1 October 2017, Pages 3028–3035, https://doi.org/10.1093/bioinformatics/btx381 A novel contribution of this work is MotifHyades, a motif discovery tool that can be directly applied to paired sequences.


Cancer genome analysis

The 3C-based techniques can provide insights into the chromosomal rearrangements in the cancer genomes. Moreover, they can show changes of spatial proximity for regulatory elements and their target genes, which bring deeper understanding of the structural and functional basis of the genome.


References


Further reading

* * * * * * *


See also

* {{DEFAULTSORT:Chromosome Conformation Capture Biochemical separation processes Chromosomes Nuclear organization