ScGET-seq
   HOME

TheInfoList



OR:

Single-cell genome and epigenome by transposases sequencing (scGET-seq) is a
DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, thymine, cytosine, and guanine. The ...
method for profiling open and closed
chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
. In contrast to single-cell assay for transposase-accessible chromatin with sequencing (scATAC-seq), which only targets active
euchromatin Euchromatin (also called "open chromatin") is a lightly packed form of chromatin (DNA, RNA, and protein) that is enriched in genes, and is often (but not always) under active transcription. Euchromatin stands in contrast to heterochromatin, which ...
, scGET-seq is also capable of probing inactive
heterochromatin Heterochromatin is a tightly packed form of DNA or '' condensed DNA'', which comes in multiple varieties. These varieties lie on a continuum between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a rol ...
. This is achieved through the use of TnH, which is created by linking the
chromodomain Overview Chromodomains are evolutionarily conserved protein domains found across a wide variety of eukaryotic species. Some chromodomain-containing genes have multiple alternative splicing isoforms that omit the chromodomain entirely. They are p ...
(CD) of heterochromatin protein-1-alpha (HP-1\alpha) to the Tn5
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transpositio ...
. TnH is then able to target histone 3 lysine 9 trimethylation (
H3K9me3 H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin. Nomenclature H3K9me ...
), a marker for heterochromatin. Akin to RNA velocity, which uses the ratio of spliced to unspliced RNA to infer the kinetics of changes in gene expression over the course of cellular development, the ratio of TnH to Tn5 signals obtained from scGET-seq can be used to calculate chromatin velocity, which measures the dynamics of chromatin accessibility over the course of cellular developmental pathways.


History

Transcriptional regulation In molecular biology and genetics, transcriptional regulation is the means by which a cell regulates the conversion of DNA to RNA ( transcription), thereby orchestrating gene activity. A single gene can be regulated in a range of ways, from al ...
is tightly linked to chromatin states.
Chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
that is open, or permissive to transcription, make up only 2-3% of the
genome A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
, but encompass 94.4% of
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
binding sites. Conversely, more tightly packed
DNA Deoxyribonucleic acid (; DNA) is a polymer composed of two polynucleotide chains that coil around each other to form a double helix. The polymer carries genetic instructions for the development, functioning, growth and reproduction of al ...
, or
heterochromatin Heterochromatin is a tightly packed form of DNA or '' condensed DNA'', which comes in multiple varieties. These varieties lie on a continuum between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a rol ...
, is responsible for genome organization and stability. Chromatin density also changes over the course of cellular differentiation processes, but there is a lack of high-throughput sequencing methods for directly assaying heterochromatin. Many genomic-related diseases such as
cancer Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
are highly linked to changes in their
epigenome In biology, the epigenome of an organism is the collection of chemical changes to its DNA and histone proteins that affects when, where, and how the DNA is expressed; these changes can be passed down to an organism's offspring via transgenerat ...
. Cancers in particular are characterized by single-cell heterogeneity, which can drive
metastasis Metastasis is a pathogenic agent's spreading from an initial or primary site to a different or secondary site within the host's body; the term is typically used when referring to metastasis by a cancerous tumor. The newly pathological sites, ...
and treatment resistance.  The mechanisms that underlie these processes are still largely unknown, although the advent of single-cell technologies, including single-cell epigenomics, has contributed greatly to their elucidation. In 2015,
ATAC-seq ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) is a laboratory technique used in molecular biology to assess genome-wide chromatin, chromatin accessibility. The technique was first described in 2013 as an alternative approa ...
, which uses the Tn5
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transpositio ...
to fragment and tag accessible chromatin, or
euchromatin Euchromatin (also called "open chromatin") is a lightly packed form of chromatin (DNA, RNA, and protein) that is enriched in genes, and is often (but not always) under active transcription. Euchromatin stands in contrast to heterochromatin, which ...
, for sequencing, became feasible at the single-cell resolution. scGET-seq builds upon this technology by also providing information on
heterochromatin Heterochromatin is a tightly packed form of DNA or '' condensed DNA'', which comes in multiple varieties. These varieties lie on a continuum between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a rol ...
, providing a more comprehensive look at
chromatin Chromatin is a complex of DNA and protein found in eukaryote, eukaryotic cells. The primary function is to package long DNA molecules into more compact, denser structures. This prevents the strands from becoming tangled and also plays important r ...
structure and dynamics within each cell.


Methods


Sample preparation

Sample preparation for scGET-seq starts with obtaining a suspension of nuclei from cells using a method appropriate for the starting material. The next step is to produce the TnH
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transpositio ...
. Tn5 is a
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transpositio ...
that cuts and ligates adapters to genomic regions unbound by
nucleosome A nucleosome is the basic structural unit of DNA packaging in eukaryotes. The structure of a nucleosome consists of a segment of DNA wound around eight histone, histone proteins and resembles thread wrapped around a bobbin, spool. The nucleosome ...
s (open chromatin). HP-1a is a member of the HP1 family and is able to recognize and specifically bind to
H3K9me3 H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin. Nomenclature H3K9me ...
. Its
chromodomain Overview Chromodomains are evolutionarily conserved protein domains found across a wide variety of eukaryotic species. Some chromodomain-containing genes have multiple alternative splicing isoforms that omit the chromodomain entirely. They are p ...
uses an induced-fit mechanism for recognizing this chromatin modification. Linking the first 112 amino acids of HP-1a containing the
chromodomain Overview Chromodomains are evolutionarily conserved protein domains found across a wide variety of eukaryotic species. Some chromodomain-containing genes have multiple alternative splicing isoforms that omit the chromodomain entirely. They are p ...
to Tn5 using a three poly-tyrosine-glycine-serine (TGS) linker leads to the creation of the TnH
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transpositio ...
, which is capable of targeting
heterochromatin Heterochromatin is a tightly packed form of DNA or '' condensed DNA'', which comes in multiple varieties. These varieties lie on a continuum between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a rol ...
marked by
H3K9me3 H3K9me3 is an epigenetic modification to the DNA packaging protein Histone H3. It is a mark that indicates the tri-methylation at the 9th lysine residue of the histone H3 protein and is often associated with heterochromatin. Nomenclature H3K9me ...
. Library preparation is done using a modified protocol for single-cell
ATAC-seq ATAC-seq (Assay for Transposase-Accessible Chromatin using sequencing) is a laboratory technique used in molecular biology to assess genome-wide chromatin, chromatin accessibility. The technique was first described in 2013 as an alternative approa ...
, where the nuclei suspension is sequentially incubated with the Tn5
transposase A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome, typically by a cut-and-paste mechanism or a replicative mechanism, in a process known as transpositio ...
first, and then TnH.


Data analysis

The goals of the data analysis are: # To identify and characterize distinct cell populations using clustering # To profile chromatin accessibility across the genome # To predict
copy-number variants Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of structural variation: specifically, it is a type of G ...
and single-nucleotide variants


Pre-processing

# Post-
sequencing In genetics and biochemistry, sequencing means to determine the primary structure (sometimes incorrectly called the primary sequence) of an unbranched biopolymer. Sequencing results in a symbolic linear depiction known as a sequence which succ ...
, reads need to be demultiplexed and mapped to the appropriate
reference genome A reference genome (also known as a reference assembly) is a digital nucleic acid sequence database, assembled by scientists as a representative example of the genome, set of genes in one idealized individual organism of a species. As they are a ...
. Duplicated reads are identified and removed. # "Peaks", or regions in the DNA enriched in the number of reads mapped, are identified. # Quality control is performed, and cells with low numbers of reads or few detected features are filtered out. # Four count matrices (matrices where each column is a cell and each row is a feature) are generated: Tn5-dhs, Tn5-complement, TnH-dhs and TnH-complement, representing signal from accessible and compacted chromatin.


Analysis


= Dimension reduction, visualization and clustering

= Each of the matrices are filtered of shared regions and then normalized and log2 transformed. Linear dimension reduction is done using
principal component analysis Principal component analysis (PCA) is a linear dimensionality reduction technique with applications in exploratory data analysis, visualization and data preprocessing. The data is linearly transformed onto a new coordinate system such that th ...
(PCA). Groups of cells are identified using a k-NN algorithm and Leiden algorithm. Finally, the four matrices are combined using
matrix factorization In the mathematical discipline of linear algebra, a matrix decomposition or matrix factorization is a factorization of a matrix into a product of matrices. There are many different matrix decompositions; each finds use among a particular class of ...
and UMAP reduction.


= Cell identification annotation

= There are two approaches to cell identity annotation: Annotation based on feature annotation of ATAC peaks, and annotation based on integration with reference scRNA-seq data.


Applications


Current

By using the ratio of Tn5 to TnH signals, quantitative values describing how quickly and in what direction chromatin remodelling is taking place can be calculated (chromatin velocity). By isolating regions that are most dynamic and identifying which
transcription factor In molecular biology, a transcription factor (TF) (or sequence-specific DNA-binding factor) is a protein that controls the rate of transcription (genetics), transcription of genetics, genetic information from DNA to messenger RNA, by binding t ...
s bind there, chromatin velocity can be used to infer the dynamic
epigenetic In biology, epigenetics is the study of changes in gene expression that happen without changes to the DNA sequence. The Greek prefix ''epi-'' (ἐπι- "over, outside of, around") in ''epigenetics'' implies features that are "on top of" or "in ...
processes happening within a given cell and the contributions of various transcription factors to those processes.


Future

Chromatin remodelling precedes changes in gene expression and enhances the understanding of trajectories and mechanisms of cellular changes. Thus, platforms and tools for integration of multimodal data are areas of active research Incorporating temporal and directionality elements through integration of chromatin velocity with RNA velocity has been proposed to reveal even more information about differentiation pathways.


Limitations

scGET-seq has some of the same limitations as scATAC-seq. Both processes require nuclei samples from viable cells, and high cellular viability. Low cellular viability leads to high background DNA contamination that do not accurately represent authentic biological signals. Additionally, the sparsity and noisy nature of scATAC-seq and scGET-seq data makes analysis challenging, and there is no consensus yet on how to best manage this data Another limitation is that scGET-seq still needs the validation of SNVs results by bulk genome sequencing. Even though there is a high correlation of mutations between bulk exome sequencing and scGET-seq results, scGET-seq fails to capture all exome SNVs.


References

{{reflist Molecular biology techniques