Perturb-seq
   HOME

TheInfoList



OR:

Perturb-seq (also known as CRISP-seq and CROP-seq) refers to a high-throughput method of performing single cell RNA sequencing (scRNA-seq) on pooled genetic perturbation screens. Perturb-seq combines multiplexed
CRISPR CRISPR () (an acronym for clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. These sequences are derived from DNA fragments of bacte ...
mediated gene inactivations with single cell RNA sequencing to assess comprehensive
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
phenotypes In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
for each perturbation. Inferring a gene’s function by applying genetic perturbations to knock down or knock out a gene and studying the resulting phenotype is known as reverse genetics. Perturb-seq is a reverse genetics approach that allows for the investigation of
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological proper ...
s at the level of the
transcriptome The transcriptome is the set of all RNA transcripts, including coding and non-coding, in an individual or a population of cells. The term can also sometimes be used to refer to all RNAs, or just mRNA, depending on the particular experiment. The t ...
, to elucidate gene functions in many cells, in a massively parallel fashion. The Perturb-seq protocol uses
CRISPR CRISPR () (an acronym for clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. These sequences are derived from DNA fragments of bacte ...
technology to inactivate specific genes and DNA barcoding of each guide RNA to allow for all perturbations to be pooled together and later deconvoluted, with assignment of each phenotype to a specific
guide RNA A guide RNA (gRNA) is a piece of RNA that functions as a guide for RNA- or DNA-targeting enzymes, with which it forms complexes. Very often these enzymes will delete, insert or otherwise alter the targeted RNA or DNA. They occur naturally, serv ...
. Droplet-based microfluidics platforms (or other cell sorting and separating techniques) are used to isolate individual cells, and then scRNA-seq is performed to generate
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
profiles for each cell. Upon completion of the protocol,
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
analyses are conducted to associate each specific cell and perturbation with a transcriptomic profile that characterizes the consequences of inactivating each gene.


History

In the December 2016 issue of the Cell journal, two companion papers were published that each introduced and described this technique. A third paper describing a conceptually similar approach (termed CRISP-seq) was also published in the same issue. In October 2016, the CROP-seq method for single-cell CRISPR screening was presented in a preprint on bioRxiv and later published in the Nature Methods journal. While each paper shared the core principles of combining CRISPR mediated perturbation with scRNA-seq, their experimental, technological and analytical approaches differed in several aspects, to explore distinct biological questions, demonstrating the broad utility of this methodology. For example, the CRISPR-seq paper demonstrated the feasibility of ''in vivo'' studies using this technology, and the CROP-seq protocol facilitates large screens by providing a vector that makes the guide RNA itself readable (rather than relying on expressed barcodes), which allows for single-step guide RNA cloning. A June 2022 paper in ''Cell'' published results from one of the first genome-scale Perturb-seq screens, which uncovered new perturbations that promote chromosomal instability as well as variations in the expression of mitochondrially encoded transcripts in response to different forms of mitochondrial stress.


Experimental workflow


CRISPR Single Guide RNA Library design and selection

Pooled CRISPR libraries that enable gene inactivation can come in the form of either knockout or interference. Knockout libraries perturb genes through double stranded breaks that prompt the error prone
non-homologous end joining Non-homologous end joining (NHEJ) is a pathway that repairs double-strand breaks in DNA. NHEJ is referred to as "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology direct ...
repair pathway to introduce disruptive insertions or deletions.
CRISPR interference CRISPR interference (CRISPRi) is a genetic perturbation technique that allows for sequence-specific repression of gene expression in prokaryotic and eukaryotic cells. It was first developed by Stanley Qi and colleagues in the laboratories of Wen ...
(CRISPRi) on the other hand utilizes a catalytically inactive
nuclease A nuclease (also archaically known as nucleodepolymerase or polynucleotidase) is an enzyme capable of cleaving the phosphodiester bonds between nucleotides of nucleic acids. Nucleases variously effect single and double stranded breaks in their ta ...
to physically block
RNA polymerase In molecular biology, RNA polymerase (abbreviated RNAP or RNApol), or more specifically DNA-directed/dependent RNA polymerase (DdRP), is an enzyme that synthesizes RNA from a DNA template. Using the enzyme helicase, RNAP locally opens the ...
, effectively preventing or halting
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
. Perturb-seq has been utilized with both the knockout and CRISPRi approaches in the Dixit et al. paper and the Adamson et al. paper, respectively. Pooling all guide RNAs into a single screen relies on DNA barcodes that act as identifiers for each unique guide RNA. There are several commercially available pooled CRISPR libraries including the guide barcode library used in the study by Adamson et al. CRISPR libraries can also be custom made using tools for sgRNA design, many of which are listed on the CRISPR/cas9 tools Wikipedia page.


Lentiviral vectors

The sgRNA expression vector design will depend largely on the experiment performed but requires the following central components: # Promoter #
Restriction sites Restriction sites, or restriction recognition sites, are located on a DNA molecule containing specific (4-8 base pairs in length) sequences of nucleotides, which are recognized by restriction enzymes. These are generally palindromic sequences (beca ...
#
Primer Primer may refer to: Arts, entertainment, and media Films * ''Primer'' (film), a 2004 feature film written and directed by Shane Carruth * ''Primer'' (video), a documentary about the funk band Living Colour Literature * Primer (textbook), a t ...
Binding Sites # sgRNA # Guide Barcode # Reporter gene: #* Fluorescent gene: vectors are often constructed to include a gene encoding a fluorescent protein, such that successfully transduced cells can be visually and quantitatively assessed by their expression. #*
Antibiotic resistance Antimicrobial resistance (AMR) occurs when microbes evolve mechanisms that protect them from the effects of antimicrobials. All classes of microbes can evolve resistance. Fungi evolve antifungal resistance. Viruses evolve antiviral resistance. ...
gene: similar to fluorescent markers, antibiotic resistance genes are often incorporated into vectors to allow for selection of successfully transduced cells. # CRISPR-associated endonuclease:
Cas9 Cas9 (CRISPR associated protein 9, formerly called Cas5, Csn1, or Csx12) is a 160 kilodalton protein which plays a vital role in the immunological defense of certain bacteria against DNA viruses and plasmids, and is heavily utilized in genetic e ...
or other CRISPR-associated endonucleases such as Cpf1 must be introduced to cells that do not endogenously express them. Due to the large size of these genes, a two-vector system can be used to express the endonuclease separately from the sgRNA expression vector.


Transduction and selection

Cells are typically transduced with a Multiplicity of Infection (MOI) of 0.4 to 0.6 lentiviral particles per cell to maximize the likelihood of obtaining the most cells which contain a single guide RNA. If the effects of simultaneous perturbations are of interest, a higher MOI may be applied to increase the amount of transduced cells with more than one guide RNA. Selection for successfully transduced cells is then performed using a fluorescence assay or an antibiotic assay, depending on the reporter gene used in the expression vector.


Single-cell library preparation

After successfully transduced cells have been selected for, isolation of single cells is needed to conduct scRNA-seq. Perturb-seq and CROP-seq have been performed using droplet-based technology for single cell isolation, while the closely related CRISP-seq was performed with a microwell-based approach. Once cells have been isolated at the single cell level,
reverse transcription A reverse transcriptase (RT) is an enzyme used to generate complementary DNA (cDNA) from an RNA template, a process termed reverse transcription. Reverse transcriptases are used by viruses such as HIV and hepatitis B to replicate their genomes, ...
, amplification and sequencing takes place to produce gene expression profiles for each cell. Many scRNA-seq approaches incorporate
unique molecular identifiers Unique molecular identifiers (UMIs), or molecular barcodes (MBC) are short sequences or molecular "tags" added to DNA fragments in some next generation sequencing library preparation protocols to identify the input DNA molecule. These tags are added ...
(UMIs) and cell barcodes during the reverse transcription step to index individual RNA molecules and cells, respectively. These additional barcodes serve to help quantify RNA transcripts and to associate each of the sequences with their cell of origin.


Bioinformatics analysis

Read alignment and processing are performed to map quality reads to a reference genome. Deconvolution of cell barcodes, guide barcodes and UMIs enables the association of guide RNAs with the cells that contain them, thus allowing the gene expression profile of each cell to be affiliated with a particular perturbation. Further downstream analyses on the transcriptional profiles will depend entirely on the biological question of interest. T-distributed Stochastic Neighbor Embedding (t-SNE) is a commonly used
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithm to visualize the high-dimensional data that results from scRNA-seq in a 2-dimensional scatterplot. The authors who first performed Perturb-seq developed an in-house computational framework called MIMOSCA that predicts the effects of each perturbation using a linear model and is available on an open software repository.


Advantages and limitations

Perturb-seq makes use of current technologies in molecular biology to integrate a multi-step workflow that couples high-throughput screening with complex phenotypic outputs. When compared to alternative methods used for gene knockdowns or knockouts, such as RNAi,
zinc finger nuclease Zinc-finger nucleases (ZFNs) are artificial restriction enzymes generated by fusing a zinc finger DNA-binding domain to a DNA-cleavage domain. Zinc finger domains can be engineered to target specific desired DNA sequences and this enables zinc ...
s or
transcription activator-like effector nuclease Transcription activator-like effector nucleases (TALEN) are restriction enzymes that can be engineered to cut specific sequences of DNA. They are made by fusing a TAL effector DNA-binding domain to a DNA cleavage domain (a nuclease which cuts DNA ...
s (TALENs), the application of CRISPR-based perturbations enables more specificity, efficiency and ease of use. Another advantage of this protocol is that while most screening approaches can only assay for simple phenotypes, such as cellular viability, scRNA-seq allows for a much richer phenotypic readout, with quantitative measurements of gene expression in many cells simultaneously. Perturb-seq can therefore combine the high throughput of
forward genetics Forward genetics is a molecular genetics approach of determining the genetic basis responsible for a phenotype. Forward genetics provides an unbiased approach because it relies heavily on identifying the genes or genetic factors that cause a partic ...
, in terms of the number of genetic perturbations, with the rich phenotype dimension of reverse genetics. However, while a large and comprehensive amount of data can be a benefit, it can also present a major challenge. Single cell RNA expression readouts are known to produce ‘noisy’ data, with a significant number of false positives. Both the large size and noise that is associated with scRNA-seq will likely require new and powerful computational methods and bioinformatics pipelines to better make sense of the resulting data. Another challenge associated with this protocol is the creation of large scale CRISPR libraries. The preparation of these extensive libraries depends upon a comparative increase in the resources required to culture the massive numbers of cells that are needed to achieve a successful screen of many perturbations. In parallel to these single-cell methods, other approaches have been developed to reconstruct genetic pathways using whole-organism RNA-sequencing. These methods use a single aggregate statistic, called the transcriptome-wide epistasis coefficient, to guide pathway reconstruction. In contrast with the statistical framework of the methods described above, this coefficient may be more robust to noise and is intuitively interpretable in terms of Batesonian epistasis. This approach was used to identify a new state in the life cycle of the nematode ''C. elegans''.


Applications

Perturb-seq or other conceptually similar protocols can be used to address a broad scope of biological questions and the applications of this technology will likely grow over time. Three papers on this topic, published in the December 2016 issue of the Journal Cell, demonstrated the utility of this method by applying it to the investigation of several distinct biological functions. In the paper, “Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens”, the authors used Perturb-seq to conduct knockouts of transcription factors related to the
immune response An immune response is a reaction which occurs within an organism for the purpose of defending against foreign invaders. These invaders include a wide variety of different microorganisms including viruses, bacteria, parasites, and fungi which could ...
in hundreds of thousands of cells to investigate the cellular consequences of their inactivation. They also explored the effects of transcription factors on cell states in the context of the
cell cycle The cell cycle, or cell-division cycle, is the series of events that take place in a cell that cause it to divide into two daughter cells. These events include the duplication of its DNA (DNA replication) and some of its organelles, and subs ...
. In the study led by
UCSF The University of California, San Francisco (UCSF) is a public land-grant research university in San Francisco, California. It is part of the University of California system and is dedicated entirely to health science and life science. It condu ...
, “A Multiplexed Single-Cell CRISPR Screening Platform Enables Systematic Dissection of the Unfolded Protein Response” the researchers suppressed multiple genes in each cell to study the unfolded protein response (UPR) pathway. With a similar methodology, but using the term CRISP-seq instead of Perturb-seq, the paper "Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq" performed a proof of concept experiment by using the technique to probe regulatory pathways related to innate immunity in mice. Lethality of each perturbation and
epistasis Epistasis is a phenomenon in genetics in which the effect of a gene mutation is dependent on the presence or absence of mutations in one or more other genes, respectively termed modifier genes. In other words, the effect of the mutation is dep ...
analyses in cells with multiple perturbations was also investigated in these papers. Perturb-seq has so far been used with very few perturbations per experiment, but it can theoretically be scaled up to address the whole genome. Finally, the October 2016 preprint and subsequent paper demonstrate the bioinformatic reconstruction of the T cell receptor signaling pathway in
Jurkat Jurkat cells are an immortalized line of human T lymphocyte cells that are used to study acute T cell leukemia, T cell signaling, and the expression of various chemokine receptors susceptible to viral entry, particularly HIV. Jurkat cells can prod ...
cells based on CROP-seq data. While these publications used these protocols for answering complex biological questions, this technology can also be used as a validation assay to ensure the specificity of any CRISPR based knockdown or knockout; the expression levels of the target genes as well as others can be measured with single cell resolution in parallel, to detect whether the perturbation was successful and to assess the experiment for off target effects. Furthermore, these protocols make it possible to perform perturbation screens in heterogeneous tissues, while obtaining cell type specific gene expression responses.


References

{{Reflist, 30em, refs= RNA sequencing Genomics Bioinformatics Molecular biology techniques