HOME

TheInfoList



OR:

Essential genes are indispensable
gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ...
s for organisms to grow and reproduce offspring under certain environment. However, being ''essential'' is highly dependent on the circumstances in which an organism lives. For instance, a gene required to digest
starch Starch or amylum is a polymeric carbohydrate consisting of numerous glucose units joined by glycosidic bonds. This polysaccharide is produced by most green plants for energy storage. Worldwide, it is the most common carbohydrate in human d ...
is only essential if starch is the only source of energy. Recently, systematic attempts have been made to identify those genes that are absolutely required to maintain life, provided that all nutrients are available. Such experiments have led to the conclusion that the absolutely required number of genes for bacteria is on the order of about 250–300. Essential genes of single-celled organisms encode proteins for three basic functions including genetic information processing, cell envelopes and energy production. Those gene functions are used to maintain a central
metabolism Metabolism (, from el, μεταβολή ''metabolē'', "change") is the set of life-sustaining chemical reactions in organisms. The three main functions of metabolism are: the conversion of the energy in food to energy available to run ...
, replicate DNA, translate genes into proteins, maintain a basic cellular structure, and mediate transport processes into and out of the cell. Compared with single-celled organisms, multicellular organisms have more essential genes related to communication and development. Most of the essential genes in viruses are related to the processing and maintenance of genetic information. In contrast to most single-celled organisms, viruses lack many essential genes for metabolism, which forces them to hijack the host's metabolism. Most genes are not essential but convey selective advantages and increased fitness. Hence, the vast majority of genes are not essential and many can be deleted without consequences, at least under most circumstances.


Bacteria: genome-wide studies

Two main strategies have been employed to identify essential genes on a genome-wide basis: directed deletion of genes and random
mutagenesis Mutagenesis () is a process by which the genetic information of an organism is changed by the production of a mutation. It may occur spontaneously in nature, or as a result of exposure to mutagens. It can also be achieved experimentally using lab ...
using transposons. In the first case, annotated individual genes (or ORFs) are completely deleted from the
genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
in a systematic way. In transposon-mediated mutagenesis, transposons are randomly inserted in as many positions in a genome as possible, aiming to disrupt the function of the targeted genes (see figure below). Insertion mutants that are still able to survive or grow suggest the transposon inserted in a gene that is not essential for survival. The location of the transposon insertions can be determined through hybridization to microarrays or through transposon sequencing . With the development of
CRISPR CRISPR () (an acronym for clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. These sequences are derived from DNA fragments of bacte ...
, gene essentiality has also been determined through inhibition of gene expression through CRISPR interference. A summary of such screens is shown in the table. Table 1. Essential genes in bacteria. Mutagenesis: ''targeted'' mutants are gene deletions; ''random'' mutants are transposon insertions. Methods: ''Clones'' indicate single gene deletions, ''population'' indicates whole population mutagenesis, e.g. using transposons. Essential genes from population screens include genes essential for fitness (see text). ORFs: number of all
open reading frames In molecular biology, open reading frames (ORFs) are defined as spans of DNA sequence between the start and stop codons. Usually, this is considered within a studied region of a prokaryotic DNA sequence, where only one of the six possible readin ...
in that genome. Notes: (a) mutant collection available; (b) direct essentiality screening method (e.g. via antisense RNA) that does not provide information about nonessential genes. (c) Only partial dataset available. (d) Includes predicted gene essentiality and data compilation from published single-gene essentiality studies. (e) Project in progress. (f) Deduced by comparison of the two gene essentiality datasets obtained independently in the ''P. aeruginosa ''strains PA14 and PAO1. (g) The original result of 271 essential genes has been corrected to 261, with 31 genes that were thought to be essential being in fact non-essential whereas 20 novel essential genes have been described since then. (h) Counting genes with essential domains and those that lead to growth-defects when disrupted as essential, and those who lead to growth-advantage when disrupted as non-essential. (i) Involved a fully saturated mutant library of 14 replicates, with 84.3% of possible insertion sites with at least one transposon insertion. (j) Each essential gene has been independently confirmed at least five times. On the basis of genome-wide experimental studies and systems biology analysis, an essential gene database has been developed by Kong et al. (2019) for predicting > 4000 bacterial species.


Eukaryotes

In ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungus microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been ...
'' (budding yeast) 15-20% of all genes are essential. In '' Schizosaccharomyces pombe'' (fission yeast) 4,836 heterozygous deletions covering 98.4% of the 4,914 protein coding open reading frames have been constructed. 1,260 of these deletions turned out to be essential. Similar screens are more difficult to carry out in other multicellular organisms, including mammals (as a model for humans), due to technical reasons, and their results are less clear. However, various methods have been developed for the nematode worm '' C. elegans'', the fruit fly, and zebrafish (see table). A recent study of 900 mouse genes concluded that 42% of them were essential although the selected genes were not representative. Gene knockout experiments are not possible or at least not ethical in humans. However, natural mutations have led to the identification of mutations that lead to early embryonic or later death. Note that many genes in humans are not absolutely essential for survival but can cause severe disease when mutated. Such mutations are catalogued in the Online Mendelian Inheritance in Man (OMIM) database. In a computational analysis of genetic variation and mutations in 2,472 human orthologs of known essential genes in the mouse, Georgi et al. found strong, purifying selection and comparatively reduced levels of sequence variation, indicating that these human genes are essential too. While it may be difficult to prove that a gene is essential in humans, it can be demonstrated that a gene is not essential or not even causing disease. For instance, sequencing the genomes of 2,636 Icelandic citizens and the genotyping of 101,584 additional subjects found 8,041 individuals who had 1 gene completely knocked out (i.e. these people were homozygous for a non-functional gene). Of the 8,041 individuals with complete knock-outs, 6,885 were estimated to be
homozygotes Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. ...
, 1,249 were estimated to be compound
heterozygotes Zygosity (the noun, zygote, is from the Greek "yoked," from "yoke") () is the degree to which both copies of a chromosome or gene have the same genetic sequence. In other words, it is the degree of similarity of the alleles in an organism. Mo ...
(i.e. they had both
allele An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution. ::"The chro ...
s of a gene knocked out but the two alleles had different mutations). In these individuals, a total of 1,171 of the 19,135 human ( RefSeq) genes (6.1%) were completely knocked out. It was concluded that these 1,171 genes are ''non-essential'' in humans — at least no associated diseases were reported. Similarly, the exome sequences of 3222 British Pakistani-heritage adults with high parental relatedness revealed 1111 rare-variant homozygous genotypes with predicted loss of gene function (LOF = knockouts) in 781 genes. This study found an average of 140 predicted LOF genotypes (per subject), including 16 rare (minor
allele frequency Allele frequency, or gene frequency, is the relative frequency of an allele (variant of a gene) at a particular locus in a population, expressed as a fraction or percentage. Specifically, it is the fraction of all chromosomes in the population tha ...
<1%) heterozygotes, 0.34 rare homozygotes, 83.2 common heterozygotes and 40.6 common homozygotes. Nearly all rare homozygous LOF genotypes were found within autozygous segments (94.9%). Even though most of these individuals had no obvious health issue arising from their defective genes, it is possible that minor health issues may be found upon more detailed examination. A summary of essentiality screens is shown in the table below (mostly based on the Database of Essential Genes.


Viruses

Viruses lack many genes necessary for metabolism, forcing them to hijack the host's metabolism. Screens for essential genes have been carried out in a few viruses. For instance,
human cytomegalovirus ''Human betaherpesvirus 5'', also called human cytomegalovirus (HCMV), is species of virus in the genus ''Cytomegalovirus'', which in turn is a member of the viral family known as ''Herpesviridae'' or herpesviruses. It is also commonly called ...
(CMV) was found to have 41 essential, 88 nonessential, and 27 augmenting ORFs (150 total ORFs). Most essential and augmenting genes are located in the central region, and nonessential genes generally cluster near the ends of the viral genome. Tscharke and Dobson (2015) compiled a comprehensive survey of essential genes in Vaccinia Virus and assigned roles to each of the 223 ORFs of the Western Reserve (WR) strain and 207 ORFs of the Copenhagen strain, assessing their role in replication in cell culture. According to their definition, a gene is considered essential (i.e. has a role in cell culture) if its deletion results in a decrease in virus titre of greater than 10-fold in either a single or multiple step growth curve. All genes involved in wrapped virion production, actin tail formation, and extracellular virion release were also considered as essential. Genes that influence plaque size, but not replication were defined as non-essential. By this definition 93 genes are required for Vaccinia Virus replication in cell culture, while 108 and 94 ORFs, from WR and Copenhagen respectively, are non-essential. Vaccinia viruses with deletions at either end of the genome behaved as expected, exhibiting only mild or host range defects. In contrast, combining deletions at both ends of the genome for VACV strain WR caused a devastating growth defect on all cell lines tested. This demonstrates that single gene deletions are not sufficient to assess the essentiality of genes and that more genes are essential in Vaccinia virus than originally thought. One of the bacteriophages screened for essential genes includes mycobacteriophage Giles. At least 35 of the 78 predicted Giles genes (45%) are non-essential for lytic growth. 20 genes were found to be essential. A major problem with phage genes is that a majority of their genes remain functionally unknown, hence their role is difficult to assess. A screen of ''
Salmonella enterica ''Salmonella enterica'' (formerly ''Salmonella choleraesuis'') is a rod-headed, flagellate, facultative anaerobic, Gram-negative bacterium and a species of the genus '' Salmonella''. A number of its serovars are serious human pathogens. Epi ...
'' phage SPN3US revealed 13 essential genes although it remains a bit obscure how many genes were really tested.


Quantitative gene essentiality analysis

In theory, essential genes are qualitative. However, depending on the surrounding environment, certain essential gene mutants may show partial functions, which can be quantitatively determined in some studies. For instance, a particular gene deletion may reduce growth rate (or fertility rate or other characters) to 90% of the wild-type. If there are isozymes or alternative pathways for the essential genes, they can be deleted completely. Using CRISPR interference, the expression of essential genes can be modulated or "tuned", leading to quantitative (or continuous) relationships between the level of gene-expression and the magnitude of fitness cost exhibited by a given mutant.


Synthetic lethality

Two genes are synthetic lethal if neither one is essential but when both are mutated the double-mutant is lethal. Some studies have estimated that the number of synthetic lethal genes may be on the order of 45% of all genes.


Conditionally essential genes

Many genes are essential only under certain circumstances. For instance, if the amino acid
lysine Lysine (symbol Lys or K) is an α-amino acid that is a precursor to many proteins. It contains an α-amino group (which is in the protonated form under biological conditions), an α-carboxylic acid group (which is in the deprotonated −CO ...
is supplied to a cell any gene that is required to make lysine is non-essential. However, when there is no lysine supplied, genes encoding enzymes for lysine biosynthesis become essential, as no protein synthesis is possible without lysine. ''Streptococcus pneumoniae'' appears to require 147 genes for growth and survival in
saliva Saliva (commonly referred to as spit) is an extracellular fluid produced and secreted by salivary glands in the mouth. In humans, saliva is around 99% water, plus electrolytes, mucus, white blood cells, epithelial cells (from which DNA can be ...
, more than the 113-133 that have been found in previous studies. The deletion of a gene may result in death or in a block of cell division. While the latter case may implicate "survival" for some time, without cell division the cell may still die eventually. Similarly, instead of blocked cell division a cell may have reduced growth or metabolism ranging from nearly undetectable to almost normal. Thus, there is gradient from "essential" to completely non-essential, again depending on the condition. Some authors have thus distinguished between genes "essential for survival" and "essential for fitness". The role of genetic background. Similar to environmental conditions, the genetic background can determine the essentiality of a gene: a gene may be essential in one individual but not another, given his or her genetic background. Gene duplications are one possible explanation (see below). Metabolic dependency. Genes involved in certain biosynthetic pathways, such as amino acid synthesis, can become non-essential if one or more amino acids are supplied by culture medium or by another organism. This is the main reason why many parasites (e.g. '' Cryptosporidium hominis'') or endosymbiontic bacteria lost many genes (e.g. ''
Chlamydia Chlamydia, or more specifically a chlamydia infection, is a sexually transmitted infection caused by the bacterium ''Chlamydia trachomatis''. Most people who are infected have no symptoms. When symptoms do appear they may occur only several wee ...
''). Such genes may be essential but only present in the host organism. For instance, ''
Chlamydia trachomatis ''Chlamydia trachomatis'' (), commonly known as chlamydia, is a bacterium that causes chlamydia, which can manifest in various ways, including: trachoma, lymphogranuloma venereum, nongonococcal urethritis, cervicitis, salpingitis, pelvic inflamma ...
'' cannot synthesize
purine Purine is a heterocyclic aromatic organic compound that consists of two rings (pyrimidine and imidazole) fused together. It is water-soluble. Purine also gives its name to the wider class of molecules, purines, which include substituted purin ...
and
pyrimidine Pyrimidine (; ) is an aromatic, heterocyclic, organic compound similar to pyridine (). One of the three diazines (six-membered heterocyclics with two nitrogen atoms in the ring), it has nitrogen atoms at positions 1 and 3 in the ring. The othe ...
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecule ...
s '' de novo'', so these bacteria are dependent on the nucleotide biosynthetic genes of the host. Another kind of metabolic dependency, unrelated to cross-species interactions, can be found when bacteria are grown under specific nutrient conditions. For example, more than 100 genes become essential when ''Escherichia coli'' is grown on nutrient-limited media. Specifically, isocitrate dehydrogenase (icd) and
citrate synthase The enzyme citrate synthase E.C. 2.3.3.1 (previously 4.1.3.7)] exists in nearly all living cells and stands as a pace-making enzyme in the first step of the citric acid cycle (or Krebs cycle). Citrate synthase is localized within eukaryotic cel ...
(gltA) are two enzymes that are part of the Citric acid cycle, tricarboxylic acid (TCA) cycle. Both genes are essential in M9 minimal media (which provides only the most basic nutrients). However, when the media is supplementing with 2-oxoglutarate or
glutamate Glutamic acid (symbol Glu or E; the ionic form is known as glutamate) is an α-amino acid that is used by almost all living beings in the biosynthesis of proteins. It is a non-essential nutrient for humans, meaning that the human body can synt ...
, these genes are not essential any more.


Gene duplications and alternative metabolic pathways

Many genes are duplicated within a genome and many organisms have different metabolic pathways (alternative metabolic pathway) to synthesis same products. Such duplications (
paralog Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spe ...
s) and alternative metabolic pathways often render essential genes non-essential because the duplicate can replace the original copy. For instance, the gene encoding the enzyme aspartokinase is essential in ''E. coli''. By contrast, the ''
Bacillus subtilis ''Bacillus subtilis'', known also as the hay bacillus or grass bacillus, is a Gram-positive, catalase-positive bacterium, found in soil and the gastrointestinal tract of ruminants, humans and marine sponges. As a member of the genus '' Bacillus ...
'' genome contains three copies of this gene, none of which is essential on its own. However, a triple-deletion of all three genes is lethal. In such cases, the essentiality of a gene or a group of paralogs can often be predicted based on the essentiality of an essential single gene in a different species. In yeast, few of the essential genes are duplicated within the genome: 8.5% of the non-essential genes, but only 1% of the essential genes have a homologue in the yeast genome. In the worm '' C. elegans'', non-essential genes are highly over-represented among duplicates, possibly because duplication of essential genes causes overexpression of these genes. Woods et al. found that non-essential genes are more often successfully duplicated (fixed) and lost compared to essential genes. By contrast, essential genes are less often duplicated but upon successful duplication are maintained over longer periods.


Conservation

In bacteria, essential genes appear to be more conserved than nonessential genes but the correlation is not very strong. For instance, only 34% of the ''B. subtilis'' essential genes have reliable orthologs in all
Bacillota The Bacillota (synonym Firmicutes) are a phylum of bacteria, most of which have gram-positive cell wall structure. The renaming of phyla such as Firmicutes in 2021 remains controversial among microbiologists, many of whom continue to use the earl ...
and 61% of the ''E. coli'' essential genes have reliable orthologs in all Gamma-proteobacteria. Fang et al. (2005) defined persistent genes as the genes present in more than 85% of the genomes of the clade. They found 475 and 611 of such genes for ''B. subtilis'' and ''E. coli'', respectively. Furthermore, they classified genes into five classes according to persistence and essentiality: persistent genes, essential genes, persistent nonessential (PNE) genes (276 in ''B. subtilis'', 409 in ''E. coli''), essential nonpersistent (ENP) genes (73 in ''B. subtilis'', 33 in ''E. coli''), and nonpersistent nonessential (NPNE) genes (3,558 in ''B. subtilis'', 3,525 in ''E. coli''). Fang et al. found 257 persistent genes, which exist both in ''B. subtilis'' (for the Bacillota) and ''E. coli'' (for the Gamma-proteobacteria). Among these, 144 (respectively 139) were previously identified as essential in ''B. subtilis'' (respectively ''E. coli'') and 25 (respectively 18) of the 257 genes are not present in the 475 ''B. subtilis'' (respectively 611 ''E. coli)'' persistent genes. All the other members of the pool are PNE genes. In eukaryotes, 83% of the one-to-one orthologs between '' Schizosaccharomyces pombe'' and ''
Saccharomyces cerevisiae ''Saccharomyces cerevisiae'' () (brewer's yeast or baker's yeast) is a species of yeast (single-celled fungus microorganisms). The species has been instrumental in winemaking, baking, and brewing since ancient times. It is believed to have been ...
'' have conserved essentiality, that is, they are nonessential in both species or essential in both species. The remaining 17% of genes are nonessential in one species and essential in the other. This is quite remarkable, given that ''S. pombe'' is separated from ''S. cerevisiae'' by approximately 400 million years of evolution. In general, highly conserved and thus older genes (i.e. genes with earlier phylogenetic origin) are more likely to be essential than younger genes - even if they have been duplicated.


Study

The experimental study of essential genes is limited by the fact that, by definition, inactivation of an essential gene is lethal to the organism. Therefore, they cannot be simply deleted or mutated to analyze the resulting
phenotype In genetics, the phenotype () is the set of observable characteristics or traits of an organism. The term covers the organism's morphology or physical form and structure, its developmental processes, its biochemical and physiological prop ...
s (a common technique in
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
). There are, however, some circumstances in which essential genes can be manipulated. In
diploid Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Sets of chromosomes refer to the number of maternal and paternal chromosome copies, respectiv ...
organisms, only a single functional copy of some essential genes may be needed ( haplosufficiency), with the heterozygote displaying an instructive phenotype. Some essential genes can tolerate mutations that are deleterious, but not wholly lethal, since they do not completely abolish the gene's function. Computational analysis can reveal many properties of proteins without analyzing them experimentally, e.g. by looking at homologous proteins, function, structure etc. (see also below, ''Predicting essential genes''). The products of essential genes can also be studied when expressed in other organisms, or when purified and studied ''in vitro''. Conditionally essential genes are easier to study. Temperature-sensitive variants of essential genes have been identified which encode products that lose function at high temperatures, and so only show a phenotype at increased temperature.


Reproducibility

If screens for essential genes are repeated in independent laboratories, they often result in different gene lists. For instance, screens in ''E. coli'' have yielded from ~300 to ~600 essential genes (see Table 1). Such differences are even more pronounced when different bacterial strains are used (see Figure 2). A common explanation is that the experimental conditions are different or that the nature of the mutation may be different (e.g. a complete gene deletion vs. a transposon mutant). Transposon screens in particular are hard to reproduce, given that a transposon can insert at many positions within a gene. Insertions towards the 3' end of an essential gene may not have a lethal phenotype (or no phenotype at all) and thus may not be recognized as such. This can lead to erroneous annotations (here: false negatives). Comparison of
CRISPR CRISPR () (an acronym for clustered regularly interspaced short palindromic repeats) is a family of DNA sequences found in the genomes of prokaryotic organisms such as bacteria and archaea. These sequences are derived from DNA fragments of bacte ...
/cas9 and
RNAi RNA interference (RNAi) is a biological process in which RNA molecules are involved in sequence-specific suppression of gene expression by double-stranded RNA, through translational or transcriptional repression. Historically, RNAi was known by o ...
screens. Screens to identify essential genes in the human
chronic myelogenous leukemia Chronic myelogenous leukemia (CML), also known as chronic myeloid leukemia, is a cancer of the white blood cells. It is a form of leukemia characterized by the increased and unregulated growth of myeloid cells in the bone marrow and the accumulati ...
cell line K562 with these two methods showed only limited overlap. At a 10% false positive rate there were ~4,500 genes identified in the Cas9 screen versus ~3,100 in the shRNA screen, with only ~1,200 genes identified in both.


Different essential genes in different organisms

Different organisms may have different essential genes. For instance, ''
Bacillus subtilis ''Bacillus subtilis'', known also as the hay bacillus or grass bacillus, is a Gram-positive, catalase-positive bacterium, found in soil and the gastrointestinal tract of ruminants, humans and marine sponges. As a member of the genus '' Bacillus ...
'' has 271 essential genes. About one-half (150) of the
orthologous Sequence homology is the biological homology between DNA, RNA, or protein sequences, defined in terms of shared ancestry in the evolutionary history of life. Two segments of DNA can have shared ancestry because of three phenomena: either a spec ...
genes in ''E. coli'' are also essential. Another 67 genes that are essential in ''E. coli'' are not essential in ''B. subtilis'', while 86 ''E. coli'' essential genes have no ''B. subtilis'' ortholog. In ''
Mycoplasma genitalium ''Mycoplasma genitalium'' (''MG'', commonly known as Mgen) is a sexually transmitted, small and pathogenic bacterium that lives on the mucous epithelial cells of the urinary and genital tracts in humans. Medical reports published in 2007 and 20 ...
'' at least 18 genes are essential that are not essential in ''M. bovis.'' Many of these different essential genes are caused by paralogs or alternative metabolic pathways. Such different essential genes in bacteria can be used to develop targeted antibacterial therapies against certain specific pathogens to reduce
antibiotic resistance Antimicrobial resistance (AMR) occurs when microbes evolve mechanisms that protect them from the effects of antimicrobials. All classes of microbes can evolve resistance. Fungi evolve antifungal resistance. Viruses evolve antiviral resistance. ...
in the microbiome era. Stone et al (2015) have used the difference in essential genes in bacteria to develop selective drugs against the oral pathogen ''
Porphyromonas gingivalis ''Porphyromonas gingivalis'' belongs to the phylum ''Bacteroidota'' and is a nonmotile, Gram-negative, rod-shaped, anaerobic, pathogenic bacterium. It forms black colonies on blood agar. It is found in the oral cavity, where it is implicated ...
'', rather than the beneficial bacteria ''Streptococcus sanguis''.


Prediction

Essential genes can be predicted computationally. However, most methods use experimental data ("training sets") to some extent. Chen et al. determined four criteria to select training sets for such predictions: (1) essential genes in the selected training set should be reliable; (2) the growth conditions in which essential genes are defined should be consistent in training and prediction sets; (3) species used as training set should be closely related to the target organism; and (4) organisms used as training and prediction sets should exhibit similar phenotypes or lifestyles. They also found that the size of the training set should be at least 10% of the total genes to yield accurate predictions. Some approaches for predicting essential genes are:
Comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural la ...
. Shortly after the first genomes (of ''
Haemophilus influenzae ''Haemophilus influenzae'' (formerly called Pfeiffer's bacillus or ''Bacillus influenzae'') is a Gram-negative, non-motile, coccobacillary, facultatively anaerobic, capnophilic pathogenic bacterium of the family Pasteurellaceae. The bacte ...
'' and ''
Mycoplasma genitalium ''Mycoplasma genitalium'' (''MG'', commonly known as Mgen) is a sexually transmitted, small and pathogenic bacterium that lives on the mucous epithelial cells of the urinary and genital tracts in humans. Medical reports published in 2007 and 20 ...
'') became available, Mushegian et al. tried to predict the number of essential genes based on common genes in these two species. It was surmised that only essential genes should be conserved over the long evolutionary distance that separated the two bacteria. This study identified approximately 250 candidate essential genes. As more genomes became available the number of predicted essential genes kept shrinking because more genomes shared fewer and fewer genes. As a consequence, it was concluded that the universal conserved core consists of less than 40 genes. However, this set of conserved genes is not identical to the set of essential genes as different species rely on different essential genes. A similar approach has been used to infer essential genes from the pan-genome of ''
Brucella ''Brucella'' is a genus of Gram-negative bacteria, named after David Bruce (1855–1931). They are small (0.5 to 0.7 by 0.6 to 1.5 µm), non encapsulated, nonmotile, facultatively intracellular coccobacilli. ''Brucella'' spp. are the caus ...
'' species. 42 complete ''Brucella'' genomes and a total of 132,143 protein-coding genes were used to predict 1252 potential essential genes, derived from the core genome by comparison with a prokaryote database of essential genes. Network analysis. After the first protein interaction networks of
yeast Yeasts are eukaryotic, single-celled microorganisms classified as members of the fungus kingdom. The first yeast originated hundreds of millions of years ago, and at least 1,500 species are currently recognized. They are estimated to constit ...
had been published, it was found that highly connected proteins (e.g. by protein-protein interactions) are more likely to be essential. However, highly connected proteins may be experimental artifacts and high connectivity may rather represent
pleiotropy Pleiotropy (from Greek , 'more', and , 'way') occurs when one gene influences two or more seemingly unrelated phenotypic traits. Such a gene that exhibits multiple phenotypic expression is called a pleiotropic gene. Mutation in a pleiotropic ...
instead of essentiality. Nevertheless, network methods have been improved by adding other criteria and therefore do have some value in predicting essential genes. Machine Learning. Hua et al. used
Machine Learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machin ...
to predict essential genes in 25 bacterial species''.'' Hurst index. Liu et al. (2015) used the Hurst exponent, a characteristic parameter to describe long-range correlation in DNA to predict essential genes. In 31 out of 33 bacterial genomes the significance levels of the Hurst exponents of the essential genes were significantly higher than for the corresponding full-gene-set, whereas the significance levels of the Hurst exponents of the nonessential genes remained unchanged or increased only slightly. Minimal genomes. It was also thought that essential genes could be inferred from minimal genomes which supposedly contain only essential genes. The problem here is that the smallest genomes belong to parasitic (or symbiontic) species which can survive with a reduced gene set as they obtain many nutrients from their hosts. For instance, one of the smallest genomes is that of '' Hodgkinia cicadicola'', a
symbiont Symbiosis (from Greek , , "living together", from , , "together", and , bíōsis, "living") is any type of a close and long-term biological interaction between two different biological organisms, be it mutualistic, commensalistic, or parasit ...
of cicadas, containing only 144 Kb of DNA encoding only 188 genes. Like other symbionts, ''Hodgkinia'' receives many of its nutrients from its host, so its genes do not need to be essential. Metabolic modelling. Essential genes may be also predicted in completely sequenced genomes by metabolic reconstruction, that is, by reconstructing the complete metabolism from the gene content and then identifying those genes and pathways that have been found to be essential in other species. However, this method can be compromised by proteins of unknown function. In addition, many organisms have backup or alternative pathways which have to be taken into account (see figure 1). Metabolic modeling was also used by Basler (2015) to develop a method to predict essential metabolic genes.
Flux balance analysis Flux balance analysis (FBA) is a mathematical method for simulating metabolism in genome-scale reconstructions of metabolic networks. In comparison to traditional methods of modeling, FBA is less intensive in terms of the input data required for c ...
, a method of metabolic modeling, has recently been used to predict essential genes in clear cell renal cell carcinoma metabolism. Genes of unknown function. Surprisingly, a significant number of essential genes has no known function. For instance, among the 385 essential candidates in ''M. genitalium'', no function could be ascribed to 95 genes even though this number had been reduced to 75 by 2011. Most of unknown functionally essential genes have potential biological functions related to one of the three fundamental functions. ZUPLS. Song et al. presented a novel method to predict essential genes that only uses the Z-curve and other sequence-based features. Such features can be calculated readily from the DNA/amino acid sequences. However, the reliability of this method remains a bit obscure. Essential gene prediction servers. Guo et al. (2015) have developed three online services to predict essential genes in bacterial genomes. These freely available tools are applicable for single gene sequences without annotated functions, single genes with definite names, and complete genomes of bacterial strains. Kong et al. (2019) have developed th
ePath
database, which can be used to search > 4000 bacterial species for predicting essential genes.


Essential protein domains

Although most essential genes encode proteins, many essential proteins consist of a single domain. This fact has been used to identify essential protein domains. Goodacre et al. have identified hundreds of essential
domains of unknown function A domain of unknown function (DUF) is a protein domain that has no characterised function. These families have been collected together in the Pfam database using the prefix DUF followed by a number, with examples being DUF2992 and DUF1220. As of 2 ...
(eDUFs). Lu et al. presented a similar approach and identified 3,450 domains that are essential in at least one microbial species.


See also

*
Essential amino acid An essential amino acid, or indispensable amino acid, is an amino acid that cannot be synthesized from scratch by the organism fast enough to supply its demand, and must therefore come from the diet. Of the 21 amino acids common to all life form ...
* Essential proteins in protein complexes *
Gene In biology, the word gene (from , ; "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ...
*
Genome In the fields of molecular biology and genetics, a genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding ...
* Minimal genome *
Mutation In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...


References


Further reading

* * * {{refend


External links


Database of Essential Genes

OGEE: Online Essentiality Database

EGGS (Essential Genes on Genome Scale) database

ePath (Essential genes in pathway) database

Essential genes in E. coli (EcoliWiki

Essential genes in E. coli (Ecogene)
* Benjamin Lewin'
Essential Genes
(textbook), Pearson/Prentice-Hall. Genomics