In
biology
Biology is the scientific study of life. It is a natural science with a broad scope but has several unifying themes that tie it together as a single, coherent field. For instance, all organisms are made up of cells that process hereditar ...
, the word gene (from , ;
[ "... Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..."] meaning ''generation''
or ''birth''
[ or ''gender'') can have several different meanings. The Mendelian gene is a basic unit of ]heredity
Heredity, also called inheritance or biological inheritance, is the passing on of traits from parents to their offspring; either through asexual reproduction or sexual reproduction, the offspring cells or organisms acquire the genetic inform ...
and the molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protein-coding genes and noncoding genes.
During gene expression
Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
, the DNA is first copied into RNA. The RNA can be directly functional or be the intermediate template for a protein that performs a function. The transmission of genes to an organism's offspring is the basis of the inheritance of phenotypic trait
A phenotypic trait, simply trait, or character state is a distinct variant of a phenotypic characteristic of an organism; it may be either inherited or determined environmentally, but typically occurs as a combination of the two.Lawrence, Eleano ...
s. These genes make up different DNA sequences called genotype
The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
s. Genotypes along with environmental and developmental factors determine what the phenotypes will be. Most biological traits are under the influence of polygenes (many different genes) as well as gene–environment interactions. Some genetic traits are instantly visible, such as eye color or the number of limbs, and some are not, such as blood type
A blood type (also known as a blood group) is a classification of blood, based on the presence and absence of antibodies and inherited antigenic substances on the surface of red blood cells (RBCs). These antigens may be proteins, carbohydrate ...
, the risk for specific diseases, or the thousands of basic biochemical
Biochemistry or biological chemistry is the study of chemical processes within and relating to living organisms. A sub-discipline of both chemistry and biology, biochemistry may be divided into three fields: structural biology, enzymology an ...
processes that constitute life.
Genes can acquire mutations in their sequence, leading to different variants, known as allele
An allele (, ; ; modern formation from Greek ἄλλος ''állos'', "other") is a variation of the same sequence of nucleotides at the same place on a long DNA molecule, as described in leading textbooks on genetics and evolution.
::"The chro ...
s, in the population. These alleles encode slightly different versions of a gene, which may cause different phenotypical traits. Usage of the term "having a gene" (e.g., "good genes," "hair color gene") typically refers to containing a different allele of the same, shared gene. Genes evolve due to natural selection / survival of the fittest
"Survival of the fittest" is a phrase that originated from Darwinian evolutionary theory as a way of describing the mechanism of natural selection. The biological concept of fitness is defined as reproductive success. In Darwinian terms, th ...
and genetic drift of the alleles.
The concept of ''gene'' continues to be refined as new phenomena are discovered.[ For example, regulatory regions of a gene can be far removed from its ]coding region
The coding region of a gene, also known as the coding sequence (CDS), is the portion of a gene's DNA or RNA that codes for protein. Studying the length, composition, regulation, splicing, structures, and functions of coding regions compared to no ...
s, and coding regions can be split into several exon
An exon is any part of a gene that will form a part of the final mature RNA produced by that gene after introns have been removed by RNA splicing. The term ''exon'' refers to both the DNA sequence within a gene and to the corresponding sequen ...
s. Some viruses store their genome in RNA instead of DNA and some gene products are functional non-coding RNAs. Therefore, a broad, modern working definition of a gene is any discrete locus of heritable, genomic sequence which affect an organism's traits by being expressed as a functional product or by regulation of gene expression
Regulation of gene expression, or gene regulation, includes a wide range of mechanisms that are used by cells to increase or decrease the production of specific gene products (protein or RNA). Sophisticated programs of gene expression are wide ...
.
The term ''gene'' was introduced by Danish botanist, plant physiologist and geneticist Wilhelm Johannsen in 1909.[ From p. 124: ''"Dieses "etwas" in den Gameten bezw. in der Zygote, … – kurz, was wir eben Gene nennen wollen – bedingt sind."'' (This "something" in the gametes or in the zygote, which has crucial importance for the character of the organism, is usually called by the quite ambiguous term ''Anlagen'' rimordium, from the German word ''Anlage'' for "plan, arrangement ; rough sketch" Many other terms have been suggested, mostly unfortunately in closer connection with certain hypothetical opinions. The word "pangene", which was introduced by Darwin, is perhaps used most frequently in place of ''Anlagen''. However, the word "pangene" was not well chosen, as it is a compound word containing the roots ''pan'' (the neuter form of Πας all, every) and ''gen'' (from γί-γ(ε)ν-ομαι, to become). Only the meaning of this latter .e., ''gen''comes into consideration here ; just the basic idea – amely,that a trait in the developing organism can be determined or is influenced by "something" in the gametes – should find expression. No hypothesis about the nature of this "something" should be postulated or supported by it. For that reason it seems simplest to use in isolation the last syllable ''gen'' from Darwin's well-known word, which alone is of interest to us, in order to replace, with it, the poor, ambiguous word ''Anlage''. Thus we will say simply "gene" and "genes" for "pangene" and "pangenes". The word gene is completely free of any hypothesis ; it expresses only the established fact that in any case many traits of the organism are determined by specific, separable, and thus independent "conditions", "foundations", "plans" – in short, precisely what we want to call genes.)] It is inspired by the Ancient Greek
Ancient Greek includes the forms of the Greek language used in ancient Greece and the ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Dark Ages (), the Archaic p ...
: γόνος, ''gonos'', that means offspring and procreation.
Conflicting definitions of 'gene'
There are lots of different ways to use the term "gene." Richard Dawkins, for example, wrote a book called "The Selfish Gene" where 'gene' simply meant any part of the chromosome that was subject to natural selection. This 'gene' is often referred to as the "Mendelian gene" whereas the physical gene described in this article is called the "molecular gene."[
The very first edition of the textbook "Molecular Biology of the Gene" (1965) described two kinds of molecular gene: protein-coding genes and those that specified functional RNA molecules such as ribosomal RNA and tRNA (noncoding genes).] But the idea of two kinds of genes dates back to the late 1950s when Jacob and Monod speculated that regulatory genes might produce repressor RNAs.
This idea of two kinds of genes is still part of the definition of a gene in most textbooks. For example,
::"The primary function of the genome is to produce RNA molecules. Selected portions of the DNA nucleotide sequence are copied into a corresponding RNA nucleotide sequence, which either encodes a protein (if it is an mRNA) or forms a 'structural' RNA, such as a transfer RNA (tRNA) or ribosomal RNA (rRNA) molecule. Each region of the DNA helix that produces a functional RNA molecule constitutes a gene."
::"We define a gene as a DNA sequence that is transcribed. This definition includes genes that do not encode proteins (not all transcripts are messenger RNA). The definition normally excludes regions of the genome that control transcription but are not themselves transcribed. We will encounter some exceptions to our definition of a gene - surprisingly, there is no definition that is entirely satisfactory."
::"A gene is a DNA sequence that codes for a diffusible product. This product may be protein (as is the case in the majority of genes) or may be RNA (as is the case of genes that code for tRNA and rRNA). The crucial feature is that the product diffuses away from its site of synthesis to act elsewhere."
The important parts of such definitions are: (1) that a gene corresponds to a transcription unit; (2) that genes produce both mRNA and noncoding RNAs; and (3) regulatory sequences control gene expression but are not part of the gene itself. However, there's one other important part of the definition and it is emphasized in Kostas Kampourakis' book "Making Sense of Genes."
::"Therefore in this book I will consider genes as DNA sequences encoding information for functional products, be it proteins or RNA molecles. With 'encoding information,' I mean that the DNA sequence is used as a template for the production of an RNA molecule or a protein that performs some function.'
The emphasis on function is essential because there are stretches of DNA that produce non-functional transcripts and they don't qualify as genes. These include obvious examples such as transcribed pseudogenes as well as less obvious examples such as junk RNA produced as noise due to transcription errors. In order to qualify as a true gene, by this definition, one has to prove that the transcript has a biological function.
Early speculations on the size of a typical gene were based on high resolution genetic mapping and on the size of proteins and RNA molecules. A length of 1500 base pairs seemed reasonable at the time (1965). This was based on the idea that the gene was the DNA that was directly responsible for production of the functional product. The discovery of introns in the 1970s meant that many eukaryotic genes were much larger than the size of the functional product would imply. Typical mammalian protein-coding genes, for example, are about 62,000 base pairs in length (transcribed region) and since there are about 20,000 of them they occupy about 35-40% of the mammalian genome (including the human genome).
In spite of the fact that both protein-coding genes and noncoding genes have been known for more than 50 years, there are still a number of textbooks, websites, and scientific publications that define a gene as a DNA sequence that specifies a protein. In other words, the definition is restricted to protein-coding genes. Here's an example from a recent article in American Scientist.
::What Is a Gene, Really?
::... to truly assess the potential significance of de novo genes, we relied on a strict definition of the word "gene" with which nearly every expert can agree. First, in order for a nucleotide sequence to be considered a true gene, an open reading frame (ORF) must be present. The ORF can be thought of as the "gene itself"; it begins with a starting mark common for every gene and ends with one of three possible finish line signals. One of the key enzymes in this process, the RNA polymerase, zips along the strand of DNA like a train on a monorail, transcribing it into its messenger RNA form. This point brings us to our second important criterion: A true gene is one that is both transcribed and translated. That is, a true gene is first used as a template to make transient messenger RNA, which is then translated into a protein.
This restricted definition is so common that it has spawned many recent articles that criticize this "standard definition" and call for a new expanded definition that includes noncoding genes. However, this so-called "new" definition has been around for more than half a century and it's not clear why some modern writers are ignoring noncoding genes.
There are exceptions to the standard definition of a gene; for example, some viruses have an RNA genome. The one important exception concerns bacterial operons where a contiguous stretch of DNA containing multiple protein-coding regions is transcribed into one large mRNA. Scientists usually refer to each of the coding regions as separate genes in this case. The only significant controversy over the definition of a gene is whether to include the regulatory sequences that control transcription of the gene. The general consensus among scientists is that regulatory elements control the expression of a gene but are not part of the gene.
History
Discovery of discrete inherited units
The existence of discrete inheritable units was first suggested by Gregor Mendel (1822–1884). From 1857 to 1864, in Brno
Brno ( , ; german: Brünn ) is a city in the South Moravian Region of the Czech Republic. Located at the confluence of the Svitava and Svratka rivers, Brno has about 380,000 inhabitants, making it the second-largest city in the Czech Republic ...
, Austrian Empire
The Austrian Empire (german: link=no, Kaiserthum Oesterreich, modern spelling , ) was a Central- Eastern European multinational great power from 1804 to 1867, created by proclamation out of the realms of the Habsburgs. During its existence ...
(today's Czech Republic), he studied inheritance patterns in 8000 common edible pea plants, tracking distinct traits from parent to offspring. He described these mathematically as 2n comb