HOME

TheInfoList



OR:

Copy number variation (CNV) is a phenomenon in which sections of the genome are repeated and the number of repeats in the genome varies between individuals. Copy number variation is a type of
structural variation Genomic structural variation is the variation in structure of an organism's chromosome. It consists of many kinds of variation in the genome of one species, and usually includes microscopic and submicroscopic types, such as deletions, duplications, ...
: specifically, it is a type of
duplication Duplication, duplicate, and duplicator may refer to: Biology and genetics * Gene duplication, a process which can result in free mutation * Chromosomal duplication, which can cause Bloom and Rett syndrome * Polyploidy, a phenomenon also known ...
or deletion event that affects a considerable number of base pairs. Approximately two-thirds of the entire human genome may be composed of repeats and 4.8–9.5% of the human genome can be classified as copy number variations. In
mammal Mammals () are a group of vertebrate animals constituting the class Mammalia (), characterized by the presence of mammary glands which in females produce milk for feeding (nursing) their young, a neocortex (a region of the brain), fur or ...
s, copy number variations play an important role in generating necessary variation in the population as well as disease phenotype. Copy number variations can be generally categorized into two main groups: short repeats and long repeats. However, there are no clear boundaries between the two groups and the classification depends on the nature of the loci of interest. Short repeats include mainly
dinucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules with ...
repeats (two repeating nucleotides e.g. A-C-A-C-A-C...) and trinucleotide repeats. Long repeats include repeats of entire genes. This classification based on size of the repeat is the most obvious type of classification as size is an important factor in examining the types of mechanisms that most likely gave rise to the repeats, hence the likely effects of these repeats on phenotype.


Types and chromosomal rearrangements

One of the most well known examples of a short copy number variation is the trinucleotide repeat of the CAG base pairs in the huntingtin gene responsible for the neurological disorder
Huntington's disease Huntington's disease (HD), also known as Huntington's chorea, is a neurodegenerative disease that is mostly inherited. The earliest symptoms are often subtle problems with mood or mental abilities. A general lack of coordination and an unst ...
. For this particular case, once the CAG trinucleotide repeats more than 36 times in a
trinucleotide repeat expansion A trinucleotide repeat expansion, also known as a triplet repeat expansion, is the DNA mutation responsible for causing any type of disorder categorized as a trinucleotide repeat disorder. These are labelled in dynamical genetics as dynamic muta ...
, Huntington's disease will likely develop in the individual and it will likely be inherited by his or her offspring. The number of repeats of the CAG trinucleotide is correlated with the
age of onset The age of onset is the age at which an individual acquires, develops, or first experiences a condition or symptoms of a disease or disorder. For instance, the general age of onset for the spinal disease scoliosis is "10-15 years old," meaning t ...
of Huntington's disease. These types of short repeats are often thought to be due to errors in
polymerase A polymerase is an enzyme ( EC 2.7.7.6/7/19/48/49) that synthesizes long chains of polymers or nucleic acids. DNA polymerase and RNA polymerase are used to assemble DNA and RNA molecules, respectively, by copying a DNA template strand using base- ...
activity during replication including polymerase slippage, template switching, and fork switching which will be discussed in detail later. The short repeat size of these copy number variations lends itself to errors in the polymerase as these repeated regions are prone to misrecognition by the polymerase and replicated regions may be replicated again, leading to extra copies of the repeat. In addition, if these trinucleotide repeats are in the same
reading frame In molecular biology, a reading frame is a way of dividing the nucleic acid sequence, sequence of nucleotides in a nucleic acid (DNA or RNA) molecule into a set of consecutive, non-overlapping triplets. Where these triplets equate to amino acids or ...
in the coding portion of a gene, it may lead to a long chain of the same
amino acid Amino acids are organic compounds that contain both amino and carboxylic acid functional groups. Although hundreds of amino acids exist in nature, by far the most important are the alpha-amino acids, which comprise proteins. Only 22 alpha am ...
, possibly creating protein aggregates in the cell, and if these short repeats fall into the non-coding portion of the gene, it may affect
gene expression Gene expression is the process by which information from a gene is used in the synthesis of a functional gene product that enables it to produce end products, protein or non-coding RNA, and ultimately affect a phenotype, as the final effect. The ...
and regulation. On the other hand, a variable number of repeats of entire genes is less commonly identified in the genome. One example of a whole gene repeat is the
alpha-amylase α-Amylase is an enzyme (EC 3.2.1.1; systematic name 4-α-D-glucan glucanohydrolase) that hydrolyses α bonds of large, α-linked polysaccharides, such as starch and glycogen, yielding shorter chains thereof, dextrins, and maltose: :Endohyd ...
1 gene (AMY1) that encodes alpha-amylase which has a significant copy number variation between different populations with different diets. Although the specific mechanism that allows the AMY1 gene to increase or decrease its copy number is still a topic of debate, some hypotheses suggest that the
non-homologous end joining Non-homologous end joining (NHEJ) is a pathway that repairs double-strand breaks in DNA. NHEJ is referred to as "non-homologous" because the break ends are directly ligated without the need for a homologous template, in contrast to homology direc ...
or the
microhomology-mediated end joining Microhomology-mediated end joining (MMEJ), also known as alternative nonhomologous end-joining (Alt-NHEJ) is one of the pathways for repairing double-strand breaks in DNA. As reviewed by McVey and Lee, the foremost distinguishing property of MMEJ ...
is likely responsible for these whole gene repeats. Repeats of entire genes has immediate effects on expression of that particular gene, and the fact that the copy number variation of the AMY1 gene has been related to diet is a remarkable example of recent human evolutionary adaptation. Although these are the general groups that copy number variations are grouped into, the exact number of base pairs copy number variations affect depends on the specific loci of interest. Currently, using data from all reported copy number variations, the mean size of copy number variant is around 118kb, and the median is around 18kb. In terms of the structural architecture of copy number variations, research has suggested and defined hotspot regions in the genome where copy number variations are four times more enriched. These hotspot regions were defined to be regions containing long repeats that are 90–100% similar known as
segmental duplication Low copy repeats (LCRs), also known as segmental duplications (SDs), are highly homologous sequence elements within the eukaryotic genome. Repeats The repeats, or duplications, are typically 10–300 kb in length, and bear greater than 95% sequ ...
s either
tandem Tandem, or in tandem, is an arrangement in which a team of machines, animals or people are lined up one behind another, all facing in the same direction. The original use of the term in English was in ''tandem harness'', which is used for two ...
or interspersed and most importantly, these hotspot regions have an increased rate of
chromosomal rearrangement In genetics, a chromosomal rearrangement is a mutation that is a type of chromosome abnormality involving a change in the structure of the native chromosome. Such changes may involve several different classes of events, like deletions, duplicatio ...
. It was thought that these large-scale chromosomal rearrangements give rise to normal variation and
genetic diseases A genetic disorder is a health problem caused by one or more abnormalities in the genome. It can be caused by a mutation in a single gene (monogenic) or multiple genes (polygenic) or by a chromosomal abnormality. Although polygenic disorders ...
, including copy number variations. Moreover, these copy number variation hotspots are consistent throughout many populations from different continents, implying that these hotspots were either independently acquired by all the populations and passed on through generations, or they were acquired in early human evolution before the populations split, the latter seems more likely. Lastly, spatial biases of the location at which copy number variations are most densely distributed does not seem to occur in the genome. Although it was originally detected by
fluorescent in situ hybridization Fluorescence ''in situ'' hybridization (FISH) is a molecular cytogenetic technique that uses fluorescent probes that bind to only particular parts of a nucleic acid sequence with a high degree of sequence complementarity. It was developed ...
and microsatellite analysis that copy number repeats are localized to regions that are highly repetitive such as
telomere A telomere (; ) is a region of repetitive nucleotide sequences associated with specialized proteins at the ends of linear chromosomes. Although there are different architectures, telomeres, in a broad sense, are a widespread genetic feature mos ...
s,
centromere The centromere links a pair of sister chromatids together during cell division. This constricted region of chromosome connects the sister chromatids, creating a short arm (p) and a long arm (q) on the chromatids. During mitosis, spindle fibers a ...
s, and
heterochromatin Heterochromatin is a tightly packed form of DNA or '' condensed DNA'', which comes in multiple varieties. These varieties lie on a continue between the two extremes of constitutive heterochromatin and facultative heterochromatin. Both play a role ...
, recent genome-wide studies have concluded otherwise. Namely, the subtelomeric regions and pericentromeric regions are where most chromosomal rearrangement hotspots are found, and there is no considerable increase in copy number variations in that region. Furthermore, these regions of chromosomal rearrangement hotspots do not have decreased gene numbers, again, implying that there is minimal spatial bias of the genomic location of copy number variations.


Detection and identification

Copy number variation was initially thought to occupy an extremely small and negligible portion of the genome through
cytogenetic Cytogenetics is essentially a branch of genetics, but is also a part of cell biology/cytology (a subdivision of human anatomy), that is concerned with how the chromosomes relate to cell behaviour, particularly to their behaviour during mitosis an ...
observations. Copy number variations were generally associated only with small tandem repeats or specific genetic disorders, therefore, copy number variations were initially only examined in terms of specific loci. However, technological developments led to an increasing number of highly accurate ways of identifying and studying copy number variations. Copy number variations were originally studied by cytogenetic techniques, which are techniques that allow one to observe the physical structure of the chromosome. One of these techniques is
fluorescent in situ hybridization Fluorescence ''in situ'' hybridization (FISH) is a molecular cytogenetic technique that uses fluorescent probes that bind to only particular parts of a nucleic acid sequence with a high degree of sequence complementarity. It was developed ...
(FISH) which involves inserting fluorescent probes that require a high degree of complementarity in the genome for binding.
Comparative genomic hybridization Comparative genomic hybridization (CGH) is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells. The ...
was also commonly used to detect copy number variations by
fluorophore A fluorophore (or fluorochrome, similarly to a chromophore) is a fluorescent chemical compound that can re-emit light upon light excitation. Fluorophores typically contain several combined aromatic groups, or planar or cyclic molecules with se ...
visualization and then comparing the length of the chromosomes. One major drawback of these early techniques is that the genomic resolution is relatively low and only large repeats such as whole gene repeats can be detected. Recent advances in
genomics Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
technologies gave rise to many important methods that are of extremely high genomic resolution and as a result, an increasing number of copy number variations in the genome have been reported. Initially these advances involved using
bacterial artificial chromosome A bacterial artificial chromosome (BAC) is a DNA construct, based on a functional fertility plasmid (or F-plasmid), used for transforming and cloning in bacteria, usually '' E. coli''. F-plasmids play a crucial role because they contain partition ...
(BAC) array with around 1 megabase of intervals throughout the entire gene, BACs can also detect copy number variations in rearrangement hotspots allowing for the detection of 119 novel copy number variations. High throughput genomic sequencing has revolutionized the field of human genomics and
in silico In biology and other experimental sciences, an ''in silico'' experiment is one performed on computer or via computer simulation. The phrase is pseudo-Latin for 'in silicon' (correct la, in silicio), referring to silicon in computer chips. It ...
studies have been performed to detect copy number variations in the genome. Reference sequences have been compared to other sequences of interest using
fosmid Fosmids are similar to cosmids but are based on the bacterial F-plasmid. The cloning vector is limited, as a host (usually '' E. coli'') can only contain one fosmid molecule. Fosmids can hold DNA inserts of up to 40 kb in size; often the source of t ...
s by strictly controlling the fosmid clones to be 40kb. Sequencing end reads would provide adequate information to align the reference sequence to the sequence of interest, and any misalignments are easily noticeable thus concluded to be copy number variations within that region of the clone. This type of detection technique offers a high genomic resolution and precise location of the repeat in the genome, and it can also detect other types of structural variation such as inversions. In addition, another way of detecting copy number variation is using
single nucleotide polymorphisms In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently larg ...
(SNPs). Due to the abundance of the human SNP data, the direction of detecting copy number variation has changed to utilize these SNPs. Relying on the fact that human recombination is relatively rare and that many recombination events occur in specific regions of the genome known as recombination hotspots,
linkage disequilibrium In population genetics, linkage disequilibrium (LD) is the non-random association of alleles at different loci in a given population. Loci are said to be in linkage disequilibrium when the frequency of association of their different alleles is h ...
can be used to identify copy number variations. Efforts have been made in associating copy number variations with specific
haplotype A haplotype ( haploid genotype) is a group of alleles in an organism that are inherited together from a single parent. Many organisms contain genetic material ( DNA) which is inherited from two parents. Normally these organisms have their DNA or ...
SNPs by analyzing the linkage disequilibrium, using these associations, one is able to recognize copy number variations in the genome using SNPs as markers.
Next-generation sequencing Massive parallel sequencing or massively parallel sequencing is any of several high-throughput approaches to DNA sequencing using the concept of massively parallel processing; it is also called next-generation sequencing (NGS) or second-generation s ...
techniques including short and long read sequencing are nowadays increasingly used and have begun to replace array-based techniques to detect copy number variations. In contrast to array-based techniques, sequencing-based detection methods readily identify other classes of
structural variation Genomic structural variation is the variation in structure of an organism's chromosome. It consists of many kinds of variation in the genome of one species, and usually includes microscopic and submicroscopic types, such as deletions, duplications, ...
such as inversions and
translocations In genetics, chromosome translocation is a phenomenon that results in unusual rearrangement of chromosomes. This includes balanced and unbalanced translocation, with two main types: reciprocal-, and Robertsonian translocation. Reciprocal translo ...
.


Molecular mechanism

There are two main types of molecular mechanism for the formation of copy number variations: homologous based and non-homologous based. Although many suggestions have been put forward, most of these theories are speculations and conjecture. There is no conclusive evidence that correlates a specific copy number variation to a specific mechanism. One of the best-recognized theories that leads to copy number variations as well as deletions and inversions is
non-allelic homologous recombination Non-allelic homologous recombination (NAHR) is a form of homologous recombination that occurs between two lengths of DNA that have high sequence similarity, but are not alleles. It usually occurs between sequences of DNA that have been previously ...
s. During
meiotic recombination Genetic recombination (also known as genetic reshuffling) is the exchange of genetic material between different organisms which leads to production of offspring with combinations of traits that differ from those found in either parent. In eukaryo ...
, homologous chromosomes pair up and form two ended double-stranded breaks leading to
Holliday junctions A Holliday junction is a branched nucleic acid structure that contains four double-stranded arms joined. These arms may adopt one of several conformations depending on buffer salt concentrations and the sequence of nucleobases closest to the j ...
. However, in the aberrant mechanism, during the formation of Holliday junctions, the double-stranded breaks are misaligned and the
crossover Crossover may refer to: Entertainment Albums and songs * ''Cross Over'' (Dan Peek album) * ''Crossover'' (Dirty Rotten Imbeciles album), 1987 * ''Crossover'' (Intrigue album) * ''Crossover'' (Hitomi Shimatani album) * ''Crossover'' (Yoshino ...
lands in non-allelic positions on the same chromosome. When the Holliday junction is resolved, the unequal crossing over event allows transfer of genetic material between the two homologous chromosomes, and as a result, a portion of the DNA on both the homologues is repeated. Since the repeated regions are no longer segregating independently, the duplicated region of the chromosome is inherited. Another type of homologous recombination based mechanism that can lead to copy number variation is known as break induced replication. When a double stranded break occurs in the genome unexpectedly the cell activates pathways that mediate the repair of the break. Errors in repairing the break, similar to non-allelic homologous recombination, can lead to an increase in copy number of a particular region of the genome. During the repair of a double stranded break, the broken end can invade its homologous chromosome instead of rejoining the original strand. As in the non-allelic homologous recombination mechanism, an extra copy of a particular region is transferred to another chromosome, leading to a duplication event. Furthermore,
cohesin Cohesin is a protein complex that mediates sister chromatid cohesion, homologous recombination, and DNA looping. Cohesin is formed of SMC3, SMC1, SCC1 and SCC3 ( SA1 or SA2 in humans). Cohesin holds sister chromatids together after DNA rep ...
proteins are found to aid in the repair system of double stranded breaks through clamping the two ends in close proximity which prevents interchromosomal invasion of the ends. If for any reason, such as activation of
ribosomal RNA Ribosomal ribonucleic acid (rRNA) is a type of non-coding RNA which is the primary component of ribosomes, essential to all cells. rRNA is a ribozyme which carries out protein synthesis in ribosomes. Ribosomal RNA is transcribed from ribosomal ...
, cohesin activity is affected then there may be local increase in double stranded break repair errors. The other class of possible mechanisms that are hypothesized to lead to copy number variations is non-homologous based. To distinguish between this and homologous based mechanisms, one must understand the concept of homology. Homologous pairing of chromosomes involved using DNA strands that are highly similar to each other (~97%) and these strands must be longer than a certain length to avoid short but highly similar pairings. Non-homologous pairings, on the other hand, rely on only few base pairs of similarity between two strands, therefore it is possible for genetic materials to be exchanged or duplicated in the process of non-homologous based double stranded repairs. One type of non-homologous based mechanism is the non-homologous end joining or micro-homology end joining mechanism. These mechanisms are also involved in repairing double stranded breaks but require no homology or limited micro-homology. When these strands are repaired, oftentimes there are small deletions or insertions added into the repaired strand. It is possible that
retrotransposons Retrotransposons (also called Class I transposable elements or transposons via RNA intermediates) are a type of genetic component that copy and paste themselves into different genomic locations (transposon) by converting RNA back into DNA through ...
are inserted into the genome through this repair system. If retrotransposons are inserted into a non-allelic position on the chromosome, meiotic recombination can drive the insertion to be recombined into the same strand as an already existing copy of the same region. Another mechanism is the break-fusion-bridge cycle which involves
sister chromatids A sister chromatid refers to the identical copies (chromatids) formed by the DNA replication of a chromosome, with both copies joined together by a common centromere. In other words, a sister chromatid may also be said to be 'one-half' of the dup ...
that have both lost its telomeric region due to double stranded breaks. It is proposed that these sister chromatids will fuse together to form one
dicentric chromosome A dicentric chromosome is an abnormal chromosome with two centromeres. It is formed through the fusion of two chromosome segments, each with a centromere, resulting in the loss of acentric fragments (lacking a centromere) and the formation of dicent ...
, and then segregate into two different nuclei. Because pulling the dicentric chromosome apart causes a double stranded break, the end regions can fuse to other double stranded breaks and repeat the cycle. The fusion of two sister chromatids can cause inverted duplication and when these events are repeated throughout the cycle, the inverted region will be repeated leading to an increase in copy number. The last mechanism that can lead to copy number variations is polymerase slippage, which is also known as template switching. During normal DNA replication, the polymerase on the
lagging strand In molecular biology, DNA replication is the biological process of producing two identical replicas of DNA from one original DNA molecule. DNA replication occurs in all living organisms acting as the most essential part for biological inheritance ...
is required to unclamp and re-clamp the replication region continuously. When small scale repeats in the DNA sequence exist already, the polymerase can be 'confused' when it re-clamps to continue replication and instead of clamping to the correct base pairs, it may shift a few base pairs and replicate a portion of the repeated region again. Note that although this has been experimentally observed and is a widely accepted mechanism, the molecular interactions that led to this error remains unknown. In addition, because this type of mechanism requires the polymerase to jump around the DNA strand and it is unlikely that the polymerase can re-clamp at another locus some kilobases apart, therefore this is more applicable to short repeats such as dinucleotide or trinucleotide repeats.


Alpha-amylase gene

Amylase is an
enzyme Enzymes () are proteins that act as biological catalysts by accelerating chemical reactions. The molecules upon which enzymes may act are called substrates, and the enzyme converts the substrates into different molecules known as products. A ...
in saliva that is responsible for the breakdown of
starch Starch or amylum is a polymeric carbohydrate consisting of numerous glucose units joined by glycosidic bonds. This polysaccharide is produced by most green plants for energy storage. Worldwide, it is the most common carbohydrate in human diets ...
into
monosaccharides Monosaccharides (from Greek ''monos'': single, '' sacchar'': sugar), also called simple sugars, are the simplest forms of sugar and the most basic units (monomers) from which all carbohydrates are built. They are usually colorless, water-solu ...
, and one type of amylase is encoded by the alpha-amylase gene (AMY1). The AMY1 locus, as well as the amylase enzyme, is one of the most extensively studied and sequenced gene in the human genome. Its homologs are also found in other primates and therefore it is likely that the
primate Primates are a diverse order of mammals. They are divided into the strepsirrhines, which include the lemurs, galagos, and lorisids, and the haplorhines, which include the tarsiers and the simians (monkeys and apes, the latter including huma ...
AMY1 gene is ancestral to the human AMY1 gene and was adapted early in primate evolution. AMY1 is one of the most well studied genes which has wide range of variable numbers of copies throughout different human populations. The AMY1 gene is also one of the few genes that had been studied that displayed convincing evidence which correlates its protein function to its copy number. Copy number is known to alter
transcription Transcription refers to the process of converting sounds (voice, music etc.) into letters or musical notes, or producing a copy of something in another medium, including: Genetics * Transcription (biology), the copying of DNA into RNA, the fir ...
as well as
translation Translation is the communication of the Meaning (linguistic), meaning of a #Source and target languages, source-language text by means of an Dynamic and formal equivalence, equivalent #Source and target languages, target-language text. The ...
levels of a particular gene, however research has shown that the relationship between protein levels and copy number is variable. In the AMY1 genes of European Americans it is found that the concentration of salivary amylase is closely correlated to the copy number of the AMY1 gene. As a result, it was hypothesized that the copy number of the AMY1 gene is closely correlated with its protein function, which is to digest starch. The AMY1 gene copy number has been found to be correlated to different levels of starch in diets of different populations. Eight populations from different continents were categorized into high starch diets and low starch diets and their AMY1 gene copy number was visualized using high resolution FISH and
qPCR A real-time polymerase chain reaction (real-time PCR, or qPCR) is a laboratory technique of molecular biology based on the polymerase chain reaction (PCR). It monitors the amplification of a targeted DNA molecule during the PCR (i.e., in real ...
. It was found that the high starch diet populations which consists of the Japanese, Hadza, and European American populations had a significantly higher (two times higher) average AMY1 copy number than the low starch diet populations including Biaka, Mbuti, Datog, Yakut populations. It was hypothesized that the levels of starch in one’s regular diet, the substrate for AMY1, can directly affect the copy number of the AMY1 gene. Since it was concluded that the copy number of AMY1 is directly correlated with salivary amylase, the more starch present in the population’s daily diet, the more evolutionarily favorable it is to have multiple copies of the AMY1 gene. The AMY1 gene was the first gene to provide strong evidence for evolution on a
molecular genetics Molecular genetics is a sub-field of biology that addresses how differences in the structures or expression of DNA molecules manifests as variation among organisms. Molecular genetics often applies an "investigative approach" to determine the ...
level. Moreover, using
comparative genomic hybridization Comparative genomic hybridization (CGH) is a molecular cytogenetic method for analysing copy number variations (CNVs) relative to ploidy level in the DNA of a test sample compared to a reference sample, without the need for culturing cells. The ...
, copy number variations of the entire genomes of the Japanese population was compared to that of the Yakut population. It was found that the copy number variation of the AMY1 gene was significantly different from the copy number variation in other genes or regions of the genome, suggesting that the AMY1 gene was under a strong selective pressure that had little or no influence on the other copy number variations. Finally, the variability of length of 783
microsatellite A microsatellite is a tract of repetitive DNA in which certain DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome. ...
s between the two populations were compared to copy number variability of the AMY1 gene. It was found that the AMY1 gene copy number range was larger than that of over 97% of the microsatellites examined. This implies that
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Charle ...
played a considerable role in shaping the average number of AMY1 genes in these two populations. However, as only six populations were studied, it is important to consider the possibility that there may be other factors in their diet or culture that influenced the AMY1 copy number other than starch. Although it is unclear when the AMY1 gene copy number began to increase, it is known and confirmed that the AMY1 gene existed in early primates.
Chimpanzee The chimpanzee (''Pan troglodytes''), also known as simply the chimp, is a species of great ape native to the forest and savannah of tropical Africa. It has four confirmed subspecies and a fifth proposed subspecies. When its close relative th ...
s, the closest evolutionary relatives to humans, were found to have two
diploid Ploidy () is the number of complete sets of chromosomes in a cell, and hence the number of possible alleles for autosomal and pseudoautosomal genes. Sets of chromosomes refer to the number of maternal and paternal chromosome copies, respectively ...
copies of the AMY1 gene that is identical in length to the human AMY1 gene, which is significantly less than that of humans. On the other hand,
bonobo The bonobo (; ''Pan paniscus''), also historically called the pygmy chimpanzee and less often the dwarf chimpanzee or gracile chimpanzee, is an endangered great ape and one of the two species making up the genus '' Pan,'' the other being the comm ...
s, also a close relative of modern humans, was found to have more than two diploid copies of the AMY1 gene. Nonetheless, the bonobo AMY1 genes were sequenced and analyzed, and it was found that the coding sequences of the AMY1 genes were disrupting, which may lead to the production of dysfunctional salivary amylase. It can be inferred from the results that the increase in bonobo AMY1 copy number is likely not correlated to the amount of starch in their diet. It was further hypothesized that the increase in copy number began recently during early
hominin The Hominini form a taxonomic tribe of the subfamily Homininae ("hominines"). Hominini includes the extant genera ''Homo'' (humans) and '' Pan'' (chimpanzees and bonobos) and in standard usage excludes the genus ''Gorilla'' (gorillas). The t ...
evolution as none of the
great apes The Hominidae (), whose members are known as the great apes or hominids (), are a taxonomic family of primates that includes eight extant species in four genera: '' Pongo'' (the Bornean, Sumatran and Tapanuli orangutan); ''Gorilla'' (the east ...
had more than two copies of the AMY1 gene that produced functional protein. In addition, it was speculated that the increase in the AMY1 copy number began around 20,000 years ago when humans shifted from a
hunter-gatherer A traditional hunter-gatherer or forager is a human living an ancestrally derived lifestyle in which most or all food is obtained by foraging, that is, by gathering food from local sources, especially edible wild plants but also insects, fungi, ...
lifestyle to
agricultural Agriculture or farming is the practice of cultivating Plant, plants and livestock. Agriculture was the key development in the rise of Sedentism, sedentary human civilization, whereby farming of Domestication, domesticated species created food ...
societies, which was also when humans relied heavily on
root vegetables Root vegetables are underground plant parts eaten by humans as food. Although botany distinguishes true roots (such as taproots and tuberous roots) from non-roots (such as bulbs, corms, rhizomes, and tubers, although some contain both hypocotyl a ...
high in starch. This hypothesis, although logical, lacks experimental evidence due to the difficulties in gathering information on the shift of human diets, especially on root vegetables that are high in starch as they cannot be directly observed or tested. Recent breakthroughs in DNA sequencing has allowed researchers to sequence older DNA such as that of
Neanderthal Neanderthals (, also ''Homo neanderthalensis'' and erroneously ''Homo sapiens neanderthalensis''), also written as Neandertals, are an extinct species or subspecies of archaic humans who lived in Eurasia until about 40,000 years ago. While th ...
s to a certain degree of accuracy. Perhaps sequencing Neanderthal DNA can provide a time marker as to when the AMY1 gene copy number increased and offer insight into human diet and gene evolution. Currently it is unknown which mechanism gave rise to the initial duplication of the amylase gene, and it can imply that the insertion of the
retroviral A retrovirus is a type of virus that inserts a DNA copy of its RNA genome into the DNA of a host cell that it invades, thus changing the genome of that cell. Once inside the host cell's cytoplasm, the virus uses its own reverse transcriptas ...
sequences was due to non-homologous end joining, which caused the duplication of the AMY1 gene. However, there is currently no evidence to support this theory and therefore this hypothesis remains conjecture. The recent origin of the multi-copy AMY1 gene implies that depending on the environment, the AMY1 gene copy number can increase and decrease very rapidly relative to genes that do not interact as directly with the environment. The AMY1 gene is an excellent example of how
gene dosage Gene dosage is the number of copies of a particular gene present in a genome. Gene dosage is related to the amount of gene product (proteins or functional RNAs) the cell is able to express. Since, a gene acts as a template, the number of templates ...
affects the survival of an organism in a given environment. The multiple copies of the AMY1 gene gives those who rely more heavily on high starch diets an evolutionary advantage, therefore the high gene copy number persists in the population.


Brain cells

Among the
neuron A neuron, neurone, or nerve cell is an electrically excitable cell that communicates with other cells via specialized connections called synapses. The neuron is the main component of nervous tissue in all animals except sponges and placozoa. N ...
s in the
human brain The human brain is the central organ of the human nervous system, and with the spinal cord makes up the central nervous system. The brain consists of the cerebrum, the brainstem and the cerebellum. It controls most of the activities of the ...
, somatically derived copy number variations are frequent. Copy number variations show wide variability (9 to 100% of brain neurons in different studies). Most alterations are between 2 and 10 Mb in size with deletions far outnumbering amplifications. Copy number variations appear to be higher in brain cells than in other cell types. A likely source of copy number variation is incorrect repair of DNA damage. Genomic duplication and triplication of the gene appear to be a rare cause of
Parkinson's disease Parkinson's disease (PD), or simply Parkinson's, is a long-term degenerative disorder of the central nervous system that mainly affects the motor system. The symptoms usually emerge slowly, and as the disease worsens, non-motor symptoms becom ...
, although more common than point mutations. Copy number variants in
RCL1 RNA 3'-terminal phosphate cyclase-like protein is an enzyme that in humans is encoded by the ''RCL1'' gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." mean ...
gene are associated with a range of
neuropsychiatric Neuropsychiatry or Organic Psychiatry is a branch of medicine that deals with psychiatry as it relates to neurology, in an effort to understand and attribute behavior to the interaction of neurobiology and social psychology factors. Within neurop ...
phenotypes in children.


Gene families, and natural selection

Recently, there had been discussion connecting copy number variations to
gene families A gene family is a set of several similar genes, formed by duplication of a single original gene, and generally with similar biochemical functions. One such family are the genes for human hemoglobin subunits; the ten genes are in two clusters on ...
. Gene families are defined as a set of related genes that serve similar functions but have minor temporal or spatial differences and these genes likely derived from one
ancestral An ancestor, also known as a forefather, fore-elder or a forebear, is a parent or (recursively) the parent of an antecedent (i.e., a grandparent, great-grandparent, great-great-grandparent and so forth). ''Ancestor'' is "any person from whom ...
gene. The main reason copy number variations are connected to gene families is that there is a possibility that genes in a family may have derived from one ancestral gene which got duplicated into different copies. Mutations accumulate through time in the genes and with
natural selection Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the heritable traits characteristic of a population over generations. Charle ...
acting on the genes, some mutations lead to environmental advantages allowing those genes to be inherited and eventually clear gene families are separated out. An example of a gene family that may have been created due to copy number variations is the
globin The globins are a superfamily of heme-containing globular proteins, involved in binding and/or transporting oxygen. These proteins all incorporate the globin fold, a series of eight alpha helical segments. Two prominent members include myogl ...
gene family. The globin gene family is an elaborate network of genes consisting of
alpha Alpha (uppercase , lowercase ; grc, ἄλφα, ''álpha'', or ell, άλφα, álfa) is the first letter of the Greek alphabet. In the system of Greek numerals, it has a value of one. Alpha is derived from the Phoenician letter aleph , whic ...
and
beta Beta (, ; uppercase , lowercase , or cursive ; grc, βῆτα, bē̂ta or ell, βήτα, víta) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Modern Greek, it represents the voiced labiod ...
globin genes including genes that are expressed in both embryos and adults as well as
pseudogenes Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are ...
. These globin genes in the globin family are all well conserved and only differ by a small portion of the gene, indicating that they were derived from a common ancestral gene, perhaps due to duplication of the initial globin gene. Research has shown that copy number variations are significantly more common in genes that encode proteins that directly interact with the environment than proteins that are involved in basic cellular activities. It was suggested that the gene dosage effect accompanying copy number variation may lead to detrimental effects if essential cellular functions are disrupted, therefore proteins involved in cellular pathways are subjected to strong
purifying selection In natural selection, negative selection or purifying selection is the selective removal of alleles that are deleterious. This can result in stabilising selection through the purging of deleterious genetic polymorphisms that arise through random ...
. In addition, proteins function together and interact with proteins of other pathways, therefore it is important to view the effects of natural selection on bio-molecular pathways rather than on individual proteins. With that being said, it was found that proteins in the periphery of the pathway are enriched in copy number variations whereas proteins in the center of the pathways are depleted in copy number variations. It was explained that proteins in the periphery of the pathway interact with fewer proteins and so a change in protein dosage affected by a change in copy number may have a smaller effect on the overall outcome of the cellular pathway. In the past few years, researchers seem to have shifted their focus from detecting, locating, and sequencing copy number variations to in depth analyses of the role of these copy number variations in the human genome and in nature in general. Evidence is needed to further validate the relationship between copy number variations and gene families as well as the role that natural selection plays in shaping these relationships and changes. Furthermore, researchers are also aiming to elucidate the molecular mechanisms involved in copy number variations as it may reveal essential information regarding structural variations in general. Taking a step back, the area of structural variation in the human genome seems to be a rapidly growing research topic. Not only can these research data provide additional evidence for evolution and natural selection, it can also be used to develop treatments for a wide range of genetic diseases.


See also

* Comparative genomics * Copy number analysis *
Human genome The human genome is a complete set of nucleic acid sequences for humans, encoded as DNA within the 23 chromosome pairs in cell nuclei and in a small DNA molecule found within individual mitochondria. These are usually treated separately as the n ...
* Inparanoid *
Molecular evolution Molecular evolution is the process of change in the sequence composition of cellular molecules such as DNA, RNA, and proteins across generations. The field of molecular evolution uses principles of evolutionary biology and population genetics ...
*
Pseudogenes Pseudogenes are nonfunctional segments of DNA that resemble functional genes. Most arise as superfluous copies of functional genes, either directly by DNA duplication or indirectly by reverse transcription of an mRNA transcript. Pseudogenes are ...
*
Segmental duplication Low copy repeats (LCRs), also known as segmental duplications (SDs), are highly homologous sequence elements within the eukaryotic genome. Repeats The repeats, or duplications, are typically 10–300 kb in length, and bear greater than 95% sequ ...
*
Tandem exon duplication Tandem exon duplication is defined as duplication of exons within the same gene to give rise to the subsequent exon. A complete exon analysis of all genes in ''Homo sapiens'', ''Drosophila melanogaster'', and '' Caenorhabditis elegans'' has shown 1 ...
*
Virtual karyotype Virtual karyotype is the digital information reflecting a karyotype, resulting from the analysis of short sequences of DNA from specific loci all over the genome, which are isolated and enumerated. It detects genomic copy number variations at a hig ...


References


Further reading

* * * * * * * * *


External links


Copy Number Variation Project
Sanger Institute The Wellcome Sanger Institute, previously known as The Sanger Centre and Wellcome Trust Sanger Institute, is a non-profit organisation, non-profit British genomics and genetics research institute, primarily funded by the Wellcome Trust. It is l ...

The Claim: Identical Twins Have Identical DNA

Integrative annotation platform for copy number variations in humans

A bibliography on copy number variation

Database of Genomic Variants
a database of structural variants in the human genome
Copy Number Variation Detection via High-Density SNP Genotyping



BioDiscovery Nexus Copy Number

High-resolution mapping of copy number variations in 2,026 healthy individuals

IGSR: The International Genome Sample Resource


software * ttp://www.bioinf.jku.at/software/cnmops/cnmops.html cn.MOPS: mixture of Poissons for discovering copy number variations in next generation sequencing datasoftware {{DEFAULTSORT:Copy Number Variation Molecular biology Genomics Human evolution