The International HapMap Project was an organization that aimed to develop a
haplotype
A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent.
Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
map (HapMap) of the
human genome
The human genome is a complete set of nucleic acid sequences for humans, encoded as the DNA within each of the 23 distinct chromosomes in the cell nucleus. A small DNA molecule is found within individual Mitochondrial DNA, mitochondria. These ar ...
, to describe the common patterns of human
genetic variation. HapMap is used to find genetic variants affecting health, disease and responses to drugs and environmental factors. The information produced by the project is made freely available for research.
The International HapMap Project is a collaboration among researchers at academic centers, non-profit biomedical research groups and private companies in
Canada
Canada is a country in North America. Its Provinces and territories of Canada, ten provinces and three territories extend from the Atlantic Ocean to the Pacific Ocean and northward into the Arctic Ocean, making it the world's List of coun ...
,
China
China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after ...
(including
Hong Kong
Hong Kong)., Legally Hong Kong, China in international treaties and organizations. is a special administrative region of China. With 7.5 million residents in a territory, Hong Kong is the fourth most densely populated region in the wor ...
),
Japan
Japan is an island country in East Asia. Located in the Pacific Ocean off the northeast coast of the Asia, Asian mainland, it is bordered on the west by the Sea of Japan and extends from the Sea of Okhotsk in the north to the East China Sea ...
,
Nigeria
Nigeria, officially the Federal Republic of Nigeria, is a country in West Africa. It is situated between the Sahel to the north and the Gulf of Guinea in the Atlantic Ocean to the south. It covers an area of . With Demographics of Nigeria, ...
, the
United Kingdom
The United Kingdom of Great Britain and Northern Ireland, commonly known as the United Kingdom (UK) or Britain, is a country in Northwestern Europe, off the coast of European mainland, the continental mainland. It comprises England, Scotlan ...
, and the
United States
The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic of 50 U.S. state, states and a federal capital district, Washington, D.C. The 48 ...
. It officially started with a meeting on October 27 to 29, 2002, and was expected to take about three years. It comprises three phases; the complete data obtained in Phase I were published on 27 October 2005. The analysis of the Phase II dataset was published in October 2007. The Phase III dataset was released in spring 2009 and the publication presenting the final results published in September 2010.
Background
Unlike with the
rarer Mendelian diseases, combinations of different
genes
In biology, the word gene has two meanings. The Mendelian gene is a basic unit of heredity. The molecular gene is a sequence of nucleotides in DNA that is transcribed to produce a functional RNA. There are two types of molecular genes: protei ...
and the environment play a role in the development and progression of common diseases (such as
diabetes,
cancer
Cancer is a group of diseases involving Cell growth#Disorders, abnormal cell growth with the potential to Invasion (cancer), invade or Metastasis, spread to other parts of the body. These contrast with benign tumors, which do not spread. Po ...
,
heart disease
Cardiovascular disease (CVD) is any disease involving the heart or blood vessels. CVDs constitute a class of diseases that includes: coronary artery diseases (e.g. angina pectoris, angina, myocardial infarction, heart attack), heart failure, ...
,
stroke
Stroke is a medical condition in which poor cerebral circulation, blood flow to a part of the brain causes cell death. There are two main types of stroke: brain ischemia, ischemic, due to lack of blood flow, and intracranial hemorrhage, hemor ...
,
depression, and
asthma
Asthma is a common long-term inflammatory disease of the airways of the lungs. It is characterized by variable and recurring symptoms, reversible airflow obstruction, and easily triggered bronchospasms. Symptoms include episodes of wh ...
), or in the individual response to
pharmacological agents. To find the genetic factors involved in these diseases, one could in principle do a
genome-wide association study: obtain the complete genetic sequence of several individuals, some with the disease and some without, and then search for differences between the two sets of genomes. At the time, this approach was not feasible because of the cost of
full genome sequencing. The HapMap project proposed a shortcut.
Although any two unrelated people share about 99.5% of their
DNA sequence, their
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
s differ at specific
nucleotide
Nucleotides are Organic compound, organic molecules composed of a nitrogenous base, a pentose sugar and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both o ...
locations. Such sites are known as
single nucleotide polymorphisms (SNPs), and each of the possible resulting gene forms is called an
allele
An allele is a variant of the sequence of nucleotides at a particular location, or Locus (genetics), locus, on a DNA molecule.
Alleles can differ at a single position through Single-nucleotide polymorphism, single nucleotide polymorphisms (SNP), ...
.
The HapMap project focuses only on common SNPs, those where each allele occurs in at least 1% of the population.
Each person has two copies of all
chromosomes, except the
sex chromosomes in
male
Male (Planet symbols, symbol: ♂) is the sex of an organism that produces the gamete (sex cell) known as sperm, which fuses with the larger female gamete, or Egg cell, ovum, in the process of fertilisation. A male organism cannot sexual repro ...
s. For each SNP, the combination of alleles a person has is called a
genotype
The genotype of an organism is its complete set of genetic material. Genotype can also be used to refer to the alleles or variants an individual carries in a particular gene or genetic location. The number of alleles an individual can have in a ...
.
Genotyping refers to uncovering what genotype a person has at a particular site. The HapMap project chose a sample of 269 individuals and selected several million well-defined SNPs, genotyped the individuals for these SNPs, and published the results.
The alleles of nearby SNPs on a single chromosome are correlated. Specifically, if the allele of one SNP for a given individual is known, the alleles of nearby SNPs can often be predicted, a process known as ''genotype imputation''.
This is because each SNP arose in evolutionary history as a single point
mutation
In biology, a mutation is an alteration in the nucleic acid sequence of the genome of an organism, virus, or extrachromosomal DNA. Viral genomes contain either DNA or RNA. Mutations result from errors during DNA or viral replication, ...
, and was then passed down on the chromosome surrounded by other, earlier, point mutations. SNPs that are separated by a large distance on the chromosome are typically not very well correlated, because
recombination occurs in each generation and mixes the allele sequences of the two chromosomes. A sequence of consecutive alleles on a particular chromosome is known as a
haplotype
A haplotype (haploid genotype) is a group of alleles in an organism that are inherited together from a single parent.
Many organisms contain genetic material (DNA) which is inherited from two parents. Normally these organisms have their DNA orga ...
.
To find the genetic factors involved in a particular disease, one can proceed as follows. First a certain region of interest in the genome is identified, possibly from earlier inheritance studies. In this region one locates a set of
tag SNPs from the HapMap data; these are SNPs that are very well correlated with all the other SNPs in the region. Using these, genotype imputation can be used to determine (impute) the other SNPs and thus the entire haplotype with high confidence. Next, one determines the genotype for these tag SNPs in several individuals, some with the disease and some without. By comparing the two groups, one determines the likely locations and haplotypes that are involved in the disease.
Samples used
Haplotypes are generally shared between populations, but their frequency can differ widely. Four populations were selected for inclusion in the HapMap: 30 adult-and-both-parents
Yoruba trios from
Ibadan
Ibadan (, ; ) is the Capital city, capital and most populous city of Oyo State, in Nigeria. It is the List of Nigerian cities by population, third-largest city by population in Nigeria after Lagos and Kano (city), Kano, with a total populatio ...
,
Nigeria
Nigeria, officially the Federal Republic of Nigeria, is a country in West Africa. It is situated between the Sahel to the north and the Gulf of Guinea in the Atlantic Ocean to the south. It covers an area of . With Demographics of Nigeria, ...
(YRI), 30 trios of Utah residents of northern and western
European ancestry (CEU), 44 unrelated Japanese individuals from
Tokyo
Tokyo, officially the Tokyo Metropolis, is the capital of Japan, capital and List of cities in Japan, most populous city in Japan. With a population of over 14 million in the city proper in 2023, it is List of largest cities, one of the most ...
,
Japan
Japan is an island country in East Asia. Located in the Pacific Ocean off the northeast coast of the Asia, Asian mainland, it is bordered on the west by the Sea of Japan and extends from the Sea of Okhotsk in the north to the East China Sea ...
(JPT) and 45 unrelated
Han Chinese
The Han Chinese, alternatively the Han people, are an East Asian people, East Asian ethnic group native to Greater China. With a global population of over 1.4 billion, the Han Chinese are the list of contemporary ethnic groups, world's la ...
individuals from
Beijing
Beijing, Chinese postal romanization, previously romanized as Peking, is the capital city of China. With more than 22 million residents, it is the world's List of national capitals by population, most populous national capital city as well as ...
,
China
China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after ...
(CHB). Although the haplotypes revealed from these populations should be useful for studying many other populations, parallel studies are currently examining the usefulness of including additional populations in the project.
All samples were collected through a community engagement process with appropriate informed consent. The community engagement process was designed to identify and attempt to respond to culturally specific concerns and give participating communities input into the informed consent and sample collection processes.
In phase III, 11 global ancestry groups have been assembled: ASW (African ancestry in Southwest USA); CEU (Utah residents with Northern and Western European ancestry from the CEPH collection); CHB (Han Chinese in Beijing, China); CHD (Chinese in Metropolitan Denver, Colorado); GIH (Gujarati Indians in Houston, Texas); JPT (Japanese in Tokyo, Japan); LWK (Luhya in Webuye, Kenya); MEX (Mexican ancestry in Los Angeles, California); MKK (Maasai in Kinyawa, Kenya); TSI (Tuscans in Italy); YRI (Yoruba in Ibadan, Nigeria).
[International HapMap consortium et al. (2010). Integrating common and rare genetic variation in diverse human populations. ''Nature'', 467, 52-8]
doi
/ref>
Three combined panels have also been created, which allow better identification of SNPs in groups outside the nine homogenous samples: CEU+TSI (Combined panel of Utah residents with Northern and Western European ancestry from the CEPH collection and Tuscans in Italy); JPT+CHB (Combined panel of Japanese in Tokyo, Japan and Han Chinese in Beijing, China) and JPT+CHB+CHD (Combined panel of Japanese in Tokyo, Japan, Han Chinese in Beijing, China and Chinese in Metropolitan Denver, Colorado). CEU+TSI, for instance, is a better model of UK British individuals than is CEU alone.
Scientific strategy
It was expensive in the 1990s to sequence patients’ whole genomes. So the National Institutes of Health
The National Institutes of Health (NIH) is the primary agency of the United States government responsible for biomedical and public health research. It was founded in 1887 and is part of the United States Department of Health and Human Service ...
embraced the idea for a "shortcut", which was to look just at sites on the genome where many people have a variant DNA unit. The theory behind the shortcut was that, since the major diseases are common, so too would be the genetic variants that caused them. Natural selection
Natural selection is the differential survival and reproduction of individuals due to differences in phenotype. It is a key mechanism of evolution, the change in the Heredity, heritable traits characteristic of a population over generation ...
keeps the human genome free of variants that damage health before children are grown, the theory held, but fails against variants that strike later in life, allowing them to become quite common (In 2002 the National Institutes of Health
The National Institutes of Health (NIH) is the primary agency of the United States government responsible for biomedical and public health research. It was founded in 1887 and is part of the United States Department of Health and Human Service ...
started a $138 million project called the HapMap to catalog the common variants in European, East Asian and African genomes).
For the Phase I, one common SNP was genotyped every 5,000 bases. Overall, more than one million SNPs were genotyped. The genotyping was carried out by 10 centres using five different genotyping technologies. Genotyping quality was assessed by using duplicate or related samples and by having periodic quality checks where centres had to genotype common sets of SNPs.
The Canadian team was led by Thomas J. Hudson at McGill University in Montreal
Montreal is the List of towns in Quebec, largest city in the Provinces and territories of Canada, province of Quebec, the List of the largest municipalities in Canada by population, second-largest in Canada, and the List of North American cit ...
and focused on chromosomes 2 and 4p. The Chinese team was led by Huanming Yang in Beijing
Beijing, Chinese postal romanization, previously romanized as Peking, is the capital city of China. With more than 22 million residents, it is the world's List of national capitals by population, most populous national capital city as well as ...
and Shanghai
Shanghai, Shanghainese: , Standard Chinese pronunciation: is a direct-administered municipality and the most populous urban area in China. The city is located on the Chinese shoreline on the southern estuary of the Yangtze River, with the ...
, and Lap-Chee Tsui in Hong Kong
Hong Kong)., Legally Hong Kong, China in international treaties and organizations. is a special administrative region of China. With 7.5 million residents in a territory, Hong Kong is the fourth most densely populated region in the wor ...
and focused on chromosomes 3, 8p and 21. The Japanese team was led by Yusuke Nakamura at the University of Tokyo and focused on chromosomes 5, 11, 14, 15, 16, 17 and 19. The British team was led by David R. Bentley at the Sanger Institute and focused on chromosomes 1, 6, 10, 13 and 20. There were four United States' genotyping centres: a team led by Mark Chee and Arnold Oliphant at Illumina Inc. in San Diego
San Diego ( , ) is a city on the Pacific coast of Southern California, adjacent to the Mexico–United States border. With a population of over 1.4 million, it is the List of United States cities by population, eighth-most populous city in t ...
(studying chromosomes 8q, 9, 18q, 22 and X), a team led by David Altshuler and Mark Daly at the Broad Institute in Cambridge, USA (chromosomes 4q, 7q, 18p, Y and mitochondrion
A mitochondrion () is an organelle found in the cell (biology), cells of most eukaryotes, such as animals, plants and fungi. Mitochondria have a double lipid bilayer, membrane structure and use aerobic respiration to generate adenosine tri ...
), a team led by Richard Gibbs at the Baylor College of Medicine in Houston
Houston ( ) is the List of cities in Texas by population, most populous city in the U.S. state of Texas and in the Southern United States. Located in Southeast Texas near Galveston Bay and the Gulf of Mexico, it is the county seat, seat of ...
(chromosome 12), and a team led by Pui-Yan Kwok at the University of California, San Francisco (chromosome 7p).
To obtain enough SNPs to create the Map, the Consortium funded a large re-sequencing project to discover millions of additional SNPs. These were submitted to the public dbSNP database. As a result, by August 2006, the database included more than ten million SNPs, and more than 40% of them were known to be polymorphic. By comparison, at the start of the project, fewer than 3 million SNPs were identified, and no more than 10% of them were known to be polymorphic.
During Phase II, more than two million additional SNPs were genotyped throughout the genome by David R. Cox, Kelly A. Frazer and others at Perlegen Sciences and 500,000 by the company Affymetrix.
Data access
All of the data generated by the project, including SNP frequencies, genotypes and haplotypes, were placed in the public domain and are available for download. This website also contains a genome browser which allows to find SNPs in any region of interest, their allele frequencies and their association to nearby SNPs. A tool that can determine tag SNPs for a given region of interest is also provided. These data can also be directly accessed from the widely used Haploview program.
Publications
*
*
*
*
*
*
*
*
* Secko, David (2005)
"Phase I of the HapMap Complete"
. The Scientist
See also
* Genealogical DNA test
* The 1000 Genomes Project
* Population groups in biomedicine
* Human Variome Project
* Human genetic variation
References
External links
International HapMap Project (HapMap Homepage)
National Human Genome Research Institute (NHGRI) HapMap Page
Browsing HapMap Data Using the Genome Browser
The Mexican Genome Diversity Project
{{Authority control
Human genome projects
Genetic genealogy projects
Genealogy websites
Biological databases
Open science
Single-nucleotide polymorphisms