Y Chromosome Haplotype Reference Database
   HOME

TheInfoList



OR:

The Y Chromosome Haplotype Reference Database (YHRD) is an open-access, annotated collection of population samples typed for Y chromosomal sequence variants. Two important objectives are pursued: (1) the generation of reliable frequency estimates for
Y-STR haplotype A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosoma ...
s and Y-SNP haplotypes to be used in the quantitative assessment of matches in forensic and kinship cases and (2) the characterization of male lineages to draw conclusions about the origins and history of human populations. The database is endorsed by the International Society for Forensic Genetics (ISFG). By October 2022 350.500 9-STR locus haplotypes, among them 290.147 17-STR locus haplotypes, 103.631 23-STR locus haplotypes, 106.025 27-STR locus haplotypes and 31.377 Y SNP haplotypes sampled for 141 countries have been directly submitted by forensic institutions and universities. In geographic terms, about 53% of the YHRD samples stem from Asia, 21% from Europe, 12% from North America, 10% from Latin America, 3% from Africa, 0.8% from Oceania/Australia and 0.2% from the Arctic. The 1.403 individual sampling projects are described in more than 780 peer-reviewed publications


Submission and registration

YHRD is built by direct submissions of population data from individual laboratories. Upon receipt of a submission, the YHRD staff examines the originality, relevance, plausibility and validity of the data and assigns an accession number to the population sample if these criteria are met. The submissions are then registered to the public database, where the entries are retrievable by
Search Searching or search may refer to: Computing technology * Search algorithm, including keyword search ** :Search algorithms * Search and optimization for problem solving in artificial intelligence * Search engine technology, software for findi ...
for haplotypes, contributors or accession numbers. All population data published in forensic journals as FSI: Genetics or ''International Journal of Legal Medicine'' are required to be validated by the YHRD custodians and are subsequently included in the YHRD.


Database structure

The database supports the most frequently used haplotype formats (e.g. Minimal (), Powerplex Y12, YFiler, Powerplex Y23, YfilerPlus and Maximal () for which differently-sized databases exist. Because strong correlations exist between geographic areas and Y chromosomal variants, the YHRD population database was structured to display the geographic, linguistic and phylogenetic relationship of searched haplotype profiles. Currently the YHRD database recognizes four separate "metapopulation" structures: national, continental, linguistic/ethnic and phylogenetic affiliation with several categories within. In population genetics the term metapopulation describes discrete spatially distributed population groups which are interconnected by geneflow and migration. By analogy, the term metapopulation is used in forensic genetics to describe a set of geographically dispersed populations with shared ancestry and continuing geneflow. Thus, the population groups are more similar within the metapopulation than to groups outside the metapopulation.


National

The concept of pooling data to build "national databases" has a very straightforward explanation: law enforcement agencies and forensic services rely on their national population to build reference databases. In most instances offenders and victims stem from the national population, and their genetic profiles should thus be represented in the database. In countries like USA, Brazil, UK or China which are characterized by strong population substructure national reference databases are often built on basis of a historical concept of ethnic affiliation, e.g. the US population is sub-structured in Caucasian, African, Hispanic, Asian and Native American populations or UK differentiates English, Afro-Caribbean, Indo-Pakistani and Chinese. National databases due to their importance in national legislation are thus searchable in the YHRD. Each national Metapopulation in the YHRD comprises all individuals sampled in a particular country regardless of the ancestry of the individuals.


Continental

Continental Metapopulations in the YHRD comprises all individuals sampled in a particular continent regardless of their ancestries. The YHRD defines seven continental Metapopulations following the United Nations classification of geographical regions: Africa, Arctic, Asia, Europe, Latin America, North America, Oceania/Australia.


Linguistic/ethnic

The Metapopulation structure built on basis of "ethnicity/linguistic affiliation" takes to a larger extent the ancestry of sampled individuals into account. "Ancestry" is a term collating historical, cultural, geographical and linguistic categories. Of course, a Metapopulation concept on basis of "ethnicity" is by no means ideal, fully rational or fully translatable, but simply takes the fact into account that on a global level categories other than "nation" or "geography" far better describe the observed genetic clustering and inhomogeneity of Y chromosome patterns. For a global reference database the "major language group" criterion seems most appropriate to group data by taking the ancestry into account and produce subdatabases with respect to genetic similarity. The reasoning in doing so is twofold: first, language is an inherited cultural trait and thus the language phylae often correlate with genetic traits not the least to Y chromosome polymorphisms. Second, since languages are well examined by science and mostly understood by the public due to the long tradition of language research, the linguistic terminology is in principal more understandable and translatable into practice than their genetic pendant. Aside from the pure linguistic categorization (e.g. the
Altaic Altaic (; also called Transeurasian) is a controversial proposed language family that would include the Turkic, Mongolic and Tungusic language families and possibly also the Japonic and Koreanic languages. Speakers of these languages are c ...
language family comprising people speaking Turk and Mongol languages) we took also unifying geographic criteria (
Sub-Saharan Africa Sub-Saharan Africa is, geographically, the area and regions of the continent of Africa that lies south of the Sahara. These include West Africa, East Africa, Central Africa, and Southern Africa. Geopolitically, in addition to the List of sov ...
comprising speakers of different
African African or Africans may refer to: * Anything from or pertaining to the continent of Africa: ** People who are native to Africa, descendants of natives of Africa, or individuals who trace their ancestry to indigenous inhabitants of Africa *** Ethn ...
language groups which live south of the
Sahara , photo = Sahara real color.jpg , photo_caption = The Sahara taken by Apollo 17 astronauts, 1972 , map = , map_image = , location = , country = , country1 = , ...
). It is important to state, that the current Metapopulation structure is an a-priori categorization which needs a continuous evaluation and verification by means of statistical methods to quantify the genetic similarity/dissimilarity between the samples. While the current categorization of eight large Metapopulations gains some support from genetic distance analysis done on basis of ~41,000 haplotypes a further subdivision of the "Eurasian – European Metapopulation" was implemented solely on basis of
Y-STR haplotype A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosoma ...
s. The analysis of ~12,000 European Haplotypes by AMOVA demonstrates that three larger pools of European haplotypes exist: the western, eastern and southeastern metapopulations. Currently the YHRD has seven non-overlapping broadly defined metapopulations: African, Afro-Asiatic, Native American, Australian Aboriginal, East Asian, Eskimo-Aleut, and Eurasian. Some of these metapopulations are further subdivided, e.g. Eurasian into six subcategories, from which the European subgroup splits further into three groups of Western, Eastern and Southeastern Europeans.


Phylogenetic

The DNA profiling of Y chromosomes submitted to the YHRD is now continuously extended for binary Y-SNP polymorphisms. The phylogeny of the Y chromosome defined by binary polymorphisms is well established and stable. All Y chromosomes sharing a mutation are related by descent, until a further mutation splits the branch. Haplotypes within a haplogroup could be highly similar or even "identical by descent" (IBD). In thus, the haplogroup could be used as a criterion to substructure the database according to the phylogenetic descent of samples. Even though the chronology of the SNP mutations is far less certain than the structure of the tree, many haplogroups could be equated with events in human prehistory. The worldwide distribution of the patterns of the human Y-chromosome diversity has revealed clear geographically associated haplogroups.


Database tools


AMOVA

Analysis of molecular variance (AMOVA) is a method for analyzing population variation using molecular data, e.g.
Y-STR haplotype A Y-STR is a short tandem repeat (STR) on the Y-chromosome. Y-STRs are often used in forensics, paternity, and genealogical DNA testing. Y-STRs are taken specifically from the male Y chromosome. These Y-STRs provide a weaker analysis than autosoma ...
s. With AMOVA it is possible to evaluate and quantify the extent of differentiation between two or more population samples. AMOVA is implemented as an online tool in the YHRD and provides a way of estimating ΦST and FST values. The online tool accepts Excel files and creates entry files from it. As much as 9 reference populations selected from the YHRD as well as population sets can be added to the AMOVA analysis. The online calculation returns as a result a *.csv table with pairwise ''F''ST or ΦST(''R''ST) values plus p-values as a test for significance (10,000 permutations). In addition, an
MDS plot Multidimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate "information about the pairwise 'distances' among a set of n objects or individuals" into a configurati ...
is generated to illustrate the genetic distance between the analyzed populations graphically. The program shows the references for the selected population studies which facilitates the correct citation.


Mixture

The tool can be applied for forensic cases when a mixed trace (2 or more male contributors) should be analyzed. The result will be a likelihood ratio of donorship vs. non-donorship of the putative contributor to the trace.


Kinship

The tool can be applied for kinship cases when a relationship between upstream and downstream relatives (e.g. father-son or grandfather-grandson) should be analyzed. The result will be a likelihood ratio (or kinship index) of patrilineal relationship vs. patrilineal non-relationship of the analyzed persons.


Match statistics

Searching the YHRD will result in a match or a non-match between a searched haplotype and the databased reference samples. The relative number of matches is described as the profile frequency. In forensic casework the probability of a match which is based on the profile frequency is evaluated using different methods. Some of these are recommended by national guidelines, e.g. the augmented counting method with confidence intervals and/or theta subpopulation correction (SWGDAM Interpretation Guidelines for Y-Chromosome STR typing by Forensic Laboratories in the US, 2014) or the Discrete Laplace method (Andersen et al. 2013) as recommended in Germany (Willuweit et al. 2018). Both augmented counting and DL values are provided by the YHRD for different metapopulations.


Releases


See also

*
Y chromosome The Y chromosome is one of two sex chromosomes (allosomes) in therian mammals, including humans, and many other animals. The other is the X chromosome. Y is normally the sex-determining chromosome in many species, since it is the presence or abse ...
*
Population genetics Population genetics is a subfield of genetics that deals with genetic differences within and between populations, and is a part of evolutionary biology. Studies in this branch of biology examine such phenomena as adaptation, speciation, and pop ...
*
DNA profiling DNA profiling (also called DNA fingerprinting) is the process of determining an individual's DNA characteristics. DNA analysis intended to identify a species, rather than an individual, is called DNA barcoding. DNA profiling is a forensic tec ...
*
Short tandem repeat A microsatellite is a tract of repetitive DNA in which certain DNA motifs (ranging in length from one to six or more base pairs) are repeated, typically 5–50 times. Microsatellites occur at thousands of locations within an organism's genome ...
*
Single-nucleotide polymorphism In genetics, a single-nucleotide polymorphism (SNP ; plural SNPs ) is a germline substitution of a single nucleotide at a specific position in the genome. Although certain definitions require the substitution to be present in a sufficiently lar ...
*
List of online databases This is a list of online databases accessible via the Internet. A * Abandoned & Little-Known Airfields * Academic OneFile * Acronym Finder * Aeiou Encyclopedia * Airiti Inc * Airliners.net * All Media Guide * Allgame (down) * Allmovie * Al ...


References

{{reflist, 33em


External links


YHRD.orgwww.ISFG.org
Biology websites Forensic genetics Genetics databases Human population genetics Human Y-DNA haplogroups Online databases Population genetics organizations