Phylogenetic Assignment of Named Global Outbreak Lineages
   HOME

TheInfoList



OR:

The Phylogenetic Assignment of Named Global Outbreak Lineages (PANGOLIN) is a software tool developed by Dr. Áine O'Toole and members of the Andrew Rambaut laboratory, with an associated web application developed by the Centre for Genomic Pathogen Surveillance in
South Cambridgeshire South Cambridgeshire is a local government district of Cambridgeshire, England, with a population of 162,119 at the 2021 census. It was formed on 1 April 1974 by the merger of Chesterton Rural District and South Cambridgeshire Rural District. I ...
. Its purpose is to implement a dynamic nomenclature (known as the PANGO nomenclature) to classify genetic lineages for
SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19 (coronavirus disease 2019), the respiratory illness responsible for the ongoing COVID-19 pandemic. The virus previously had a ...
, the virus that causes
COVID-19 Coronavirus disease 2019 (COVID-19) is a contagious disease caused by a virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was COVID-19 pandemic in Hubei, identified in Wuhan, China, in December ...
. A user with a full genome sequence of a sample of SARS-CoV-2 can use the tool to submit that sequence, which is then compared with other genome sequences, and assigned the most likely lineage (PANGO lineage). Single or multiple runs are possible, and the tool can return further information regarding the known history of the assigned lineage. Additionally, it interfaces with Microreact, to show a time sequence of the location of reports of sequenced samples of the same lineage. This latter feature draws on publicly available genomes obtained from the COVID-19 Genomics UK Consortium and from those submitted to
GISAID GISAID (Global Initiative on Sharing Avian Influenza Data) is a global science initiative and primary source established in 2008 that provides open access to genomic data of influenza viruses and the coronavirus responsible for the COVID-19 pan ...
. It is named after the
pangolin Pangolins, sometimes known as scaly anteaters, are mammals of the order Pholidota (, from Ancient Greek ϕολιδωτός – "clad in scales"). The one extant family, the Manidae, has three genera: '' Manis'', '' Phataginus'', and '' Smut ...
.


Context

PANGOLIN is a key component underpinning the PANGO nomenclature system. As described in Andrew Rambaut et al. (2020), a PANGO Lineage is described as a cluster of sequences that are associated with an epidemiological event, for instance an introduction of the virus into a distinct geographic area with evidence of onward spread. Lineages are designed to capture the emerging edge of the pandemic and are at a fine-grain resolution suitable to genomic
epidemiological surveillance Public health surveillance (also epidemiological surveillance, clinical surveillance or syndromic surveillance) is, according to the World Health Organization (WHO), "the continuous, systematic collection, analysis and interpretation of health-relat ...
and outbreak investigation. Both the tool and the PANGOLIN nomenclature system have been used extensively during the
COVID-19 pandemic The COVID-19 pandemic, also known as the coronavirus pandemic, is an ongoing global pandemic of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The novel virus was first identif ...
.


Description


Lineage designation

Distinct from the PANGOLIN tool, Pango lineages are regularly, manually curated based on the current globally circulating diversity. A large
phylogenetic tree A phylogenetic tree (also phylogeny or evolutionary tree Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA.) is a branching diagram or a tree showing the evolutionary relationships among various biological spec ...
is constructed from an alignment containing publicly available SARS-CoV-2 genomes, and sub-clusters of sequences in this tree are manually examined and cross-referenced against epidemiological information to designate new lineages; these can be designated by data producers, and lineage suggestions can be submitted to the Pango team via a
GitHub GitHub, Inc. () is an Internet hosting service for software development and version control using Git. It provides the distributed version control of Git plus access control, bug tracking, software feature requests, task management, continuous ...
issue request.


Model training

These manually curated lineage designations, and the associated genome sequences, are the input into the machine learning model training. This model, both the training and the assignment, has been termed 'pangoLEARN'. The current version of pangoLEARN uses a classification tree, based on the
scikit-learn scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification, regression and clustering algorithms including support-vector ...
implementation of a decision tree classifier.


Lineage assignation

Originally, PANGOLIN used a maximum-likelihood-based assignment algorithm to assign query SARS-CoV-2 the most likely lineage sequence. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. This approach is fast and can assign large numbers of SARS-CoV-2 genomes in a relatively short time.


Availability

PANGOLIN is available as a
command-line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
-based tool, downloadable from Conda and from a GitHub repository, and as a web-application with a drag-and-drop graphical user interface. The PANGOLIN web application has assigned more that 512,000 unique SARS-CoV-2 sequences as of January 2021.


Creators and developers

PANGOLIN was created by Áine O'Toole and the Rambaut lab and released on 5 April 2020. The main developers of PANGOLIN are Áine O'Toole and Emily Scher; many others have contributed to various aspects of the tool, including Ben Jackson, J.T. McCrone, Verity Hill, and Rachel Colquhoun of the Rambaut Lab. The PANGOLIN web application was developed by the Centre for Genomic Pathogen Surveillance, namely Anthony Underwood, Ben Taylor, Corin Yeats, Khali Abu-Dahab, and David Aanensen.


See also

*
Variants of SARS-CoV-2 There are many variants of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus that causes coronavirus disease 2019 (COVID-19). Some are believed, or have been stated, to be of particular importance due to their potential ...
*
Nextstrain Nextstrain is a collaboration between researchers in Seattle, United States and Basel, Switzerland which provides a collection of open-source tools for visualising the genetics behind the spread of viral outbreaks. Its aim is to support public he ...


References


External links

* * {{Official website Phylogenetics software Genome databases Medical software