Multilocus sequence typing (MLST) is a technique in

molecular biology Molecular biology is the branch of biology that seeks to understand the molecular basis of biological activity in and between cells, including biomolecular synthesis, modification, mechanisms, and interactions. The study of chemical and phys ...

for the typing of multiple loci, using

DNA sequences A nucleic acid sequence is a succession of bases signified by a series of a set of five different letters that indicate the order of nucleotides forming alleles within a DNA (using GACT) or RNA (GACU) molecule. By convention, sequences are us ...

of internal fragments of multiple

housekeeping gene In molecular biology, housekeeping genes are typically constitutive genes that are required for the maintenance of basic cellular function, and are expressed in all cells of an organism under normal and patho-physiological conditions. Although ...

s to characterize isolates of microbial species. The first MLST scheme to be developed was for ''

Neisseria meningitidis ''Neisseria meningitidis'', often referred to as meningococcus, is a Gram-negative bacterium that can cause meningitis and other forms of meningococcal disease such as meningococcemia, a life-threatening sepsis. The bacterium is referred to as a ...

'', the causative agent of meningococcal

meningitis Meningitis is acute or chronic inflammation of the protective membranes covering the brain and spinal cord, collectively called the meninges. The most common symptoms are fever, headache, and neck stiffness. Other symptoms include confusion or ...

and

septicaemia Sepsis, formerly known as septicemia (septicaemia in British English) or blood poisoning, is a life-threatening condition that arises when the body's response to infection causes injury to its own tissues and organs. This initial stage is follo ...

. Since its introduction for the research of evolutionary history, MLST has been used not only for human pathogens but also for plant pathogens.

Principle

MLST directly measures the DNA sequence variations in a set of housekeeping genes and characterizes strains by their unique allelic profiles. The principle of MLST is simple: the technique involves PCR amplification followed by

DNA sequencing DNA sequencing is the process of determining the nucleic acid sequence – the order of nucleotides in DNA. It includes any method or technology that is used to determine the order of the four bases: adenine, guanine, cytosine, and thymine. T ...

. Nucleotide differences between strains can be checked at a variable number of genes depending on the degree of discrimination desired. The workflow of MLST involves: 1) data collection, 2) data analysis and 3) multilocus sequence analysis. In the data collection step, definitive identification of variation is obtained by nucleotide sequence determination of gene fragments. In the data analysis step, all unique sequences are assigned allele numbers and combined into an allelic profile and assigned a sequence type (ST). If new alleles and STs are found, they are stored in the database after verification. In the final analysis step of MLST, the relatedness of isolates are made by comparing allelic profiles. Researchers do epidemiological and phylogenetical studies by comparing STs of different clonal complexes. A huge set of data is produced during the sequencing and identification process so bioinformatic techniques are used to arrange, manage, analyze and merge all of the biological data. To strike the balance between the acceptable identification power, time and cost for the strain typing, about seven to eight house-keeping genes are commonly used in the laboratories. Quoting ''

Staphylococcus aureus ''Staphylococcus aureus'' is a Gram-positive spherically shaped bacterium, a member of the Bacillota, and is a usual member of the microbiota of the body, frequently found in the upper respiratory tract and on the skin. It is often posit ...

'' as an example, seven housekeeping genes are used in MLST typing. These genes include carbamate kinase (''arcC''),

shikimate dehydrogenase In enzymology, a shikimate dehydrogenase () is an enzyme that catalyzes the chemical reaction :shikimate + NADP+ \rightleftharpoons 3-dehydroshikimate + NADPH + H+ Thus, the two substrates of this enzyme are shikimate and NADP+, whereas its ...

(''aroE''),

glycerol kinase Glycerol kinase, encoded by the gene ''GK'', is a phosphotransferase enzyme involved in triglycerides and glycerophospholipids synthesis. Glycerol kinase catalyzes the transfer of a phosphate from ATP to glycerol thus forming glycerol 3-phosph ...

(''glpF''), guanylate kinase (''gmk''), phosphate acetyltransferase (''pta''),

triosephosphate isomerase Triose-phosphate isomerase (TPI or TIM) is an enzyme () that catalyzes the reversible interconversion of the triose phosphate isomers dihydroxyacetone phosphate and D-glyceraldehyde 3-phosphate. TPI plays an important role in glycolysis and i ...

(''tpi'') and acetyl coenzyme A acetyltransferase (''yqiL'') as specified by the MLST website. However, it is not uncommon for up to ten housekeeping genes to be used. For '' Vibrio vulnificus'', the housekeeping genes used are glucose-6-phosphate isomerase (''glp''), DNA gyrase, subunit B (''

gyrB DNA gyrase, or simply gyrase, is an enzyme within the class of topoisomerase and is a subclass of Type II topoisomerases that reduces topological strain in an ATP dependent manner while double-stranded DNA is being unwound by elongating RNA-poly ...

''), malate-lactate dehydrogenase (''mdh''), methionyl-tRNA synthetase (''metG''), phosphoribosylaminoimidazole synthetase (''purM''), threonine dehydrogenase (''dtdS''), diaminopimelate decarboxylase (''lysA''), transhydrogenase alpha subunit (''pntA''), dihydroorotase (''pyrC'') and tryptophanase (''tnaA''). Thus both the number and type of housekeeping genes interrogated by MLST may differ from species to species. For each of these housekeeping genes, the different sequences are assigned as alleles and the alleles at the loci provide an allelic profile. A series of profiles can then be the identification marker for strain typing. Sequences that differ at even a single nucleotide are assigned as different alleles and no weighting is given to take into account the number of nucleotide differences between alleles, as we cannot distinguish whether differences at multiple nucleotide sites are a result of multiple point mutations or a single recombinational exchange. The large number of potential alleles at each of the loci provides the ability to distinguish billions of different allelic profiles, and a strain with the most common allele at each locus would only be expected to occur by chance approximately once in 10,000 isolates. Despite MLST providing high discriminatory power, the accumulation of nucleotide changes in housekeeping genes is a relatively slow process and the allelic profile of a bacterial isolate is sufficiently stable over time for the method to be ideal for global epidemiology. The relatedness of isolates is displayed as a

dendrogram A dendrogram is a diagram representing a tree. This diagrammatic representation is frequently used in different contexts: * in hierarchical clustering, it illustrates the arrangement of the clusters produced by the corresponding analyses. ...

constructed using the matrix of pairwise differences between their allelic profiles
eBURST
or a

minimum spanning tree A minimum spanning tree (MST) or minimum weight spanning tree is a subset of the edges of a connected, edge-weighted undirected graph that connects all the vertices together, without any cycles and with the minimum possible total edge weight. ...

(MST). The dendrogram is only a convenient way of displaying those isolates that have identical or very similar allelic profiles that can be assumed to be derived from a common ancestor; the relationships between isolates that differ at more than three out of seven loci are likely to be unreliable and should not be taken to infer their phylogeny. The MST connects all samples in such a way that the summed distance of all branches of the tree is minimal. Alternatively, the relatedness of isolates can also be analysed with MultiLocus Sequence Analysis (MLSA). This does not use the assigned alleles, but instead concatenates the sequences of the gene fragments of the housekeeping genes and uses this concatenated sequence to determine phylogenetic relationships. In contrast to MLST, this analysis does assign a higher similarity between sequences differing only a single nucleotide and a lower similarity between sequences with multiple nucleotide differences. As a result, this analysis is more suitable for organisms with a clonal evolution and less suitable for organisms in which recombinational events occur very often. It can also be used to determine phylogenetic relationships between closely related species. The terms MLST and MLSA are very often considered interchangeable. This is however not correct as each analysis method has its distinctive features and uses. Care should be taken to use the correct term.

Comparison with other techniques

Earlier serological typing approaches had been established for differentiating bacterial isolates, but immunological typing has drawbacks such as reliance on few antigenic loci and unpredictable reactivities of antibodies with different antigenic variants. Several molecular typing schemes have been proposed to determine the relatedness of pathogens such as pulsed-field gel electrophoresis ( PFGE), ribotyping, and PCR-based fingerprinting. But these DNA banding-based subtyping methods do not provide meaningful evolutionary analyses. Despite PFGE being considered by many researchers as the “gold standard”, many strains are not typable by this technique due to the degradation of the DNA during the process (gel smears). The approach of MLST is distinct from Multi locus enzyme electrophoresis (MLEE), which is based on different electrophoretic mobilities (EM) of multiple core metabolic enzymes. The alleles at each locus define the EM of their products, as different amino acid sequences between enzymes result in different mobilities and distinct bands when run on a gel. The relatedness of isolates can then be visualized with a dendrogram generated from the matrix of pairwise differences between the electrophoretic types. This method has a lower resolution than MLST for several reasons, all arising from the fact that enzymatic phenotype diversity is merely a proxy for DNA sequence diversity. First, enzymes may have different amino acid sequences without having sufficiently different EM to give distinct bands. Second, "silent mutations" may alter the DNA sequence of a gene without altering the encoded amino acids. Thirdly, the phenotype of the enzyme can easily be altered in response to environmental conditions and badly affect the reproducibility of MLEE results - common modifications of enzymes are phosphorylation, cofactor binding and cleavage of transport sequences. This also limits comparability of MLEE data obtained by different laboratories, whereas MLST provides portable and comparable DNA sequence data and has great potential for automation and standardization. MLST should not be confused with

DNA barcoding DNA barcoding is a method of species identification using a short section of DNA from a specific gene or genes. The premise of DNA barcoding is that by comparison with a reference library of such DNA sections (also called " sequences"), an indi ...

. The latter is a taxonomic method that uses short genetic markers to recognize particular species of eukaryotes. It is based on the fact that

mitochondrial DNA Mitochondrial DNA (mtDNA or mDNA) is the DNA located in mitochondria, cellular organelles within eukaryotic cells that convert chemical energy from food into a form that cells can use, such as adenosine triphosphate (ATP). Mitochondrial D ...

(mtDNA) or some parts of the

ribosomal DNA Ribosomal DNA (rDNA) is a DNA sequence that codes for ribosomal RNA. These sequences regulate transcription initiation and amplification, and contain both transcribed and non-transcribed spacer segments. In the human genome there are 5 chro ...

cistron have relatively fast mutation rates, which give significant variation in sequences between species. mtDNA methods are only possible in eukaryotes (as prokaryotes lack mitochondria), whereas MLST, although initially developed for prokaryotes, is now finding application in eukaryotes and in principle could be applied to any kingdom.

Advantages and applications

MLST is highly unambiguous and portable. Materials required for ST determination can be exchanged between laboratories. Primer sequences and protocols can be accessed electronically. It is reproducible and scalable. MLST is automated, combines advances in high throughput sequencing and bioinformatics with established population genetics techniques. MLST data can be used to investigate evolutionary relationships among bacteria. MLST provides good discriminatory power to differentiate isolates. The application of MLST is huge, and provides a resource for the scientific, public health, and veterinary communities as well as the food industry. The following are examples of MLST applications.

''Campylobacter''

Campylobacter ''Campylobacter'' (meaning "curved bacteria") is a genus of Gram-negative bacteria. ''Campylobacter'' typically appear comma- or s-shaped, and are motile. Some ''Campylobacter'' species can infect humans, sometimes causing campylobacteriosis, ...

'' is the common causative agent for bacterial infectious intestinal diseases, usually arising from undercooked poultry or unpasteurised milk. However, its epidemiology is poorly understood since outbreaks are rarely detected, so that the sources and transmission routes of outbreak are not easily traced. In addition, ''Campylobacter'' genomes are genetically diverse and unstable with frequent inter- and intragenomic recombination, together with phase variation, which complicates the interpretation of data from many typing methods. Until recently, with the application of MLST technique, ''Campylobacter'' typing has achieved a great success and added onto the MLST database. As at 1 May 2008, the ''Campylobacter'' MLST database contains 3516 isolates and about 30 publications that use or mention MLST in research on ''Campylobacter'' (http://pubmlst.org/campylobacter/).

''Neisseria meningitidis''

MLST has provided a more richly textured picture of bacteria within human populations and on strain variants that may be pathogenic to human, plants and animals. MLST technique was first used by Maiden et al. (1) to characterize ''

'' using six loci. The application of MLST has clearly resolved the major meningococcal lineages known to be responsible for invasive disease around the world. To improve the level of discriminatory power between the major invasive lineages, seven loci are now being used and have been accepted by many laboratories as the method of choice for characterizing meningococcal isolates. It is a well known fact that recombinational exchanges commonly occur in ''N. meningitidis'', leading to rapid diversification of meningococcal clones. MLST has successfully provided a reliable method for characterization of clones within other bacterial species in which the rates of clonal diversification are generally lower.

''Staphylococcus aureus''

'' S. aureus'' causes a number of diseases. Methicillin-resistant ''S. aureus'' ( MRSA) has generated growing concerns over its resistance to almost all antibiotics except vancomycin. However, most serious ''S. aureus'' infections in the community, and many in hospitals, are caused by methicillin-susceptible isolates (MSSA) and there have been few attempts to identify the hypervirulent MSSA clones associated with serious disease. MLST was therefore developed to provide an unambiguous method of characterizing MRSA clones and for the identification of the MSSA clones associated with serious disease.

''Streptococcus pyogenes''

S. pyogenes ''Streptococcus pyogenes'' is a species of Gram-positive, aerotolerant bacteria in the genus ''Streptococcus''. These bacteria are extracellular, and made up of non-motile and non-sporing cocci (round cells) that tend to link in chains. They ar ...

'' causes diseases ranging from pharyngitis to life-threatening impetigo including necrotizing fasciitis. An MLST scheme for ''S. pyogenes'' has been developed. At present, the database
mlst.net
contains the allelic profiles of isolates that represent the worldwide diversity of the organism and isolates from serious invasive disease.

''Candida albicans''

'' C. albicans'' is a fungal pathogen of humans and is responsible for hospital-acquired bloodstream infections. MLST technique has used to characterize ''C. albicans'' isolates. Combination of the alleles at the different loci results in unique diploid sequence types that can be used to discriminate strains. MLST has been shown successfully applied to study the epidemiology of ''C. albicans'' in the hospital as well as the diversity of ''C. albicans'' isolates obtained from diverse ecological niches including human and animal hosts.

''Cronobacter''

The genus '' Cronobacter'' is composed of 7 species. Before 2007, the single species name ''Enterobacter sakazakii'' was applied to these organisms. The ''Cronobacter'' MLST was initially applied to distinguish between ''C. sakazakii'' and ''C. malonaticus'' because 16S rDNA sequencing is not always accurate enough, and biotyping is too subjective. The ''Cronobacter'' MLST scheme uses 7 alleles; ''atpD'', ''fusA'', ''glnS'', ''gltB'', ''gyrB'', ''infB'' and ''ppsA'' giving a concatenated sequence of 3036 bp for phylogenetic analysis (MLSA) and

comparative genomics Comparative genomics is a field of biological research in which the genomic features of different organisms are compared. The genomic features may include the DNA sequence, genes, gene order, regulatory sequences, and other genomic structural ...

. MLST has also been used in the formal recognition of new ''Cronobacter'' species. The method has revealed a strong association between one genetic lineage, sequence type 4 (ST4), and cases of neonatal meningitis., The ''Cronobacter'' MLST site is at http://www.pubMLST.org/cronobacter.

Limitations

MLST appears best in population genetic study but it is expensive. Due to the sequence conservation in housekeeping genes, MLST sometimes lacks the discriminatory power to differentiate bacterial strains, which limits its use in epidemiological investigations. To improve the discriminatory power of MLST, a multi-virulence-locus sequence typing (MVLST) approach has been developed using ''Listeria monocytogenes'' . MVLST broadens the benefits of MLST but targets virulence genes, which may be more polymorphic than housekeeping genes. Population genetics is not the only relevant factor in an epidemic. Virulence factors are also important in causing disease, and population genetic studies struggle to monitor these. This is because the genes involved are often highly recombining and mobile between strains in comparison with the population genetic framework. Thus, for example in ''Escherichia coli'', identifying strains carrying toxin genes is more important than having a population genetics-based evaluation of prevalent strains. The advent of second-generation sequencing technologies has made it possible to obtain sequence information across the entire bacterial genome at relatively modest cost and effort, and MLST can now be assigned from whole-genome sequence information, rather than sequencing each locus separately as was the practice when MLST was first developed. Whole-genome sequencing provides richer information for differentiating bacterial strains (MLST uses approximately 0.1% of the genomic sequence to assign type while disregarding the rest of the bacterial genome). For example, whole-genome sequencing of numerous isolates has revealed the single MLST lineage ST258 of ''Klebsiella pneumoniae'' comprises two distinct genetic clades, providing additional information about the evolution and spread of these multi-drug resistant organisms, and disproving the previous hypothesis of a single clonal origin for ST258.

Databases

MLST databases contain the reference allele sequences and sequence types for each organism, and also isolate epidemiological data. The websites contain interrogation and analysis software which allow users to query their allele sequences and sequence types. MLST is widely used as a tool for researchers and public healthcare workers. The majority of MLST databases are hosted at web server currently located in Oxford University
pubmlst.org
. The database hosted at the site hold the organism specific reference allele sequences and lists of STs for individual organisms. To assist the gathering and formatting of the utilized sequences a simple and free plug-in for Firefox has been developed
link
).

References

External links

PubMLST - Oxford University

databases hosted at University College Cork

databases held at Pasteur Institute

BioNumerics
One universal bioinformatics solution to store and analyze all your biological data. The entire MLST workflow can be performed and results compared to other typing methods and/or metadata. Allele sequences and IDs can also be derived from whole genome sequences.
CLC Microbial Genomics Module
includes a set of tools and ready-to-use workflows for MLST in the context of meta-information such as outbreak, host, geographical location, etc., and allows easy comparison to typing results obtained using other typing methods. DNA profiling techniques Molecular biology