infile
. If the phylip programs do not find this file, they then ask the user to type in the file name of the data file.
File format
The component programs of phylip use several different formats, all of which are relatively simple. Programs for the analysis of DNA sequence alignments, protein sequence alignments, or discrete characters (e.g., morphological data) can accept those data in sequential or interleaved format, as shown below. Sequential format: 5 42 Turkey AAGCTNGGGC ATTTCAGGGT GAGCCCGGGC AATACAGGGT AT Salmo schiAAGCCTTGGC AGTGCAGGGT GAGCCGTGGC CGGGCACGGT AT H. sapiensACCGGTTGGC CGTTCAGGGT ACAGGTTGGC CGTTCAGGGT AA Chimp AAACCCTTGC CGTTACGCTT AAACCGAGGC CGGGACACTC AT Gorilla AAACCCTTGC CGGTACGCTT AAACCATTGC CGGTACGCTT AA Interleaved format: 5 42 Turkey AAGCTNGGGC ATTTCAGGGT Salmo schiAAGCCTTGGC AGTGCAGGGT H. sapiensACCGGTTGGC CGTTCAGGGT Chimp AAACCCTTGC CGTTACGCTT Gorilla AAACCCTTGC CGGTACGCTT GAGCCCGGGC AATACAGGGT AT GAGCCGTGGC CGGGCACGGT AT ACAGGTTGGC CGTTCAGGGT AA AAACCGAGGC CGGGACACTC AT AAACCATTGC CGGTACGCTT AA The numbers are the number of taxa (different species in the example shown above) followed by the number of characters (aligned nucleotides or amino acids in the case of molecular sequences). Restriction site data must include the number of enzymes as well. Names are limited to 10 characters by default and must be blank-filled to be of that length and followed immediately by the character data using one-letter codes, although the 10 character limit name can be changed by a minor modification of the code (by changingnmlngth
in phylip.h and recompiling). All printable ASCII/ISO characters are allowed names, except for parentheses ("(
" and ")
"), square brackets (" .html" ;"title="/code>" and "">/code>" and "
/code>"), colon (":
"), semicolon (";
") and comma (",
"). The spaces embedded in the alignment are ignored.
Many programs for phylogenetic analyses, including the commonly-use
RAxML
an
IQ-TREE
ref> programs, use the phylip format or a minor modification of that format called the relaxed phylip format.
Relaxed phylip format (sequential):
5 42
Turkey AAGCTNGGGCATTTCAGGGTGAGCCCGGGCAATACAGGGTAT
Salmo_schiefermuelleri AAGCCTTGGCAGTGCAGGGTGAGCCGTGGCCGGGCACGGTAT
H_sapiens ACCGGTTGGCCGTTCAGGGTACAGGTTGGCCGTTCAGGGTAA
Chimp AAACCCTTGCCGTTACGCTTAAACCGAGGCCGGGACACTCAT
Gorilla AAACCCTTGCCGGTACGCTTAAACCATTGCCGGTACGCTTAA
The primary difference in relaxed phylip format is the absence of the 10 character limit and the removal of the need to blank fill names to reach that length (although filling names to start the character matrix at the same position can improve readability for user). This example of relaxed uses underscores rather than spaces in the names and uses spaces between the names and the aligned character data; it is often good practice to avoid white space within taxon names and to separate the character data from the name when generating files. Like strict phylip format files, relaxed phylip format files can be in interleaved format and include spaces and endlines within the sequence data.
The programs that use distance data, like the neighbor
program that implements the neighbor-joining
In bioinformatics, neighbor joining is a bottom-up (agglomerative) clustering method for the creation of phylogenetic trees, created by Naruya Saitou and Masatoshi Nei in 1987. Usually based on DNA or protein sequence data, the algorithm requi ...
method, also use a simple distance matrix format the includes only the number of taxa, their names, and numerical values for the distances:
Phylip distance matrix:
7
Bovine 0.0000 1.6866 1.7198 1.6606 1.5243 1.6043 1.5905
Mouse 1.6866 0.0000 1.5232 1.4841 1.4465 1.4389 1.4629
Gibbon 1.7198 1.5232 0.0000 0.7115 0.5958 0.6179 0.5583
Orang 1.6606 1.4841 0.7115 0.0000 0.4631 0.5061 0.4710
Gorilla 1.5243 1.4465 0.5958 0.4631 0.0000 0.3484 0.3083
Chimp 1.6043 1.4389 0.6179 0.5061 0.3484 0.0000 0.2692
Human 1.5905 1.4629 0.5583 0.4710 0.3083 0.2692 0.0000
The number indicates the number of taxa and same limitations for taxon names exist. Note that this matrix is symmetric and the diagonal has values of 0 (since the distance between a taxon and itself is zero by definition).
Programs that use trees as input accept the trees in Newick format
In mathematics, Newick tree format (or Newick notation or New Hampshire tree format) is a way of representing graph-theoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Fels ...
, an informal standard agreed to in 1986 by authors of seven major phylogeny packages. Output is written onto files with names like outfile
and outtree
. Trees written onto outtree
are in the Newick format.
Component programs
File format conversion
Many programs that convert among alignment formats will output data in phylip or relaxed phylip format. For example, conversion between the PHYLIP multiple sequence alignment format and Multi-FASTA format
In bioinformatics and biochemistry, the FASTA format is a text-based format for representing either nucleotide sequences or amino acid (protein) sequences, in which nucleotides or amino acids are represented using single-letter codes. The format ...
can done with Genozipet al. (2021) Genozip: a universal extensible genomic data compressor, Bioinformatics''
/ref> using ''genocat --fasta'' or ''genocat --phylip''. The PAUP*
PAUP* (Phylogenetic Analysis Using Parsimony *and other methods) is a computational phylogenetics program for inferring evolutionary trees (Phylogenetics, phylogenies), written by David L. Swofford. Originally, as the name implies, PAUP only implem ...
software package is especially useful for converting between the Nexus
NEXUS is a joint Canada Border Services Agency and U.S. Customs and Border Protection-operated Trusted Traveler and expedited border control program designed for pre-approved, low-risk travelers. Members of the program can avoid waits at border ...
format and phylip format.
References
External links
*
Phylogeny Programs List
A large list of phylogeny packages with details on each one. {{As of, 2011, 01, 26, alt=Current count at 366.
Phylogenetics software