PHYLIP
   HOME

TheInfoList



OR:

PHYLogeny Inference Package (PHYLIP) is a free
computational phylogenetics Computational phylogenetics is the application of computational algorithms, methods, and programs to phylogenetic
package of programs for inferring evolutionary trees (Phylogenetics, phylogenies). It consists of 65 Porting, portable programs, i.e., the
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
is written in the programming language C. As of version 3.696, it is licensed as
open-source software Open-source software (OSS) is computer software that is released under a license in which the copyright holder grants users the rights to use, study, change, and distribute the software and its source code to anyone and for any purpose. ...
; versions 3.695 and older were
proprietary software Proprietary software is software that is deemed within the free and open-source software to be non-free because its creator, publisher, or other rightsholder or rightsholder partner exercises a legal monopoly afforded by modern copyright and i ...
freeware Freeware is software, most often proprietary, that is distributed at no monetary cost to the end user. There is no agreed-upon set of rights, license, or EULA that defines ''freeware'' unambiguously; every publisher defines its own rules for the ...
. Releases occur as source code, and as precompiled
executable In computing, executable code, an executable file, or an executable program, sometimes simply referred to as an executable or binary, causes a computer "to perform indicated tasks according to encoded instructions", as opposed to a data fil ...
s for many
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s including
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...
(95, 98, ME, NT, 2000, XP, Vista),
Mac OS 8 Mac OS 8 is an operating system that was released by Apple Computer on July 26, 1997. It includes the largest overhaul of the classic Mac OS experience since the release of System 7, approximately six years before. It places a greater emphasis o ...
,
Mac OS 9 Mac OS 9 is the ninth major release of Apple's classic Mac OS operating system which was succeeded by Mac OS X (renamed to OS X in 2011 and macOS in 2016) in 2001. Introduced on October 23, 1999, it was promoted by Apple as "The Best Internet ...
,
OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and la ...
,
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
(
Debian Debian (), also known as Debian GNU/Linux, is a Linux distribution composed of free and open-source software, developed by the community-supported Debian Project, which was established by Ian Murdock on August 16, 1993. The first version of De ...
,
Red Hat Red Hat, Inc. is an American software company that provides open source software products to enterprises. Founded in 1993, Red Hat has its corporate headquarters in Raleigh, North Carolina, with other offices worldwide. Red Hat has become a ...
); and
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
from FreeBSD.org. Full documentation is written for all the programs in the package and is included therein. The programs in the phylip package were written by Professor
Joseph Felsenstein Joseph "Joe" Felsenstein (born May 9, 1942) is a Professor Emeritus in the Departments of Genome Sciences and Biology at the University of Washington in Seattle. He is best known for his work on phylogenetic inference, and is the author of ''Infer ...
, of the Department of Genome Sciences and the Department of Biology,
University of Washington The University of Washington (UW, simply Washington, or informally U-Dub) is a public research university in Seattle, Washington. Founded in 1861, Washington is one of the oldest universities on the West Coast; it was established in Seatt ...
, Seattle. Methods (implemented by each program) that are available in the package include
parsimony Parsimony refers to the quality of economy or frugality in the use of resources. Parsimony may also refer to * The Law of Parsimony, or Occam's razor, a problem-solving principle ** Maximum parsimony (phylogenetics), an optimality criterion in p ...
,
distance matrix In mathematics, computer science and especially graph theory, a distance matrix is a square matrix (two-dimensional array) containing the distances, taken pairwise, between the elements of a set. Depending upon the application involved, the ''dist ...
, and likelihood methods, including bootstrapping and consensus trees. Data types that can be handled include molecular sequences, gene frequencies,
restriction site Restriction sites, or restriction recognition sites, are located on a DNA molecule containing specific (4-8 base pairs in length) sequences of nucleotides, which are recognized by restriction enzymes. These are generally palindromic sequences (bec ...
s and fragments, distance matrices, and discrete characters. Each program is controlled through a menu, which asks users which options they want to set, and allows them to start the computation. The data is read into the program from a text file, which the user can prepare using any word processor or text editor (but this text file cannot be in the special format of the word processor, it must instead be in ''flat
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
'' or ''text only'' format). Some sequence analysis programs such as the
Clustal Clustal is a series of widely used computer programs used in bioinformatics for multiple sequence alignment. There have been many versions of Clustal over the development of the algorithm that are listed below. The analysis of each tool and its ...
W alignment program can write data files in the PHYLIP format. Most of the programs look for the data in a file called infile . If the phylip programs do not find this file, they then ask the user to type in the file name of the data file.


File format

The component programs of phylip use several different formats, all of which are relatively simple. Programs for the analysis of DNA sequence alignments, protein sequence alignments, or discrete characters (e.g., morphological data) can accept those data in sequential or interleaved format, as shown below. Sequential format: 5 42 Turkey AAGCTNGGGC ATTTCAGGGT GAGCCCGGGC AATACAGGGT AT Salmo schiAAGCCTTGGC AGTGCAGGGT GAGCCGTGGC CGGGCACGGT AT H. sapiensACCGGTTGGC CGTTCAGGGT ACAGGTTGGC CGTTCAGGGT AA Chimp AAACCCTTGC CGTTACGCTT AAACCGAGGC CGGGACACTC AT Gorilla AAACCCTTGC CGGTACGCTT AAACCATTGC CGGTACGCTT AA Interleaved format: 5 42 Turkey AAGCTNGGGC ATTTCAGGGT Salmo schiAAGCCTTGGC AGTGCAGGGT H. sapiensACCGGTTGGC CGTTCAGGGT Chimp AAACCCTTGC CGTTACGCTT Gorilla AAACCCTTGC CGGTACGCTT GAGCCCGGGC AATACAGGGT AT GAGCCGTGGC CGGGCACGGT AT ACAGGTTGGC CGTTCAGGGT AA AAACCGAGGC CGGGACACTC AT AAACCATTGC CGGTACGCTT AA The numbers are the number of taxa (different species in the example shown above) followed by the number of characters (aligned nucleotides or amino acids in the case of molecular sequences). Restriction site data must include the number of enzymes as well. Names are limited to 10 characters by default and must be blank-filled to be of that length and followed immediately by the character data using one-letter codes, although the 10 character limit name can be changed by a minor modification of the code (by changing nmlngth in phylip.h and recompiling). All printable ASCII/ISO characters are allowed names, except for parentheses ("(" and ")"), square brackets (" .html" ;"title="/code>" and "">/code>" and "/code>"), colon (":"), semicolon (";") and comma (","). The spaces embedded in the alignment are ignored. Many programs for phylogenetic analyses, including the commonly-use
RAxML
an
IQ-TREE
ref>
programs, use the phylip format or a minor modification of that format called the relaxed phylip format. Relaxed phylip format (sequential): 5 42 Turkey AAGCTNGGGCATTTCAGGGTGAGCCCGGGCAATACAGGGTAT Salmo_schiefermuelleri AAGCCTTGGCAGTGCAGGGTGAGCCGTGGCCGGGCACGGTAT H_sapiens ACCGGTTGGCCGTTCAGGGTACAGGTTGGCCGTTCAGGGTAA Chimp AAACCCTTGCCGTTACGCTTAAACCGAGGCCGGGACACTCAT Gorilla AAACCCTTGCCGGTACGCTTAAACCATTGCCGGTACGCTTAA The primary difference in relaxed phylip format is the absence of the 10 character limit and the removal of the need to blank fill names to reach that length (although filling names to start the character matrix at the same position can improve readability for user). This example of relaxed uses underscores rather than spaces in the names and uses spaces between the names and the aligned character data; it is often good practice to avoid white space within taxon names and to separate the character data from the name when generating files. Like strict phylip format files, relaxed phylip format files can be in interleaved format and include spaces and endlines within the sequence data. The programs that use distance data, like the neighbor program that implements the neighbor-joining method, also use a simple distance matrix format the includes only the number of taxa, their names, and numerical values for the distances: Phylip distance matrix: 7 Bovine 0.0000 1.6866 1.7198 1.6606 1.5243 1.6043 1.5905 Mouse 1.6866 0.0000 1.5232 1.4841 1.4465 1.4389 1.4629 Gibbon 1.7198 1.5232 0.0000 0.7115 0.5958 0.6179 0.5583 Orang 1.6606 1.4841 0.7115 0.0000 0.4631 0.5061 0.4710 Gorilla 1.5243 1.4465 0.5958 0.4631 0.0000 0.3484 0.3083 Chimp 1.6043 1.4389 0.6179 0.5061 0.3484 0.0000 0.2692 Human 1.5905 1.4629 0.5583 0.4710 0.3083 0.2692 0.0000 The number indicates the number of taxa and same limitations for taxon names exist. Note that this matrix is symmetric and the diagonal has values of 0 (since the distance between a taxon and itself is zero by definition). Programs that use trees as input accept the trees in Newick format, an informal standard agreed to in 1986 by authors of seven major phylogeny packages. Output is written onto files with names like outfile and outtree. Trees written onto outtree are in the Newick format.


Component programs


File format conversion

Many programs that convert among alignment formats will output data in phylip or relaxed phylip format. For example, conversion between the PHYLIP multiple sequence alignment format and Multi- FASTA format can done with Genozipet al. (2021) Genozip: a universal extensible genomic data compressor, Bioinformatics''
/ref> using ''genocat --fasta'' or ''genocat --phylip''. The
PAUP* PAUP* (Phylogenetic Analysis Using Parsimony *and other methods) is a computational phylogenetics program for inferring evolutionary trees (Phylogenetics, phylogenies), written by David L. Swofford. Originally, as the name implies, PAUP only implem ...
software package is especially useful for converting between the
Nexus NEXUS is a joint Canada Border Services Agency and U.S. Customs and Border Protection-operated Trusted Traveler and expedited border control program designed for pre-approved, low-risk travelers. Members of the program can avoid waits at border ...
format and phylip format.


References


External links

*
Phylogeny Programs List
A large list of phylogeny packages with details on each one. {{As of, 2011, 01, 26, alt=Current count at 366. Phylogenetics software