HOME

TheInfoList



OR:

BioPerl is a collection of
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
modules that facilitate the development of Perl scripts for
bioinformatics Bioinformatics () is an interdisciplinary field that develops methods and software tools for understanding biological data, in particular when the data sets are large and complex. As an interdisciplinary field of science, bioinformatics combi ...
applications. It has played an integral role in the
Human Genome Project The Human Genome Project (HGP) was an international scientific research project with the goal of determining the base pairs that make up human DNA, and of identifying, mapping and sequencing all of the genes of the human genome from both a ...
.


Background

BioPerl is an active
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
software project supported by the
Open Bioinformatics Foundation The Open Bioinformatics Foundation is a non-profit, volunteer-run organization focused on supporting open source programming in bioinformatics. The mission of the foundation is to support the development of open source toolkits for bioinformatics, ...
. The first set of Perl codes of BioPerl was created by
Tim Hubbard Timothy John Phillip Hubbard is a Professor of Bioinformatics at King's College London, Head of Genome Analysis at Genomics England and Honorary Faculty at the Wellcome Trust Sanger Institute in Cambridge, UK. Education Hubbard was educated at the ...
and Jong Bhak at
MRC MRC may refer to Government * Medical Research Council (United Kingdom) * Medical Reserve Corps, a US network of volunteer organizations * Municipalité régionale de comté (regional county municipality), Quebec, Canada * Military Revolutionar ...
Centre Cambridge, where the first genome sequencing was carried out by
Fred Sanger Frederick Sanger (; 13 August 1918 – 19 November 2013) was an English biochemist who received the Nobel Prize in Chemistry twice. He won the 1958 Chemistry Prize for determining the amino acid sequence of insulin and numerous other ...
. MRC Centre was one of the hubs and birth places of modern bioinformatics as it had a large quantity of DNA sequences and 3D protein structures. Hubbard was using the th_lib.pl Perl library, which contained many useful Perl subroutines for bioinformatics. Bhak, Hubbard's first PhD student, created jong_lib.pl. Bhak merged the two Perl subroutine libraries into Bio.pl. The name BioPerl was coined jointly by Bhak and Steven Brenner at the
Centre for Protein Engineering {{Use dmy dates, date=April 2022 The MRC Centre for Protein Engineering (or CPE) was a pioneering research unit in Cambridge, England, with a main focus on the structure, stability and activity of proteins and engineering of antibodies. Centre for ...
(CPE). In 1995, Brenner organized a BioPerl session at the
Intelligent Systems for Molecular Biology Intelligent Systems for Molecular Biology (ISMB) is an annual academic conference on the subjects of bioinformatics and computational biology organised by the International Society for Computational Biology (ISCB). The principal focus of the con ...
conference, held in Cambridge. BioPerl had some users in coming months including Georg Fuellen who organized a training course in Germany. Fuellen's colleagues and students greatly extended BioPerl; this was further expanded by others, including Steve Chervitz who was actively developing Perl codes for his yeast genome database. The major expansion came when Cambridge student
Ewan Birney John Frederick William Birney (known as Ewan Birney) (born 6 December 1972) is joint director of EMBL's European Bioinformatics Institute (EMBL-EBI), in Hinxton, Cambridgeshire and deputy director general of the European Molecular Biology Labora ...
joined the development team. The first stable release was on 11 June 2002; the most recent stable (in terms of API) release is 1.7.2 from 07 September 2017. There are also developer releases produced periodically. Version series 1.7.x is considered to be the most stable (in terms of bugs) version of BioPerl and is recommended for everyday use. In order to take advantage of BioPerl, the user needs a basic understanding of the Perl programming language including an understanding of how to use Perl references, modules, objects and methods.


Influence on the Human Genome Project

The Human Genome Project faced several challenges during its lifetime. A few of these problems were solved when many of the genomics labs started to use Perl. The process of analyzing all of the DNA sequences was one such problem. Some labs built large monolithic systems with complex relational databases that took forever to debug and implement, and got surpassed by new technologies. Other labs learned to build modular, loosely-coupled systems whose parts could be swapped in and out when new technologies arose. Many of the initial results from all of the labs were mixed. It was eventually discovered that many of the steps could be implemented as loosely coupled programs that were run with a Perl shell script. Another problem that was fixed was interchange of data. Each lab usually had different programs that they ran with their scripts, resulting in several conversions when comparing results. To fix this the labs collectively started using a super-set of data. One script was used to convert from super-set to each lab's set and one was used to convert back. This minimized the number of scripts needed and data exchange became simplified with Perl.


Features and examples

BioPerl provides software modules for many of the typical tasks of bioinformatics programming. These include: * Accessing
nucleotide Nucleotides are organic molecules consisting of a nucleoside and a phosphate. They serve as monomeric units of the nucleic acid polymers – deoxyribonucleic acid (DNA) and ribonucleic acid (RNA), both of which are essential biomolecules wi ...
and
peptide Peptides (, ) are short chains of amino acids linked by peptide bonds. Long chains of amino acids are called proteins. Chains of fewer than twenty amino acids are called oligopeptides, and include dipeptides, tripeptides, and tetrapeptides. A ...
sequence data from local and remote
databases In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
Example of accessing GenBank to retrieve a sequence:
use Bio::DB::GenBank;

$db_obj = Bio::DB::GenBank->new;

$seq_obj = $db_obj->get_Seq_by_acc( # Insert Accession Number );
* Transforming formats of database/ file records Example code for transforming formats
use Bio::SeqIO;

my $usage = "all2y.pl informat outfile outfileformat";
my $informat = shift or die $usage;
my $outfile = shift or die $usage;
my $outformat = shift or die $usage;

my $seqin = Bio::SeqIO->new( -fh  => *STDIN,  -format => $informat, );
my $seqout = Bio::SeqIO->new( -file  => ">$outfile",  -format => $outformat, );

while (my $inseq = $seqin->next_seq)

* Manipulating individual sequences Example of gathering statistics for a given sequence
use Bio::Tools::SeqStats;
$seq_stats = Bio::Tools::SeqStats->new($seqobj);

$weight = $seq_stats->get_mol_wt();
$monomer_ref = $seq_stats->count_monomers();

# for nucleic acid sequence
$codon_ref = $seq_stats->count_codons();
* Searching for similar sequences * Creating and manipulating
sequence alignment In bioinformatics, a sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. Alig ...
s * Searching for
gene In biology, the word gene (from , ; "...Wilhelm Johannsen coined the word gene to describe the Mendelian units of heredity..." meaning ''generation'' or ''birth'' or ''gender'') can have several different meanings. The Mendelian gene is a ba ...
s and other structures on
genomic Genomics is an interdisciplinary field of biology focusing on the structure, function, evolution, mapping, and editing of genomes. A genome is an organism's complete set of DNA, including all of its genes as well as its hierarchical, three-dim ...
DNA * Developing machine readable sequence
annotations An annotation is extra information associated with a particular point in a document or other piece of information. It can be a note that includes a comment or explanation. Annotations are sometimes presented in the margin of book pages. For anno ...


Usage

In addition to being used directly by end-users, BioPerl has also provided the base for a wide variety of bioinformatic tools, includin
amongst others
* SynBrowse * GeneComber * TFBS * MIMOX * BioParser * Degenerate primer design * Querying the public databases * Current Comparative Table New tools and algorithms from external developers are often integrated directly into BioPerl itself: * Dealing with phylogenetic trees and nested taxa * FPC Web tools


Advantages

BioPerl was one of the first biological module repositories that increased its usability. It has very easy to install modules, along with a flexible global repository. BioPerl uses good test modules for a large variety of processes.


Disadvantages

There are many ways to use BioPerl, from simple scripting to very complex object programming. This makes the language not clear and sometimes hard to understand. For as many modules that BioPerl has, some do not always work the way they are intended.


Related libraries in other programming languages

Several related bioinformatics libraries implemented in other programming languages exist as part of the
Open Bioinformatics Foundation The Open Bioinformatics Foundation is a non-profit, volunteer-run organization focused on supporting open source programming in bioinformatics. The mission of the foundation is to support the development of open source toolkits for bioinformatics, ...
, including: * Biopython *
BioJava BioJava is an open-source software project dedicated to provide Java tools to process biological data.VS Matha and P Kangueane, 2009, ''Bioinformatics: a concept-based introduction'', 2009. p26 BioJava is a set of library functions written in the ...
*
BioRuby BioRuby is a collection of open-source Ruby code, comprising classes for computational molecular biology and bioinformatics. It contains classes for DNA and protein sequence analysis, sequence alignment, biological database parsing, structural biol ...
*
BioPHP BioPHP is a collection of open-source PHP code, with classes for DNA and protein sequence analysis, alignment, database parsing, and other bioinformatics tools. BioRuby is released under the GNU GPL version 2 licence and is one of a number of Bi ...
*
BioJS BioJS is an open-source project for bioinformatics data on the web. Its goal is to develop an open-source library of JavaScript components to visualise biological data. BioJS develops and maintains small building blocks (components) which can be ...
*
Bioconductor Bioconductor is a Free software, free, Open-source software, open source and Open source software development, open development software project for the analysis and comprehension of Genome, genomic data generated by Wet laboratory, wet lab experi ...


References

{{Perl Perl software Free bioinformatics software