HOME

TheInfoList



OR:

Biodiversity informatics is the application of
informatics Informatics is the study of computational systems. According to the Association for Computing Machinery, ACM Europe Council and Informatics Europe, informatics is synonymous with computer science and computing as a profession, in which the centra ...
techniques to
biodiversity Biodiversity is the variability of life, life on Earth. It can be measured on various levels. There is for example genetic variability, species diversity, ecosystem diversity and Phylogenetics, phylogenetic diversity. Diversity is not distribut ...
information, such as
taxonomy image:Hierarchical clustering diagram.png, 280px, Generalized scheme of taxonomy Taxonomy is a practice and science concerned with classification or categorization. Typically, there are two parts to it: the development of an underlying scheme o ...
,
biogeography Biogeography is the study of the species distribution, distribution of species and ecosystems in geography, geographic space and through evolutionary history of life, geological time. Organisms and biological community (ecology), communities o ...
or
ecology Ecology () is the natural science of the relationships among living organisms and their Natural environment, environment. Ecology considers organisms at the individual, population, community (ecology), community, ecosystem, and biosphere lev ...
. It is defined as the application of
Information technology Information technology (IT) is a set of related fields within information and communications technology (ICT), that encompass computer systems, software, programming languages, data processing, data and information processing, and storage. Inf ...
technologies to management, algorithmic exploration, analysis and interpretation of primary data regarding life, particularly at the species level organization. Modern computer techniques can yield new ways to view and analyze existing information, as well as predict future situations (see niche modelling). Biodiversity informatics is a term that was only coined around 1992 but with rapidly increasing data sets has become useful in numerous studies and applications, such as the construction of taxonomic databases or
geographic information system A geographic information system (GIS) consists of integrated computer hardware and Geographic information system software, software that store, manage, Spatial analysis, analyze, edit, output, and Cartographic design, visualize Geographic data ...
s. Biodiversity informatics contrasts with "
bioinformatics Bioinformatics () is an interdisciplinary field of science that develops methods and Bioinformatics software, software tools for understanding biological data, especially when the data sets are large and complex. Bioinformatics uses biology, ...
", which is often used synonymously with the computerized handling of data in the specialized area of
molecular biology Molecular biology is a branch of biology that seeks to understand the molecule, molecular basis of biological activity in and between Cell (biology), cells, including biomolecule, biomolecular synthesis, modification, mechanisms, and interactio ...
.


Overview

Biodiversity informatics (different but linked to bioinformatics) is the application of information technology methods to the problems of organizing, accessing, visualizing and analyzing primary biodiversity data. Primary biodiversity data is composed of names, observations and records of specimens, and genetic and morphological data associated to a specimen. Biodiversity informatics may also have to cope with managing information from unnamed taxa such as that produced by environmental sampling and sequencing of mixed-field samples. The term biodiversity informatics is also used to cover the
computational problem In theoretical computer science, a computational problem is one that asks for a solution in terms of an algorithm. For example, the problem of factoring :"Given a positive integer ''n'', find a nontrivial prime factor of ''n''." is a computati ...
s specific to the names of biological entities, such as the development of algorithms to cope with variant representations of identifiers such as species names and authorities, and the multiple classification schemes within which these entities may reside according to the preferences of different workers in the field, as well as the syntax and semantics by which the content in taxonomic databases can be made machine queryable and interoperable for biodiversity informatics purposes...


History of the discipline

Biodiversity Informatics can be considered to have commenced with the construction of the first computerized taxonomic databases in the early 1970s, and progressed through subsequent developing of distributed search tools towards the late 1990s including the Species Analyst from Kansas University, the North American Biodiversity Information Network NABIN, CONABIO in Mexico, INBio in Costa Rica, and others, the establishment of the
Global Biodiversity Information Facility The Global Biodiversity Information Facility (GBIF) is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around th ...
in 2001, and the parallel development of a variety of niche modelling and other tools to operate on digitized biodiversity data from the mid-1980s onwards (e.g. see ). In September 2000, the U.S. journal ''
Science Science is a systematic discipline that builds and organises knowledge in the form of testable hypotheses and predictions about the universe. Modern science is typically divided into twoor threemajor branches: the natural sciences, which stu ...
'' devoted a special issue to "Bioinformatics for Biodiversity", the journal ''Biodiversity Informatics'' commenced publication in 2004, and several international conferences through the 2000s have brought together biodiversity informatics practitioners, including the London e-Biosphere conference in June 2009. A supplement to the journal ''
BMC Bioinformatics ''BMC Bioinformatics'' is a peer-reviewed open access scientific journal covering bioinformatics and computational biology published by BioMed Central. It was established in 2000, and has been one of the fastest growing and most successful journal ...
'' (Volume 10 Suppl 14) published in November 2009 also deals with biodiversity informatics.


History of the term

According to correspondence reproduced by Walter Berendsohn, the term "Biodiversity Informatics" was coined by John Whiting in 1992 to cover the activities of an entity known as the Canadian Biodiversity Informatics Consortium, a group involved with fusing basic
biodiversity Biodiversity is the variability of life, life on Earth. It can be measured on various levels. There is for example genetic variability, species diversity, ecosystem diversity and Phylogenetics, phylogenetic diversity. Diversity is not distribut ...
information with
environmental economics Environmental economics is a sub-field of economics concerned with environmental issues. It has become a widely studied subject due to growing environmental concerns in the twenty-first century. Environmental economics "undertakes theoretical ...
and geospatial information in the form of GPS and GIS. Subsequently, it appears to have lost any obligate connection with the GPS/GIS world and be associated with the computerized management of any aspects of biodiversity information (e.g. see )


Digital taxonomy (systematics)


Global list of all species

One major goal for biodiversity informatics is the creation of a complete master list of currently recognised species of the world. This goal has been achieved to a large extent by the
Catalogue of Life The Catalogue of Life (CoL) is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxono ...
project which lists >2 million species in its 2022 Annual Checklist. A similar effort for fossil taxa, the Paleobiology Database documents some 100,000+ names for fossil species, out of an unknown total number.


Genus and species scientific names as unique identifiers

Application of the Linnaean system of binomial nomenclature for
species A species () is often defined as the largest group of organisms in which any two individuals of the appropriate sexes or mating types can produce fertile offspring, typically by sexual reproduction. It is the basic unit of Taxonomy (biology), ...
, and uninomials for
genera Genus (; : genera ) is a taxonomic rank above species and below family as used in the biological classification of living and fossil organisms as well as viruses. In binomial nomenclature, the genus name forms the first part of the binomial s ...
and higher ranks, has led to many advantages but also problems with
homonyms In linguistics, homonyms are words which are either; ''homographs''—words that mean different things, but have the same spelling (regardless of pronunciation), or ''homophones''—words that mean different things, but have the same pronunciatio ...
(the same name being used for multiple taxa, either inadvertently or legitimately across multiple kingdoms),
synonyms A synonym is a word, morpheme, or phrase that means precisely or nearly the same as another word, morpheme, or phrase in a given language. For example, in the English language, the words ''begin'', ''start'', ''commence'', and ''initiate'' are a ...
(multiple names for the same taxon), as well as variant representations of the same name due to orthographic differences, minor spelling errors, variation in the manner of citation of author names and dates, and more. In addition, names can change through time on account of changing taxonomic opinions (for example, the correct generic placement of a species, or the elevation of a subspecies to species rank or vice versa), and also the circumscription of a taxon can change according to different authors' taxonomic concepts. One proposed solution to this problem is the usage of Life Science Identifiers ( LSIDs) for machine-machine communication purposes, although there are both proponents and opponents of this approach.


A consensus classification of organisms

Organisms can be classified in a multitude of ways (see main page
Biological classification In biology, taxonomy () is the scientific study of naming, defining ( circumscribing) and classifying groups of biological organisms based on shared characteristics. Organisms are grouped into taxa (singular: taxon), and these groups are give ...
), which can create design problems for Biodiversity Informatics systems aimed at incorporating either a single or multiple classification to suit the needs of users, or to guide them towards a single "preferred" system. Whether a single consensus classification system can ever be achieved is probably an open question, however the Catalogue of Life has commissioned activity in this area which has been succeeded by a published system proposed in 2015 by M. Ruggiero and co-workers.


Biodiversity Maps

Biodiversity maps provide a cartographic representation of spatial biodiversity data. This data can be used in conjunction with Species
Checklist A checklist is a type of job aid used in repetitive tasks to reduce failure by compensating for potential limits of human memory and attention. Checklists are used both to ensure that safety-critical system preparations are carried out completely ...
s to help with biodiversity conservation efforts. Biodiversity maps can help reveal patterns of species distribution and range changes. This may reflect biodiversity loss,
habitat degradation Habitat destruction (also termed habitat loss or habitat reduction) occurs when a natural habitat is no longer able to support its native species. The organisms once living there have either moved elsewhere, or are dead, leading to a decrease ...
, or changes in
species composition Relative species abundance is a component of biodiversity and is a measure of how common or rare a species is relative to other species in a defined location or community.Hubbell, S. P. 2001. ''The unified neutral theory of biodiversity and biogeog ...
. Combined with
urban development Urban means "related to a city". In that sense, the term may refer to: * Urban area, geographical area distinct from rural areas * Urban culture, the culture of towns and cities Urban may also refer to: General * Urban (name), a list of peop ...
data, maps can inform land management by modeling scenarios which might impact biodiversity. Biodiversity maps can be produced in a variety of ways: traditionally range maps were hand-drawn based on literature reports but increasingly large-scale data, e.g. from
citizen science The term citizen science (synonymous to terms like community science, crowd science, crowd-sourced science, civic science, participatory monitoring, or volunteer monitoring) is research conducted with participation from the general public, or am ...
projects (e.g.
iNaturalist iNaturalist is an American 501(c)(3) nonprofit social network of naturalists, citizen scientists, and biologists built on the concept of mapping and sharing observations of biodiversity across the globe. iNaturalist may be accessed via its web ...
) and digitized museum collections (e.g. VertNet) are used. GIS tools such as
ArcGIS ArcGIS is a family of client, server and online geographic information system (GIS) software developed and maintained by Esri. ArcGIS was first released in 1982 as ARC/INFO, a command line-based GIS. ARC/INFO was later merged into ArcGIS De ...
or R packages such as dismo can specifically aid in species distribution modeling (ecological niche modeling) and even predict impacts of ecological change on biodiversity.
GBIF The Global Biodiversity Information Facility (GBIF) is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around the ...
, OBIS, and
IUCN The International Union for Conservation of Nature (IUCN) is an international organization working in the field of nature conservation and sustainable use of natural resources. Founded in 1948, IUCN has become the global authority on the status ...
are large web-based repositories of species spatial-temporal data that source many existing biodiversity maps.


Mobilizing primary biodiversity information

"Primary" biodiversity information can be considered the basic data on the occurrence and diversity of species (or indeed, any recognizable taxa), commonly in association with information regarding their distribution in either space, time, or both. Such information may be in the form of retained specimens and associated information, for example as assembled in the natural history collections of
museum A museum is an institution dedicated to displaying or Preservation (library and archive), preserving culturally or scientifically significant objects. Many museums have exhibitions of these objects on public display, and some have private colle ...
s and
herbaria A herbarium (plural: herbaria) is a collection of preserved plant specimens and associated data used for scientific study. The specimens may be whole plants or plant parts; these will usually be in dried form mounted on a sheet of paper (called ...
, or as observational records, for example either from formal faunal or floristic surveys undertaken by professional biologists and students, or as amateur and other planned or unplanned observations including those increasingly coming under the scope of
citizen science The term citizen science (synonymous to terms like community science, crowd science, crowd-sourced science, civic science, participatory monitoring, or volunteer monitoring) is research conducted with participation from the general public, or am ...
. Providing online, coherent digital access to this vast collection of disparate primary data is a core Biodiversity Informatics function that is at the heart of regional and global biodiversity data networks, examples of the latter including OBIS and
GBIF The Global Biodiversity Information Facility (GBIF) is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around the ...
. As a secondary source of biodiversity data, relevant
scientific literature Scientific literature encompasses a vast body of academic papers that spans various disciplines within the natural and social sciences. It primarily consists of academic papers that present original empirical research and theoretical ...
can be parsed either by humans or (potentially) by specialized information retrieval algorithms to extract the relevant primary biodiversity information that is reported therein, sometimes in aggregated / summary form but frequently as primary observations in narrative or tabular form. Elements of such activity (such as extracting key taxonomic identifiers, keywording /
index term In information retrieval, an index term (also known as subject term, subject heading, descriptor, or keyword) is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic recor ...
s, etc.) have been practiced for many years at a higher level by selected academic databases and search engines. However, for the maximum Biodiversity Informatics value, the actual primary occurrence data should ideally be retrieved and then made available in a standardized form or forms; for example both the Plazi and INOTAXA projects are transforming taxonomic literature into
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
formats that can then be read by client applications, the former using TaxonX-XML and the latter using the taXMLit format. The
Biodiversity Heritage Library The Biodiversity Heritage Library (BHL) is the world’s largest open-access digital library for biodiversity literature and archives. BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working ...
is also making significant progress in its aim to digitize substantial portions of the out-of-copyright taxonomic literature, which is then subjected to
optical character recognition Optical character recognition or optical character reader (OCR) is the electronics, electronic or machine, mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo ...
(OCR) so as to be amenable to further processing using biodiversity informatics tools.


Standards and protocols

In common with other data-related disciplines, Biodiversity Informatics benefits from the adoption of appropriate
standard Standard may refer to: Symbols * Colours, standards and guidons, kinds of military signs * Standard (emblem), a type of a large symbol or emblem used for identification Norms, conventions or requirements * Standard (metrology), an object ...
s and
protocols Protocol may refer to: Sociology and politics * Protocol (politics), a formal agreement between nation states * Protocol (diplomacy), the etiquette of diplomacy and affairs of state * Etiquette, a code of personal behavior Science and technology ...
in order to support machine-machine transmission and interoperability of information within its particular domain. Examples of relevant standards include the Darwin Core
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
schema Schema may refer to: Science and technology * SCHEMA (bioinformatics), an algorithm used in protein engineering * Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity * Schema.org, a web markup vocab ...
for specimen- and observation-based biodiversity data developed from 1998 onwards, plus extensions of the same, Taxonomic Concept Transfer Schema, plus standards for Structured Descriptive Data, and Access to Biological Collection Data (ABCD); while data retrieval and transfer protocols include DiGIR (now mostly superseded) and TAPIR (TDWG Access Protocol for Information Retrieval). Many of these standards and protocols are currently maintained, and their development overseen, by Biodiversity Information Standards (TDWG).


Current activities

At the 2009 e-Biosphere conference in the U.K., the following themes were adopted, which is indicative of a broad range of current Biodiversity Informatics activities and how they might be categorized: * Application: Conservation / Agriculture / Fisheries / Industry / Forestry * Application: Invasive Alien Species * Application: Systematic and Evolutionary Biology * Application: Taxonomy and Identification Systems * New Tools, Services and Standards for Data Management and Access ** New Modeling Tools ** New Tools for Data Integration ** New Approaches to Biodiversity Infrastructure ** New Approaches to Species Identification ** New Approaches to Mapping Biodiversity * National and Regional Biodiversity Databases and Networks A post-conference workshop of key persons with current significant Biodiversity Informatics roles also resulted in a Workshop Resolution that stressed, among other aspects, the need to create durable, global registries for the resources that are basic to biodiversity informatics (e.g., repositories, collections); complete the construction of a solid taxonomic infrastructure; and create ontologies for biodiversity data.


Example projects

Global: * The
Global Biodiversity Information Facility The Global Biodiversity Information Facility (GBIF) is an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around th ...
(GBIF), and the Ocean Biogeographic Information System (OBIS) (for marine species) * The
Species 2000 Species 2000 is a federation of database organizations around the world that compiles the ''Catalogue of Life'', a comprehensive checklist of the world's species, in partnership with the Integrated Taxonomic Information System (ITIS). The creatio ...
,
ITIS The Integrated Taxonomic Information System (ITIS) is an American partnership of federal agencies designed to provide consistent and reliable information on the taxonomy of biological species. ITIS was originally formed in 1996 as an interagenc ...
(Integrated Taxonomic Information System), and
Catalogue of Life The Catalogue of Life (CoL) is an online database that provides an index of known species of animals, plants, fungi, and microorganisms. It was created in 2001 as a partnership between the global Species 2000 and the American Integrated Taxono ...
projects * Global Names * EOL, The Encyclopedia of Life project * The
Consortium for the Barcode of Life The Consortium for the Barcode of Life (CBOL) was an international initiative dedicated to supporting the development of DNA barcoding as a global standard for species identification. CBOL's Secretariat Office is hosted by the National Museum of ...
project * The Map of Life project * The
Reptile Database The Reptile Database is a scientific database that collects taxonomic information on all living reptile species (i.e. no fossil species such as dinosaur Dinosaurs are a diverse group of reptiles of the clade Dinosauria. They first appeared ...
project * The AmphibiaWeb project * The uBio Universal Biological Indexer and Organizer, from the Woods Hole
Marine Biological Laboratory The Marine Biological Laboratory (MBL) is an international center for research and education in biological and environmental science. Founded in Woods Hole, Massachusetts, in 1888, the MBL is a private, nonprofit institution that was independent ...
* The Index to Organism Names (ION) from Clarivate Analytics, providing access to scientific names of taxa from numerous journals as indexed in the
Zoological Record ''The Zoological Record'' (''ZR'') is an electronic index of zoological literature that also serves as the unofficial register of scientific names in zoology. It was started as a print publication in 1864 by the Zoological Society of London, ...
* The Interim Register of Marine and Nonmarine Genera (IRMNG) * ZooBank, the registry for nomenclatural acts and relevant systematic literature in
zoology Zoology ( , ) is the scientific study of animals. Its studies include the anatomy, structure, embryology, Biological classification, classification, Ethology, habits, and distribution of all animals, both living and extinction, extinct, and ...
* The Index Nominum Genericorum, compilation of generic names published for organisms covered by the
International Code of Botanical Nomenclature The ''International Code of Nomenclature for algae, fungi, and plants'' (ICN or ICNafp) is the set of rules and recommendations dealing with the formal botanical names that are given to plants, fungi and a few other groups of organisms, all tho ...
, maintained at the
Smithsonian Institution The Smithsonian Institution ( ), or simply the Smithsonian, is a group of museums, Education center, education and Research institute, research centers, created by the Federal government of the United States, U.S. government "for the increase a ...
in the U.S.A. * The
International Plant Names Index The International Plant Names Index (IPNI) describes itself as "a database of the names and associated basic bibliographical details of seed plants, ferns and lycophytes." Coverage of plant names is best at the rank of species and genus. It inclu ...
*
MycoBank MycoBank is an online database, documenting new mycological names and combinations, eventually combined with descriptions and illustrations. It is run by the Westerdijk Fungal Biodiversity Institute in Utrecht. Each novelty, after being screene ...
, documenting new names and combinations for
fungi A fungus (: fungi , , , or ; or funguses) is any member of the group of eukaryotic organisms that includes microorganisms such as yeasts and mold (fungus), molds, as well as the more familiar mushrooms. These organisms are classified as one ...
* The List of Prokaryotic names with Standing in Nomenclature (
LPSN List of Prokaryotic names with Standing in Nomenclature (LPSN) is an online database that maintains information on the naming and taxonomy image:Hierarchical clustering diagram.png, 280px, Generalized scheme of taxonomy Taxonomy is a practi ...
) - Official register of valid names for
bacteria Bacteria (; : bacterium) are ubiquitous, mostly free-living organisms often consisting of one Cell (biology), biological cell. They constitute a large domain (biology), domain of Prokaryote, prokaryotic microorganisms. Typically a few micr ...
and
archaea Archaea ( ) is a Domain (biology), domain of organisms. Traditionally, Archaea only included its Prokaryote, prokaryotic members, but this has since been found to be paraphyletic, as eukaryotes are known to have evolved from archaea. Even thou ...
, as governed by the
International Code of Nomenclature of Bacteria The International Code of Nomenclature of Prokaryotes (ICNP) or Prokaryotic Code, formerly the International Code of Nomenclature of Bacteria (ICNB) or Bacteriological Code (BC), governs the scientific names for Bacteria and Archaea.P. H. A. Sneath ...
* The
Biodiversity Heritage Library The Biodiversity Heritage Library (BHL) is the world’s largest open-access digital library for biodiversity literature and archives. BHL operates as a worldwide consortium of natural history, botanical, research, and national libraries working ...
project - digitising biodiversity literature *
Wikispecies Wikispecies is a wiki-based online project supported by the Wikimedia Foundation. Its aim is to create a comprehensive open content catalogue of all species; the project is directed at scientists, rather than at the general public. Jimmy Wales s ...
, open source (community-editable) compilation of taxonomic information, companion project to Wikipedia * TaxonConcept.org, a
Linked Data In computing, linked data is structured data which is interlinked with other data so it becomes more useful through semantic queries. It builds upon standard Web technologies such as HTTP, RDF and URIs, but rather than using them to serve web ...
project that connects disparate species databases * Instituto de Ciencias Naturales. Universidad Nacional de Colombia. Virtual Collections and Biodiversity Informatics Unit * ANTABIF. The Antarctic Biodiversity Information Facility gives free and open access to Antarctic Biodiversity data, in the spirit of the Antarctic Treaty. * Genesys, database of plant genetic resources maintained in national, regional and international
gene bank A gene bank is a type of biorepository that is used across the world to store the genetic material of animals, plants, and other organisms. It preserves their genetic information in the form of reproductive material like seeds, sperm, eggs, emb ...
s * VertNet, Access to vertebrate primary occurrence data from data sets worldwide. Regional / national projects: *
Fauna Europaea Fauna Europaea is a database of the scientific names and distribution of all living multicellular European land and fresh-water animals. It serves as a standard taxonomic source for animal taxonomy within the Pan-European Species directories Infr ...
* Atlas of Living Australia *
Pan-European Species directories Infrastructure The Pan-European Species-directories Infrastructure (PESI) provides a mechanism to deliver an integrated, annotated checklist of the species occurring in Europe, aiming to cover the Western Palearctic biogeographic region. PESI integrates the ef ...
(PESI) * Symbiota * iDigBio, Integrated Digitized Biocollections (USA) * i4Life project * Sistema de Información sobre Biodiversidad de Colombia * India Biodiversity Portal (IBP) * Bhutan Biodiversity Portal (BBP) * Weed Identification and Knowledge in the Western Indian Ocean (WIKWIO) * LifeWatch is proposed by ESFRI as a pan-European research (e-)infrastructure to support Biodiversity research and policy-making.
Vermont Atlas of Life
A listing of over 600 current biodiversity informatics related activities can be found at the TDWG "Biodiversity Information Projects of the World" database.


See also

* Web-based taxonomy *
List of biodiversity databases This is a list of biodiversity databases. Biodiversity databases store taxonomic information alone or more commonly also other information like distribution (spatial) data and ecological data, which provide information on the biodiversity of a pa ...


References


Further reading

* * * * * * *


External links


Biodiversity Informatics
(journal) {{DEFAULTSORT:Biodiversity Informatics Information science by discipline Taxonomy (biology) Computational fields of study