HOME

TheInfoList



OR:

Data publishing (also data publication) is the act of releasing
research data Data ( , ) are a collection of discrete or continuous value (semiotics), values that convey information, describing the quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols t ...
in published form for use by others. It is a practice consisting in preparing certain
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
or
data set A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more table (database), database tables, where every column (database), column of a table represents a particular Variable (computer sci ...
(s) for public use thus to make them available to everyone to use as they wish. This practice is an integral part of the
open science Open science is the movement to make scientific research (including publications, data, physical samples, and software) and its dissemination accessible to all levels of society, amateur or professional. Open science is transparent and accessib ...
movement. There is a large and multidisciplinary consensus on the benefits resulting from this practice. The main goal is to elevate data to be first class research outputs. There are a number of initiatives underway as well as points of consensus and issues still in contention. There are several distinct ways to make research data available, including: * publishing data as supplemental material associated with a
research article Academic publishing is the subfield of publishing which distributes academic research and scholarship. Most academic work is published in academic journal articles, books or theses. The part of academic written output that is not formally pub ...
, typically with the data files hosted by the publisher of the article * hosting data on a publicly available website, with files available for download * hosting data in a repository that has been developed to support data publication, e.g.
figshare Figshare is an online open access repository where researchers can preserve and share their research outputs, including figures, datasets, images, and videos. It is free to upload content and free to access, in adherence to the principle of open ...
,
Dryad A dryad (; , sing. ) is an oak tree nymph or oak tree spirit in Greek mythology; ''Drys'' (δρῦς) means "tree", and more specifically " oak" in Greek. Today the term is often used to refer to tree nymphs in general. Types Daphnaie Thes ...
,
Dataverse The Dataverse is an open source web application to share, preserve, cite, explore and analyze research data. Researchers, data authors, publishers, data distributors, and affiliated institutions all receive appropriate credit via a data citation ...
,
Zenodo Zenodo is a general-purpose open repository developed under the European OpenAIRE program and operated by CERN. It allows researchers to deposit research papers, data sets, research software, reports, and any other research related digital art ...
. A large number of general and specialty (such as by research topic) data repositories exist. For example, the UK Data Service enables users to deposit
data collection Data collection or data gathering is the process of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes. Data collection is a research com ...
s and re-share these for research purposes. * publishing a data paper about the dataset, which may be published as a preprint, in a regular
journal A journal, from the Old French ''journal'' (meaning "daily"), may refer to: *Bullet journal, a method of personal organization *Diary, a record of personal secretive thoughts and as open book to personal therapy or used to feel connected to onesel ...
, or in a data journal that is dedicated to supporting data papers. The data may be hosted by the journal or hosted separately in a data repository. Publishing data allows researchers to both make their data available to others to use, and enables datasets to be cited similarly to other research publication types (such as articles or books), thereby enabling producers of datasets to gain academic credit for their work. The motivations for publishing data may range for a desire to make research more accessible, to enable citability of datasets, or research funder or publisher mandates that require open data publishing. The UK Data Service is one key organisation working with others to raise the importance of citing data correctly and helping researchers to do so. Solutions to preserve privacy within data publishing has been proposed, including privacy protection algorithms, data ”masking” methods, and regional privacy level calculation algorithm.


Methods for publishing data


Data files as supplementary material

A large number of journals and publishers support supplementary material being attached to research articles, including datasets. Though historically such material might have been distributed only by request or on
microform A microform is a scaled-down reproduction of a document, typically either photographic film or paper, made for the purposes of transmission, storage, reading, and printing. Microform images are commonly reduced to about 4% or of the original d ...
to libraries, journals today typically host such material online. Supplementary material is available to subscribers to the journal or, if the article or journal is open access, to everyone.


Data repositories

There are a large number of data repositories, on both general and specialized topics. Many repositories are disciplinary repositories, focused on a particular research discipline such as the UK Data Service which is a trusted digital repository of social, economic and humanities data. Repositories may be free for researchers to upload their data or may charge a one-time or ongoing fee for hosting the data. These repositories offer a publicly accessible web interface for searching and browsing hosted datasets, and may include additional features such as a
digital object identifier A digital object identifier (DOI) is a persistent identifier or handle used to uniquely identify various objects, standardized by the International Organization for Standardization (ISO). DOIs are an implementation of the Handle System; th ...
, for permanent citation of the data, and linking to associated published papers and code.


Data papers

Data papers or data articles are “scholarly publication of a searchable metadata document describing a particular on-line accessible dataset, or a group of datasets, published in accordance to the standard academic practices”. Their final aim is to provide “information on the what, where, why, how and who of the data”. The intent of a data paper is to offer descriptive information on the related dataset(s) focusing on data collection, distinguishing features, access and potential reuse rather than on data processing and analysis. Because data papers are considered academic publications no different than other types of papers, they allow scientists sharing data to receive credit in currency recognizable within the academic system, thus "making data sharing count". This provides not only an additional incentive to share data, but also through the
peer review Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work (:wiktionary:peer#Etymology 2, peers). It functions as a form of self-regulation by qualified members of a profession within the ...
process, increases the quality of metadata and thus reusability of the shared data. Thus data papers represent the
scholarly communication Scholarly communication involves the creation, publication, dissemination, and discovery of academic research, primarily in peer-reviewed journals and books. It is “the system through which research and other scholarly writings are created, ev ...
approach to
data sharing Data sharing is the practice of making data used for scholarly research available to other investigators. Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are consid ...
. Despite their potentiality, data papers are not the ultimate and complete solution for all the data sharing and reuse issues and, in some cases, they are considered to induce false expectations in the research community.


Data journals

Data papers are supported by a rich array of data journals, some of which are "pure", i.e. they are dedicated to publish data papers only, while others – the majority – are "mixed", i.e. they publish a number of articles types including data papers. A comprehensive survey on data journals is available. A non-exhaustive list of data journals has been compiled by staff at the University of Edinburgh. Examples of "pure" data journals are: '' Earth System Science Data'', ''
Journal of Open Archaeology Data This page contains a list of academic journals covering archaeology, the study of the human past through material remains. It includes both active periodicals and those that have ceased publication. Before the advent of the modern journal format, ...
'', ''
Open Health Data Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * ''Open'' (Blues Image album), 1969 * ''Open'' (Gerd Dudek, Buschi Niebergall, and Edward Vesala album), 1979 * ''Open'' (Got ...
'', ''
Polar Data Journal Polar may refer to: Geography * Geographical pole, either of the two points on Earth where its axis of rotation intersects its surface ** Polar climate, the climate common in polar regions ** Polar regions of Earth, locations within the polar circ ...
'', and '' Scientific Data''. Examples of "mixed" journals publishing data papers are: ''
Biodiversity Data Journal Biodiversity is the variability of life on Earth. It can be measured on various levels. There is for example genetic variability, species diversity, ecosystem diversity and phylogenetic diversity. Diversity is not distributed evenly on Eart ...
'', ''
F1000Research F1000 (formerly "Faculty of 1000") is an open research publisher for scientists, scholars, and clinical researchers. F1000 offers a different research evaluation service from standard academic journals by offering peer-review after, rather than ...
'', ''
GigaScience ''GigaScience'' is a peer-reviewed scientific journal that was established in 2012. It covers research and large data-sets that result from work in the biomedical and life sciences. The editor-in-chief is Scott Edmunds. Originally, the journal was ...
'', ''
GigaByte The gigabyte () is a multiple of the unit byte for digital information. The SI prefix, prefix ''giga-, giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte i ...
'', '' PLOS ONE'', and '' SpringerPlus''.


Data citation

Data citation is the provision of accurate, consistent and standardised referencing for datasets just as bibliographic
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose o ...
s are provided for other published sources like
research article Academic publishing is the subfield of publishing which distributes academic research and scholarship. Most academic work is published in academic journal articles, books or theses. The part of academic written output that is not formally pub ...
s or
monograph A monograph is generally a long-form work on one (usually scholarly) subject, or one aspect of a subject, typically created by a single author or artist (or, sometimes, by two or more authors). Traditionally it is in written form and published a ...
s. Typically the well established Digital Object Identifier (DOI) approach is used with DOIs taking users to a
website A website (also written as a web site) is any web page whose content is identified by a common domain name and is published on at least one web server. Websites are typically dedicated to a particular topic or purpose, such as news, educatio ...
that contains the
metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...
on the dataset and the dataset itself.


History of development

A 2011 paper reported an inability to determine how often data citation happened in social sciences. 2012-13 papers reported that data citation was becoming more common but the practice for it was not standard. In 2014 FORCE 11 published the Joint Declaration of Data Citation Principles covering the purpose, function and attributes of data citation. In October 2018 CrossRef expressed its support for cataloging datasets and recommending their citation. A popular data-oriented journal reported in April 2019 that it would now use data citations. A June 2019 paper suggested that increased data citation will make the practice more valuable for everyone by encouraging data sharing and also by increasing the prestige of people who share. Data citation is an emerging topic in
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
and it has been defined as a computational problem. Indeed, citing data poses significant challenges to computer scientists and the main problems to address are related to: * the use of heterogeneous data models and formats – e.g., relational databases, Comma-Separated Values (CSV),
Extensible Markup Language Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. The Wor ...
(XML),
Resource Description Framework The Resource Description Framework (RDF) is a method to describe and exchange graph data. It was originally designed as a data model for metadata by the World Wide Web Consortium (W3C). It provides a variety of syntax notations and formats, of whi ...
(RDF); * the transience of data; * the necessity to cite data at different levels of coarseness – i.e., deep citations; * the necessity to automatically generate citations to data with variable granularity.


See also

* Data archiving *
Disciplinary repository A disciplinary repository (or subject repository) is an online archive, often an open-access repository, containing works or data associated with these works of scholars in a particular subject area. Disciplinary repositories can accept work f ...
* Open science data *
Registry of Research Data Repositories The Registry of Research Data Repositories (re3data.org) is an open science tool that offers researchers, funding organizations, libraries, and publishers an overview of existing international data library, repositories for research data. Backg ...


References

" above it will break the reflist and cause cite errors--> Australian National Data Service: Data Citation Awareness
(Accessed 20 March 2012)
Ball, A., Duke, M. (2011). 'Data Citation and Linking'. DCC Briefing Papers. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/briefing-papers/ Silvello, G. (2018). 'Theory and Practice of Data Citation'. Journal of the Association for Information Science and Technology (JASIST) (AIS Review), vol. 69 issue 1, pp. 6-20, 2018. Available online (open access): https://onlinelibrary.wiley.com/doi/full/10.1002/asi.23917 Buneman, P. and Silvello, G. (2010). 'A Rule-Based Citation System for Structured and Evolving Datasets'. IEEE Bulletin of the Technical Committee on Data Engineering, Vol. 3, No. 3. IEEE Computer Society, pp. 33-41, September 2010. Available online: http://sites.computer.org/debull/A10sept/buneman.pdf Silvello, G. (2017). 'Learning to Cite Framework: How to Automatically Construct Citations for Hierarchical Data'. Journal of the Association for Information Science and Technology (JASIST), Volume 68 issue 6, pp. 1505-1524, June 2017. Available online: http://www.dei.unipd.it/~silvello/papers/2016-DataCitation-JASIST-Silvello.pdf Silvello, G. (2015). 'A Methodology for Citing Linked Open Data Subsets'. D-Lib Magazine 21 (1/2), 2015. Available online: http://www.dlib.org/dlib/january15/silvello/01silvello.html Buneman, P. (2006). 'How to Cite Curated Databases and how to Make Them Citable'. In Proc. of the 18th International Conference on Scientific and Statistical Database Management, SSDBM 2006, pages 195–203, 2006. {{Data Academic publishing Open access (publishing) Data Open science Scholarly communication