Data sharing (Science)
   HOME

TheInfoList



OR:

Data sharing is the practice of making data used for scholarly research available to other investigators. Many funding agencies, institutions, and publication venues have policies regarding data sharing because transparency and openness are considered by many to be part of the
scientific method The scientific method is an empirical method for acquiring knowledge that has characterized the development of science since at least the 17th century (with notable practitioners in previous centuries; see the article history of scientific ...
. A number of funding agencies and science journals require authors of
peer-review Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work ( peers). It functions as a form of self-regulation by qualified members of a profession within the relevant field. Peer revie ...
ed papers to share any supplemental information (
raw data Raw data, also known as primary data, are ''data'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after test scores). If a scientist ...
,
statistical methods Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industria ...
or
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
) necessary to understand, develop or
reproduce Reproduction (or procreation or breeding) is the biological process by which new individual organisms – "offspring" – are produced from their "parent" or parents. Reproduction is a fundamental feature of all known life; each individual org ...
published research. A great deal of scientific research is not subject to data sharing requirements, and many of these policies have liberal exceptions. In the absence of any binding requirement, data sharing is at the discretion of the scientists themselves. In addition, in certain situations governments and institutions prohibit or severely limit data sharing to protect proprietary interests, national security, and subject/patient/victim confidentiality. Data sharing may also be restricted to protect institutions and scientists from use of data for political purposes. Data and methods may be requested from an author years after publication. In order to encourage data sharing and prevent the loss or corruption of data, a number of funding agencies and journals established policies on data archiving. Access to publicly archived data is a recent development in the history of science made possible by technological advances in communications and
information technology Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of Data (computing), data . and information. IT forms part of information and communications technology (ICT). An information te ...
. To take full advantage of modern rapid communication may require consensual agreement on the criteria underlying mutual recognition of respective contributions. Models recognized for improving the timely sharing of data for more effective response to emergent infectious disease threats include the data sharing mechanism introduced by the
GISAID GISAID (Global Initiative on Sharing Avian Influenza Data) is a global science initiative and primary source established in 2008 that provides open access to genomic data of influenza viruses and the coronavirus responsible for the COVID-19 pan ...
Initiative. Despite policies on data sharing and archiving, data withholding still happens. Authors may fail to archive data or they only archive a portion of the data. Failure to archive data alone is not data withholding. When a researcher requests additional information, an author sometimes refuses to provide it. When authors withhold data like this, they run the risk of losing the trust of the science community. A 2022 study identified about 3500 research papers which contained statements that the data was available, but upon request and further seeking the data, found that it was unavailable for 94% of papers. Data sharing may also indicate the sharing of personal information on a social media platform.


U.S. government policies


Federal law

On August 9, 2007, President Bush signed the America COMPETES Act (or the "America Creating Opportunities to Meaningfully Promote Excellence in Technology, Education, and Science Act") requiring civilian federal agencies to provide guidelines, policies and procedures, to facilitate and optimize the open exchange of data and research between agencies, the public and policymakers. See Section 1009.America COMPETES Act
/ref>


NIH data sharing policy

The NIH Final Statement of Sharing of Research Data says:


NSF Policy from Grant General Conditions


Office of Research Integrity

Allegations of misconduct in medical research carry severe consequences. The United States Department of Health and Human Services established an office to oversee investigations of allegations of misconduct, including data withholding. The website defines the mission:


Ideals in data sharing

Some research organizations feel particularly strongly about data sharing. Stanford University's
WaveLab WaveLab is a digital audio editor and recording computer software application for Windows and macOS, created by Steinberg. WaveLab was started in 1995 and it is mainly the work of one programmer, Philippe Goutier. Audio can be edited as a sin ...
has a philosophy about reproducible research and disclosing all algorithms and source code necessary to reproduce the research. In a paper titled "WaveLab and Reproducible Research," the authors describe some of the problems they encountered in trying to reproduce their own research after a period of time. In many cases, it was so difficult they gave up the effort. These experiences are what convinced them of the importance of disclosing source code. The philosophy is described: :''The idea is: An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures.'' The Data Observation Network for Earth (
DataONE DataONE is a network of interoperable data repositories facilitating data sharing, data discovery, and open science. Originally supported by $21.2 million in funding from the US National Science Foundation as one of the initial DataNet programs ...
) and Data Conservancy are projects supported by the
National Science Foundation The National Science Foundation (NSF) is an independent agency of the United States government that supports fundamental research and education in all the non-medical fields of science and engineering. Its medical counterpart is the National ...
to encourage and facilitate data sharing among research scientists and better support
meta-analysis A meta-analysis is a statistical analysis that combines the results of multiple scientific studies. Meta-analyses can be performed when there are multiple scientific studies addressing the same question, with each individual study reporting me ...
. In environmental sciences, the research community is recognizing that major scientific advances involving integration of knowledge in and across fields will require that researchers overcome not only the technological barriers to data sharing but also the historically entrenched institutional and sociological barriers. Dr. Richard J. Hodes, director of the
National Institute on Aging The National Institute on Aging (NIA) is a division of the U.S. National Institutes of Health (NIH), located in Bethesda, Maryland. The NIA itself is headquartered in Baltimore, Maryland. The NIA leads a broad scientific effort to understand the ...
has stated, "the old model in which researchers jealously guarded their data is no longer applicable". The Alliance for Taxpayer Access is a group of organizations that support open access to government sponsored research. The group has expressed a "Statement of Principles" explaining why they believe open access is important. They also list a number of international public access policies. This is no more so than in timely communication of essential information to effectively respond to health emergencies. While public domain archives have been embraced for depositing data, mainly post formal publication, they have failed to encourage rapid data sharing during health emergencies, among them the Ebola and Zika, outbreaks. More clearly defined principles are required to recognize the interests of those generating the data while permitting free, unencumbered access to and use of the data (pre-publication) for research and practical application, such as those adopted by the GISAID Initiative to counter emergent threats from influenza.


International policies


Australia
* Europe â€
Commission of European CommunitiesGermanyUnited Kingdom
* 'Omic Data Sharing — a list of policies of major science funder
FAIRsharing.org Catalogue of Data Policies
* India -
National Data Sharing and Accessibility Policy – Government of India The Union Cabinet approved the National Data Sharing and Accessibility Policy (NDSAP) on 9 February 2012. The objective of the policy is to facilitate access to Government of India owned shareable data and information in both human readable and mac ...


Data sharing problems in academia


Genetics

Withholding of data has become so commonplace in
genetics Genetics is the study of genes, genetic variation, and heredity in organisms.Hartl D, Jones E (2005) It is an important branch in biology because heredity is vital to organisms' evolution. Gregor Mendel, a Moravian Augustinian friar wor ...
that researchers at Massachusetts General Hospital published a journal article on the subject. The study found that "Because they were denied access to data, 28% of geneticists reported that they had been unable to confirm published research."


Psychology

In a 2006 study, it was observed that, of 141 authors of a publication from the
American Psychological Association The American Psychological Association (APA) is the largest scientific and professional organization of psychologists in the United States, with over 133,000 members, including scientists, educators, clinicians, consultants, and students. It ha ...
(APA) empirical articles, 103 (73%) did not respond with their data over a 6-month period. In a follow up study published in 2015, it was found that 246 out of 394 contacted authors of papers in APA journals did not share their data upon request (62%).


Archaeology

A 2018 study reported on study of a random sample of 48 articles published during February–May 2017 in the ''
Journal of Archaeological Science The ''Journal of Archaeological Science'' is a monthly peer-reviewed academic journal that covers "the development and application of scientific techniques and methodologies to all areas of archaeology". The journal was established in 1974 by Acad ...
'' which found openly available raw data for 18 papers (53%), with compositional and dating data being the most frequently shared types. The same study also emailed authors of articles on experiments with stone artifacts that were published during 2009 and 2015 to request data relating to the publications. They contacted the authors of 23 articles and received 15 replies, resulting in a 70% response rate. They received five responses that included data files, giving an overall sharing rate of 20%.


Scientists in training

A study of scientists in training indicated many had already experienced data withholding. This study has given rise to the fear the future generation of scientists will not abide by the established practices.


Differing approaches in different fields

Requirements for data sharing are more commonly imposed by institutions, funding agencies, and publication venues in the medical and biological sciences than in the physical sciences. Requirements vary widely regarding whether data must be shared at all, with whom the data must be shared, and who must bear the expense of data sharing. Funding agencies such as the NIH and NSF tend to require greater sharing of data, but even these requirements tend to acknowledge the concerns of patient confidentiality, costs incurred in sharing data, and the legitimacy of the request. Private interests and public agencies with national security interests (defense and law enforcement) often discourage sharing of data and methods through non-disclosure agreements. Data sharing poses specific challenges in
participatory monitoring Participatory monitoring (also known as collaborative monitoring, community-based monitoring, locally based monitoring, or volunteer monitoring) is the regular collection of measurements or other kinds of data ( monitoring), usually of natural re ...
initiatives, for example where forest communities collect data on local social and environmental conditions. In this case, a rights-based approach to the development of data-sharing protocols can be based on principles of
free, prior and informed consent Free, prior and informed consent (FPIC) is aimed to establish bottom-up participation and consultation of an indigenous population prior to the beginning of development on ancestral land or using resources in an indigenous population's territory. ...
, and prioritise the protection of the rights of those who generated the data, and/or those potentially affected by data-sharing.D Sabogal. 2015. Data sharing in community-based forest monitoring: lessons from Guyana. Global Canopy Programme. http://forestcompass.org/how/resources/data-sharing-community-based-forest-monitoring-lessons-guyana


See also

* Data archive * Data dissemination *
Data privacy Information privacy is the relationship between the collection and dissemination of data, technology, the public expectation of privacy, contextual information norms, and the legal and political issues surrounding them. It is also known as data pr ...
* Data publishing *
Data citation Data publishing (also data publication) is the act of releasing research data in published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available to everyone to use a ...
* FAIR data *
File sharing File sharing is the practice of distributing or providing access to digital media, such as computer programs, multimedia (audio, images and video), documents or electronic books. Common methods of storage, transmission and dispersion include r ...
*
Information sharing Information exchange or information sharing means that people or other entities pass information from one to another. This could be done electronically or through certain systems. These are terms that can either refer to bidirectional '' inform ...
*
Open data Open data is data that is openly accessible, exploitable, editable and shared by anyone for any purpose. Open data is licensed under an open license. The goals of the open data movement are similar to those of other "open(-source)" movement ...
*
Registry of Research Data Repositories The Registry of Research Data Repositories (re3data.org) is an open science tool that offers researchers, funding organizations, libraries and publishers an overview of existing international repositories for research data. Background re3da ...


References


Literature

— discusses the international exchange of data in the natural sciences.


External links

*
The Selfish Gene
: Data Sharing and Withholding in Academic Genetics" by Eric Campbell and David Blumenthal published May 31, 2002.

― American Psychological Association
The Public Domain of Digital Research Data

WaveLab and Reproducible Research
by Jonathan B. Buckheit and David L. Donoho of Stanford University
The Role of Data and Program Code Archives in the Future of Economic Research
published by The Federal Reserve Bank of St. Louis
Ecological Society of America data sharing and archiving initiative
{{Webarchive, url=https://web.archive.org/web/20080310171037/http://www.esa.org/science_resources/datasharing.php , date=2008-03-10
FAIRsharing.org
A website on data sharing and data policies in biology
UK Data Archive: Manage and Share data

Data Management Plan Resources and Examples
- Inter-university Consortium for Political and Social Research.
DataONE
* Natur
Scientific Data
open-access, online-only publication for descriptions of scientifically valuable datasets. Data Scientific method Scholarly communication Academic publishing * Open access (publishing) Open data Open science Scientific misconduct Sharing