COVID-19 datasets are public databases for sharing case data and medical information related to the
COVID-19 pandemic
The COVID-19 pandemic (also known as the coronavirus pandemic and COVID pandemic), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), began with an disease outbreak, outbreak of COVID-19 in Wuhan, China, in December ...
.
Aggregate statistics
United States
Volunteer/non-government
U.S. Department of Health & Human Services
Global
*
Johns Hopkins
Johns Hopkins (May 19, 1795 – December 24, 1873) was an American merchant, investor, and philanthropist. Born on a plantation, he left his home to start a career at the age of 17, and settled in Baltimore, Maryland, where he remained for mos ...
Coronavirus Resource Center: Global aggregated data including cases, testing, contact tracing, and vaccine development
*
World Health Organization
The World Health Organization (WHO) is a list of specialized agencies of the United Nations, specialized agency of the United Nations which coordinates responses to international public health issues and emergencies. It is headquartered in Gen ...
(WHO) Coronavirus Disease Dashboard: a database of confirmed cases and deaths reported globally and broken down by region. This database is part of the WHO Health Data Platform.
* COVID-19 Africa Open Data Project: a volunteer-run database and dashboard reporting region, country and district level case counts, deaths, healthcare worker infections, healthcare services and urgent needs.
Data hubs
Health Data Research UKprovides a searchable registry of health data resources from the United Kingdom, includin
COVID-19 related datasets
* NIH Open Access Datasets: The
National Institutes of Health
The National Institutes of Health (NIH) is the primary agency of the United States government responsible for biomedical and public health research. It was founded in 1887 and is part of the United States Department of Health and Human Service ...
provide open-access data and computational resources related to COVID-19.
* COVID-19 Open Research Dataset (CORD-19): The
Semantic Scholar
Semantic Scholar is a research tool for scientific literature. It is developed at the Allen Institute for AI and was publicly released in November 2015. Semantic Scholar uses modern techniques in natural language processing to support the resear ...
project of the
Allen Institute for AI
The Allen Institute for AI (abbreviated AI2) is a 501(c)(3) non-profit scientific research institute founded by late Microsoft co-founder and philanthropist Paul Allen in 2014. The institute seeks to conduct high-impact AI research and engineeri ...
hosts CORD-19, a public dataset of academic articles about COVID-19 and related research. The dataset is updated daily and includes both peer-reviewed articles and preprints. CORD-19 was originally released on March 16, 2020, by researchers and leaders from the Allen Institute for AI,
Chan Zuckerberg Initiative
The Chan Zuckerberg Initiative (CZI) is an organization established and owned by Facebook founder Mark Zuckerberg and his wife Priscilla Chan with an investment of 99 percent of the couple's wealth from their Facebook shares over their lifetim ...
,
Georgetown University's Center for Security and Emerging Technhology,
Microsoft
Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
, and the
National Library of Medicine
The United States National Library of Medicine (NLM), operated by the United States federal government, is the world's largest medical library.
Located in Bethesda, Maryland, the NLM is an institute within the National Institutes of Health. I ...
. The dataset is created through the use of
text mining
Text mining, text data mining (TDM) or text analytics is the process of deriving high-quality information from text. It involves "the discovery by computer of new, previously unknown information, by automatically extracting information from differe ...
of the current research literature.
Topic-specific and special-interest resources
Genomics
* Consensus genome data for
SARS-CoV-2
Severe acute respiratory syndrome coronavirus 2 (SARS‑CoV‑2) is a strain of coronavirus that causes COVID-19, the respiratory illness responsible for the COVID-19 pandemic. The virus previously had the Novel coronavirus, provisional nam ...
is available through
GISAID
GISAID (), the Global Initiative on Sharing All Influenza Data, previously the Global Initiative on Sharing Avian Influenza Data, is a global science initiative established in 2008 to provide access to genomic data of influenza viruses. The datab ...
for registered users and included in an interactive
Phylogenetic tree
A phylogenetic tree or phylogeny is a graphical representation which shows the evolutionary history between a set of species or taxa during a specific time.Felsenstein J. (2004). ''Inferring Phylogenies'' Sinauer Associates: Sunderland, MA. In ...
dashboard
on
Nextstrain
Nextstrain is a collaboration between researchers in Seattle, United States and Basel, Switzerland which provides a collection of open-source tools for visualising the genetics behind the spread of viral outbreaks.
Its aim is to support public h ...
, an open-source pathogen
genome
A genome is all the genetic information of an organism. It consists of nucleotide sequences of DNA (or RNA in RNA viruses). The nuclear genome includes protein-coding genes and non-coding genes, other functional regions of the genome such as ...
data project.
Imaging (Radiology)
* Characteristic imaging features on chest
radiographs
Radiography is an imaging technique using X-rays, gamma rays, or similar ionizing radiation and non-ionizing radiation to view the internal form of an object. Applications of radiography include medical ("diagnostic" radiography and "therapeu ...
and
computed tomography
A computed tomography scan (CT scan), formerly called computed axial tomography scan (CAT scan), is a medical imaging technique used to obtain detailed internal images of the body. The personnel that perform CT scans are called radiographers or ...
(CT) of people who are symptomatic include asymmetric peripheral
ground-glass opacities without
pleural effusion
A pleural effusion is accumulation of excessive fluid in the pleural space, the potential space that surrounds each lung.
Under normal conditions, pleural fluid is secreted by the parietal pleural capillaries at a rate of 0.6 millilitre per kilog ...
s.
The
University of Montreal
A university () is an institution of tertiary education and research which awards academic degrees in several academic disciplines. ''University'' is derived from the Latin phrase , which roughly means "community of teachers and scholars". Univ ...
and
Mila created the "COVID-19 Image Data Collection" in March which is a public data repository of chest imaging.
The Medical Imaging Databank in Valencian Region released a large dataset of chest imaging from Spain.
The
Italian Radiological Society is compiling an international online database of imaging findings for confirmed cases.
Online radiology case sharing platforms such as
Eurorad and
Radiopaedia
Radiopaedia is a wiki-based international collaborative educational web resource containing a radiology encyclopedia and imaging case repository. It is currently the largest freely available radiology related resource in the world with more than ...
serve as platforms for sharing COVID-19 case data and imaging.
References
{{COVID-19
datasets
Datasets