S2CID
   HOME

TheInfoList



OR:

Semantic Scholar is an artificial intelligence–powered research tool for scientific literature developed at the Allen Institute for AI and publicly released in November 2015. It uses advances in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
to provide summaries for scholarly papers. The Semantic Scholar team is actively researching the use of artificial-intelligence in
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
, machine learning, Human-Computer interaction, and
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
. Semantic Scholar began as a database surrounding the topics of computer science, geoscience, and neuroscience. However, in 2017 the system began including
biomedical literature Medical literature is the scientific literature of medicine: articles in journals and texts in books devoted to the field of medicine. Many references to the medical literature include the health care literature generally, including that of denti ...
in its corpus. As of September 2022, they now include over 200 million publications from all fields of science.


Technology

Semantic Scholar provides a one-sentence summary of scientific literature. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices. It also seeks to ensure that the three million scientific papers published yearly reach readers, since it is estimated that only half of this literature are ever read. Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique. The project uses a combination of machine learning,
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
, and machine vision to add a layer of semantic analysis to the traditional methods of citation analysis, and to extract relevant figures, tables, entities, and venues from papers. In contrast with Google Scholar and PubMed, Semantic Scholar is designed to highlight the most important and influential elements of a paper. The AI technology is designed to identify hidden connections and links between research topics. Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the Microsoft Academic Knowledge Graph, Springer Nature's SciGraph, and the Semantic Scholar Corpus. Each paper hosted by Semantic Scholar is assigned a unique
identifier An identifier is a name that identifies (that is, labels the identity of) either a unique object or a unique ''class'' of objects, where the "object" or class may be an idea, physical countable object (or class thereof), or physical noncountable ...
called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example: :: Semantic Scholar is free to use and unlike similar search engines (i.e. Google Scholar) does not search for material that is behind a
paywall A paywall is a method of restricting access to content, with a purchase or a paid subscription, especially news. Beginning in the mid-2010s, newspapers started implementing paywalls on their websites as a way to increase revenue after years of ...
. One study compared the search abilities of Semantic Scholar through a systematic approach, and found the search engine to be 98.88% accurate when attempting to uncover the data. The same study examined other Semantic Scholar functions, including tools to survey
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
as well as several citation tools.


Number of users and publications

As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from computer science and biomedicine. In March 2018, Doug Raymond, who developed machine learning initiatives for the Amazon Alexa platform, was hired to lead the Semantic Scholar project. As of August 2019, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million after the addition of the Microsoft Academic Graph records. In 2020, a partnership between Semantic Scholar and the University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus. At the end of 2020, Semantic Scholar had indexed 190 million papers. In 2020, users of Semantic Scholar reached seven million a month.


See also

* * * *
List of academic databases and search engines This article contains a representative list of notable databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repositories, archives, or other collections of scientific and ...
*


References


External links

* {{Authority control Bibliographic databases in computer science Scholarly search services Applications of artificial intelligence