Google Scholar is a freely accessible
web search engine
A search engine is a software system designed to carry out web searches. They search the World Wide Web in a systematic way for particular information specified in a textual web search query. The search results are generally presented in a ...
that indexes the full text or
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
of
scholarly literature
Academic publishing is the subfield of publishing which distributes academic research and scholarship. Most academic work is published in academic journal articles, books or theses. The part of academic written output that is not formally pub ...
across an array of publishing formats and disciplines. Released in
beta
Beta (, ; uppercase , lowercase , or cursive ; grc, βῆτα, bē̂ta or ell, βήτα, víta) is the second letter of the Greek alphabet. In the system of Greek numerals, it has a value of 2. In Modern Greek, it represents the voiced labiod ...
in November 2004, the Google Scholar index includes
peer-reviewed
Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work (peers). It functions as a form of self-regulation by qualified members of a profession within the relevant field. Peer review ...
online
academic journal
An academic journal or scholarly journal is a periodical publication in which scholarship relating to a particular academic discipline is published. Academic journals serve as permanent and transparent forums for the presentation, scrutiny, and d ...
s and books, conference papers,
theses
A thesis ( : theses), or dissertation (abbreviated diss.), is a document submitted in support of candidature for an academic degree or professional qualification presenting the author's research and findings.International Standard ISO 7144: ...
and
dissertations,
preprint
In academic publishing, a preprint is a version of a scholarly or scientific paper that precedes formal peer review and publication in a peer-reviewed scholarly or scientific journal. The preprint may be available, often as a non-typeset versio ...
s,
abstracts,
technical report
A technical report (also scientific report) is a document that describes the process, progress, or results of technical or scientific research or the state of a technical or scientific research problem. It might also include recommendations and co ...
s, and other scholarly literature, including
court opinions and
patent
A patent is a type of intellectual property that gives its owner the legal right to exclude others from making, using, or selling an invention for a limited period of time in exchange for publishing an enabling disclosure of the invention."A p ...
s.
Google Scholar uses a web crawler, or web robot, to identify files for inclusion in the search results. For content to be indexed in Google Scholar, it must meet certain specified criteria. An earlier statistical estimate published in
PLOS One using a
mark and recapture method estimated approximately 80–90% coverage of all articles published in English with an estimate of 100 million.
[''Trend Watch'' (2014) ]Nature
Nature, in the broadest sense, is the physics, physical world or universe. "Nature" can refer to the phenomenon, phenomena of the physical world, and also to life in general. The study of nature is a large, if not the only, part of science. ...
509(7501), 405 – discussing Madian Khabsa and C Lee Giles (2014
''The Number of Scholarly Documents on the Public Web''
, PLOS One 9, e93949. This estimate also determined how many documents were freely available on the internet. Google Scholar has been criticized for not vetting journals and for including
predatory journals
Predatory publishing, also write-only publishing or deceptive publishing, is an exploitative academic publishing business model that involves charging publication fees to authors without checking articles for quality and legitimacy, and withou ...
in its index.
The University of Michigan Library and other libraries whose collections Google scanned for
Google Books
Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical c ...
and Google Scholar retained copies of the scans and have used them to create the
HathiTrust Digital Library.
History
Google Scholar arose out of a discussion between Alex Verstak and
Anurag Acharya
Anurag Acharya is an Indian-American engineer known for co-founding Google Scholar, of which he has been described as the "key inventor". As of 2014, Acharya held the title of Distinguished Engineer at Google. He and his Google colleague Alex Ver ...
, both of whom were then working on building Google's main web index. Their goal was to "make the world's problem solvers 10% more efficient"
[Steven Levy (2015]
The gentleman who made Scholar
. "Back channel" on Medium. by allowing easier and more accurate access to scientific knowledge. This goal is reflected in the Google Scholar's advertising slogan "
Stand on the shoulders of giants
The phrase "standing on the shoulders of giants" is a metaphor which means "using the understanding gained by major thinkers who have gone before in order to make intellectual progress".
It is a metaphor of Dwarf (mythology), dwarfs standing ...
", which was taken from an idea attributed to
Bernard of Chartres Bernard of Chartres ( la, Bernardus Carnotensis; died after 1124) was a twelfth-century French Neo-Platonist philosopher, scholar, and administrator.
Life
The date and place of his birth are unknown. He was believed to have been the elder broth ...
,
quoted by Isaac Newton, and is a nod to the scholars who have contributed to their fields over the centuries, providing the foundation for new intellectual achievements. One of the original sources for the texts in Google Scholar is the University of Michigan's print collection.
[
Scholar has gained a range of features over time. In 2006, a citation importing feature was implemented supporting bibliography managers, such as RefWorks, ]RefMan
Reference Manager was the first commercial reference management software package sold by Thomson Reuters. It was the first commercial software of its kind, originally developed by Ernest Beutler and his son, Earl Beutler, in 1982 through their c ...
, EndNote, and BibTeX. In 2007, Acharya announced that Google Scholar had started a program to digitize and host journal articles in agreement with their publishers, an effort separate from Google Books
Google Books (previously known as Google Book Search, Google Print, and by its code-name Project Ocean) is a service from Google Inc. that searches the full text of books and magazines that Google has scanned, converted to text using optical c ...
, whose scans of older journals do not include the metadata required for identifying specific articles in specific issues. In 2011, Google
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
removed Scholar from the toolbars on its search pages, making it both less easily accessible and less discoverable for users not already aware of its existence. Around this period, sites with similar features such as CiteSeer
CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.
CiteSeer's goal is to improve the dissemination and access of ac ...
, Scirus
Scirus was a comprehensive science-specific search engine, first launched in 2001. Like CiteSeerX and Google Scholar, it was focused on scientific information. Unlike CiteSeerX, Scirus was not only for computer sciences and IT and not all of the ...
, and Microsoft Windows Live Academic search were developed. Some of these are now defunct; in 2016, Microsoft launched a new competitor, Microsoft Academic
Microsoft Academic was a free internet-based academic search engines for academic publications and literature, developed by Microsoft Research, shut down in 2022. At the same time, OpenAlex launched and claimed to be a successor to Microsoft Aca ...
.
A major enhancement was rolled out in 2012, with the possibility for individual scholars to create personal "Scholar Citations profiles".[Alex Verstak:]
Fresh Look of Scholar Profiles
". Google Scholar Blog, August 21, 2014 A feature introduced in November 2013 allows logged-in users to save search results into the "Google Scholar library", a personal collection which the user can search separately and organize by tags. Via the "metrics" button, it reveals the top journals in a field of interest, and the articles generating these journal's impact can also be accessed. A metrics feature now supports viewing the impact of whole fields of science, as well as academic journals.
Features and specifications
Google Scholar allows users to search for digital or physical copies of articles, whether online or in libraries. It indexes "full-text journal articles, technical reports, preprints, theses
A thesis ( : theses), or dissertation (abbreviated diss.), is a document submitted in support of candidature for an academic degree or professional qualification presenting the author's research and findings.International Standard ISO 7144: ...
, books, and other documents, including selected Web pages that are deemed to be 'scholarly.'" Because many of Google Scholar's search results link to commercial journal articles, most people will be able to access only an abstract and the citation details of an article, and have to pay a fee to access the entire article.[ The most relevant results for the searched keywords will be listed first, in order of the author's ranking, the number of references that are linked to it and their relevance to other scholarly literature, and the ranking of the publication that the journal appears in.]
Groups and access to literature
Using its "group of" feature, it shows the available links to journal articles. In the 2005 version, this feature provided a link to both subscription-access versions of an article and to free full-text versions of articles; for most of 2006, it provided links to only the publishers' versions. Since December 2006, it has provided links to both published versions and major open access repositories, including all those posted on individual faculty web pages and other unstructured sources identified by similarity. On the other hand, Google Scholar doesn't allow to filter explicitly between toll access
Toll may refer to:
Transportation
* Toll (fee) a fee charged for the use of a road or waterway
** Road pricing, the modern practice of charging for road use
** Road toll (historic), the historic practice of charging for road use
** Shadow toll, ...
and open access
Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
resources, a feature offered Unpaywall and the tools which embed its data, such as Web of Science, Scopus
Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-l ...
and Unpaywall Journals
OurResearch, formerly known as ImpactStory, is a nonprofit organization which creates and distributes tools and services for libraries, institutions and researchers. The organization follows open practices with their data (to the extent allowed by ...
, used by libraries to calculate the real costs and value of their collections.
Citation analysis and tools
Through its "cited by" feature, Google Scholar provides access to abstracts of articles that have cited the article being viewed. It is this feature in particular that provides the citation indexing
A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebre ...
previously only found in CiteSeer
CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.
CiteSeer's goal is to improve the dissemination and access of ac ...
, Scopus
Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-l ...
, and Web of Science. Google Scholar also provides links so that citations can be either copied in various formats or imported into user-chosen reference managers such as Zotero.
"Scholar Citations profiles" are public author profiles that are editable by authors themselves. Individuals, logging on through a Google account with a bona fide address usually linked to an academic institution, can now create their own page giving their fields of interest and citations. Google Scholar automatically calculates and displays the individual's total citation count, h-index
The ''h''-index is an author-level metric that measures both the productivity and citation impact of the publications, initially used for an individual scientist or scholar. The ''h''-index correlates with obvious success indicators such as winn ...
, and i10-index. According to Google, "three-quarters of Scholar search results pages ... show links to the authors' public profiles" as of August 2014.
Related articles
Through its "Related articles" feature, Google Scholar presents a list of closely related articles, ranked primarily by how similar these articles are to the original result, but also taking into account the relevance of each paper.
US legal case database
Google Scholar's legal database of US cases is extensive. Users can search and read published opinions of US state appellate and supreme court cases since 1950, US federal district, appellate, tax, and bankruptcy courts since 1923 and US Supreme Court cases since 1791. Google Scholar embeds clickable citation links within the case and the How Cited tab allows lawyers to research prior case law and the subsequent citations to the court decision.
Ranking algorithm
While most academic databases and search engines allow users to select one factor (e.g. relevance, citation counts, or publication date) to rank results, Google Scholar ranks results with a combined ranking algorithm in a "way researchers do, weighing the full text of each article, the author, the publication in which the article appears, and how often the piece has been cited in other scholarly literature". Research has shown that Google Scholar puts high weight especially on citation counts,[Jöran Beel and Bela Gipp]
Google Scholar's Ranking Algorithm: An Introductory Overview
In Birger Larsen and Jacqueline Leta, editors, Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI'09), vol. 1, pp. 230–41, Rio de Janeiro, July 2009. International Society for Scientometrics and Informetrics. . as well as words included in a document's title. In searches by author or year, the first search results are often highly cited articles, as the number of citations is highly determinant, whereas in keyword searches the number of citations is probably the factor with the most weight, but other factors also participate.
Limitations and criticism
Some searchers found Google Scholar to be of comparable quality and utility to subscription-based databases when looking at citations of articles in some specific journals. The reviews recognize that its "cited by" feature in particular poses serious competition to Scopus
Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-l ...
and Web of Science. A study looking at the biomedical field found citation information in Google Scholar to be "sometimes inadequate, and less often updated". The coverage of Google Scholar may vary by discipline compared to other general databases. Google Scholar strives to include as many journals as possible, including predatory journals
Predatory publishing, also write-only publishing or deceptive publishing, is an exploitative academic publishing business model that involves charging publication fees to authors without checking articles for quality and legitimacy, and withou ...
, which may lack academic rigor. Specialists on predatory journals say that these kinds of journals "have polluted the global scientific record with pseudo-science" and "that Google Scholar dutifully and perhaps blindly includes in its central index."
Google Scholar does not publish a list of journals crawled or publishers included, and the frequency of its updates is uncertain. Bibliometric
Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
evidence suggests Google Scholar's coverage of the sciences and social sciences is competitive with other academic databases; as of 2017, Scholar's coverage of the arts and humanities has not been investigated empirically and Scholar's utility for disciplines in these fields remains ambiguous. Especially early on, some publishers did not allow Scholar to crawl their journals. Elsevier
Elsevier () is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell'', the ScienceDirect collection of electronic journals, '' Trends'', th ...
journals have been included since mid-2007, when Elsevier began to make most of its ScienceDirect
ScienceDirect is a website which provides access to a large bibliographic database of scientific and medical publications of the Dutch publisher Elsevier. It hosts over 18 million pieces of content from more than 4,000 academic journals and 30,0 ...
content available to Google Scholar and Google's web search. However, a 2014 study estimates that Google Scholar can find almost 90% (approximately 100 million) of all scholarly documents on the Web written in English. Large-scale longitudinal studies have found between 40 and 60 percent of scientific articles are available in full text via Google Scholar links.
Google Scholar puts high weight on citation counts in its ranking algorithm and therefore is being criticized for strengthening the Matthew effect
The Matthew effect of accumulated advantage, Matthew principle, or Matthew effect, is the tendency of individuals to accrue social or economic success in proportion to their initial level of popularity, friends, wealth, etc. It is sometimes summar ...
; as highly cited papers appear in top positions they gain more citations while new papers hardly appear in top positions and therefore get less attention by the users of Google Scholar and hence fewer citations. Google Scholar effect is a phenomenon when some researchers pick and cite works appearing in the top results on Google Scholar regardless of their contribution to the citing publication because they automatically assume these works' credibility and believe that editors, reviewers, and readers expect to see these citations. Google Scholar has problems identifying publications on the arXiv preprint server correctly. Interpunctuation characters in titles produce wrong search results, and authors are assigned to wrong papers, which leads to erroneous additional search results. Some search results are even given without any comprehensible reason.
Google Scholar is vulnerable to spam
Spam may refer to:
* Spam (food), a canned pork meat product
* Spamming, unsolicited or undesired electronic messages
** Email spam, unsolicited, undesired, or illegal email messages
** Messaging spam, spam targeting users of instant messaging ( ...
.[On the Robustness of Google Scholar against Spam](_blank)
/ref>[Scholarly Open Access – Did A Romanian Researcher Successfully Game Google Scholar to Raise his Citation Count?](_blank)
Researchers from the University of California, Berkeley
The University of California, Berkeley (UC Berkeley, Berkeley, Cal, or California) is a public land-grant research university in Berkeley, California. Established in 1868 as the University of California, it is the state's first land-grant u ...
and Otto-von-Guericke University Magdeburg
The Otto-von-Guericke University Magdeburg () (''OvGU'') was founded in 1993, making it one of the youngest universities in Germany. The university is located in Magdeburg, the Capital city of Saxony-Anhalt and has about 13.000 students in nine ...
demonstrated that citation counts on Google Scholar can be manipulated and complete non-sense articles created with SCIgen were indexed within Google Scholar. These researchers concluded that citation counts from Google Scholar should be used with care, especially when used to calculate performance metrics such as the h-index
The ''h''-index is an author-level metric that measures both the productivity and citation impact of the publications, initially used for an individual scientist or scholar. The ''h''-index correlates with obvious success indicators such as winn ...
or impact factor
The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as i ...
, which is in itself a poor predictor of article quality. Google Scholar started computing an h-index in 2012 with the advent of individual Scholar pages. Several downstream packages like ''Harzing's Publish or Perish'' also use its data. The practicality of manipulating h-index calculators by spoofing Google Scholar was demonstrated in 2010 by Cyril Labbe from Joseph Fourier University
Joseph Fourier University (UJF, french: Université Joseph Fourier, also known as Grenoble I) was a French university situated in the city of Grenoble and focused on the fields of sciences, technologies and health. It is now part of the Universit ...
, who managed to rank "Ike Antkare" ahead of Albert Einstein
Albert Einstein ( ; ; 14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. Einstein is best known for developing the theory ...
by means of a large set of SCIgen-produced documents citing each other (effectively an academic link farm). As of 2010, Google Scholar was not able to shepardize
''Shepard's Citations'' is a citator used in United States legal research that provides a list of all the authorities citing a particular case, statute, or other legal authority. The verb ''Shepardizing'' (sometimes written lower-case) refers to ...
case law, as Lexis
Lexis may refer to:
* Lexis (linguistics), the total bank of words and phrases of a particular language, the artifact of which is known as a lexicon
*Lexis (Aristotle), a complete group of words in a language
*LexisNexis, part of the LexisNexis onl ...
could. Unlike other indexes of academic work such as Scopus
Scopus is Elsevier's abstract and citation database launched in 2004. Scopus covers nearly 36,377 titles (22,794 active titles and 13,583 inactive titles) from approximately 11,678 publishers, of which 34,346 are peer-reviewed journals in top-l ...
and Web of Science, Google Scholar does not maintain an Application Programming Interface
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how t ...
that may be used to automate data retrieval. Use of web scrapers to obtain the contents of search results is also severely restricted by the implementation of CAPTCHAs. Google Scholar does not display or export Digital Object Identifiers (DOIs), a ''de facto'' standard implemented by all major academic publishers to uniquely identify and refer to individual pieces of academic work.
Search engine optimization for Google Scholar
Search engine optimization
Search engine optimization (SEO) is the process of improving the quality and quantity of Web traffic, website traffic to a website or a web page from web search engine, search engines. SEO targets unpaid traffic (known as "natural" or "Organ ...
(SEO) for traditional web search engines such as Google
Google LLC () is an American multinational technology company focusing on search engine technology, online advertising, cloud computing, computer software, quantum computing, e-commerce, artificial intelligence, and consumer electronics. ...
has been popular for many years. For several years, SEO has also been applied to academic search engines such as Google Scholar. SEO for academic articles is also called "academic search engine optimization" (ASEO) and defined as "the creation, publication, and modification of scholarly literature in a way that makes it easier for academic search engines to both crawl it and index it". ASEO has been adopted by several organizations, among them Elsevier
Elsevier () is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell'', the ScienceDirect collection of electronic journals, '' Trends'', th ...
, OpenScience, Mendeley, and SAGE Publishing
SAGE Publishing, formerly SAGE Publications, is an American independent publishing company founded in 1965 in New York by Sara Miller McCune and now based in Newbury Park, California.
It publishes more than 1,000 journals, more than 800 books ...
, to optimize their articles' rankings in Google Scholar. ASEO has negatives.
See also
* Bibliometrics
Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Biblio ...
* List of academic databases and search engines
This article contains a representative list of notable databases and search engines useful in an academic setting for finding and accessing articles in academic journals, institutional repositories, archives, or other collections of scientific and ...
* Open-access repository
References
Further reading
* Jensenius, F., Htun, M., Samuels, D., Singer, D., Lawrence, A., & Chwe, M. (2018).
The Benefits and Pitfalls of Google Scholar
''PS: Political Science & Politics'', 51(4), 820–824.
External links
Google Scholar
Google Scholar Blog
{{Authority control
Bibliographic databases and indexes
Scholar
A scholar is a person who pursues academic and intellectual activities, particularly academics who apply their intellectualism into expertise in an area of study. A scholar can also be an academic, who works as a professor, teacher, or researche ...
Scholarly search services
Computer-related introductions in 2004
Online databases
Citation indices