HOME

TheInfoList



OR:

Citation analysis is the examination of the frequency, patterns, and graphs of
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of ...
s in documents. It uses the
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
of citations — links from one document to another document — to reveal properties of the documents. A typical aim would be to identify the most important documents in a collection. A classic example is that of the citations between academic
article Article often refers to: * Article (grammar), a grammatical element used to indicate definiteness or indefiniteness * Article (publishing), a piece of nonfictional prose that is an independent part of a publication Article may also refer to: G ...
s and books. For another example, judges of law support their judgements by referring back to judgements made in earlier cases (see citation analysis in a legal context). An additional example is provided by patents which contain
prior art Prior art (also known as state of the art or background art) is a concept in patent law used to determine the patentability of an invention, in particular whether an invention meets the novelty and the inventive step or non-obviousness criteria f ...
, citation of earlier patents relevant to the current claim. Documents can be associated with many other features in addition to citations, such as authors, publishers, journals as well as their actual texts. The general analysis of collections of documents is known as
bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Biblio ...
and citation analysis is a key part of that field. For example, bibliographic coupling and co-citation are association measures based on citation analysis (shared citations or shared references). The citations in a collection of documents can also be represented in forms such as a
citation graph A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents. Each vertex (or node) in the graph represents a document in the collection, an ...
, as pointed out by
Derek J. de Solla Price Derek John de Solla Price (22 January 1922 – 3 September 1983) was a British physicist, historian of science, and information scientist. He was known for his investigation of the Antikythera mechanism, an ancient Greek planetary computer, an ...
in his 1965 article "Networks of Scientific Papers". This means that citation analysis draws on aspects of
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) a ...
and
network science Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors repre ...
. An early example of automated citation indexing was
CiteSeer CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
, which was used for citations between academic papers, while Web of Science is an example of a modern system which includes more than just academic books and articles reflecting a wider range of information sources. Today, automated
citation index A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebr ...
ing has changed the nature of citation analysis research, allowing millions of citations to be analyzed for large-scale patterns and
knowledge discovery Knowledge extraction is the creation of Knowledge representation and reasoning, knowledge from structured (relational databases, XML) and unstructured (text (literary theory), text, documents, images) sources. The resulting knowledge needs to be in ...
. Citation analysis tools can be used to compute various impact measures for scholars based on data from
citation indices A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebr ...
. These have various applications, from the identification of expert referees to review papers and grant proposals, to providing transparent data in support of academic merit review,
tenure Tenure is a category of academic appointment existing in some countries. A tenured post is an indefinite academic appointment that can be terminated only for cause or under extraordinary circumstances, such as financial exigency or program disco ...
, and promotion decisions. This competition for limited resources may lead to ethically questionable behavior to increase citations. A great deal of criticism has been made of the practice of naively using citation analyses to compare the impact of different scholarly articles without taking into account other factors which may affect citation patterns. Among these criticisms, a recurrent one focuses on "field-dependent factors", which refers to the fact that citation practices vary from one area of science to another, and even between fields of research within a discipline.


Overview

While citation indexes were originally designed for
information retrieval Information retrieval (IR) in computing and information science is the process of obtaining information system resources that are relevant to an information need from a collection of those resources. Searches can be based on full-text or other co ...
, they are increasingly used for
bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Biblio ...
and other studies involving research evaluation. Citation data is also the basis of the popular
journal impact factor The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as in ...
. There is a large body of literature on citation analysis, sometimes called
scientometrics Scientometrics is the field of study which concerns itself with measuring and analysing scholarly literature. Scientometrics is a sub-field of informetrics. Major research issues include the measurement of the impact of research papers and academi ...
, a term invented by
Vasily Nalimov Vasiliy Vasilievich Nalimov (Васи́лий Васи́льевич Нали́мов; 4 November 1910 – 19 January 1997) was a Russian philosopher and humanist and wrote on Transpersonal Psychology. His main areas of research were the philosophy ...
, or more specifically
bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Biblio ...
. The field blossomed with the advent of the
Science Citation Index The Science Citation Index Expanded – previously entitled Science Citation Index – is a citation index originally produced by the Institute for Scientific Information (ISI) and created by Eugene Garfield. It was officially launched in 1964 and ...
, which now covers source literature from 1900 on. The leading journals of the field are ''
Scientometrics Scientometrics is the field of study which concerns itself with measuring and analysing scholarly literature. Scientometrics is a sub-field of informetrics. Major research issues include the measurement of the impact of research papers and academi ...
,'' ''Informetrics,'' and the ''
Journal of the Association for Information Science and Technology The ''Journal of the Association for Information Science and Technology'' is a monthly peer-reviewed academic journal covering all aspects of information science published by Wiley-Blackwell on behalf of the Association for Information Science and ...
''. ASIST also hosts an
electronic mailing list A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is re ...
called SIGMETRICS at ASIST. This method is undergoing a resurgence based on the wide dissemination of the Web of Science and Scopus subscription databases in many universities, and the universally available free citation tools such as CiteBase,
CiteSeerX CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
,
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
, and the former Windows Live Academic (now available with extra features as
Microsoft Academic Microsoft Academic was a free internet-based academic search engines for academic publications and literature, developed by Microsoft Research, shut down in 2022. At the same time, OpenAlex launched and claimed to be a successor to Microsoft Aca ...
). Methods of citation analysis research include qualitative, quantitative and computational approaches. The main foci of such scientometric studies have included productivity comparisons, institutional research rankings, journal rankings establishing faculty productivity and tenure standards, assessing the influence of top scholarly articles, tracing the development trajectory of a science or technology field, and developing profiles of top authors and institutions in terms of research performance.
Legal citation Legal citation is the practice of crediting and referring to authoritative documents and sources. The most common sources of authority cited are court decisions (cases), statutes, regulations, government documents, treaties, and scholarly writin ...
analysis is a citation analysis technique for analyzing
legal documents Legal instrument is a legal term of art that is used for any formally executed written document that can be formally attributed to its author, records and formally expresses a legally enforceable act, process, or contractual duty, obligation, or ...
to facilitate the understanding of the inter-related regulatory compliance documents by the exploration the citations that connect provisions to other provisions within the same document or between different documents. Legal citation analysis uses a
citation graph A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents. Each vertex (or node) in the graph represents a document in the collection, an ...
extracted from a regulatory document, which could supplement
E-discovery Electronic discovery (also ediscovery or e-discovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often refe ...
- a process that leverages on technological innovations in
big data analytics Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
. by Cat Casey and Alejandra Perez


History

In a 1965 paper,
Derek J. de Solla Price Derek John de Solla Price (22 January 1922 – 3 September 1983) was a British physicist, historian of science, and information scientist. He was known for his investigation of the Antikythera mechanism, an ancient Greek planetary computer, an ...
described the inherent linking characteristic of the SCI as "Networks of Scientific Papers". The links between citing and cited papers became dynamic when the SCI began to be published online. The
Social Sciences Citation Index The Social Sciences Citation Index (SSCI) is a commercial citation index product of Clarivate Analytics. It was originally developed by the Institute for Scientific Information from the Science Citation Index. The Social Sciences Citation Index is ...
became one of the first databases to be mounted on the
Dialog Dialog is an online information service owned by ProQuest, who acquired it from Thomson Reuters in mid-2008. Dialog was one of the predecessors of the World Wide Web as a provider of information, though not in form. The earliest form of the Dial ...
system in 1972. With the advent of the
CD-ROM A CD-ROM (, compact disc read-only memory) is a type of read-only memory consisting of a pre-pressed optical compact disc that contains data. Computers can read—but not write or erase—CD-ROMs. Some CDs, called enhanced CDs, hold both comput ...
edition, linking became even easier and enabled the use of
bibliographic coupling Bibliographic coupling, like co-citation, is a similarity measure that uses citation analysis to establish a similarity relationship between documents. Bibliographic coupling occurs when two works reference a common third work in their bibliographi ...
for finding related records. In 1973, Henry Small published his classic work on Co-Citation analysis which became a
self-organizing Self-organization, also called spontaneous order in the social sciences, is a process where some form of overall order arises from local interactions between parts of an initially disordered system. The process can be spontaneous when suff ...
classification system that led to
document clustering Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering. Overview Document cluster ...
experiments and eventually an "Atlas of Science" later called "Research Reviews". The inherent topological and graphical nature of the worldwide citation network which is an inherent property of the
scientific literature : ''For a broader class of literature, see Academic publishing.'' Scientific literature comprises scholarly publications that report original empirical and theoretical work in the natural and social sciences. Within an academic field, scient ...
was described by Ralph Garner (
Drexel University Drexel University is a private research university with its main campus in Philadelphia, Pennsylvania. Drexel's undergraduate school was founded in 1891 by Anthony J. Drexel, a financier and philanthropist. Founded as Drexel Institute of Art, S ...
) in 1965. The use of citation counts to rank journals was a technique used in the early part of the nineteenth century but the systematic ongoing measurement of these counts for scientific journals was initiated by Eugene Garfield at the Institute for Scientific Information who also pioneered the use of these counts to rank authors and papers. In a landmark paper of 1965 he and Irving Sher showed the correlation between citation frequency and eminence in demonstrating that
Nobel Prize The Nobel Prizes ( ; sv, Nobelpriset ; no, Nobelprisen ) are five separate prizes that, according to Alfred Nobel's will of 1895, are awarded to "those who, during the preceding year, have conferred the greatest benefit to humankind." Alfr ...
winners published five times the average number of papers while their work was cited 30 to 50 times the average. In a long series of essays on the Nobel and other prizes Garfield reported this phenomenon. The usual summary measure is known as
impact factor The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as i ...
, the number of citations to a journal for the previous two years, divided by the number of articles published in those years. It is widely used, both for appropriate and inappropriate purposes—in particular, the use of this measure alone for ranking authors and papers is therefore quite controversial. In an early study in 1964 of the use of Citation Analysis in writing the history of DNA, Garfield and Sher demonstrated the potential for generating historiographs,
topological map In cartography and geology, a topological map is a type of diagram that has been simplified so that only vital information remains and unnecessary detail has been removed. These maps lack scale, also distance and direction are subject to change a ...
s of the most important steps in the history of scientific topics. This work was later automated by E. Garfield, A. I. Pudovkin of the Institute of Marine Biology,
Russian Academy of Sciences The Russian Academy of Sciences (RAS; russian: Росси́йская акаде́мия нау́к (РАН) ''Rossíyskaya akadémiya naúk'') consists of the national academy of Russia; a network of scientific research institutes from across t ...
and V. S. Istomin of Center for Teaching, Learning, and Technology,
Washington State University Washington State University (Washington State, WSU, or informally Wazzu) is a public land-grant research university with its flagship, and oldest, campus in Pullman, Washington. Founded in 1890, WSU is also one of the oldest land-grant unive ...
and led to the creation of the
HistCite HistCite is a software package used for bibliometric analysis and information visualization. It was developed by Eugene Garfield, the founder of the Institute for Scientific Information and the inventor of important information retrieval tools such ...
software around 2002. Automatic citation indexing was introduced in 1998 by
Lee Giles Clyde Lee Giles is an American computer scientist and the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. He is also Graduate Faculty Professor of Computer Science and Engineering, ...
,
Steve Lawrence Steve Lawrence (born Sidney Liebowitz; July 8, 1935) is an American singer, comedian and actor, best known as a member of a duo with his wife Eydie Gormé, billed as " Steve and Eydie", and for his performance as Maury Sline, the manager and fr ...
and
Kurt Bollacker Kurt Bollacker is an American computer scientist with a research background in the areas of machine learning, digital libraries, semantic networks, and electro-cardiographic modeling. He received a Ph.D. in Computer Engineering from The Universi ...
and enabled automatic algorithmic extraction and grouping of citations for any digital academic and scientific document. Where previous citation extraction was a manual process, citation measures could now scale up and be computed for any scholarly and scientific field and document venue, not just those selected by organizations such as ISI. This led to the creation of new systems for public and automated citation indexing, the first being
CiteSeer CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
(now
CiteSeerX CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
, soon followed by Cora, which focused primarily on the field of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and
information science Information science (also known as information studies) is an academic field which is primarily concerned with analysis, collection, Categorization, classification, manipulation, storage, information retrieval, retrieval, movement, dissemin ...
. These were later followed by large scale academic domain citation systems such as the Google Scholar and Microsoft Academic. Such autonomous citation indexing is not yet perfect in citation extraction or citation clustering with an error rate estimated by some at 10% though a careful statistical sampling has yet to be done. This has resulted in such authors as
Ann Arbor Anne, alternatively spelled Ann, is a form of the Latin female given name Anna (name), Anna. This in turn is a representation of the Hebrew Hannah (given name), Hannah, which means 'favour' or 'grace'. Related names include Annie (given name), ...
,
Milton Keynes Milton Keynes ( ) is a city and the largest settlement in Buckinghamshire, England, about north-west of London. At the 2021 Census, the population of its urban area was over . The River Great Ouse forms its northern boundary; a tributary ...
, and
Walton Hall Walton may refer to: People * Walton (given name) * Walton (surname) * Susana, Lady Walton (1926–2010), Argentine writer Places Canada *Walton, Nova Scotia, a community ** Walton River (Nova Scotia) *Walton, Ontario, a hamlet United Kingdo ...
being credited with extensive academic output. SCI claims to create automatic citation indexing through purely programmatic methods. Even the older records have a similar magnitude of error.


Citation impact


Citation analysis for legal documents

Citation analysis for legal documents is an approach to facilitate the understanding and analysis of inter-related
regulatory compliance In general, compliance means conforming to a rule, such as a specification, policy, standard or law. Compliance has traditionally been explained by reference to the deterrence theory, according to which punishing a behavior will decrease the viol ...
documents by exploration of the citations that connect provisions to other provisions within the same document or between different documents. Citation analysis uses a
citation graph A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents. Each vertex (or node) in the graph represents a document in the collection, an ...
extracted from a regulatory document, which could supplement
E-discovery Electronic discovery (also ediscovery or e-discovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often refe ...
- a process that leverages on technological innovations in
big data analytics Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
.


Controversies

*'' E-publishing'': due to the unprecedented growth of electronic resource (e-resource) availability, one of the questions currently being explored is, "how often are e-resources being cited in my field?" For instance, there are claims that On-Line access to
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
literature Literature is any collection of written work, but it is also used more narrowly for writings specifically considered to be an art form, especially prose fiction, drama, and poetry. In recent centuries, the definition has expanded to include ...
leads to higher citation rates, however,
humanities Humanities are academic disciplines that study aspects of human society and culture. In the Renaissance, the term contrasted with divinity and referred to what is now called classics, the main area of secular study in universities at the t ...
articles may suffer if not in print. * '' Self-citations'': it has been criticized that authors game the system by accumulating citations by citing themselves excessively. For instance, it has been found that men tend to cite themselves more often than women. *Citation pollution: the infiltration of retracted research, or fake research, being cited in legitimate research, but negatively impacting on the validity of the research. It is due to various factors, including the publication race and the concerning rise in unscrupulous business practices related to so-called
predatory Predation is a biological interaction where one organism, the predator, kills and eats another organism, its prey. It is one of a family of common feeding behaviours that includes parasitism and micropredation (which usually do not kill th ...
or deceptive publishers, research quality, in general, is facing different types of threats.


See also

* Google economy *
Journalology Journalology (also known as publication science) is the scholarly study of all aspects of the academic publishing process. The field seeks to improve the quality of scholarly research by implementing evidence-based practices in academic publishing. ...
* Main path analysis *
San Francisco Declaration on Research Assessment The San Francisco Declaration on Research Assessment (DORA) intends to halt the practice of correlating the journal impact factor to the merits of a specific scientist's contributions. Also according to this statement, this practice creates bias ...


Notes


References

{{Reflist Analysis Citation metrics