Citation analysis is the examination of the frequency, patterns, and graphs of
citation
A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of ...
s in documents. It uses the
directed graph
In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs.
Definition
In formal terms, a directed graph is an ordered pai ...
of citations — links from one document to another document — to reveal properties of the documents. A typical aim would be to identify the most important documents in a collection. A classic example is that of the citations between academic
articles and books. For another example, judges of law support their
judgements by referring back to judgements made in earlier cases (see
citation analysis in a legal context). An additional example is provided by patents which contain
prior art
Prior art (also known as state of the art or background art) is a concept in patent law used to determine the patentability of an invention, in particular whether an invention meets the novelty and the inventive step or non-obviousness criteria ...
, citation of earlier patents relevant to the current claim.
Documents can be associated with many other features in addition to citations, such as authors, publishers, journals as well as their actual texts. The general analysis of collections of documents is known as
bibliometrics
Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
and citation analysis is a key part of that field. For example,
bibliographic coupling and co-citation are association measures based on citation analysis (shared citations or shared references). The citations in a collection of documents can also be represented in forms such as a
citation graph, as pointed out by
Derek J. de Solla Price in his 1965 article "Networks of Scientific Papers".
This means that citation analysis draws on aspects of
social network analysis
Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
and
network science
Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors repr ...
.
An early example of automated citation indexing was
CiteSeer, which was used for citations between academic papers, while
Web of Science is an example of a modern system which includes more than just academic books and articles reflecting a wider range of information sources. Today, automated
citation index
A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebr ...
ing has changed the nature of citation analysis research, allowing millions of citations to be analyzed for
large-scale patterns and
knowledge discovery
Knowledge extraction is the creation of knowledge from structured ( relational databases, XML) and unstructured ( text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and mus ...
. Citation analysis tools can be used to compute various impact measures for scholars based on data from
citation indices
A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebr ...
. These have various applications, from the identification of expert referees to review papers and grant proposals, to providing transparent data in support of academic merit review,
tenure
Tenure is a category of academic appointment existing in some countries. A tenured post is an indefinite academic appointment that can be terminated only for cause or under extraordinary circumstances, such as financial exigency or program disco ...
, and promotion decisions. This competition for limited resources may lead to ethically questionable behavior to increase citations.
A great deal of criticism has been made of the practice of naively using citation analyses to compare the impact of different scholarly articles without taking into account other factors which may affect citation patterns. Among these criticisms, a recurrent one focuses on "field-dependent factors", which refers to the fact that citation practices vary from one area of science to another, and even between fields of research within a discipline.
Overview
While citation indexes were originally designed for
information retrieval, they are increasingly used for
bibliometrics
Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
and other studies involving research evaluation. Citation data is also the basis of the popular
journal impact factor
The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as in ...
.
There is a large body of literature on citation analysis, sometimes called
scientometrics
Scientometrics is the field of study which concerns itself with measuring and analysing scholarly literature. Scientometrics is a sub-field of informetrics. Major research issues include the measurement of the impact of research papers and academ ...
, a term invented by
Vasily Nalimov, or more specifically
bibliometrics
Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
. The field blossomed with the advent of the
Science Citation Index
The Science Citation Index Expanded – previously entitled Science Citation Index – is a citation index originally produced by the Institute for Scientific Information (ISI) and created by Eugene Garfield. It was officially launched in 1964 ...
, which now covers source literature from 1900 on. The leading journals of the field are ''
Scientometrics
Scientometrics is the field of study which concerns itself with measuring and analysing scholarly literature. Scientometrics is a sub-field of informetrics. Major research issues include the measurement of the impact of research papers and academ ...
,'' ''Informetrics,'' and the ''
Journal of the Association for Information Science and Technology''.
ASIST also hosts an
electronic mailing list
A mailing list is a collection of names and addresses used by an individual or an organization to send material to multiple recipients. The term is often extended to include the people subscribed to such a list, so the group of subscribers is re ...
called SIGMETRICS at ASIST. This method is undergoing a resurgence based on the wide dissemination of the Web of Science and Scopus subscription databases in many universities, and the universally available free citation tools such as
CiteBase,
CiteSeerX
CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.
CiteSeer's goal is to improve the dissemination and access of a ...
,
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
, and the former
Windows Live Academic
Live Search Academic was a Web search engine for scholarly literature that existed from April 2006 to May 2008; it was part of Microsoft's Live Search group of services. It was similar to Google Scholar, but rather than crawling the Internet for ...
(now available with extra features as
Microsoft Academic). Methods of citation analysis research include qualitative, quantitative and computational approaches. The main foci of such scientometric studies have included productivity comparisons, institutional research rankings, journal rankings establishing faculty productivity and tenure standards, assessing the influence of top scholarly articles, tracing the development trajectory of a science or technology field, and developing profiles of top authors and institutions in terms of research performance.
Legal citation
Legal citation is the practice of crediting and referring to authoritative documents and sources. The most common sources of authority cited are court decisions (cases), statutes, regulations, government documents, treaties, and scholarly writin ...
analysis is a citation analysis technique for analyzing
legal documents
Legal instrument is a legal term of art that is used for any formally executed written document that can be formally attributed to its author, records and formally expresses a legally enforceable act, process, or contractual duty, obligation, or ...
to facilitate the understanding of the inter-related regulatory compliance documents by the exploration the citations that connect provisions to other provisions within the same document or between different documents. Legal citation analysis uses a
citation graph extracted from a regulatory document, which could supplement
E-discovery - a process that leverages on technological innovations in
big data analytics.
[ by Cat Casey and Alejandra Perez]
History
In a 1965 paper,
Derek J. de Solla Price described the inherent linking characteristic of the SCI as "Networks of Scientific Papers".
The links between citing and cited papers became dynamic when the SCI began to be published online. The
Social Sciences Citation Index
The Social Sciences Citation Index (SSCI) is a commercial citation index product of Clarivate Analytics. It was originally developed by the Institute for Scientific Information from the Science Citation Index. The Social Sciences Citation Inde ...
became one of the first databases to be mounted on the
Dialog system in 1972. With the advent of the
CD-ROM edition, linking became even easier and enabled the use of
bibliographic coupling for finding related records. In 1973, Henry Small published his classic work on
Co-Citation analysis
Co-citation is the frequency with which two documents are '' cited'' together by other documents.. If at least one other document cites two documents in common, these documents are said to be ''co-cited''. The more co-citations two documents rece ...
which became a
self-organizing classification system that led to
document clustering experiments and eventually an "Atlas of Science" later called "Research Reviews".
The inherent topological and graphical nature of the worldwide citation network which is an inherent property of the
scientific literature
: ''For a broader class of literature, see Academic publishing.''
Scientific literature comprises scholarly publications that report original empirical and theoretical work in the natural and social sciences. Within an academic field, sci ...
was described by
Ralph Garner
Ralph (pronounced ; or ,) is a male given name of English, Scottish and Irish origin, derived from the Old English ''Rædwulf'' and Radulf, cognate with the Old Norse ''Raðulfr'' (''rað'' "counsel" and ''ulfr'' "wolf").
The most common forms ...
(
Drexel University
Drexel University is a private research university with its main campus in Philadelphia, Pennsylvania. Drexel's undergraduate school was founded in 1891 by Anthony J. Drexel, a financier and philanthropist. Founded as Drexel Institute of Ar ...
) in 1965.
The use of citation counts to rank journals was a technique used in the early part of the nineteenth century but the systematic ongoing measurement of these counts for scientific journals was initiated by Eugene Garfield at the Institute for Scientific Information who also pioneered the use of these counts to rank authors and
papers
Paper is a thin, flat material produced by the compression of fibres.
Paper(s) or The Paper may also refer to:
Publishing and academia
* Newspaper, a periodical publication
* ''Paper'' (magazine), an American monthly fashion and culture magazin ...
. In a landmark paper of 1965 he and
Irving Sher Irving may refer to:
People
* Irving (name), including a list of people with the name
Fictional characters
* Irving, the main character's love interest in Cathy (comic strip)
* Lloyd Irving, the main protagonist in the ''Tales of Symphonia'' vide ...
showed the correlation between citation frequency and eminence in demonstrating that
Nobel Prize
The Nobel Prizes ( ; sv, Nobelpriset ; no, Nobelprisen ) are five separate prizes that, according to Alfred Nobel's will of 1895, are awarded to "those who, during the preceding year, have conferred the greatest benefit to humankind." Alfre ...
winners published five times the average number of papers while their work was cited 30 to 50 times the average. In a long series of essays on the Nobel and other prizes Garfield reported this phenomenon. The usual summary measure is known as
impact factor
The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as ...
, the number of citations to a journal for the previous two years, divided by the number of articles published in those years. It is widely used, both for appropriate and inappropriate purposes—in particular, the use of this measure alone for ranking authors and papers is therefore
quite controversial.
In an early study in 1964 of the use of Citation Analysis in writing the history of
DNA, Garfield and Sher demonstrated the potential for generating
historiograph
Historiography is the study of the methods of historians in developing history as an academic discipline, and by extension is any body of historical work on a particular subject. The historiography of a specific topic covers how historians hav ...
s,
topological maps of the most important steps in the history of scientific topics. This work was later automated by E. Garfield,
A. I. Pudovkin
A is the first letter of the Latin and English alphabet.
A may also refer to:
Science and technology Quantities and units
* ''a'', a measure for the attraction between particles in the Van der Waals equation
* ''A'' value, a measure o ...
of the
Institute of Marine Biology,
Russian Academy of Sciences
The Russian Academy of Sciences (RAS; russian: Росси́йская акаде́мия нау́к (РАН) ''Rossíyskaya akadémiya naúk'') consists of the national academy of Russia; a network of scientific research institutes from across t ...
and
V. S. Istomin
''V.'' is the debut novel of Thomas Pynchon, published in 1963. It describes the exploits of a discharged U.S. Navy sailor named Benny Profane, his reconnection in New York with a group of pseudo-bohemian artists and hangers-on known as the Wh ...
of
Center for Teaching, Learning, and Technology,
Washington State University
Washington State University (Washington State, WSU, or informally Wazzu) is a public land-grant research university with its flagship, and oldest, campus in Pullman, Washington. Founded in 1890, WSU is also one of the oldest land-grant uni ...
and led to the creation of the
HistCite software around 2002.
Automatic citation indexing was introduced in 1998 by
Lee Giles,
Steve Lawrence
Steve Lawrence (born Sidney Liebowitz; July 8, 1935) is an American singer, comedian and actor, best known as a member of a duo with his wife Eydie Gormé, billed as " Steve and Eydie", and for his performance as Maury Sline, the manager and fri ...
and
Kurt Bollacker and enabled automatic algorithmic extraction and grouping of citations for any digital academic and scientific document. Where previous citation extraction was a manual process, citation measures could now scale up and be computed for any scholarly and scientific field and document venue, not just those selected by organizations such as ISI. This led to the creation of new systems for public and automated citation indexing, the first being
CiteSeer (now
CiteSeerX
CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.
CiteSeer's goal is to improve the dissemination and access of a ...
, soon followed by Cora, which focused primarily on the field of
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
and
information science
Information science (also known as information studies) is an academic field which is primarily concerned with analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of information. ...
. These were later followed by large scale academic domain citation systems such as the Google Scholar and Microsoft Academic. Such autonomous citation indexing is not yet perfect in citation extraction or citation clustering with an error rate estimated by some at 10% though a careful statistical sampling has yet to be done. This has resulted in such authors as
Ann Arbor
Anne, alternatively spelled Ann, is a form of the Latin female given name Anna. This in turn is a representation of the Hebrew Hannah, which means 'favour' or 'grace'. Related names include Annie.
Anne is sometimes used as a male name in the ...
,
Milton Keynes, and
Walton Hall being credited with extensive academic output.
SCI claims to create automatic citation indexing through purely programmatic methods. Even the older records have a similar magnitude of error.
Citation impact
Citation analysis for legal documents
Citation analysis for legal documents is an approach to facilitate the understanding and analysis of inter-related
regulatory compliance documents by exploration of the citations that connect
provisions to other provisions within the same document or between different documents. Citation analysis uses a
citation graph extracted from a regulatory document, which could supplement
E-discovery - a process that leverages on technological innovations in
big data analytics.
Controversies
*''
E-publishing
Electronic publishing (also referred to as publishing, digital publishing, or online publishing) includes the digital publication of e-books, digital magazines, and the development of digital libraries and catalogues. It also includes the editing ...
'': due to the unprecedented growth of
electronic resource (e-resource) availability, one of the questions currently being explored is, "how often are e-resources being cited in my field?" For instance, there are claims that On-Line access to
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...
literature
Literature is any collection of written work, but it is also used more narrowly for writings specifically considered to be an art form, especially prose fiction, drama, and poetry. In recent centuries, the definition has expanded to inclu ...
leads to higher citation rates, however,
humanities
Humanities are academic disciplines that study aspects of human society and culture. In the Renaissance, the term contrasted with divinity and referred to what is now called classics, the main area of secular study in universities at th ...
articles may suffer if not in print.
* ''
Self-citations'': it has been criticized that authors game the system by accumulating citations by citing themselves excessively.
For instance, it has been found that men tend to cite themselves more often than women.
*Citation pollution: the infiltration of
retracted research, or fake research, being cited in legitimate research, but negatively impacting on the validity of the research.
It is due to various factors, including the publication race and the concerning rise in unscrupulous business practices related to so-called
predatory
Predation is a biological interaction where one organism, the predator, kills and eats another organism, its prey. It is one of a family of common feeding behaviours that includes parasitism and micropredation (which usually do not kill t ...
or deceptive publishers, research quality, in general, is facing different types of threats.
See also
*
Google economy
PageRank (PR) is an algorithm used by Google Search to rank web pages in their search engine results. It is named after both the term "web page" and co-founder Larry Page. PageRank is a way of measuring the importance of website pages. According ...
*
Journalology
Journalology (also known as publication science) is the scholarly study of all aspects of the academic publishing process. The field seeks to improve the quality of scholarly research by implementing evidence-based practices in academic publishin ...
*
Main path analysis Main path analysis is a mathematical tool, first proposed by Hummon and Doreian in 1989, to identify the major paths in a citation network, which is one form of a directed acyclic graph (DAG). It has since become an effective technique for mappin ...
*
San Francisco Declaration on Research Assessment
The San Francisco Declaration on Research Assessment (DORA) intends to halt the practice of correlating the journal impact factor to the merits of a specific scientist's contributions. Also according to this statement, this practice creates bia ...
Notes
References
{{Reflist
Analysis
Citation metrics