HOME

TheInfoList



OR:

Citation analysis is the examination of the frequency, patterns, and graphs of citations in documents. It uses the
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
of citations — links from one document to another document — to reveal properties of the documents. A typical aim would be to identify the most important documents in a collection. A classic example is that of the citations between academic articles and books. For another example, judges of law support their judgements by referring back to judgements made in earlier cases (see citation analysis in a legal context). An additional example is provided by patents which contain
prior art Prior art (also known as state of the art or background art) is a concept in patent law used to determine the patentability of an invention, in particular whether an invention meets the novelty and the inventive step or non-obviousness criteria ...
, citation of earlier patents relevant to the current claim. Documents can be associated with many other features in addition to citations, such as authors, publishers, journals as well as their actual texts. The general analysis of collections of documents is known as
bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
and citation analysis is a key part of that field. For example, bibliographic coupling and co-citation are association measures based on citation analysis (shared citations or shared references). The citations in a collection of documents can also be represented in forms such as a
citation graph A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents. Each vertex (or node) in the graph represents a document in the collection, a ...
, as pointed out by Derek J. de Solla Price in his 1965 article "Networks of Scientific Papers". This means that citation analysis draws on aspects of
social network analysis Social network analysis (SNA) is the process of investigating social structures through the use of networks and graph theory. It characterizes networked structures in terms of ''nodes'' (individual actors, people, or things within the network) ...
and
network science Network science is an academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks, considering distinct elements or actors rep ...
. An early example of automated citation indexing was
CiteSeer CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
, which was used for citations between academic papers, while Web of Science is an example of a modern system which includes more than just academic books and articles reflecting a wider range of information sources. Today, automated
citation index A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebre ...
ing has changed the nature of citation analysis research, allowing millions of citations to be analyzed for large-scale patterns and
knowledge discovery Knowledge extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must r ...
. Citation analysis tools can be used to compute various impact measures for scholars based on data from
citation indices A citation index is a kind of bibliographic index, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. A form of citation index is first found in 12th-century Hebre ...
. These have various applications, from the identification of expert referees to review papers and grant proposals, to providing transparent data in support of academic merit review,
tenure Tenure is a category of academic appointment existing in some countries. A tenured post is an indefinite academic appointment that can be terminated only for cause or under extraordinary circumstances, such as financial exigency or program disco ...
, and promotion decisions. This competition for limited resources may lead to ethically questionable behavior to increase citations. A great deal of criticism has been made of the practice of naively using citation analyses to compare the impact of different scholarly articles without taking into account other factors which may affect citation patterns. Among these criticisms, a recurrent one focuses on "field-dependent factors", which refers to the fact that citation practices vary from one area of science to another, and even between fields of research within a discipline.


Overview

While citation indexes were originally designed for information retrieval, they are increasingly used for
bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
and other studies involving research evaluation. Citation data is also the basis of the popular
journal impact factor The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as in ...
. There is a large body of literature on citation analysis, sometimes called scientometrics, a term invented by
Vasily Nalimov Vasiliy Vasilievich Nalimov (Васи́лий Васи́льевич Нали́мов; 4 November 1910 – 19 January 1997) was a Russian philosopher and humanist and wrote on Transpersonal Psychology. His main areas of research were the philosophy ...
, or more specifically
bibliometrics Bibliometrics is the use of statistical methods to analyse books, articles and other publications, especially in regard with scientific contents. Bibliometric methods are frequently used in the field of library and information science. Bibliom ...
. The field blossomed with the advent of the
Science Citation Index The Science Citation Index Expanded – previously entitled Science Citation Index – is a citation index originally produced by the Institute for Scientific Information (ISI) and created by Eugene Garfield. It was officially launched in 1964 and ...
, which now covers source literature from 1900 on. The leading journals of the field are '' Scientometrics,'' ''Informetrics,'' and the ''
Journal of the Association for Information Science and Technology The ''Journal of the Association for Information Science and Technology'' is a monthly peer-reviewed academic journal covering all aspects of information science published by Wiley-Blackwell on behalf of the Association for Information Science and ...
''. ASIST also hosts an electronic mailing list called SIGMETRICS at ASIST. This method is undergoing a resurgence based on the wide dissemination of the Web of Science and Scopus subscription databases in many universities, and the universally available free citation tools such as CiteBase,
CiteSeerX CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
,
Google Scholar Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes ...
, and the former Windows Live Academic (now available with extra features as Microsoft Academic). Methods of citation analysis research include qualitative, quantitative and computational approaches. The main foci of such scientometric studies have included productivity comparisons, institutional research rankings, journal rankings establishing faculty productivity and tenure standards, assessing the influence of top scholarly articles, tracing the development trajectory of a science or technology field, and developing profiles of top authors and institutions in terms of research performance. Legal citation analysis is a citation analysis technique for analyzing
legal documents Legal instrument is a legal term of art that is used for any formally executed written document that can be formally attributed to its author, records and formally expresses a legally enforceable act, process, or contractual duty, obligation, or ...
to facilitate the understanding of the inter-related regulatory compliance documents by the exploration the citations that connect provisions to other provisions within the same document or between different documents. Legal citation analysis uses a
citation graph A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents. Each vertex (or node) in the graph represents a document in the collection, a ...
extracted from a regulatory document, which could supplement
E-discovery Electronic discovery (also ediscovery or e-discovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often refe ...
- a process that leverages on technological innovations in
big data analytics Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
. by Cat Casey and Alejandra Perez


History

In a 1965 paper, Derek J. de Solla Price described the inherent linking characteristic of the SCI as "Networks of Scientific Papers". The links between citing and cited papers became dynamic when the SCI began to be published online. The Social Sciences Citation Index became one of the first databases to be mounted on the
Dialog Dialog is an online information service owned by ProQuest, who acquired it from Thomson Reuters in mid-2008. Dialog was one of the predecessors of the World Wide Web as a provider of information, though not in form. The earliest form of the Dial ...
system in 1972. With the advent of the CD-ROM edition, linking became even easier and enabled the use of bibliographic coupling for finding related records. In 1973, Henry Small published his classic work on Co-Citation analysis which became a
self-organizing Self-organization, also called spontaneous order in the social sciences, is a process where some form of overall order and disorder, order arises from local interactions between parts of an initially disordered system. The process can be spon ...
classification system that led to
document clustering Document clustering (or text clustering) is the application of cluster analysis to textual documents. It has applications in automatic document organization, topic extraction and fast information retrieval or filtering. Overview Document cluster ...
experiments and eventually an "Atlas of Science" later called "Research Reviews". The inherent topological and graphical nature of the worldwide citation network which is an inherent property of the
scientific literature : ''For a broader class of literature, see Academic publishing.'' Scientific literature comprises scholarly publications that report original empirical and theoretical work in the natural and social sciences. Within an academic field, scie ...
was described by Ralph Garner ( Drexel University) in 1965. The use of citation counts to rank journals was a technique used in the early part of the nineteenth century but the systematic ongoing measurement of these counts for scientific journals was initiated by Eugene Garfield at the Institute for Scientific Information who also pioneered the use of these counts to rank authors and papers. In a landmark paper of 1965 he and Irving Sher showed the correlation between citation frequency and eminence in demonstrating that
Nobel Prize The Nobel Prizes ( ; sv, Nobelpriset ; no, Nobelprisen ) are five separate prizes that, according to Alfred Nobel's will of 1895, are awarded to "those who, during the preceding year, have conferred the greatest benefit to humankind." Alfr ...
winners published five times the average number of papers while their work was cited 30 to 50 times the average. In a long series of essays on the Nobel and other prizes Garfield reported this phenomenon. The usual summary measure is known as
impact factor The impact factor (IF) or journal impact factor (JIF) of an academic journal is a scientometric index calculated by Clarivate that reflects the yearly mean number of citations of articles published in the last two years in a given journal, as ...
, the number of citations to a journal for the previous two years, divided by the number of articles published in those years. It is widely used, both for appropriate and inappropriate purposes—in particular, the use of this measure alone for ranking authors and papers is therefore quite controversial. In an early study in 1964 of the use of Citation Analysis in writing the history of DNA, Garfield and Sher demonstrated the potential for generating historiographs,
topological map In cartography and geology, a topological map is a type of diagram that has been simplified so that only vital information remains and unnecessary detail has been removed. These maps lack scale, also distance and direction are subject to change a ...
s of the most important steps in the history of scientific topics. This work was later automated by E. Garfield, A. I. Pudovkin of the Institute of Marine Biology,
Russian Academy of Sciences The Russian Academy of Sciences (RAS; russian: Росси́йская акаде́мия нау́к (РАН) ''Rossíyskaya akadémiya naúk'') consists of the national academy of Russia; a network of scientific research institutes from across ...
and V. S. Istomin of Center for Teaching, Learning, and Technology,
Washington State University Washington State University (Washington State, WSU, or informally Wazzu) is a public land-grant research university with its flagship, and oldest, campus in Pullman, Washington. Founded in 1890, WSU is also one of the oldest land-grant uni ...
and led to the creation of the
HistCite HistCite is a software package used for bibliometric analysis and information visualization. It was developed by Eugene Garfield, the founder of the Institute for Scientific Information and the inventor of important information retrieval tools such ...
software around 2002. Automatic citation indexing was introduced in 1998 by
Lee Giles Clyde Lee Giles is an American computer scientist and the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. He is also Graduate Faculty Professor of Computer Science and Engineering ...
,
Steve Lawrence Steve Lawrence (born Sidney Liebowitz; July 8, 1935) is an American singer, comedian and actor, best known as a member of a duo with his wife Eydie Gormé, billed as " Steve and Eydie", and for his performance as Maury Sline, the manager and f ...
and Kurt Bollacker and enabled automatic algorithmic extraction and grouping of citations for any digital academic and scientific document. Where previous citation extraction was a manual process, citation measures could now scale up and be computed for any scholarly and scientific field and document venue, not just those selected by organizations such as ISI. This led to the creation of new systems for public and automated citation indexing, the first being
CiteSeer CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
(now
CiteSeerX CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science. CiteSeer's goal is to improve the dissemination and access of ac ...
, soon followed by Cora, which focused primarily on the field of
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
and
information science Information science (also known as information studies) is an academic field which is primarily concerned with analysis, collection, classification, manipulation, storage, retrieval, movement, dissemination, and protection of informatio ...
. These were later followed by large scale academic domain citation systems such as the Google Scholar and Microsoft Academic. Such autonomous citation indexing is not yet perfect in citation extraction or citation clustering with an error rate estimated by some at 10% though a careful statistical sampling has yet to be done. This has resulted in such authors as Ann Arbor, Milton Keynes, and Walton Hall being credited with extensive academic output. SCI claims to create automatic citation indexing through purely programmatic methods. Even the older records have a similar magnitude of error.


Citation impact


Citation analysis for legal documents

Citation analysis for legal documents is an approach to facilitate the understanding and analysis of inter-related regulatory compliance documents by exploration of the citations that connect provisions to other provisions within the same document or between different documents. Citation analysis uses a
citation graph A citation graph (or citation network), in information science and bibliometrics, is a directed graph that describes the citations within a collection of documents. Each vertex (or node) in the graph represents a document in the collection, a ...
extracted from a regulatory document, which could supplement
E-discovery Electronic discovery (also ediscovery or e-discovery) refers to discovery in legal proceedings such as litigation, government investigations, or Freedom of Information Act requests, where the information sought is in electronic format (often refe ...
- a process that leverages on technological innovations in
big data analytics Though used sometimes loosely partly because of a lack of formal definition, the interpretation that seems to best describe Big data is the one associated with large body of information that we could not comprehend when used only in smaller am ...
.


Controversies

*'' E-publishing'': due to the unprecedented growth of electronic resource (e-resource) availability, one of the questions currently being explored is, "how often are e-resources being cited in my field?" For instance, there are claims that On-Line access to
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
literature Literature is any collection of written work, but it is also used more narrowly for writings specifically considered to be an art form, especially prose fiction, drama, and poetry. In recent centuries, the definition has expanded to include ...
leads to higher citation rates, however,
humanities Humanities are academic disciplines that study aspects of human society and culture. In the Renaissance, the term contrasted with divinity and referred to what is now called classics, the main area of secular study in universities at the t ...
articles may suffer if not in print. * '' Self-citations'': it has been criticized that authors game the system by accumulating citations by citing themselves excessively. For instance, it has been found that men tend to cite themselves more often than women. *Citation pollution: the infiltration of retracted research, or fake research, being cited in legitimate research, but negatively impacting on the validity of the research. It is due to various factors, including the publication race and the concerning rise in unscrupulous business practices related to so-called
predatory Predation is a biological interaction where one organism, the predator, kills and eats another organism, its prey. It is one of a family of common feeding behaviours that includes parasitism and micropredation (which usually do not kill th ...
or deceptive publishers, research quality, in general, is facing different types of threats.


See also

* Google economy * Journalology * Main path analysis * San Francisco Declaration on Research Assessment


Notes


References

{{Reflist Analysis Citation metrics