CITESEER_X_ (originally called CITESEER) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science . Many consider it to be the first academic paper search engine and the first automated citation indexing system. CiteSeer holds a patent on this topic, and is considered a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search . CiteSeer-like engines and archives usually only harvest documents from publicly available websites and do not crawl publisher websites. For this reason, authors whose documents are freely available are more likely to be represented in the index.
CiteSeer's goal is to improve the dissemination and access of academic and scientific literature. As a non-profit service that can be freely used by anyone, it has been considered as part of the open access movement that is attempting to change academic and scientific publishing to allow greater access to scientific literature. CiteSeer freely provided Open Archives Initiative metadata of all indexed documents and links indexed documents when possible to other sources of metadata such as DBLP and the ACM Portal . To promote open data, CITESEER_X_ shares its data for non-commercial purposes under a Creative Commons license.
The name can be construed to have at least two explanations. As a pun, a 'sightseer' is a tourist who looks at the sights, so a 'cite seer' would be a researcher who looks at cited papers. Another is a 'seer' is a prophet and a 'cite seer' is a prophet of citations. CiteSeer changed its name to ResearchIndex at one point and then changed it back.
* 1 History
* 1.1 CiteSeer and CiteSeer.IST * 1.2 CiteSeerX
* 2 Current features
* 2.1 Automated Information Extraction * 2.2 Focused Crawling * 2.3 Usage * 2.4 Data
* 3 Other SeerSuite-based search engines * 4 See also * 5 References * 6 Further reading * 7 External links
CITESEER AND CITESEER.IST
CiteSeer was created by researchers Lee Giles , Kurt Bollacker and Steve Lawrence in 1997 while they were at the NEC Research Institute (now NEC Labs), Princeton, New Jersey , USA. CiteSeer's goal was to actively crawl and harvest academic and scientific documents on the web and use autonomous citation indexing to permit querying by citation or by document, ranking them by citation impact . At one point, it was called ResearchIndex.
CiteSeer became public in 1998 and had many new features unavailable in academic search engines at that time. These included:
* Autonomous Citation Indexing automatically created a citation index that can be used for literature search and evaluation. * Citation statistics and related documents were computed for all articles cited in the database, not just the indexed articles. * Reference linking allowing browsing of the database using citation links. * Citation context showed the context of citations to a given paper, allowing a researcher to quickly and easily see what other researchers have to say about an article of interest. * Related documents were shown using citation and word based measures and an active and continuously updated bibliography is shown for each document.
After NEC, in 2004 it was hosted as CiteSeer.IST on the World Wide Web at the College of Information Sciences and Technology, The Pennsylvania State University , and had over 700,000 documents. For enhanced access, performance and research, similar versions of CiteSeer were supported at universities such as the Massachusetts Institute of Technology , University of Zürich and the National University of Singapore . However, these versions of CiteSeer proved difficult to maintain and are no longer available. Because CiteSeer only indexes freely available papers on the web and does not have access to publisher metadata, it returns fewer citation counts than sites, such as Google Scholar , that have publisher metadata.
CiteSeer had not been comprehensively updated since 2005 due to limitations in its architecture design. It had a representative sampling of research documents in computer and information science but was limited in coverage because it was limited to papers that are publicly available, usually at an author's homepage, or those submitted by an author. To overcome some of these limitations, a modular and open source architecture for CiteSeer was designed - CiteSeerX.
CITESEER_X_ replaced CiteSeer and all queries to CiteSeer were
redirected. CiteSeer_X_ is a public search engine and digital library
and repository for scientific and academic papers primarily with a
focus on computer and information science . However, recently
CiteSeerX has been expanding into other scholarly domains such as
economics, physics and others. Released in 2008, it was loosely based
on the previous CiteSeer search engine and digital library and is
built with a new open source infrastructure, SeerSuite, and new
algorithms and their implementations. It was developed by researchers
Dr. Isaac Councill and Dr. C.
Lee Giles at the College of Information
Sciences and Technology ,
Pennsylvania State University . It continues
to support the goals outlined by CiteSeer to actively crawl and
harvest academic and scientific documents on the public web and to use
a citation inquery by citations and ranking of documents by the impact
of citations. Currently, Lee Giles, Prasenjit Mitra, Susan Gauch,
Min-Yen Kan, Pradeep Teregowda, Juan Pablo Fernández Ramírez,
Pucktada Treeratpituk, Jian Wu, Douglas Jordan, Steve Carman, Jack
Carroll, Jim Jansen, and Shuyi Zheng are or have been actively
involved in its development. Recently, a table search feature was
introduced. It has been funded by the
National Science Foundation
CiteSeerX continues to be rated as one of the world's top repositories and was rated number 1 in July 2010. It currently has over 6 million documents with nearly 6 million unique authors and 120 million citations.
CiteSeerX also shares its software, data, databases and metadata with
other researchers, currently by
AUTOMATED INFORMATION EXTRACTION
CiteSeerX uses automated information extraction tools, usually built on machine learning methods such ParsCit, to extract scholarly document metadata such as title, authors, abstract, citations, etc. As such, there are sometime errors in authors and titles. Other academic search engines have similar errors.
CiteSeerX crawls publicly available scholarly documents primarily from author webpages and other open resources, and does not have access to publisher metadata. As such citation counts in CiteSeerX are usually less than those in Google Scholar and Microsoft Academic Search who have access to publisher metadata.
CiteSeerX has nearly 1 million users worldwide based on unique IP addresses and has millions of hits daily. Annual downloads of document PDFs was nearly 200 million for 2015.
CiteSeerX data is regularly shared under a Creative Commons BY-NC-SA License with researchers worldwide and has been and is used in many experiments and competitions.
OTHER SEERSUITE-BASED SEARCH ENGINES