SCIgen is a
paper generator
A paper generator is computer software that composes scholarly papers in the style of those that appear in academic journals or conference proceedings. Typically, the generator uses technical jargon from the field to compose sentences that are gra ...
that uses
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
to randomly generate
nonsense
Nonsense is a communication, via speech, writing, or any other symbolic system, that lacks any coherent meaning. Sometimes in ordinary usage, nonsense is synonymous with absurdity or the ridiculous
To be ridiculous is to be something which is ...
in the form of
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
research papers
Academic publishing is the subfield of publishing which distributes academic research and scholarship. Most academic work is published in academic journal articles, books or theses. The part of academic written output that is not formally publ ...
. Its original data source was a collection of computer science papers downloaded from
CiteSeer
CiteSeerX (formerly called CiteSeer) is a public search engine and digital library for scientific and academic papers, primarily in the fields of computer and information science.
CiteSeer's goal is to improve the dissemination and access of ac ...
. All elements of the papers are formed, including graphs, diagrams, and
citation
A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose of ...
s. Created by scientists at the
Massachusetts Institute of Technology
The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the ...
, its stated aim is "to maximize amusement, rather than coherence." Originally created in 2005 to expose the lack of scrutiny of submissions to conferences, the generator subsequently became used, primarily by Chinese academics, to create large numbers of fraudulent conference submissions, leading to the retraction of 122 SCIgen generated papers and the creation of detection software to combat its use.
Sample output
Opening
abstract of ''Rooter: A Methodology for the Typical Unification of Access Points and Redundancy'':
Prominent results
In 2005 a paper generated by SCIgen, ''Rooter: A Methodology for the Typical Unification of Access Points and Redundancy'', was accepted as a non-reviewed paper to the 2005
(WMSCI) and the authors were invited to speak. The authors of SCIgen described their hoax on their website, and it soon received great publicity when picked up by
Slashdot
''Slashdot'' (sometimes abbreviated as ''/.'') is a social news website that originally advertised itself as "News for Nerds. Stuff that Matters". It features news stories concerning science, technology, and politics that are submitted and evalu ...
. WMSCI withdrew their invitation, but the SCIgen team went anyway, renting space in the hotel separately from the conference and delivering a series of randomly generated talks on their own "track". The organizer of these WMSCI conferences is Professor Nagib Callaos. From 2000 until 2005, the WMSCI was also sponsored by the
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
. The IEEE stopped granting sponsorship to Callaos from 2006 to 2008.
Submitting the paper was a deliberate attempt to embarrass WMSCI, which the authors claim accepts low-quality papers and sends unsolicited requests for submissions in bulk to academics. As the SCIgen website states:
Computing writer
Stan Kelly-Bootle
Stanley Bootle, known as Stan Kelly-Bootle (15 September 1929 – 16 April 2014), was a British author, academic, singer-songwriter and computer scientist.
He took his stage name Stan Kelly (he was not known as Stan Kelly-Bootle in folk music circ ...
noted in ''
ACM Queue
''ACM Queue'' is a bimonthly computer magazine founded and published by the Association for Computing Machinery
The Association for Computing Machinery (ACM) is a US-based international learned society for computing. It was founded in 1947 an ...
'' that many sentences in the "Rooter" paper were individually plausible, which he regarded as posing a problem for automated detection of hoax articles. He suggested that even human readers might be taken in by the effective use of jargon ("The pun on root/router is par for MIT-graduate humor, and at least one occurrence of methodology is mandatory") and attribute the paper's apparent incoherence to their own limited knowledge. His conclusion was that "a reliable gibberish filter requires a careful holistic review by several peer domain experts".
Schlangemann
The
pseudonym
A pseudonym (; ) or alias () is a fictitious name that a person or group assumes for a particular purpose, which differs from their original or true name (orthonym). This also differs from a new name that entirely or legally replaces an individua ...
"Herbert Schlangemann" was used to publish fake scientific articles in international conferences that claimed to practice
peer review
Peer review is the evaluation of work by one or more people with similar competencies as the producers of the work (peers). It functions as a form of self-regulation by qualified members of a profession within the relevant field. Peer review ...
. The name is taken from the Swedish short film ''
Der Schlangemann
''Der Schlangemann'' is a freely available seven-minute short film in pseudo-German made by Andreas Hansson and Björn Renberg in Umeå, Sweden, 1998–2000.
The film is in the form of an advertisement for a toy called ''Schlangemann'', a K ...
''.
* In 2008, in response to a
series
Series may refer to:
People with the name
* Caroline Series (born 1951), English mathematician, daughter of George Series
* George Series (1920–1995), English physicist
Arts, entertainment, and media
Music
* Series, the ordered sets used i ...
of Call-for-Paper
e-mail
Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" meant ...
s, SCIgen was used to generate a false
scientific paper
: ''For a broader class of literature, see Academic publishing.''
Scientific literature comprises scholarly publications that report original empirical and theoretical work in the natural and social sciences. Within an academic field, scienti ...
titled ''Towards the Simulation of E-Commerce'', using "Herbert Schlangemann" as the author. The article was accepted at the ''2008 International Conference on Computer Science and Software Engineering (CSSE 2008)'', co-sponsored by the
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
, to be held in
Wuhan, China
Wuhan (, ; ; ) is the capital of Hubei Province in the People's Republic of China. It is the largest city in Hubei and the most populous city in Central China, with a population of over eleven million, the ninth-most populous Chinese city and ...
, and the author was invited to be a session chair on grounds of his fictional
. The official review comment: "This paper presents cooperative technology and classical Communication. In conclusion, the result shows that though the much-touted amphibious algorithm for the refinement of randomized algorithms is impossible, the well-known client-server algorithm for the analysis of voice-over-IP by Kumar and Raman runs in _(n) time. The authors can clearly identify important features of visualization of DHTs and analyze them insightfully. It is recommended that the authors should develop ideas more cogently, organizes them more logically, and connects them with clear transitions." The paper was available for a short time in the
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
Xplore Database, but was then removed. The entire story is described in the official "Herbert Schlangemann"
blog
A blog (a truncation of "weblog") is a discussion or informational website published on the World Wide Web consisting of discrete, often informal diary-style text entries (posts). Posts are typically displayed in reverse chronological order ...
,
and it also received attention in
Slashdot
''Slashdot'' (sometimes abbreviated as ''/.'') is a social news website that originally advertised itself as "News for Nerds. Stuff that Matters". It features news stories concerning science, technology, and politics that are submitted and evalu ...
and the German-language technology-news site Heise Online.
* In 2009, the same incident happened and Herbert Schlangemann's latest fake paper ''PlusPug: A Methodology for the Improvement of Local-Area Networks'' was accepted for oral presentation at the ''2009 International Conference on e-Business and Information System Security (EBISS 2009)'', also co-sponsored by
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
, to be held again in
Wuhan, China
Wuhan (, ; ; ) is the capital of Hubei Province in the People's Republic of China. It is the largest city in Hubei and the most populous city in Central China, with a population of over eleven million, the ninth-most populous Chinese city and ...
.
In all cases, the published papers were withdrawn from the conferences' proceedings, and the conference organizing committee as well as the names of the keynote speakers were removed from their websites.
List of works with notable acceptance
In conferences
* Rob Thomas: ''Rooter: A Methodology for the Typical Unification of Access Points and Redundancy'', 2005 for WMSCI (see above)
* Mathias Uslar's paper was accepted to the IPSI-BG conference.
* Professor
Genco Gulan
Genco Gulan ( (born 1969 in Turkey) is a contemporary conceptual artist and theorist, who lives and works in Istanbul. His transmedia contextual work involves painting, found objects, new media, drawings, sculpture, photography, performance and ...
published a paper in the 3rd International Symposium of Interactive Media Design.
* A 2013
scientometrics
Scientometrics is the field of study which concerns itself with measuring and analysing scholarly literature. Scientometrics is a sub-field of informetrics. Major research issues include the measurement of the impact of research papers and academi ...
paper demonstrated that at least 85 SCIgen papers have been published by
IEEE
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
and
Springer
Springer or springers may refer to:
Publishers
* Springer Science+Business Media, aka Springer International Publishing, a worldwide publishing group founded in 1842 in Germany formerly known as Springer-Verlag.
** Springer Nature, a multinationa ...
. Over 120 SCIgen papers were removed according to this research.
In journals
* Students at Iran's
Sharif University of Technology
Sharif University of Technology (SUT; fa, دانشگاه صنعتی شریف) is a public research university in Tehran, Iran. It is widely considered as the nation's most prestigious and leading institution for science, technology, engineering, ...
published a paper in
Elsevier
Elsevier () is a Dutch academic publishing company specializing in scientific, technical, and medical content. Its products include journals such as ''The Lancet'', ''Cell'', the ScienceDirect collection of electronic journals, '' Trends'', th ...
's ''Journal of Applied Mathematics and Computation''. The students wrote under the surname "MosallahNejad", which translates literally from
Persian language
Persian (), also known by its endonym Farsi (, ', ), is a Western Iranian language belonging to the Iranian branch of the Indo-Iranian subdivision of the Indo-European languages. Persian is a pluricentric language predominantly spoken and ...
(in spite of not being a traditional
Persian name
A Persian name or Iranian name consists of a given name (Persian: نام ''Nâm''), sometimes more than one, and a surname (نام خانوادگی).
Given names
Since the Muslim conquest of Persia, some names in Iran have been derived from Ar ...
) as "from an Armed Breed". The paper was subsequently removed when the publishers were informed that it was a joke paper.
*
Mikhail Gelfand
Mikhail Sergeyevich Gelfand (russian: Михаил Сергеевич Гельфанд; born 25 October 1963) is a Russian Bioinformaticist and molecular biologist. He is a member of Academia Europaea, Vice President Biomedical Research of Skolko ...
published a translation of the "Rooter" article in the Russian-language ''Journal of Scientific Publications of Aspirants and Doctorants'' in August 2008. Gelfand was protesting against the journal, which was apparently not peer reviewed and was being used by Russian PhD candidates to publish in an "
accredited
Accreditation is the independent, third-party evaluation of a conformity assessment body (such as certification body, inspection body or laboratory) against recognised standards, conveying formal demonstration of its impartiality and competence to ...
" scientific journal, charging them 4000 Rubles to do so. The accreditation was revoked two weeks later. (See
Dissernet
Dissernet (russian: Диссернет) is a volunteer community network working to clean Russian science of plagiarism. The core activity of the community is conducting examinations of doctoral and habilitation (higher doctorate) theses defended ...
for related information.)
*
Springer Science+Business Media
Springer Science+Business Media, commonly known as Springer, is a German multinational publishing company of books, e-books and peer-reviewed journals in science, humanities, technical and medical (STM) publishing.
Originally founded in 1842 in ...
and IEEE were also the subject of similar pranks.
Spoofing Google Scholar and ''h''-index calculators
Refereeing performed on behalf of the
Institute of Electrical and Electronics Engineers
The Institute of Electrical and Electronics Engineers (IEEE) is a 501(c)(3) professional association for electronic engineering and electrical engineering (and associated disciplines) with its corporate office in New York City and its operation ...
has also been subject to criticism after fake papers were discovered in conference publications, most notably by Labbé and a researcher using the pseudonym of
Schlangemann.
Cyril Labbé from
Grenoble University
The Université Grenoble Alpes (UGA, French: meaning "''Grenoble Alps University''") is a public research university in Grenoble, France. Founded in 1339, it is the third largest university in France with about 60,000 students and over 3,000 resea ...
demonstrated the vulnerability of
''h''-index calculations based on
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text or metadata of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes p ...
output by feeding it a large set of SCIgen-generated documents that were citing each other, effectively an academic
link farm
On the World Wide Web, a link farm is any group of websites that all hyperlink to other sites in the group for the purpose of increasing SEO rankings. In graph theoretic terms, a link farm is a clique. Although some link farms can be created ...
, in a 2010 paper. Using this method the author managed to rank "Ike Antkare" ahead of
Albert Einstein
Albert Einstein ( ; ; 14 March 1879 – 18 April 1955) was a German-born theoretical physicist, widely acknowledged to be one of the greatest and most influential physicists of all time. Einstein is best known for developing the theory ...
for instance.
2013 retractions
In 2013, over 122 published conference papers created by SCIgen were retracted by ''
Springer
Springer or springers may refer to:
Publishers
* Springer Science+Business Media, aka Springer International Publishing, a worldwide publishing group founded in 1842 in Germany formerly known as Springer-Verlag.
** Springer Nature, a multinationa ...
'' and the IEEE, unlike previous submissions that were intended to be pranks, this submission were largely made by Chinese academics, who were using SCIgen papers to boost their publication record.
SciDetect
In 2015, SciDetect was released by ''Springer''. This software, developed by Cyril Labbé, is designed to automatically detect papers generated by SCIgen.
2021 report
In 2021, a study was published on 243 SCIgen papers that had been published in the academic literature. They found that SCIgen papers made up 75 per million papers (<0.01%) in information science, and that only a small fraction of the detected papers had been dealt with.
See also
References
Further reading
*
*
*
External links
Copy of the fake paper: Towards the Simulation of E-Commerce by Herbert SchlangemannSCIgen - An Automatic CS Paper GeneratorSCIgen detection website
{{DEFAULTSORT:Scigen
Academic scandals
Applications of artificial intelligence
Formal languages
Hoaxes in science
Natural language generation
Academic publishing