WebCite was an on-demand
archive site
In web archiving, an archive site is a website that stores information on webpages from the past for anyone to view.
Common techniques
Two common techniques for archiving websites are using a web crawler or soliciting user submissions:
# Using ...
, designed to digitally preserve scientific and educationally important material on the web by taking snapshots of Internet contents as they existed at the time when a blogger or a scholar cited or quoted from it. The preservation service enabled verifiability of claims supported by the cited sources even when the original web pages are being revised, removed, or disappear for other reasons, an effect known as
link rot
Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address ...
.
Service features
WebCite allowed for preservation of all types of web content, including
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
web pages,
PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
files,
style sheets,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
and
digital image
A digital image is an image composed of picture elements, also known as ''pixels'', each with ''finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions ...
s. It also archived
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
about the collected resources such as access time,
MIME type
A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority, Internet Assigned Numbers Authority (IANA) is the official authority for t ...
, and content length.
WebCite was a non-profit
consortium
A consortium (plural: consortia) is an association of two or more individuals, companies, organizations or governments (or any combination of these entities) with the objective of participating in a common activity or pooling their resources for ...
supported by publishers and editors, and it could be used by individuals without charge. It was one of the first services to offer on-demand archiving of pages, a feature later adopted by many other archiving services, such as
archive.today
archive.today (or archive.is) is a web archiving site, founded in 2012, that saves snapshots on demand, and has support for JavaScript-heavy sites such as Google Maps and progressive web apps such as Twitter. archive.today records two snaps ...
and the
Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, a nonprofit based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" and see ...
. It did not do web page crawling.
History
Conceived in 1997 by
Gunther Eysenbach
Gunther Eysenbach is a German-Canadian researcher on healthcare, especially health policy, eHealth, and consumer health informatics.
Career
Eysenbach was born on 22 March 1967 in West Berlin, West Germany. While a medical student, he served o ...
, WebCite was publicly described the following year when an article on Internet
quality control
Quality control (QC) is a process by which entities review the quality of all factors involved in production. ISO 9000 defines quality control as "a part of quality management focused on fulfilling quality requirements".
This approach places ...
declared that such a service could also measure the
citation impact
Citation impact is a measure of how many times an academic journal article or book or author is cited by other articles, books or authors. Citation counts are interpreted as measures of the impact or influence of academic work and have given rise ...
of web pages. In the next year, a pilot service was set up at the address webcite.net. Although it seemed that the need for WebCite decreased when Google's ''short term'' copies of web pages began to be offered by
Google Cache
Search engine cache is a cache of web pages that shows the page as it was when it was indexed by a web crawler. Cached versions of web pages can be used to view the contents of a page when the live version cannot be reached, has been altered or ...
and the
Internet Archive
The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
expanded their crawling (which started in 1996),
WebCite was the only one allowing "on-demand" archiving by users. WebCite also offered interfaces to scholarly journals and publishers to automate the archiving of cited links. By 2008, over 200 journals had begun routinely using WebCite.
WebCite was formerly a member of the
International Internet Preservation Consortium
The International Internet Preservation Consortium is an international organization of libraries and other organizations established to coordinate List of Web archiving initiatives, efforts to preserve internet content for the future. It was found ...
.
In response a 2012 message on Twitter relating to WebCite's former membership of the consortium, Eysenbach commented that "WebCite has no funding, and IIPC charges €4000 per year in annual membership fees."
WebCite "feeds its content" to other
digital preservation
In library and archival science, digital preservation is a formal endeavor to ensure that digital information of continuing value remains accessible and usable. It involves planning, resource allocation, and application of preservation methods an ...
projects, including the
Internet Archive
The Internet Archive is an American digital library with the stated mission of "universal access to all knowledge". It provides free public access to collections of digitized materials, including websites, software applications/games, music, ...
.
Lawrence Lessig
Lester Lawrence Lessig III (born June 3, 1961) is an American academic, attorney, and political activist. He is the Roy L. Furman Professor of Law at Harvard Law School and the former director of the Edmond J. Safra Center for Ethics at Harvard ...
, an American academic who writes extensively on copyright and technology, used WebCite in his
''amicus'' brief in the
Supreme Court of the United States
The Supreme Court of the United States (SCOTUS) is the highest court in the federal judiciary of the United States. It has ultimate appellate jurisdiction over all U.S. federal court cases, and over state court cases that involve a point o ...
case of ''
MGM Studios, Inc. v. Grokster, Ltd.''
Sometime between July 9 and 17, 2019, WebCite stopped accepting new archiving requests. In a further outage, as of October 29, 2021, all previously archived content is no longer available, and only the home page still works.
Fundraising
WebCite ran a fund-raising campaign using
FundRazr
FundRazr is a free crowdfunding and online fundraising platform released in 2009. FundRazr operates internationally in 35+ counties with the largest markets being United States, Canada, United Kingdom and Australia. It allows users to run a wide-r ...
from January 2013 with a target of $22,500, a sum which its operators stated was needed to maintain and modernize the service beyond the end of 2013.
This includes relocating the service to
Amazon EC2
Amazon Elastic Compute Cloud (EC2) is a part of Amazon.com's cloud-computing platform, Amazon Web Services (AWS), that allows users to rent virtual computers on which to run their own computer applications. EC2 encourages scalable deployment of ...
cloud hosting and legal support. it remained undecided whether WebCite would continue as a non-profit or as a for-profit entity.
Business model
The term "WebCite" is a registered trademark.
WebCite did not charge individual users, journal editors and publishers
any fee to use their service. WebCite earned revenue from publishers who wanted to "have their publications analyzed and cited webreferences archived".
Early support was from the
University of Toronto
The University of Toronto (UToronto or U of T) is a public research university in Toronto, Ontario, Canada, located on the grounds that surround Queen's Park. It was founded by royal charter in 1827 as King's College, the first institution ...
.
Copyright issues
WebCite maintained the legal position that its archiving activities
are allowed by the copyright doctrines of
fair use
Fair use is a doctrine in United States law that permits limited use of copyrighted material without having to first acquire permission from the copyright holder. Fair use is one of the limitations to copyright intended to balance the interests ...
and
implied license
An implied license is an unwritten license which permits a party (the licensee) to do something that would normally require the express permission of another party (the licensor). Implied licenses may arise by operation of law from actions by t ...
.
To support the fair use argument, WebCite noted that its archived copies are
transformative
In United States copyright law, transformative use or transformation is a type of fair use that builds on a copyrighted work in a different manner or for a different purpose from the original, and thus does not infringe its holder's copyright. Tr ...
, socially valuable for academic research, and not harmful to the market value of any copyrighted work.
WebCite argued that caching and archiving web pages was not considered a copyright infringement when the archiver offers the copyright owner an opportunity to "opt-out" of the archive system, thus creating an implied license.
To that end, WebCite would not archive in violation of Web site "do-not-cache" and "no-archive"
metadata
Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including:
* Descriptive metadata – the descriptive ...
, as well as
robot exclusion standards, the absence of which creates an "
implied license
An implied license is an unwritten license which permits a party (the licensee) to do something that would normally require the express permission of another party (the licensor). Implied licenses may arise by operation of law from actions by t ...
" for web archive services to preserve the content.
In a similar case involving Google's web caching activities, on January 19, 2006, the
United States District Court for the District of Nevada
United may refer to:
Places
* United, Pennsylvania, an unincorporated community
* United, West Virginia, an unincorporated community
Arts and entertainment Films
* ''United'' (2003 film), a Norwegian film
* ''United'' (2011 film), a BBC Two f ...
agreed with that argument in the case of ''
Field v. Google'' (CV-S-04-0413-RCJ-LRL), holding that fair use and an "implied license" meant that Google's caching of Web pages did not constitute copyright violation.
The "implied license" referred to general Internet standards.
DMCA requests
According to their policy, after receiving legitimate
DMCA
The Digital Millennium Copyright Act (DMCA) is a 1998 United States copyright law that implements two 1996 treaties of the World Intellectual Property Organization (WIPO). It criminalizes production and dissemination of technology, devices, or s ...
requests from the copyright holders, WebCite would remove saved pages from public access, as the archived pages are still under the safe harbor of being citations. The pages were removed to a "dark archive" and in cases of legal controversies or evidence requests, there was pay-per-view access of "$200 (up to 5 snapshots) plus $100 for each further 10 snapshots" to the copyrighted content.
See also
*
Archive.today
archive.today (or archive.is) is a web archiving site, founded in 2012, that saves snapshots on demand, and has support for JavaScript-heavy sites such as Google Maps and progressive web apps such as Twitter. archive.today records two snaps ...
*
Perma.cc
Perma.cc is a web archiving service for legal and academic citations founded by the Harvard Library Innovation Lab in 2013.
Concept
Perma.cc was created in response to studies showing high incidences of link rot in both academic publications an ...
*
Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by the Internet Archive, a nonprofit based in San Francisco, California. Created in 1996 and launched to the public in 2001, it allows the user to go "back in time" and see ...
References
External links
*
{{DEFAULTSORT:Webcite
Internet properties established in 2004
Organizations established in 2003
Web archiving initiatives