A search engine cache is a
cache of
web page
A web page (or webpage) is a World Wide Web, Web document that is accessed in a web browser. A website typically consists of many web pages hyperlink, linked together under a common domain name. The term "web page" is therefore a metaphor of pap ...
s that shows the page as it was when it was indexed by a
web crawler
Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spider ...
. Cached versions of web pages can be used to view the contents of a page when the live version
cannot be reached, has been altered or
taken down.
A web crawler collects the contents of a web page, which is then indexed by a
web search engine
A search engine is a software system that provides hyperlinks to web pages, and other relevant information on World Wide Web, the Web in response to a user's web query, query. The user enters a query in a web browser or a mobile app, and the sea ...
. The search engine might make the copy accessible to users. Web crawlers that obey restrictions in
robots.txt
robots.txt is the filename used for implementing the Robots Exclusion Protocol, a standard used by websites to indicate to visiting web crawlers and other web robots which portions of the website they are allowed to visit.
The standard, dev ...
or
meta tags by the site webmaster may not make a cached copy available to search engine users if instructed not to.
Search engine caches can be used for
crime investigation,
legal proceeding
Legal proceeding is an activity that seeks to invoke the power of a tribunal in order to enforce a law. Although the term may be defined more broadly or more narrowly as circumstances require, it has been noted that " e term ''legal proceedings ...
s and
journalism
Journalism is the production and distribution of reports on the interaction of events, facts, ideas, and people that are the "news of the day" and that informs society to at least some degree of accuracy. The word, a noun, applies to the journ ...
.
They may not be fully protected by the usual laws that protect technology providers from copyright infringement claims.
Google Cache
Google retired its web caching service in 2024.
The service was designed for websites that might show up in a Google search result, but are temporarily offline. As a "
cache", it was not designed for archival purposes, the cache had expiration. Google said the Internet as of 2024 is much more reliable than it was "way back" in earlier days, and therefore its cache service is no longer an important service to maintain.
Google pointed to the
Wayback Machine
The Wayback Machine is a digital archive of the World Wide Web founded by Internet Archive, an American nonprofit organization based in San Francisco, California. Launched for public access in 2001, the service allows users to go "back in ...
as a better alternative, and suggested Google might work with them in the future.
In September 2024, Google and the
Internet Archive
The Internet Archive is an American 501(c)(3) organization, non-profit organization founded in 1996 by Brewster Kahle that runs a digital library website, archive.org. It provides free access to collections of digitized media including web ...
announced a collaboration providing links to the Wayback Machine from within
Google Search
Google Search (also known simply as Google or Google.com) is a search engine operated by Google. It allows users to search for information on the World Wide Web, Web by entering keywords or phrases. Google Search uses algorithms to analyze an ...
.
Bing
Bing Search, following Google Cache's lead, also retired its web caching service in 2024. Microsoft explained, "the internet has evolved for better reliability, and many pages aren't optimized for cache viewing."
References
{{Reflist
Web crawlers
Web scraping
Internet search engines