HOME
        TheInfoList



archive.today (formerly archive.is) is an archive site which stores snapshots of web pages. It retrieves one page at a time similar to WebCite, smaller than 50MB each, but with support for JavaScript-heavy sites such as Google Maps and progressive web applications such as Twitter. Archive.today records simultaneously two different 'snapshots' of a web-page. One is "Webpage" which includes any functional live links that are in the original. The other is "Screenshot" which provides a static and non-interactive visualization of the representation.


Features





Functionality


Archive.today can capture individual pages in response to explicit user requests. Since its beginning, Archive.Today supports crawling pages with URLs containing a now-deprecated hash-bang fragment (). Archive.today records only text and images, excluding video, XML, RTF, spreadsheet (xls or ods) and other non-static content. It keeps track of the history of snapshots saved, returning to the user a request for confirmation before adding a new snapshot of an already saved Internet address. Pages are captured with 1024 pixels of browser width. CSS is converted to inline CSS, removing responsive web design and selectors such as :hover and :active. Content generated using JavaScript during the crawling process appears in a frozen state.JavaScript-generated loading animation of Dailymotion vide
appearing in a frozen state
/ref> HTML class names are preserved inside the old-class attribute. When text is selected, a JavaScript applet generates a URL fragment seen in the browser's URL bar that automatically highlights that portion of the text when visited again. Web pages cannot be duplicated from ''archive.is'' to ''web.archive.org'' as second-level backup, as archive.is places an exclusion for Wayback Machine and does not save its snapshots in WARC format. The reverse—from ''web.archive.org'' to ''archive.is''—is possible, but the copy usually takes more time than a direct capture. Some web sites get deleted from Internet Archive's listings retroactively or blocked from being saved due to their robots.txt file, but Archive.today does not use this. The research toolbar enables advanced keywords operators, using as the wildcard character. A couple of quotation marks address the search to an exact sequence of keywords present in the title or in the body of the webpage, whereas the ''insite'' operator restricts it to a specific Internet domain. Once a web page is archived, it cannot be deleted directly by any Internet user. While saving a dynamic list, archive.today searchbox shows only a result that links the previous and the following section of the list (e.g. 20 links for page). The other web pages saved are filtered, and sometimes may be found by one of their occurrences. The search feature is backed by Google CustomSearch. If it delivers no results, archive.is attempts to utilize Yandex Search. If a page has already been archived, archive.is asks the user to confirm archiving a new revision, instead of immediately archiving it. While loading a page, a list of URLs to individual page elements among their content sizes, HTTP statuses and MIME types is shown. This list can only be viewed during the crawling process. One can download archived pages as a ZIP file, except pages archived since 29 November 2019, when Archive.Today changed their browser engine from PhantomJS to Chromium. Since July 2013, archive.today supports the Memento Project application programming interface (API).


History


Archive.today was founded in 2012. The site originally branded itself as archive.today, but in May 2015, changed the primary mirror to archive.is. In January 2019, it began to deprecate the archive.is domain in favor of the archive.today mirror.


Worldwide availability





Australia


In March 2019, the site was blocked for six months by several Australian internet providers in the aftermath of the Christchurch mosque shootings in an attempt to limit distribution of the footage of the attack.


China


According to GreatFire.org, archive.today has been blocked in China since March 2016, archive.li since September 2017, and archive.fo since July 2018.


Finland


On 21 July 2015, the operators blocked access to the service from all Finnish IP addresses, stating on Twitter that they did this in order to avoid escalating a dispute they allegedly had with the Finnish government.


Russia


In Russia, only HTTP access is possible; HTTPS connections are blocked.


Worldwide


Archive.today currently blocks requests from Cloudflare's recursive DNS resolver, 1.1.1.1. Archive.today insists that recursive DNS resolvers include the geolocation of the user making the DNS lookup. For privacy reasons, Cloudflare specifically does not include the geolocation of the user making the request. As a result, the archive.today DNS servers intentionally return invalid responses when queried by a Cloudflare recursive DNS resolver.https://news.ycombinator.com/item?id=19828702 Additionally, since late 2018, Archive.today has implemented a data cap limitation, presumably to help protect against denial-of-service attacks. Individual users can only archive and/or retrieve approximately 10 to 20 megabytes of data per day. After that limitation is reached, their web server blocks the individual user's IP address by no longer responding.


See also


* Digital preservation * List of Web archiving initiatives * Internet Archive * Link rot ** * Perma.cc * Wayback Machine * Web archiving * WebCite


References





External links


* * * * {{authority control Category:History of the Internet Category:Internet properties established in 2012 Category:Tor onion services Category:Web archiving initiatives