Soft 404
   HOME

TheInfoList



OR:

In computer network communications, the HTTP 404, 404 not found, 404, 404 error, page not found or file not found
error message An error message is information displayed when an unforeseen occurs, usually on a computer or other device. On modern operating systems with graphical user interfaces, error messages are often displayed using dialog boxes. Error messages are use ...
is a
hypertext transfer protocol The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
(HTTP) standard response code, to indicate that the browser was able to communicate with a given
server Server may refer to: Computing *Server (computing), a computer program or a device that provides functionality for other programs or devices, called clients Role * Waiting staff, those who work at a restaurant or a bar attending customers and su ...
, but the server could not find what was requested. The error may also be used when a server does not wish to disclose whether it has the requested information. The website hosting server will typically generate a "404 Not Found" web page when a user attempts to follow a broken or dead link; hence the 404 error is one of the most recognizable errors encountered on the
World Wide Web The World Wide Web (WWW), commonly known as the Web, is an information system enabling documents and other web resources to be accessed over the Internet. Documents and downloadable media are made available to the network through web ...
.


Overview

When communicating via HTTP, a server is required to respond to a request, such as a
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
request for a web page, with a numeric response code and an optional, mandatory, or disallowed (based upon the status code) message. In code 404, the first digit indicates a client error, such as a mistyped
Uniform Resource Locator A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identif ...
(URL). The following two digits indicate the specific error encountered. HTTP's use of three-digit codes is similar to the use of such codes in earlier protocols such as FTP and
NNTP The Network News Transfer Protocol (NNTP) is an application protocol used for transporting Usenet news articles (''netnews'') between news servers, and for reading/posting articles by the end user client applications. Brian Kantor of the Univers ...
. At the HTTP level, a 404 response code is followed by a human-readable "reason phrase". The HTTP specification suggests the phrase "Not Found" and many web servers by default issue an HTML page that includes both the 404 code and the "Not Found" phrase. A 404 error is often returned when pages have been moved or deleted. In the first case, it is better to employ
URL mapping A web framework (WF) or web application framework (WAF) is a software framework that is designed to support the development of web applications including web services, web resources, and web APIs. Web frameworks provide a standard way to build an ...
or
URL redirection URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened ...
by returning a 301 Moved Permanently response, which can be configured in most server configuration files, or through
URL rewriting In web applications, a rewrite engine is a software component that performs rewriting on URLs (Uniform Resource Locators), modifying their appearance. This modification is called URL rewriting. It is a way of implementing URL mapping or routing ...
; in the second case, a 410 Gone should be returned. Because these two options require special server configuration, most websites do not make use of them. 404 errors should not be confused with DNS errors, which appear when the given URL refers to a server name that does not exist. A 404 error indicates that the server itself was found, but that the server was not able to retrieve the requested page.


Soft 404 errors

Some websites report a "not found" error by returning a standard web page with a "200 OK" response code, falsely reporting that the page loaded properly; this is known as a ''soft 404''. The term "soft 404" was introduced in 2004 by Ziv Bar-Yossef ''et al''. Soft 404s are problematic for automated methods of discovering whether a link is broken. Some search engines, like
Yahoo Yahoo! (, styled yahoo''!'' in its logo) is an American web services provider. It is headquartered in Sunnyvale, California and operated by the namesake company Yahoo! Inc. (2017–present), Yahoo Inc., which is 90% owned by investment funds ma ...
and
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
, use automated processes to detect soft 404s. Soft 404s can occur as a result of configuration errors when using certain HTTP server software, for example with the
Apache The Apache () are a group of culturally related Native American tribes in the Southwestern United States, which include the Chiricahua, Jicarilla, Lipan, Mescalero, Mimbreño, Ndendahe (Bedonkohe or Mogollon and Nednhi or Carrizaleño a ...
software, when an Error Document 404 (specified in a .htaccess file) is specified as an absolute path (e.g. http://example.com/error.html) rather than a relative path (/error.html). This can also be done on purpose to force some browsers (like
Internet Explorer Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical web browsers developed by Microsoft which was used in the Windows line of operating systems (in ...
) to display a customized 404 error message rather than replacing what is served with a browser-specific "friendly" error message (in Internet Explorer, this behavior is triggered when a 404 is served and the received HTML is shorter than a certain length, and can be manually disabled by the user). There are also "soft 3XX" errors where content is returned with a status 200 but comes from a redirected page, such as when missing pages are redirected to the domain root/home page.


Proxy servers

Some
proxy server In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. Instead of connecting directly to a server that can fulfill a reques ...
s generate a 404 error when a 500-range error code would be more correct. If the proxy server is unable to satisfy a request for a page because of a problem with the remote host (such as hostname resolution failures or refused TCP connections), this should be described as a 5xx Internal Server Error, but might deliver a 404 instead. This can confuse programs that expect and act on specific responses, as they can no longer easily distinguish between an absent web server and a missing web page on a web server that is present.


Intentional 404s

In July 2004, the UK telecom provider
BT Group BT Group plc ( trading as BT and formerly British Telecom) is a British multinational telecommunications holding company headquartered in London, England. It has operations in around 180 countries and is the largest provider of fixed-line, bro ...
deployed the Cleanfeed content blocking system, which returns a 404 error to any request for content identified as potentially illegal by the
Internet Watch Foundation The Internet Watch Foundation (IWF) is a registered charity based in Cambridge, England. It states that its remit is "to minimise the availability of online sexual abuse content, specifically child sexual abuse images and videos hosted anywhe ...
. Other ISPs return a
HTTP 403 HTTP 403 is an HTTP status code meaning access to the requested resource is forbidden. The server understood the request, but will not fulfill it. Specifications HTTP 403 provides a distinct error case from HTTP 401; while HTTP 401 is returned ...
"forbidden" error in the same circumstances. The practice of employing fake 404 errors as a means to conceal
censorship Censorship is the suppression of speech, public communication, or other information. This may be done on the basis that such material is considered objectionable, harmful, sensitive, or "inconvenient". Censorship can be conducted by governments ...
has also been reported in
Thailand Thailand ( ), historically known as Siam () and officially the Kingdom of Thailand, is a country in Southeast Asia, located at the centre of the Indochinese Peninsula, spanning , with a population of almost 70 million. The country is b ...
and
Tunisia ) , image_map = Tunisia location (orthographic projection).svg , map_caption = Location of Tunisia in northern Africa , image_map2 = , capital = Tunis , largest_city = capital , ...
. In Tunisia, where censorship was severe before the 2011 revolution, people became aware of the nature of the fake 404 errors and created an imaginary character named " Ammar 404" who represents "the invisible censor".


Microsoft Internet Server 404 substatus error codes

The webserver software developed by Microsoft, Microsoft's Internet Information Services (IIS), returns a set of substatus codes with its 404 responses. The substatus codes take the form of decimal numbers appended to the 404 status code. The substatus codes are not officially recognized by
IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
and are not returned by non-Microsoft servers.


Substatus codes

Microsoft's IIS 7.0, IIS 7.5, and IIS 8.0 servers define the following HTTP substatus codes to indicate a more specific cause of a 404 error: * 404.0 – Not found. * 404.1 – Site Not Found. * 404.2 – ISAPI or CGI restriction. * 404.3 –
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
type restriction. * 404.4 – No handler configured. * 404.5 – Denied by request filtering configuration. * 404.6 – Verb denied. * 404.7 – File extension denied. * 404.8 – Hidden namespace. * 404.9 – File attribute hidden. * 404.10 – Request header too long. * 404.11 – Request contains double escape sequence. * 404.12 – Request contains high-bit characters. * 404.13 – Content length too large. * 404.14 – Request URL too long. * 404.15 – Query string too long. * 404.16 – DAV request sent to the static file handler. * 404.17 – Dynamic content mapped to the static file handler via a wildcard MIME mapping. * 404.18 – Query string sequence denied. * 404.19 – Denied by filtering rule. * 404.20 – Too Many URL Segments.


Custom error pages

Web server A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initia ...
s can typically be configured to display a customised 404 error page, including a more natural description, the parent site's branding, and sometimes a site map, a search form or 404-page widget. The protocol level phrase, which is hidden from the user, is rarely customized.
Internet Explorer Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical web browsers developed by Microsoft which was used in the Windows line of operating systems (in ...
, however, will not display custom pages unless they are larger than 512 bytes, opting instead to display a "friendly" error page.
Google Chrome Google Chrome is a cross-platform web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, ...
included similar functionality, where the 404 is replaced with alternative suggestions generated by Google algorithms, if the page is under 512 bytes in size. Another problem is that if the page does not provide a
favicon A favicon (; short for favorite icon), also known as a shortcut icon, website icon, tab icon, URL icon, or bookmark icon, is a file containing one or more small icons, associated with a particular website or web page. A web designer can create ...
, and a separate custom 404-page exists, extra traffic and longer loading times will be generated on every page view. Many organizations use 404 error pages as an opportunity to inject humor into what may otherwise be a serious website. For example, Metro UK shows a polar bear on a skateboard, and the web development agency Left Logic has a simple drawing program. During the
2015 UK general election The 2015 United Kingdom general election was held on Thursday, 7 May 2015 to elect 650 members to the House of Commons. It was the first and only general election held at the end of a Parliament under the Fixed-term Parliaments Act 2011. Local ...
campaign the main political parties all used their 404 pages to either take aim at political opponents or show relevant policies to potential supporters. In Europe, the NotFound project, created by multiple European organizations including Missing Children Europe and Child Focus, encourages site operators to add a snippet of code to serve customized 404 error pages which provide data about
missing children A missing person is a person who has disappeared and whose status as alive or dead cannot be confirmed as their location and condition are unknown. A person may go missing through a voluntary disappearance, or else due to an accident, crime, d ...
. While many websites send additional information in a 404 error message—such as a link to the
homepage A home page (or homepage) is the main web page of a website. The term may also refer to the start page shown in a web browser when the application first opens. Usually, the home page is located at the root of the website's domain or subdomain. ...
of a website or a search box—some also endeavor to find the correct web page the user wanted. Extensions are available for some
content management system A content management system (CMS) is computer software used to manage the creation and modification of digital content ( content management).''Managing Enterprise Content: A Unified Content Strategy''. Ann Rockley, Pamela Kostur, Steve Manning. New ...
s (CMSs) to do this.


Tracking 404 errors

A number of tools exist that crawl through a website to find pages that return 404 status codes. These tools can be helpful in finding links that exist within a particular website. The limitation of these tools is that they only find links within one particular website, and ignore 404s resulting from links on other websites. As a result, these tools miss out on 83% of the 404s on websites. One way around this is to find 404 errors by analyzing external links. One of the most effective ways to discover 404 errors is by using Google Search Console,
Google Analytics Google Analytics is a web analytics service offered by Google that tracks and reports website traffic, currently as a platform inside the Google Marketing Platform brand. Google launched the service in November 2005 after acquiring Urchin. As o ...
or crawling software. Another common method is tracking traffic to 404 pages using log file analysis. This can be useful to understand more about what 404s users reached on the site. Another method of tracking traffic to 404 pages is using JavaScript-based traffic tracking tools.


See also

* Blue screen of death * Funky caching *
Link rot Link rot (also called link death, link breaking, or reference rot) is the phenomenon of hyperlinks tending over time to cease to point to their originally targeted file, web page, or server due to that resource being relocated to a new address ...
*
List of HTTP status codes This is a list of Hypertext Transfer Protocol (HTTP) response status codes. Status codes are issued by a server in response to a client's request made to the server. It includes codes from IETF Request for Comments (RFCs), other specifications, ...


References


External links


A More Useful 404

404 Not Found
of the ''Hypertext Transfer Protocol (HTTP/1.1): Semantics and
Content Content or contents may refer to: Media * Content (media), information or experience provided to audience or end-users by publishers or media producers ** Content industry, an umbrella term that encompasses companies owning and providing mas ...
'' specification, at the
Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements an ...

ErrorDocument Directive
nbsp;– instructions on custom error pages for the Apache 2.0 web server
404: Not Found
nbsp;– an award-winning song about the error code {{DEFAULTSORT:Http 404 Computer errors Hypertext Transfer Protocol status codes Internet terminology