HOME

TheInfoList



OR:

A web server is computer
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists o ...
and underlying hardware that accepts requests via
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
(the
network protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchroniza ...
created to distribute
web content Web content is the text, visual or audio content that is made available online and user encountered as part of the online usage and experience on websites. It may include text, images, sounds and audio, online videos, among other items placed wi ...
) or its secure variant
HTTPS Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is enc ...
. A user agent, commonly a
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
or
web crawler A Web crawler, sometimes called a spider or spiderbot and often shortened to crawler, is an Internet bot that systematically browses the World Wide Web and that is typically operated by search engines for the purpose of Web indexing (''web spid ...
, initiates communication by making a request for a web page or other
resource Resource refers to all the materials available in our environment which are technologically accessible, economically feasible and culturally sustainable and help us to satisfy our needs and wants. Resources can broadly be classified upon their ...
using HTTP, and the
server Server may refer to: Computing * Server (computing), a computer program or a device that provides functionality for other programs or devices, called clients Role * Waiting staff, those who work at a restaurant or a bar attending customers and s ...
responds with the content of that resource or an
error message An error message is information displayed when an unforeseen occurs, usually on a computer or other device. On modern operating systems with graphical user interfaces, error messages are often displayed using dialog boxes. Error messages are used ...
. A web server can also accept and store resources sent from the user agent if configured to do so. The hardware used to run a web server can vary according to the volume of requests that it needs to handle. At the low end of the range are
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
s, such as a router that runs a small web server as its configuration interface. A high-traffic
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
website A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and Wikipe ...
might handle requests with hundreds of servers that run on racks of high-speed computers. A resource sent from a web server can be a preexisting
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
(
static content A static web page (sometimes called a flat page or a stationary page) is a web page that is delivered to the user's web browser exactly as stored, in contrast to dynamic web pages which are generated by a web application. Consequently, a static ...
) available to the web server, or it can be generated at the time of the request (
dynamic content A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, and includin ...
) by another
program Program, programme, programmer, or programming may refer to: Business and management * Program management, the process of managing several related projects * Time management * Program, a part of planning Arts and entertainment Audio * Programm ...
that communicates with the server software. The former usually can be served faster and can be more easily cached for repeated requests, while the latter supports a broader range of applications. Technologies such as
REST Rest or REST may refer to: Relief from activity * Sleep ** Bed rest * Kneeling * Lying (position) * Sitting * Squatting position Structural support * Structural support ** Rest (cue sports) ** Armrest ** Headrest ** Footrest Arts and entert ...
and
SOAP Soap is a salt of a fatty acid used in a variety of cleansing and lubricating products. In a domestic setting, soaps are surfactants usually used for washing, bathing, and other types of housekeeping. In industrial settings, soaps are used ...
, which use HTTP as a basis for general computer-to-computer communication, as well as support for
WebDAV WebDAV (Web Distributed Authoring and Versioning) is a set of extensions to the Hypertext Transfer Protocol (HTTP), which allows user agents to collaboratively author contents ''directly'' in an HTTP web server by providing facilities for conc ...
extensions, have extended the application of web servers well beyond their original purpose of serving human-readable pages.


History

This is a very brief history of ''web server programs'', so some information necessarily overlaps with the histories of the ''web browsers'', the ''World Wide Web'' and the ''Internet''; therefore, for the sake of clearness and understandability, some key historical information below reported may be similar to that found also in one or more of the above-mentioned history articles.


Initial WWW project (1989-1991)

In March 1989,
Sir Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profess ...
proposed a new project to his employer
CERN The European Organization for Nuclear Research, known as CERN (; ; ), is an intergovernmental organization that operates the largest particle physics laboratory in the world. Established in 1954, it is based in a northwestern suburb of Gene ...
, with the goal of easing the exchange of information between scientists by using a
hypertext Hypertext is text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typically ac ...
system. The proposal titled ''"HyperText and CERN"'', asked for comments and it was read by several people. In October 1990 the proposal was reformulated and enriched (having as co-author
Robert Cailliau Robert Cailliau (, born 26 January 1947) is a Belgian informatics engineer, computer scientist and author who proposed the first (pre-www) hypertext system for CERN in 1987 and collaborated with Tim Berners-Lee on the World Wide Web (jointly wi ...
), and finally, it was approved. Between late 1990 and early 1991 the project resulted in Berners-Lee and his developers writing and testing several software libraries along with three programs, which initially ran on NeXTSTEP OS installed on
NeXT Next may refer to: Arts and entertainment Film * ''Next'' (1990 film), an animated short about William Shakespeare * ''Next'' (2007 film), a sci-fi film starring Nicolas Cage * '' Next: A Primer on Urban Painting'', a 2005 documentary film Lit ...
workstations: * a graphical
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
, called
WorldWideWeb WorldWideWeb (later renamed Nexus to avoid confusion between the software and the World Wide Web) is the first web browser and web page editor. It was discontinued in 1994. It was the first WYSIWYG HTML editor. The source code was released int ...
; * a portable line mode web browser; * a web server, later known as CERN httpd. Those early browsers retrieved web pages from web server(s) using a new basic communication protocol that was named HTTP 0.9. In August 1991 Tim Berner-Lee announced the ''birth of WWW technology'' and encouraged scientists to adopt and develop it. Soon after, those programs, along with their
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
, were made available to people interested in their usage. In practice CERN informally allowed other people, including developers, etc., to play with and maybe further develop what it has been made till that moment. This was the official birth of CERN httpd. Since then Berner-Lee started promoting the adoption and the usage of those programs along with their
porting In software engineering, porting is the process of adapting software for the purpose of achieving some form of execution in a computing environment that is different from the one that a given program (meant for such execution) was originally desi ...
to other OSs.


Fast and wild development (1991-1995)

In December 1991 the was installed at SLAC (U.S.A.). This was a very important event because it started trans-continental web communications between web browsers and web servers. In 1991-1993 CERN web server program continued to be actively developed by the www group, meanwhile, thanks to the availability of its source code and the public specifications of the HTTP protocol, many other implementations of web servers started to be developed. In April 1993 CERN issued a public official statement stating that the three components of Web software (the basic line-mode client, the web server and the library of common code), along with their
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the w ...
, were put in the
public domain The public domain (PD) consists of all the creative work to which no exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly waived, or may be inapplicable. Because those rights have expired, ...
. This statement freed web server developers from any possible legal issue about the development of ''derivative work'' based on that source code (a threat that in practice never existed). At the beginning of 1994, the most notable among new web servers was
NCSA httpd NCSA HTTPd is an early, now discontinued, web server originally developed at the NCSA at the University of Illinois at Urbana–Champaign by Robert McCool and others. First released in 1993, it was among the earliest web servers developed, follo ...
which ran on a variety of
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
-based OSs and could serve ''dynamically generated content'' by implementing the POST HTTP method and the CGI to communicate with external programs. These capabilities, along with the multimedia features of NCSA's Mosaic browser (also able to manage
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
FORM Form is the shape, visual appearance, or configuration of an object. In a wider sense, the form is the way something happens. Form also refers to: *Form (document), a document (printed or electronic) with spaces in which to write or enter data * ...
s in order to send data to a web server) highlighted the potential of web technology for publishing and
distributed computing A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sc ...
applications. In the second half of 1994, the development of NCSA httpd stalled to the point that a group of external software developers,
webmaster A webmaster is a person responsible for maintaining one or more websites. The title may refer to web architects, web developers, site authors, website administrators, website owners, website coordinators, or website publishers. The duties of ...
s and other professional figures interested in that server, started to write and collect patches thanks to the NCSA httpd source code being available to the public domain. At the beginning of 1995 those patches were all applied to the last release of NCSA source code and, after several tests, the
Apache HTTP server The Apache HTTP Server ( ) is a free and open-source cross-platform web server software, released under the terms of Apache License 2.0. Apache is developed and maintained by an open community of developers under the auspices of the Apache Sof ...
project was started. At the end of 1994 a new commercial web server, named Netsite, was released with specific features. It was the first one of many other similar products that were developed first by
Netscape Netscape Communications Corporation (originally Mosaic Communications Corporation) was an American independent computer services company with headquarters in Mountain View, California and then Dulles, Virginia. Its Netscape web browser was onc ...
, then also by
Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, th ...
, and finally by
Oracle Corporation Oracle Corporation is an American multinational computer technology corporation headquartered in Austin, Texas. In 2020, Oracle was the third-largest software company in the world by revenue and market capitalization. The company sells da ...
. In mid-1995 the first version of IIS was released, for Windows NT OS, by
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washingt ...
. This was a notable event because marked the entry, in the field of World Wide Web technologies, of a very important commercial developer and vendor that has played and still is playing a key role on both sides (client and server) of the web. In the second half of 1995 CERN and NCSA web servers started to decline (in global percentage usage) because of the widespread adoption of new web servers which had a much faster development cycle along with more features, more fixes applied, and more performances than the previous ones.


Explosive growth and competition (1996-2014)

At the end of 1996 there were already over fifty known (different) web server software programs that were available to everybody who wanted to own an Internet
domain name A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As ...
and/or to host websites. Many of them lived only shortly and were replaced by other web servers. The publication of RFCs about protocol versions HTTP/1.0 (1996) and HTTP/1.1 (1997, 1999), forced most web servers to comply (not always completely) with those standards. The use of TCP/IP persistent connections (HTTP/1.1) required web servers both to increase a lot the maximum number of concurrent connections allowed and to improve their level of scalability. Between 1996 and 1999
Netscape Enterprise Server Oracle iPlanet Web Server (OiWS) is a web server designed for medium and large business applications. Previous versions were marketed as Netscape Enterprise Server, iPlanet Web Server, Sun ONE Web Server, and Sun Java System Web Server. Oracle ...
and Microsoft's IIS emerged among the leading commercial options whereas among the freely available and
open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...
programs Apache HTTP Server held the lead as the preferred server (because of its reliability and its many features). In those years there was also another commercial, highly innovative and thus notable web server called
Zeus Zeus or , , ; grc, Δῐός, ''Diós'', label=genitive Boeotian Aeolic and Laconian grc-dor, Δεύς, Deús ; grc, Δέος, ''Déos'', label=genitive el, Δίας, ''Días'' () is the sky and thunder god in ancient Greek religion, ...
(''now discontinued'') that was known as one of the fastest and most scalable web servers available on market, at least till the first decade of 2000s, despite its low percentage of usage. Apache resulted in the most used web server from mid-1996 to the end of 2015 when, after a few years of decline, it was surpassed initially by IIS and then by Nginx. Afterward IIS dropped to much lower percentages of usage than Apache (see also
market share Market share is the percentage of the total revenue or sales in a market that a company's business makes up. For example, if there are 50,000 units sold per year in a given industry, a company whose sales were 5,000 of those units would have a ...
). From 2005-2006 Apache started to improve its speed and its scalability level by introducing new performance features (e.g. event MPM and new content cache). As those new performance improvements initially were marked as experimental, they were not enabled by its users for a long time and so Apache suffered, even more, the competition of commercial servers and, above all, of other open-source servers which meanwhile had already achieved far superior performances (mostly when serving static content) since the beginning of their development and at the time of the Apache decline were able to offer also a long enough list of well tested advanced features. In fact, a few years after 2000 started, not only other commercial and highly competitive web servers, e.g.
LiteSpeed Litespeed is a U.S. bicycle manufacturer founded in 1986 in Ooltewah, Tennessee by David Lynskey. Litespeed makes titanium and carbon fiber frame road racing bicycles and mountain bikes. Titanium bicycle frames are famed for their ride quality ...
, but also many other open-source programs, often of excellent quality and very high performances, among which should be noted
Hiawatha Hiawatha ( , also : ), also known as Ayenwathaaa or Aiionwatha, was a precolonial Native American leader and co-founder of the Iroquois Confederacy. He was a leader of the Onondaga people, the Mohawk people, or both. According to some account ...
, Cherokee HTTP server,
Lighttpd lighttpd (pronounced "lighty") is an open-source web server optimized for speed-critical environments while remaining standards-compliant, secure and flexible. It was originally written by Jan Kneschke as a proof-of-concept of the c10k problem � ...
,
Nginx Nginx (pronounced "engine x" ) is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. The software was created by Igor Sysoev and publicly released in 2004. Nginx is free and open-source software, ...
and other derived/related products also available with commercial support, emerged. Around 2007-2008 most popular web browsers increased their previous default limit of 2 persistent connections per host-domain (a limit recommended by RFC-2616) to 4, 6 or 8 persistent connections per host-domain, in order to speed up the retrieval of heavy web pages with lots of images, and to mitigate the problem of the shortage of persistent connections dedicated to dynamic objects used for bi-directional notifications of events in web pages. Within a year, these changes, on average, nearly tripled the maximum number of persistent connections that web servers had to manage. This trend (of increasing the number of persistent connections) definitely gave a strong impetus to the adoption of reverse proxies in front of slower web servers and it gave also one more chance to the emerging new web servers that could show all their speed and their capability to handle very high numbers of concurrent connections without requiring too many hardware resources (expensive computers with lots of CPUs, RAM and fast disks).


New challenges (2015 and later years)

In 2015, RFCs published new protocol version TTP/2 and as the implementation of new specifications was not trivial at all, a dilemma arose among developers of less popular web servers (e.g. with a percentage of usage lower than 1% .. 2%), about adding or not adding support for that new protocol version. In fact supporting HTTP/2 often required radical changes to their internal implementation due to many factors (practically always required encrypted connections, capability to distinguish between HTTP/1.x and HTTP/2 connections on the same TCP port, binary representation of HTTP messages, message priority, compression of HTTP headers, use of streams also known as TCP/IP sub-connections and related flow-control, etc.) and so a few developers of those web servers opted for not supporting new HTTP/2 version (at least in the near future) also because of these main reasons: * protocols HTTP/1.x would have been supported anyway by browsers for a very long time (maybe forever) so that there would be no incompatibility between clients and servers in next future; * implementing HTTP/2 was considered a task of overwhelming complexity that could open the door to a whole new class of bugs that till 2015 did not exist and so it would have required notable investments in developing and testing the implementation of the new protocol; * adding HTTP/2 support could always be done in future in case the efforts would be justified. Instead, developers of most popular web servers, rushed to offer the availability of new protocol, not only because they had the work force and the time to do so, but also because usually their previous implementation of
SPDY SPDY (pronounced "speedy") is an obsolete open-specification communication protocol developed for transporting web content. SPDY became the basis for HTTP/2 specification. However, HTTP/2 diverged from SPDY and eventually HTTP/2 subsumed all use ...
protocol could be reused as a starting point and because most used web browsers implemented it very quickly for the same reason. Another reason that prompted those developers to act quickly was that webmasters felt the pressure of the ever increasing
web traffic Web traffic is the data sent and received by visitors to a website. Since the mid-1990s, web traffic has been the largest portion of Internet traffic. Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are ...
and they really wanted to install and to try - as soon as possible - something that could drastically lower the number of TCP/IP connections and speedup accesses to hosted websites. In 2020–2021 the HTTP/2 dynamics about its implementation (by top web servers and popular web browsers) were partly replicated after the publication of advanced drafts of future RFC about
HTTP/3 HTTP/3 is the third major version of the Hypertext Transfer Protocol used to exchange information on the World Wide Web, complementing the widely-deployed HTTP/1.1 and HTTP/2. Unlike previous versions which relied on the well-established TCP ( ...
protocol.


Technical overview

The following technical overview should be considered only as an attempt to give a few very ''limited examples'' about ''some'' features that may be implemented in a web server and ''some'' of the tasks that it may perform in order to have a sufficiently wide scenario about the topic. A web server program plays the role of a server in a
client–server model The client–server model is a distributed application structure that partitions tasks or workloads between the providers of a resource or service, called servers, and service requesters, called clients. Often clients and servers communicate ove ...
by implementing one or more versions of HTTP protocol, often including the HTTPS secure variant and other features and extensions that are considered useful for its planned usage. The complexity and the efficiency of a web server program may vary a lot depending on (e.g.): * common features implemented; * common tasks performed; *
performances A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Management science In the work place ...
and scalability level aimed as a goal; *
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists o ...
model and techniques adopted to achieve wished performance and scalability level; * target hardware and category of usage, e.g. embedded system, low-medium traffic web server, high traffic
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
web server.


Common features

Although web server programs differ in how they are implemented, most of them offer the following common features. These are basic features that most web servers usually have. * Static content serving: to be able to serve static content (web files) to clients via HTTP protocol. *
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
: support for one or more versions of HTTP protocol in order to send versions of HTTP responses compatible with versions of client HTTP requests, e.g. HTTP/1.0, HTTP/1.1 (eventually also with
encrypted In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can deci ...
connections
HTTPS Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is enc ...
), plus, if available,
HTTP/2 HTTP/2 (originally named HTTP/2.0) is a major revision of the HTTP network protocol used by the World Wide Web. It was derived from the earlier experimental SPDY protocol, originally developed by Google. HTTP/2 was developed by the HTTP Working ...
,
HTTP/3 HTTP/3 is the third major version of the Hypertext Transfer Protocol used to exchange information on the World Wide Web, complementing the widely-deployed HTTP/1.1 and HTTP/2. Unlike previous versions which relied on the well-established TCP ( ...
. *
Logging Logging is the process of cutting, processing, and moving trees to a location for transport. It may include skidding, on-site processing, and loading of trees or logs onto trucks or skeleton cars. Logging is the beginning of a supply chain ...
: usually web servers have also the capability of logging some information, about client requests and server responses, to
log files In computing, logging is the act of keeping a log of events that occur in a computer system, such as problems, errors or just information on current operations. These events may occur in the operating system or in other software. A message or l ...
for security and statistical purposes. A few other more advanced and popular features (''only a very short selection'') are the following ones. * Dynamic content serving: to be able to serve dynamic content (generated on the fly) to clients via HTTP protocol. *
Virtual hosting Virtual hosting is a method for hosting multiple domain names (with separate handling of each name) on a single server (or pool of servers). This allows one server to share its resources, such as memory and processor cycles, without requiring al ...
: to be able to serve many websites (
domain name A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As ...
s) using only one
IP address An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
. *
Authorization Authorization or authorisation (see spelling differences) is the function of specifying access rights/privileges to resources, which is related to general information security and computer security, and to access control in particular. More for ...
: to be able to allow, to forbid or to authorize access to portions of website paths (web resources). * Content cache: to be able to cache static and/or dynamic content in order to speed up server responses; *
Large file support Large-file support (LFS) is the term frequently applied to the ability to create files larger than either 2 or 4  GiB on 32-bit filesystems. Details Traditionally, many operating systems and their underlying file system implementations used ...
: to be able to serve files whose size is greater than 2 GB on 32 bit OS. *
Bandwidth throttling Bandwidth throttling consists in the intentional limitation of the communication speed (bytes or kilobytes per second) of the ingoing (received) data and/or in the limitation of the speed of outgoing (sent) data in a network node or in a network ...
: to limit the speed of content responses in order to not saturate the network and to be able to serve more clients; *
Rewrite engine In web applications, a rewrite engine is a software component that performs rewriting on URLs (Uniform Resource Locators), modifying their appearance. This modification is called URL rewriting. It is a way of implementing URL mapping or routing ...
: to map parts of
clean URL Clean URLs, also sometimes referred to as RESTful URLs, user-friendly URLs, pretty URLs or search engine-friendly URLs, are URLs intended to improve the usability and accessibility of a website or web service by being immediately and intuitively m ...
s (found in client requests) to their real names. *
Custom error page In computer network communications, the HTTP 404, 404 not found, 404, 404 error, page not found or file not found error message is a hypertext transfer protocol (HTTP) standard response code, to indicate that the browser was able to commun ...
s: support for customized HTTP error messages.


Common tasks

A web server program, when it is running, usually performs several general tasks, (e.g.): * starts, optionally reads and applies settings found in its
configuration file In computing, configuration files (commonly known simply as config files) are files used to configure the parameters and initial settings for some computer programs. They are used for user applications, server processes and operating system se ...
(s) or elsewhere, optionally opens log file, starts listening to client connections / requests; * optionally tries to adapt its general behavior according to its settings and its current operating conditions; * manages client connection(s) (accepting new ones or closing the existing ones as required); * receives client requests (by reading HTTP messages): ** reads and verify each HTTP request message; ** usually performs
URL normalization URI normalization is the process by which URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine if two syntactically differen ...
; ** usually performs
URL mapping A web framework (WF) or web application framework (WAF) is a software framework that is designed to support the development of web applications including web services, web resources, and web APIs. Web frameworks provide a standard way to build and ...
(which may default to URL path translation); ** usually performs URL path translation along with various security checks; * executes or refuses requested HTTP method: ** optionally manages URL authorizations; ** optionally manages
URL redirection URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. ...
s; ** optionally manages requests for static resources (file contents): *** optionally manages directory index files; *** optionally manages regular files; ** optionally manages requests for dynamic resources: *** optionally manages directory listings; *** optionally manages program or module processing, checking the availability, the start and eventually the stop of the execution of external programs used to generate dynamic content; *** optionally manages the communications with external programs / internal modules used to generate dynamic content; * replies to client requests sending proper HTTP responses (e.g. requested resources or error messages) eventually verifying or adding
HTTP headers The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
to those sent by dynamic programs / modules; * optionally logs (partially or totally) client requests and/or its responses to an external user log file or to a system log file by
syslog In computing, syslog is a standard for message logging. It allows separation of the software that generates messages, the system that stores them, and the software that reports and analyzes them. Each message is labeled with a facility code, i ...
, usually using
common log format For computer log management, the Common Log Format, also known as the NCSA Common log format, (after NCSA HTTPd) is a standardized text file format used by web servers when generating server log files. Because the format is standardized, the fi ...
; * optionally logs process messages about detected anomalies or other notable events (e.g. in client requests or in its internal functioning) using syslog or some other system facilities; these log messages usually have a debug, warning, error, alert level which can be filtered (not logged) depending on some settings, see also severity level; * optionally generates statistics about web traffic managed and/or its performances; * other custom tasks.


Read request message

Web server programs are able: * to read an HTTP request message; * to interpret it; * to verify its syntax; * to identify known
HTTP headers The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
and to extract their values from them. Once an HTTP request message has been decoded and verified, its values can be used to determine whether that request can be satisfied or not. This requires many other steps, including security checks.


URL normalization

Web server programs usually perform some type of
URL normalization URI normalization is the process by which URIs are modified and standardized in a consistent manner. The goal of the normalization process is to transform a URI into a normalized URI so it is possible to determine if two syntactically differen ...
(
URL A Uniform Resource Locator (URL), colloquially termed as a web address, is a reference to a web resource that specifies its location on a computer network and a mechanism for retrieving it. A URL is a specific type of Uniform Resource Identifi ...
found in most HTTP request messages) in order: * to make resource path always a clean uniform path from root directory of website; * to lower security risks (e.g. by intercepting more easily attempts to access static resources outside the root directory of the website or to access to portions of path below website root directory that are forbidden or which require authorization); * to make path of web resources more recognizable by human beings and web log analysis programs (also known as log analyzers / statistical applications). The term ''URL normalization'' refers to the process of modifying and standardizing a URL in a consistent manner. There are several types of normalization that may be performed, including the conversion of the scheme and host to lowercase. Among the most important normalizations are the removal of "." and ".." path segments and adding trailing slashes to a non-empty path component.


URL mapping

"URL mapping is the process by which a URL is analyzed to figure out what resource it is referring to, so that that resource can be returned to the requesting client. This process is performed with every request that is made to a web server, with some of the requests being served with a file, such as an HTML document, or a gif image, others with the results of running a CGI program, and others by some other process, such as a built-in module handler, a PHP document, or a Java servlet." In practice, web server programs that implement advanced features, beyond the simple ''static content serving'' (e.g. URL rewrite engine, dynamic content serving), usually have to figure out how that URL has to be handled, e.g.: * as a
URL redirection URL redirection, also called URL forwarding, is a World Wide Web technique for making a web page available under more than one URL address. When a web browser attempts to open a URL that has been redirected, a page with a different URL is opened. ...
, a redirection to another URL; * as a ''static request'' of
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
content; * as a ''dynamic request'' of: **
directory Directory may refer to: * Directory (computing), or folder, a file system structure in which to store computer files * Directory (OpenVMS command) * Directory service, a software application for organizing information about a computer network's u ...
listing of files or other sub-directories contained in that directory; ** other types of dynamic request in order to identify the program / module processor able to handle that kind of URL path and to pass to it other URL parts, i.e. usually path-info and
query string A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML, cho ...
variables. One or more configuration files of web server may specify the mapping of parts of URL path (e.g. initial parts of
file path A path is a string of characters used to uniquely identify a location in a directory structure. It is composed by following the directory tree hierarchy in which components, separated by a delimiting character, represent each directory. The d ...
,
filename extension A filename extension, file name extension or file extension is a suffix to the name of a computer file (e.g., .txt, .docx, .md). The extension indicates a characteristic of the file contents or its intended use. A filename extension is typically d ...
and other path components) to a specific URL handler (file, directory, external program or internal module). When a web server implements one or more of the above-mentioned advanced features then the path part of a valid URL may not always match an existing file system path under website directory tree (a file or a directory in
file system In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is stored and retrieved. Without a file system, data placed in a storage medium would be one lar ...
) because it can refer to a virtual name of an internal or external module processor for dynamic requests.


URL path translation to file system

Web server programs are able to translate an URL path (all or part of it), that refers to a physical file system path, to an absolute path under the target website's root directory. Website's root directory may be specified by a configuration file or by some internal rule of the web server by using the name of the website which is the
host A host is a person responsible for guests at an event or for providing hospitality during it. Host may also refer to: Places * Host, Pennsylvania, a village in Berks County People *Jim Host (born 1937), American businessman *Michel Host ...
part of the URL found in HTTP client request. Path translation to file system is done for the following types of web resources: * a local, usually non-executable, file (static request for file content); * a local directory (dynamic request: directory listing generated on the fly); * a program name (dynamic requests that is executed using CGI or SCGI interface and whose output is read by web server and resent to client who made the HTTP request). The web server appends the path found in requested URL (HTTP request message) and appends it to the path of the (Host) website root directory. On an
Apache server The Apache HTTP Server ( ) is a free and open-source cross-platform web server software, released under the terms of Apache License 2.0. Apache is developed and maintained by an open community of developers under the auspices of the Apache Sof ...
, this is commonly /home/www/website (on
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
machines, usually it is: /var/www/website). See the following examples of how it may result. URL path translation for a static file request Example of a ''static request'' of an existing file specified by the following URL: http://www.example.com/path/file.html The client's
user agent In computing, a user agent is any software, acting on behalf of a user, which "retrieves, renders and facilitates end-user interaction with Web content". A user agent is therefore a special kind of software agent. Some prominent examples of us ...
connects to www.example.com and then sends the following
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
/1.1 request: GET /path/file.html HTTP/1.1 Host: www.example.com Connection: keep-alive The result is the local file system resource: /home/www/www.example.com/path/file.html The web server then reads the
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
, if it exists, and sends a response to the client's web browser. The response will describe the content of the file and contain the file itself or an error message will return saying that the file does not exist or its access is forbidden. URL path translation for a directory request (without a static index file) Example of an implicit ''dynamic request'' of an existing directory specified by the following URL: http://www.example.com/directory1/directory2/ The client's
user agent In computing, a user agent is any software, acting on behalf of a user, which "retrieves, renders and facilitates end-user interaction with Web content". A user agent is therefore a special kind of software agent. Some prominent examples of us ...
connects to www.example.com and then sends the following
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
/1.1 request: GET /directory1/directory2 HTTP/1.1 Host: www.example.com Connection: keep-alive The result is the local directory path: /home/www/www.example.com/directory1/directory2/ The web server then verifies the existence of the
directory Directory may refer to: * Directory (computing), or folder, a file system structure in which to store computer files * Directory (OpenVMS command) * Directory service, a software application for organizing information about a computer network's u ...
and if it exists and it can be accessed then tries to find out an index file (which in this case does not exist) and so it passes the request to an internal module or a program dedicated to directory listings and finally reads data output and sends a response to the client's web browser. The response will describe the content of the directory (list of contained subdirectories and files) or an error message will return saying that the directory does not exist or its access is forbidden. URL path translation for a dynamic program request For a ''dynamic request'' the URL path specified by the client should refer to an existing external program (usually an executable file with a CGI) used by the web server to generate dynamic content. Example of a ''dynamic request'' using a program file to generate output: http://www.example.com/cgi-bin/forum.php?action=view&orderby=thread&date=2021-10-15 The client's
user agent In computing, a user agent is any software, acting on behalf of a user, which "retrieves, renders and facilitates end-user interaction with Web content". A user agent is therefore a special kind of software agent. Some prominent examples of us ...
connects to www.example.com and then sends the following
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
/1.1 request: GET /cgi-bin/forum.php?action=view&ordeby=thread&date=2021-10-15 HTTP/1.1 Host: www.example.com Connection: keep-alive The result is the local file path of the program (in this example, a
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
program): /home/www/www.example.com/cgi-bin/forum.php The web server executes that program, passing in the path-info and the
query string A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML, cho ...
action=view&orderby=thread&date=2021-10-15 so that the program has the info it needs to run. (In this case, it will return an HTML document containing a view of forum entries ordered by thread from October 15th, 2021). In addition to this, the web server reads data sent from the external program and resends that data to the client that made the request.


Manage request message

Once a request has been read, interpreted, and verified, it has to be managed depending on its method, its URL, and its parameters, which may include values of HTTP headers. In practice, the web server has to handle the request by using one of these response paths: * if something in request was not acceptable (in status line or message headers), web server already sent an error response; * if request has a method (e.g. OPTIONS) that can be satisfied by general code of web server then a successful response is sent; * if URL requires authorization then an authorization error message is sent; * if URL maps to a redirection then a redirect message is sent; * if URL maps to a dynamic resource (a virtual path or a directory listing) then its handler (an internal module or an external program) is called and request parameters (query string and path info) are passed to it in order to allow it to reply to that request; * if URL maps to a static resource (usually a file on file system) then the internal static handler is called to send that file; * if request method is not known or if there is some other unacceptable condition (e.g. resource not found, internal server error, etc.) then an error response is sent.


Serve static content

If a web server program is capable of serving static content and it has been configured to do so, then it is able to send file content whenever a request message has a valid URL path matching (after URL mapping, URL translation and URL redirection) that of an existing file under the root directory of a website and file has attributes which match those required by internal rules of web server program. That kind of content is called ''static'' because usually it is not changed by the web server when it is sent to clients and because it remains the same until it is modified (file modification) by some program. NOTE: when serving static content only, a web server program usually does not change file contents of served websites (as they are only read and never written) and so it suffices to support only these
HTTP method The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
s: * OPTIONS * HEAD * GET Response of static file content can be sped up by a file cache.


= Directory index files

= If a web server program receives a client request message with an URL whose path matches one of an existing ''directory'' and that directory is accessible and serving directory index file(s) is enabled then a web server program may try to serve the first of known (or configured) static index file names (a
regular file The seven standard Unix file types are ''regular'', ''directory'', ''symbolic link'', ''FIFO special'', ''block special'', ''character special'', and ''socket'' as defined by POSIX. Different OS-specific implementations allow more types than what PO ...
) found in that directory; if no index file is found or other conditions are not met then an error message is returned. Most used names for static index files are: index.html, index.htm and Default.htm.


= Regular files

= If a web server program receives a client request message with an URL whose path matches the file name of an existing ''file'' and that file is accessible by web server program and its attributes match internal rules of web server program, then web server program can send that file to client. Usually, for security reasons, most web server programs are pre-configured to serve only
regular file The seven standard Unix file types are ''regular'', ''directory'', ''symbolic link'', ''FIFO special'', ''block special'', ''character special'', and ''socket'' as defined by POSIX. Different OS-specific implementations allow more types than what PO ...
s or to avoid to use ''special file types'' like
device file In Unix-like operating systems, a device file or special file is an interface to a device driver that appears in a file system as if it were an ordinary file. There are also special files in DOS, OS/2, and Windows. These special files allow a ...
s, along with
symbolic link In computing, a symbolic link (also symlink or soft link) is a file whose purpose is to point to a file or directory (called the "target") by specifying a path thereto. Symbolic links are supported by POSIX and by most Unix-like operating syst ...
s or
hard link In computing, a hard link is a directory entry (in a directory-based file system) that associates a name with a file. Thus, each file must have at least one hard link. Creating additional hard links for a file makes the contents of that file acc ...
s to them. The aim is to avoid undesirable side effects when serving static web resources.


Serve dynamic content

If a web server program is capable of serving dynamic content and it has been configured to do so, then it is able to communicate with the proper internal module or external program (associated with the requested URL path) in order to pass to it parameters of client request; after that, web server program reads from it its data response (that it has generated, often on the fly) and then it resends it to the client program who made the request. NOTE: when serving static and dynamic content, a web server program usually has to support also the following HTTP method in order to be able to safely receive data from client(s) and so to be able to host also websites with interactive form(s) that may send large data sets (e.g. lots of
data entry Data entry is the process of digitizing data by entering it into a computer system for organization and management purposes. It is a person-based process and is "one of the important basic" tasks needed when no machine-readable version of the inf ...
or file uploads) to web server / external programs / modules: * POST In order to be able to communicate with its internal modules and/or external programs, a web server program must have implemented one or more of the many available gateway interface(s) (see also Web Server Gateway Interfaces used for dynamic content). The three standard and historical gateway interfaces are the following ones. ; CGI : An external CGI program is run by web server program for each dynamic request, then web server program reads from it the generated data response and then resends it to client. ;
SCGI The Simple Common Gateway Interface (SCGI) is a protocol for applications to interface with HTTP servers, as an alternative to the CGI protocol. It is similar to FastCGI but is designed to be easier to parse. Unlike CGI, it permits a long-running ...
: An external SCGI program (it usually is a process) is started once by web server program or by some other program / process and then it waits for network connections; every time there is a new request for it, web server program makes a new network connection to it in order to send request parameters and to read its data response, then network connection is closed. ;
FastCGI FastCGI is a binary protocol for interfacing interactive programs with a web server. It is a variation on the earlier Common Gateway Interface (CGI). FastCGI's main aim is to reduce the overhead related to interfacing between web server and CGI p ...
: An external FastCGI program (it usually is a process) is started once by web server program or by some other program / process and then it waits for a network connection which is established permanently by web server; through that connection are sent the request parameters and read data responses.


= Directory listings

= A web server program may be capable to manage the dynamic generation (on the fly) of a directory index list of files and sub-directories. If a web server program is configured to do so and a requested URL path matches an existing directory and its access is allowed and no static index file is found under that directory then a web page (usually in HTML format), containing the list of files and/or subdirectories of above mentioned directory, is ''dynamically generated'' (on the fly). If it cannot be generated an error is returned. Some web server programs allow the customization of directory listings by allowing the usage of a web page template (an HTML document containing placeholders, e.g. $(FILE_NAME), $(FILE_SIZE), etc., that are replaced with the field values of each file entry found in directory by web server), e.g. index.tpl or the usage of HTML and embedded source code that is interpreted and executed on the fly, e.g. index.asp, and / or by supporting the usage of dynamic index programs such as CGIs, SCGIs, FGCIs, e.g. index.cgi, index.php, index.fcgi. Usage of dynamically generated ''directory listings'' is usually avoided or limited to a few selected directories of a website because that generation takes much more OS resources than sending a static index page. The main usage of ''directory listings'' is to allow the download of files (usually when their names, sizes, modification date-times or
file attribute File attributes are a type of meta-data that describe and may modify how files and/or directories in a filesystem behave. Typical file attributes may, for example, indicate or specify whether a file is visible, modifiable, compressed, or encrypted. ...
s may change randomly / frequently) ''as they are, without requiring to provide further information to requesting user''.


= Program or module processing

= An external program or an internal module (''processing unit'') can execute some sort of application function that may be used to get data from or to store data to one or more data repositories, e.g.: * files (file system); *
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
s (DBs); * other sources located in local computer or in other computers. A ''processing unit'' can return any kind of web content, also by using data retrieved from a data repository, e.g.: * a document (e.g.
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScript ...
,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
, etc.); * an image; * a video; * structured data, e.g. that may be used to update one or more values displayed by a dynamic page (
DHTML Dynamic HTML, or DHTML, is a term which was used by some browser vendors to describe the combination of HTML, style sheets and client-side scripts (JavaScript, VBScript, or any other supported scripts) that enabled the creation of interactive ...
) of a
web interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine fr ...
and that maybe was requested by an
XMLHttpRequest XMLHttpRequest (XHR) is an API in the form of an object whose methods transfer data between a web browser and a web server. The object is provided by the browser's JavaScript environment. Particularly, retrieval of data from XHR for the purpose ...
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
(see also: dynamic page). In practice whenever there is content that may vary, depending on one or more parameters contained in client request or in configuration settings, then, usually, it is generated dynamically.


Send response message

Web server programs are able to send response messages as replies to client request messages. An error response message may be sent because a request message could not be successfully read or decoded or analyzed or executed. NOTE: the following sections are reported only as examples to help to understand what a web server, more or less, does; these sections are by any means neither exhaustive nor complete.


Error message

A web server program may reply to a client request message with many kinds of error messages, anyway these errors are divided mainly in two categories: * HTTP client errors, due to the type of request message or to the availability of requested web resource; * HTTP server errors, due to internal server errors. When an error response / message is received by a client browser, then if it is related to the main user request (e.g. an URL of a web resource such as a web page) then usually that error message is shown in some browser window / message.


URL authorization

A web server program may be able to verify whether the requested URL path: * can be freely accessed by everybody; * requires a user authentication (request of user credentials, e.g. such as
user name A user is a person who utilizes a computer or network service. A user often has a user account and is identified to the system by a username (or user name). Other terms for username include login name, screenname (or screen name), accou ...
and
password A password, sometimes called a passcode (for example in Apple devices), is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of ...
); * access is forbidden to some or all kind of users. If the authorization / access rights feature has been implemented and enabled and access to web resource is not granted, then, depending on the required access rights, a web server program: * can deny access by sending a specific error message (e.g. access forbidden); * may deny access by sending a specific error message (e.g. access
unauthorized Authorization or authorisation (see spelling differences) is the function of specifying access rights/privileges to resources, which is related to general information security and computer security, and to access control in particular. More for ...
) that usually forces the client browser to ask human user to provide required user credentials; if authentication credentials are provided then web server program verifies and accepts or rejects them.


URL redirection

A web server program ''may'' have the capability of doing URL redirections to new URLs (new locations) which consists in replying to a client request message with a response message containing a new URL suited to access a valid or an existing web resource (client should redo the request with the new URL). URL redirection of location is used: * to fix a directory name by adding a final slash '/'; * to give a new URL for a no more existing URL path to a new path where that kind of web resource can be found. * to give a new URL to another domain when current domain has too much load. Example 1: a URL path points to a directory name but it does not have a final slash '/' so web server sends a redirect to client in order to instruct it to redo the request with the fixed path name. From:
  /directory1/directory2
To:
  /directory1/directory2/ Example 2: a whole set of documents has been moved inside website in order to reorganize their file system paths. From:
  /directory1/directory2/2021-10-08/
To:
  /directory1/directory2/2021/10/08/ Example 3: a whole set of documents has been moved to a new website and now it is mandatory to use secure HTTPS connections to access them. From:
  http://www.example.com/directory1/directory2/2021-10-08/
To:
  https://docs.example.com/directory1/2021-10-08/ Above examples are only a few of the possible kind of redirections.


Successful message

A web server program is able to reply to a valid client request message with a successful message, optionally containing requested web resource data. If web resource data is sent back to client, then it can be static content or dynamic content depending on how it has been retrieved (from a file or from the output of some program / module).


Content cache

In order to speed up web server responses by lowering average HTTP response times and hardware resources used, many popular web servers implement one or more content
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
s, each one specialized in a content category. Content is usually cached by its origin, e.g.: * static content: ** file cache; * dynamic content: ** dynamic cache (module / program output).


File cache

Historically, static contents found in
file File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to ''remove'' fine amounts of material from a workpiece **Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gent ...
s which had to be accessed frequently, randomly and quickly, have been stored mostly on electro-mechanical disks since mid-late 1960s / 1970s; regrettably reads from and writes to those kind of devices have always been considered very slow operations when compared to
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * Raj ...
speed and so, since early OSs, first disk caches and then also OS file
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
sub-systems were developed to speed up I/O operations of frequently accessed data / files. Even with the aid of an OS file cache, the relative / occasional slowness of I/O operations involving directories and files stored on disks became soon a
bottleneck Bottleneck literally refers to the narrowed portion (neck) of a bottle near its opening, which limit the rate of outflow, and may describe any object of a similar shape. The literal neck of a bottle was originally used to play what is now known as ...
in the increase of
performances A performance is an act of staging or presenting a play, concert, or other form of entertainment. It is also defined as the action or process of carrying out or accomplishing an action, task, or function. Management science In the work place ...
expected from top level web servers, specially since mid-late 1990s, when web Internet traffic started to grow exponentially along with the constant increase of speed of Internet / network lines. The problem about how to further efficiently speed-up the serving of static files, thus increasing the maximum number of requests/responses per second ( RPS), started to be studied / researched since mid 1990s, with the aim to propose useful cache models that could be implemented in web server programs. In practice, nowadays, many popular / high performance web server programs include their own '' userland'' file cache, tailored for a web server usage and using their specific implementation and parameters. The wide spread adoption of
RAID Raid, RAID or Raids may refer to: Attack * Raid (military), a sudden attack behind the enemy's lines without the intention of holding ground * Corporate raid, a type of hostile takeover in business * Panty raid, a prankish raid by male college ...
and/or fast
solid-state drive A solid-state drive (SSD) is a solid-state storage device that uses integrated circuit assemblies to store data Persistence (computer science), persistently, typically using flash memory, and functioning as secondary storage in the Computer ...
s (storage hardware with very high I/O speed) has slightly reduced but of course not eliminated the advantage of having a file cache incorporated in a web server.


Dynamic cache

Dynamic content, output by an internal module or an external program, may not always change very frequently (given a unique URL with keys / parameters) and so, maybe for a while (e.g. from 1 second to several hours or more), the resulting output can be cached in RAM or even on a fast disk. The typical usage of a dynamic cache is when a website has
dynamic web page A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, and includin ...
s about news, weather, images, maps, etc. that do not change frequently (e.g. every ''n'' minutes) and that are accessed by a huge number of clients per minute / hour; in those cases it is useful to return cached content too (without calling the internal module or the external program) because clients often do not have an updated copy of the requested content in their browser caches. Anyway, in most cases those kind of caches are implemented by external servers (e.g.
reverse proxy In computer networks, a reverse proxy is the application that sits in front of back-end applications and forwards client (e.g. browser) requests to those applications. Reverse proxies help increase scalability, performance, resilience and securi ...
) or by storing dynamic data output in separate computers, managed by specific applications (e.g.
memcached Memcached (pronounced variously ''mem-cash-dee'' or ''mem-cashed'') is a general-purpose distributed memory-caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of ...
), in order to not compete for hardware resources (CPU, RAM, disks) with web server(s).


Kernel-mode and user-mode web servers

A web server software can be either incorporated into the OS and executed in
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learni ...
space, or it can be executed in
user space A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour. Kernel ...
(like other regular applications). Web servers that run in
kernel mode In computer science, hierarchical protection domains, often called protection rings, are mechanisms to protect data and functionality from faults (by improving fault tolerance) and malicious behavior (by providing computer security). Comput ...
(usually called kernel space web servers) can have direct access to kernel resources and so they can be, in theory, faster than those running in user mode; anyway there are disadvantages in running a web server in kernel mode, e.g.: difficulties in developing (
debugging In computer programming and software development, debugging is the process of finding and resolving '' bugs'' (defects or problems that prevent correct operation) within computer programs, software, or systems. Debugging tactics can involve i ...
) software whereas run-time critical errors may lead to serious problems in OS kernel. Web servers that run in
user-mode In computer science, hierarchical protection domains, often called protection rings, are mechanisms to protect data and functionality from faults (by improving fault tolerance) and malicious behavior (by providing computer security). Compute ...
have to ask the system for permission to use more memory or more
CPU A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
resources. Not only do these requests to the kernel take time, but they might not always be satisfied because the system reserves resources for its own usage and has the responsibility to share hardware resources with all the other running applications. Executing in user mode can also mean using more buffer/data copies (between user-space and kernel-space) which can lead to a decrease in the performance of a user-mode web server. Nowadays almost all web server software is executed in user mode (because many of the aforementioned small disadvantages have been overcome by faster hardware, new OS versions, much faster OS
system calls In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, acc ...
and new optimized web server software). See also
comparison of web server software Web server software allows computers to act as web servers. The first web servers supported only static files, such as HTML (and images), but now they commonly allow embedding of server side applications. Some web application frameworks include s ...
to discover which of them run in kernel mode or in user mode (also referred as kernel space or user space).


Performances

To improve the user experience (on client / browser side), a web server should reply quickly (as soon as possible) to client requests; unless content response is throttled (by configuration) for some type of files (e.g. big or huge files), also returned data content should be sent as fast as possible (high transfer speed). In other words, a web server should always be very responsive, even under high load of web traffic, in order to keep total user's wait (sum of browser time + network time + web server response time) for a response as low as possible.


Performance metrics

For web server software, main key performance metrics (measured under vary operating conditions) usually are at least the following ones (i.e.): * (, similar to , depending on HTTP version and configuration, type of HTTP requests and other operating conditions); * (), is the number of connections per second accepted by web server (useful when using HTTP/1.0 or HTTP/1.1 with a very low limit of requests / responses per connection, i.e. 1 .. 20); * + response time for each new client request; usually benchmark tool shows how many requests have been satisfied within a scale of time laps (e.g. within 1ms, 3ms, 5ms, 10ms, 20ms, 30ms, 40ms) and / or the shortest, the average and the longest response time; * , in bytes per second. Among the operating conditions, the (1 .. ''n'') of used during a test is an important parameter because it allows to correlate the supported by web server with results of the tested performance metrics.


Software efficiency

The specific web server
software design Software design is the process by which an agent creates a specification of a software artifact intended to accomplish goals, using a set of primitive components and subject to constraints. Software design may refer to either "all the activity ...
and model adopted (e.g.): * single
process A process is a series or set of activities that interact to produce a result; it may occur once-only or be recurrent or periodic. Things called a process include: Business and management *Business process, activities that produce a specific se ...
or multi-process; * single thread (no thread) or multi-thread for each process; * usage of
coroutines Coroutines are computer program components that generalize subroutines for non-preemptive multitasking, by allowing execution to be suspended and resumed. Coroutines are well-suited for implementing familiar program components such as cooperativ ...
or not; ... and other programming techniques, such as (e.g.): *
zero copy 0 (zero) is a number representing an empty quantity. In place-value notation such as the Hindu–Arabic numeral system, 0 also serves as a placeholder numerical digit, which works by multiplying digits to the left of 0 by the radix, usuall ...
; * minimization of possible CPU cache misses; * minimization of possible CPU branch mispredictions in critical paths for speed; * minimization of the number of
system call In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the operating system on which it is executed. This may include hardware-related services (for example, ac ...
s used to perform a certain function / task; * other tricks; ... used to implement a web server program, can bias a lot the performances and in particular the
scalability Scalability is the property of a system to handle a growing amount of work by adding resources to the system. In an economic context, a scalable business model implies that a company can increase sales given increased resources. For example, a ...
level that can be achieved under heavy load or when using high end hardware (many CPUs, disks and lots of RAM). In practice some web server software models may require more OS resources (specially more CPUs and more RAM) than others to be able to work well and so to achieve target performances.


Operating conditions

There are many operating conditions that can affect the performances of a web server; performance values may vary depending on (i.e.): * the settings of web server (including the fact that log file is or is not enabled, etc.); * the HTTP version used by client requests; * the average HTTP request type (method, length of HTTP headers and optional body); * whether the requested content is static or dynamic; * whether the content is cached or not cached (by server and/or by client); * whether the content is compressed on the fly (when transferred), pre-compressed (i.e. when a file resource is stored on disk already compressed so that web server can send that file directly to the network with the only indication that its content is compressed) or not compressed at all; * whether the connections are or are not encrypted; * the average
network speed In computer networking, wire speed or wirespeed refers to the hypothetical peak physical layer net bit rate (useful information rate) of a cable (consisting of fiber-optical wires or copper wires) combined with a certain digital communication devic ...
between web server and its clients; * the number of active TCP connections; * the number of active processes managed by web server (including external CGI, SCGI, FCGI programs); * the hardware and
software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists o ...
limitations or settings of the OS of the computer(s) on which the web server runs; * other minor conditions.


Benchmarking

Performances of a web server are typically benchmarked by using one or more of the available automated load testing tools.


Load limits

A web server (program installation) usually has pre-defined load limits for each combination of operating conditions, also because it is limited by OS resources and because it can handle only a limited number of concurrent client connections (usually between 2 and several tens of thousands for each active web server process, see also the
C10k problem The C10k problem is the problem of optimizing network sockets to handle a large number of clients at the same time. The name C10k is a numeronym for concurrently handling ten thousand connections. Handling many concurrent connections is a differe ...
and the C10M problem). When a web server is near to or over its load limits, it gets overloaded and so it may become unresponsive.


Causes of overload

At any time web servers can be overloaded due to one or more of the following causes (e.g.). * Excess legitimate web traffic. Thousands or even millions of clients connecting to the website in a short amount of time, e.g.,
Slashdot effect The Slashdot effect, also known as slashdotting, occurs when a popular website links to a smaller website, causing a massive increase in traffic. This overloads the smaller site, causing it to slow down or even temporarily become unavailable. Thi ...
. *
Distributed Denial of Service In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host conne ...
attacks. A denial-of-service attack (DoS attack) or distributed denial-of-service attack (DDoS attack) is an attempt to make a computer or network resource unavailable to its intended users. *
Computer worm A computer worm is a standalone malware computer program that replicates itself in order to spread to other computers. It often uses a computer network to spread itself, relying on security failures on the target computer to access it. It w ...
s that sometimes cause abnormal traffic because of millions of infected computers (not coordinated among them). * XSS worms can cause high traffic because of millions of infected browsers or web servers. *
Internet bot An Internet bot, web robot, robot or simply bot, is a software application that runs automated tasks ( scripts) over the Internet, usually with the intent to imitate human activity on the Internet, such as messaging, on a large scale. An Internet ...
s Traffic not filtered/limited on large websites with very few network resources (e.g.
bandwidth Bandwidth commonly refers to: * Bandwidth (signal processing) or ''analog bandwidth'', ''frequency bandwidth'', or ''radio bandwidth'', a measure of the width of a frequency range * Bandwidth (computing), the rate of data transfer, bit rate or thr ...
) and/or hardware resources (CPUs, RAM, disks). *
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
(network) slowdowns (e.g. due to packet losses) so that client requests are served more slowly and the number of connections increases so much that server limits are reached. * Web servers, serving dynamic content, waiting for slow responses coming from back-end computer(s) (e.g.
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
s), maybe because of too many queries mixed with too many inserts or updates of DB data; in these cases web servers have to wait for back-end data responses before replying to HTTP clients but during these waits too many new client connections / requests arrive and so they become overloaded. * Web servers ( computers) partial unavailability. This can happen because of required or urgent maintenance or upgrade, hardware or software failures such as back-end (e.g.
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases span ...
) failures; in these cases the remaining web servers may get too much traffic and become overloaded.


Symptoms of overload

The symptoms of an overloaded web server are usually the following ones (e.g.). * Requests are served with (possibly long) delays (from 1 second to a few hundred seconds). * The web server returns an HTTP error code, such as 500, 502, 503, 504, 408, or even an intermittent 404. * The web server refuses or resets (interrupts) TCP connections before it returns any content. * In very rare cases, the web server returns only a part of the requested content. This behavior can be considered a bug, even if it usually arises as a symptom of overload.


Anti-overload techniques

To partially overcome above average load limits and to prevent overload, most popular websites use common techniques like the following ones (e.g.). * Tuning OS parameters for hardware capabilities and usage. * Tuning web server(s) parameters to improve their security and performances. * Deploying techniques (not only for static contents but, whenever possible, for dynamic contents too). * Managing network traffic, by using: **
Firewall Firewall may refer to: * Firewall (computing), a technological barrier designed to prevent unauthorized or unwanted communications between computer networks or hosts * Firewall (construction), a barrier inside a building, designed to limit the spre ...
s to block unwanted traffic coming from bad IP sources or having bad patterns; ** HTTP traffic managers to drop, redirect or rewrite requests having bad
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
patterns; **
Bandwidth management Bandwidth management is the process of measuring and controlling the communications (traffic, packets) on a network link, to avoid filling the link to capacity or overfilling the link,https://www.internetsociety.org/wp-content/uploads/2017/08/BWro ...
and
traffic shaping Traffic shaping is a bandwidth management technique used on computer networks which delays some or all datagrams to bring them into compliance with a desired ''traffic profile''. Traffic shaping is used to optimize or guarantee performance, impro ...
, in order to smooth down peaks in network usage. * Using different
domain name A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. As ...
s,
IP address An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
es and computers to serve different kinds (static and dynamic) of content; the aim is to separate big or huge files (download.*) (that domain might be replaced also by a CDN) from small and medium-sized files (static.*) and from main dynamic site (maybe where some contents are stored in a backend database) (www.*); the idea is to be able to efficiently serve big or huge (over 10 – 1000 MB) files (maybe throttling downloads) and to fully
cache Cache, caching, or caché may refer to: Places United States * Cache, Idaho, an unincorporated community * Cache, Illinois, an unincorporated community * Cache, Oklahoma, a city in Comanche County * Cache, Utah, Cache County, Utah * Cache Count ...
small and medium-sized files, without affecting performances of dynamic site under heavy load, by using different settings for each (group) of web server computers, e.g.: ** https://download.example.com ** https://static.example.com ** https://www.example.com * Using many web servers (computers) that are grouped together behind a
load balancer In computing, load balancing is the process of distributing a set of tasks over a set of resources (computing units), with the aim of making their overall processing more efficient. Load balancing can optimize the response time and avoid unevenl ...
so that they act or are seen as one big web server. * Adding more hardware resources (i.e.
RAM Ram, ram, or RAM may refer to: Animals * A male sheep * Ram cichlid, a freshwater tropical fish People * Ram (given name) * Ram (surname) * Ram (director) (Ramsubramaniam), an Indian Tamil film director * RAM (musician) (born 1974), Dutch * Raj ...
, fast disks) to each computer. * Using more efficient computer programs for web servers (see also: software efficiency). * Using the most efficient to process dynamic requests (spawning one or more external programs every time a dynamic page is retrieved, kills performances). * Using other programming techniques and
workaround A workaround is a bypass of a recognized problem or limitation in a system or policy. A workaround is typically a temporary fix that implies that a genuine solution to the problem is needed. But workarounds are frequently as creative as true solut ...
s, especially if dynamic content is involved, to speed up the HTTP responses (i.e. by avoiding dynamic calls to retrieve objects, such as style sheets, images and scripts), that never change or change very rarely, by copying that content to static files once and then keeping them synchronized with dynamic content). * Using latest efficient versions of
HTTP The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, w ...
(e.g. beyond using common HTTP/1.1 also by enabling
HTTP/2 HTTP/2 (originally named HTTP/2.0) is a major revision of the HTTP network protocol used by the World Wide Web. It was derived from the earlier experimental SPDY protocol, originally developed by Google. HTTP/2 was developed by the HTTP Working ...
and maybe
HTTP/3 HTTP/3 is the third major version of the Hypertext Transfer Protocol used to exchange information on the World Wide Web, complementing the widely-deployed HTTP/1.1 and HTTP/2. Unlike previous versions which relied on the well-established TCP ( ...
too, whenever available web server software has reliable support for the latter two protocols) in order to reduce a lot the number of TCP/IP connections started by each client and the size of data exchanged (because of more compact HTTP headers representation and maybe data compression). Caveats about using HTTP/2 and HTTP/3 protocols .


Market share

Below are the latest statistics of the market share of all sites of the top web servers on the Internet by
Netcraft Netcraft is an Internet services company based in Bath, Somerset, England. The company provides cybercrime disruption services across a range of industries. History Netcraft was founded by Mike Prettejohn. The company provides web server and ...
. NOTE: (*) percentage rounded to integer number, because its decimal values are not publicly reported by source page (only its rounded value is reported in graph).


See also

*
Server (computing) In computing, a server is a piece of computer hardware or software (computer program) that provides functionality for other programs or devices, called " clients". This architecture is called the client–server model. Servers can provide vario ...
*
Application server An application server is a server that hosts applications or software that delivers a business application through a communication protocol. An application server framework is a service layer model. It includes software components available to a ...
*
Comparison of web server software Web server software allows computers to act as web servers. The first web servers supported only static files, such as HTML (and images), but now they commonly allow embedding of server side applications. Some web application frameworks include s ...
*
HTTP server An HTTP server is a computer (software) program (or even a software component included in an other program) that plays the role of a server in a client–server model by implementing the ''server part'' of the HTTP and/or HTTPS network protoco ...
(core part of a web server program that serves HTTP requests) *
HTTP compression HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization. HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods ar ...
*
Web application A web application (or web app) is application software that is accessed using a web browser. Web applications are delivered on the World Wide Web to users with an active network connection. History In earlier computing models like client-serv ...
* Open source web application * List of AMP packages *
Variant object Variant objects in the context of HTTP are objects served by an Origin Content Server in a type of transmitted data variation (i.e. uncompressed, compressed, different languages, etc.). HTTP/1.1 (1997–1999) introduces Content/Accept headers. ...
*
Virtual hosting Virtual hosting is a method for hosting multiple domain names (with separate handling of each name) on a single server (or pool of servers). This allows one server to share its resources, such as memory and processor cycles, without requiring al ...
*
Web hosting service A web hosting service is a type of Internet hosting service that hosts websites for clients, i.e. it offers the facilities required for them to create and maintain a site and makes it accessible on the World Wide Web. Companies providing web h ...
*
Web container A web container (also known as a servlet container; and compare "webcontainer" ) is the component of a web server that interacts with Jakarta Servlets. A web container is responsible for managing the lifecycle of servlets, mapping a URL to a par ...
*
Web proxy In computer networking, a proxy server is a server application that acts as an intermediary between a client requesting a resource and the server providing that resource. Instead of connecting directly to a server that can fulfill a request ...
* Web service Standard Web Server Gateway Interfaces used for
dynamic content A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, and includin ...
s: * CGI Common Gateway Interface *
SCGI The Simple Common Gateway Interface (SCGI) is a protocol for applications to interface with HTTP servers, as an alternative to the CGI protocol. It is similar to FastCGI but is designed to be easier to parse. Unlike CGI, it permits a long-running ...
Simple Common Gateway Interface *
FastCGI FastCGI is a binary protocol for interfacing interactive programs with a web server. It is a variation on the earlier Common Gateway Interface (CGI). FastCGI's main aim is to reduce the overhead related to interfacing between web server and CGI p ...
Fast Common Gateway Interface A few other Web Server Interfaces (server or
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
specific) used for dynamic contents: * SSI Server Side Includes, rarely used, static HTML documents containing SSI directives are interpreted by server software to include small dynamic data on the fly when pages are served, e.g. date and time, other static file contents, etc. * SAPI Server Application Programming Interface: **
ISAPI The Internet Server Application Programming Interface (ISAPI) is an N-tier API of Internet Information Services (IIS), Microsoft's collection of Windows-based web server services. The most prominent application of IIS and ISAPI is Microsoft's web ...
Internet Server Application Programming Interface ** NSAPI Netscape Server Application Programming Interface *
PSGI Plack is a Perl web application programming framework inspired by Rack for Ruby and WSGI for Python, and it is the project behind the PSGI specification used by other frameworks such as Catalyst and Dancer. Plack allows for testing of Perl we ...
Perl Web Server Gateway Interface *
WSGI The Web Server Gateway Interface (WSGI, pronounced ''whiskey'' or ) is a simple calling convention for web servers to forward requests to web applications or frameworks written in the Python programming language. The current version of WSGI, ve ...
Python Web Server Gateway Interface *
Rack Rack or racks may refer to: Storage and installation * Amp rack, short for amplifier rack, a piece of furniture in which amplifiers are mounted * Bicycle rack, a frame for storing bicycles when not in use * Bustle rack, a type of storage bin ...
Rack Web Server Gateway Interface *
JSGI JSGI, or JavaScript Gateway Interface, is an interface between web servers and JavaScript-based web applications and frameworks. It was inspired by the Rack for Ruby and WSGI for Python and was one of the inspirations of PSGI for Perl. is a re ...
JavaScript Web Server Gateway Interface *
Java Servlet A Jakarta Servlet (formerly Java Servlet) is a Java software component that extends the capabilities of a server. Although servlets can respond to many types of requests, they most commonly implement web containers for hosting web application ...
,
JavaServer Pages Jakarta Server Pages (JSP; formerly JavaServer Pages) is a collection of technologies that helps software developers create dynamically generated web pages based on HTML, XML, SOAP, or other document types. Released in 1999 by Sun Microsystems, ...
*
Active Server Pages Active Server Pages (ASP) is Microsoft's first server-side scripting language and engine for dynamic web pages. It was first released in December 1996, before being superseded in January 2002 by ASP.NET. History Initially released as an ad ...
,
ASP.NET ASP.NET is an open-source, server-side web-application framework designed for web development to produce dynamic web pages. It was developed by Microsoft to allow programmers to build dynamic web sites, applications and services. The name sta ...


References


External links


Mozilla: what is a web server?

Netcraft: news about web server survey
{{DEFAULTSORT:Web Server Servers (computing) Web server software Website management English inventions