HTTP Compression
   HOME

TheInfoList



OR:

HTTP compression is a capability that can be built into
web server A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiate ...
s and
web client A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s to improve transfer speed and bandwidth utilization. HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are supported to the server before downloading the correct format; browsers that do not support compliant compression method will download uncompressed data. The most common compression schemes include
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and in ...
and
Brotli Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and ...
; a full list of available schemes is maintained by the IANA. There are two different ways compression can be done in HTTP. At a lower level, a Transfer-Encoding header field may indicate the payload of an HTTP message is compressed. At a higher level, a Content-Encoding header field may indicate that a resource being transferred, cached, or otherwise referenced is compressed. Compression using Content-Encoding is more widely supported than Transfer-Encoding, and some browsers do not advertise support for Transfer-Encoding compression to avoid triggering bugs in servers.


Compression scheme negotiation

The negotiation is done in two steps, described in RFC 2616: 1. The
web client A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
advertises which compression schemes it supports by including a list of tokens in the
HTTP request The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
. For ''Content-Encoding'', the list is in a field called ''Accept-Encoding''; for ''Transfer-Encoding'', the field is called ''TE''. GET /encrypted-area HTTP/1.1 Host: www.example.com Accept-Encoding: gzip, deflate 2. If the server supports one or more compression schemes, the outgoing data may be compressed by one or more methods supported by both parties. If this is the case, the server will add a ''Content-Encoding'' or ''Transfer-Encoding'' field in the HTTP response with the used schemes, separated by commas. HTTP/1.1 200 OK Date: mon, 26 June 2016 22:38:34 GMT Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux) Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT Accept-Ranges: bytes Content-Length: 438 Connection: close Content-Type: text/html; charset=UTF-8 Content-Encoding: gzip The
web server A web server is computer software and underlying hardware that accepts requests via HTTP (the network protocol created to distribute web content) or its secure variant HTTPS. A user agent, commonly a web browser or web crawler, initiate ...
is by no means obligated to use any compression method â€“ this depends on the internal settings of the web server and also may depend on the internal architecture of the website in question.


Content-Encoding tokens

The official list of tokens available to servers and client is maintained by IANA, and it includes: *br –
Brotli Brotli is a lossless data compression algorithm developed by Google. It uses a combination of the general-purpose LZ77 lossless compression algorithm, Huffman coding and 2nd-order context modelling. Brotli is primarily used by web servers and ...
, a compression algorithm specifically designed for HTTP content encoding, defined in RFC 7932 and implemented in all modern major browsers. *
compress compress is a Unix shell compression program based on the LZW compression algorithm. Compared to more modern compression utilities such as gzip and bzip2, compress performs faster and with less memory usage, at the cost of a significantly lo ...
– UNIX "compress" program method (historic; deprecated in most applications and replaced by gzip or deflate) *deflate – compression based on the deflate algorithm (described in RFC 1951), a combination of the
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations includin ...
algorithm and Huffman coding, wrapped inside the
zlib zlib ( or "zeta-lib", ) is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also ...
data format (RFC 1950); *exi – W3C
Efficient XML Interchange Efficient XML Interchange (EXI) is a binary XML format for exchange of data on a computer network. It was developed by the W3C's Efficient Extensible Interchange Working Group and is one of the most prominent efforts to encode XML documents in a b ...
*
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and in ...
– GNU zip format (described in RFC 1952). Uses the deflate algorithm for compression, but the data format and the checksum algorithm differ from the "deflate" content-encoding. This method is the most broadly supported as of March 2011. *
identity Identity may refer to: * Identity document * Identity (philosophy) * Identity (social science) * Identity (mathematics) Arts and entertainment Film and television * ''Identity'' (1987 film), an Iranian film * ''Identity'' (2003 film), ...
– No transformation is used. This is the default value for content coding. * pack200-gzip – Network Transfer Format for Java Archives *
zstd Zstandard, commonly known by the name of its reference implementation zstd, is a lossless data compression algorithm developed by Yann Collet at Facebook. ''Zstd'' is the reference implementation in C. Version 1 of this implementation was re ...
– Zstandard compression, defined in RFC 8478 In addition to these, a number of unofficial or non-standardized tokens are used in the wild by either servers or clients: * bzip2 – compression based on the free bzip2 format, supported by
lighttpd lighttpd (pronounced "lighty") is an open-source web server optimized for speed-critical environments while remaining standards-compliant, secure and flexible. It was originally written by Jan Kneschke as a proof-of-concept of the c10k problem †...
* lzma – compression based on (raw) LZMA is available in Opera 20, and in elinks via a compile-time option *peerdist – Microsoft Peer Content Caching and Retrieval *
rsync rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files. It is commonly found on Unix-like operat ...
- delta encoding in HTTP, implemented by a pair of ''rproxy'' proxies. *xpress - Microsoft compression protocol used by Windows 8 and later for Windows Store application updates.
LZ77 LZ77 and LZ78 are the two lossless data compression algorithms published in papers by Abraham Lempel and Jacob Ziv in 1977 and 1978. They are also known as LZ1 and LZ2 respectively. These two algorithms form the basis for many variations includin ...
-based compression optionally using a Huffman encoding. * xz - LZMA2-based content compression, supported by a non-official Firefox patch; and fully implemented in mget since 2013-12-31.


Servers that support HTTP compression

*
SAP NetWeaver SAP NetWeaver is a software stack for many of SAP SE's applications. The SAP NetWeaver Application Server, sometimes referred to as WebAS, is the runtime environment for the SAP applications and all of the mySAP Business Suite runs on SAP WebAS: s ...
*
Microsoft IIS Internet Information Services (IIS-pronounced 2S, formerly Internet Information Server) is an extensible web server software created by Microsoft for use with the Windows NT family. IIS supports HTTP, HTTP/2, HTTPS, FTP, FTPS, SMTP and NNTP. ...
: built-in or using third-party module *
Apache HTTP Server The Apache HTTP Server ( ) is a free and open-source cross-platform web server software, released under the terms of Apache License 2.0. Apache is developed and maintained by an open community of developers under the auspices of the Apache So ...
, via '
mod_deflate
'' (despite its name, only supporting gzip), and '

'' * Hiawatha HTTP server: serves pre-compressed files *
Cherokee HTTP server Cherokee is an open-source cross-platform web server that runs on Linux, BSD, BSD variants, Solaris (operating system), Solaris, , and Windows. It is a lightweight, high-performance web server/reverse proxy licensed under the GNU General Public ...
, On the fly gzip and deflate compressions *
Oracle iPlanet Web Server Oracle iPlanet Web Server (OiWS) is a web server designed for medium and large business applications. Previous versions were marketed as Netscape Enterprise Server, iPlanet Web Server, Sun ONE Web Server, and Sun Java System Web Server. Oracle ...
*
Zeus Web Server Zeus Web Server is a discontinued proprietary high-performance web server for Unix and Unix-like platforms (including Solaris, FreeBSD, HP-UX and Linux). It was developed by Zeus Technology, a software company located in Cambridge, England that ...
*
lighttpd lighttpd (pronounced "lighty") is an open-source web server optimized for speed-critical environments while remaining standards-compliant, secure and flexible. It was originally written by Jan Kneschke as a proof-of-concept of the c10k problem †...
*
nginx Nginx (pronounced "engine x" ) is a web server that can also be used as a reverse proxy, load balancer, mail proxy and HTTP cache. The software was created by Igor Sysoev and publicly released in 2004. Nginx is free and open-source software ...
– built-in *Applications based on
Tornado A tornado is a violently rotating column of air that is in contact with both the surface of the Earth and a cumulonimbus cloud or, in rare cases, the base of a cumulus cloud. It is often referred to as a twister, whirlwind or cyclone, altho ...
, if "compress_response" is set to True in the application settings (for versions prior to 4.0, set "gzip" to True) * Jetty Server – built-into default static content serving and available via servlet filter configurations *
GeoServer In computing, GeoServer is an open-source server written in Java that allows users to share, process and edit geospatial data. Designed for interoperability, it publishes data from any major spatial data source using open standards. GeoServer h ...
*
Apache Tomcat Apache Tomcat (called "Tomcat" for short) is a free and open-source implementation of the Jakarta Servlet, Jakarta Expression Language, and WebSocket technologies. It provides a "pure Java" HTTP web server environment in which Java code can also ...
*
IBM Websphere IBM WebSphere refers to a brand of proprietary computer software products in the genre of enterprise software known as "application and integration middleware". These software products are used by end-users to create and integrate applications wi ...
*
AOLserver AOLserver is AOL's Open-source software, open source web server. AOLserver is Thread_(computing)#Single-threaded_vs_multithreaded_programs, multithreaded, Tcl-enabled, and used for large scale, dynamic web sites. AOLserver is distributed under ...
*
Ruby A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sa ...
Rack Rack or racks may refer to: Storage and installation * Amp rack, short for amplifier rack, a piece of furniture in which amplifiers are mounted * Bicycle rack, a frame for storing bicycles when not in use * Bustle rack, a type of storage bi ...
, via the Rack::Deflater middleware *
HAProxy HAProxy is a free and open source software that provides a high availability load balancer and reverse proxy for TCP and HTTP-based applications that spreads requests across multiple servers. It is written in C and has a reputation for being fa ...
*
Varnish Varnish is a clear transparent hard protective coating or film. It is not a stain. It usually has a yellowish shade from the manufacturing process and materials used, but it may also be pigmented as desired, and is sold commercially in various ...
– built-in. Works also with ESI
Armeria
– Serving pre-compressed files *
NaviServer NaviServer is a high performance web server written in C (programming language), C and Tcl. It can be easily extended in either language to create web sites and services; there are over 35 modules available (including database integration or prot ...
- built-in, dynamic and static compression Many
content delivery network A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. The goal is to provide high availability and performance by distributing the service spatially re ...
s also implement HTTP compression to improve speedy delivery of resources to end users. The compression in HTTP can also be achieved by using the functionality of
server-side scripting Server-side scripting is a technique used in web development which involves employing scripts on a web server which produces a response customized for each user's (client's) request to the website. The alternative is for the web server itself ...
languages like
PHP PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
, or programming languages like
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
. Various online tools exist to verify a working implementation of HTTP compression. These online tools usually request multiple variants of a URL, each with different request headers (with varying Accept-Encoding content). HTTP compression is considered to be implemented correctly when the server returns a document in a compressed format. By comparing the sizes of the returned documents, the effective compression ratio can be calculated (even between different compression algorithms).


Problems preventing the use of HTTP compression

A 2009 article by Google engineers Arvind Jain and Jason Glasgow states that more than 99 person-years are wasted daily due to increase in page load time when users do not receive compressed content. This occurs when anti-virus software interferes with connections to force them to be uncompressed, where proxies are used (with overcautious web browsers), where servers are misconfigured, and where browser bugs stop compression being used. Internet Explorer 6, which drops to HTTP 1.0 (without features like compression or pipelining) when behind a proxy â€“ a common configuration in corporate environments â€“ was the mainstream browser most prone to failing back to uncompressed HTTP. Another problem found while deploying HTTP compression on large scale is due to the deflate encoding definition: while HTTP 1.1 defines the deflate encoding as data compressed with deflate (RFC 1951) inside a
zlib zlib ( or "zeta-lib", ) is a software library used for data compression. zlib was written by Jean-loup Gailly and Mark Adler and is an abstraction of the DEFLATE compression algorithm used in their gzip file compression program. zlib is also ...
formatted stream (RFC 1950), Microsoft server and client products historically implemented it as a "raw" deflated stream, making its deployment unreliable. For this reason, some software, including the Apache HTTP Server, only implement gzip encoding.


Security implications

Compression allows a form of
chosen plaintext A chosen-plaintext attack (CPA) is an attack model for cryptanalysis which presumes that the attacker can obtain the ciphertexts for arbitrary plaintexts.Ross Anderson, ''Security Engineering: A Guide to Building Dependable Distributed Systems'' ...
attack to be performed: if an attacker can inject any chosen content into the page, they can know whether the page contains their given content by observing the size increase of the encrypted stream. If the increase is smaller than expected for random injections, it means that the compressor has found a repeat in the text, i.e. the injected content overlaps the secret information. This is the idea behind CRIME. In 2012, a general attack against the use of data compression, called
CRIME In ordinary language, a crime is an unlawful act punishable by a State (polity), state or other authority. The term ''crime'' does not, in modern criminal law, have any simple and universally accepted definition,Farmer, Lindsay: "Crime, definit ...
, was announced. While the CRIME attack could work effectively against a large number of protocols, including but not limited to TLS, and application-layer protocols such as SPDY or HTTP, only exploits against TLS and SPDY were demonstrated and largely mitigated in browsers and servers. The CRIME exploit against HTTP compression has not been mitigated at all, even though the authors of CRIME have warned that this vulnerability might be even more widespread than SPDY and TLS compression combined. In 2013, a new instance of the CRIME attack against HTTP compression, dubbed BREACH, was published. A BREACH attack can extract login tokens, email addresses or other sensitive information from TLS encrypted web traffic in as little as 30 seconds (depending on the number of bytes to be extracted), provided the attacker tricks the victim into visiting a malicious web link. All versions of TLS and SSL are at risk from BREACH regardless of the encryption algorithm or cipher used. Unlike previous instances of
CRIME In ordinary language, a crime is an unlawful act punishable by a State (polity), state or other authority. The term ''crime'' does not, in modern criminal law, have any simple and universally accepted definition,Farmer, Lindsay: "Crime, definit ...
, which can be successfully defended against by turning off TLS compression or SPDY header compression, BREACH exploits HTTP compression which cannot realistically be turned off, as virtually all web servers rely upon it to improve data transmission speeds for users. As of 2016, the TIME attack and the HEIST attack are now public knowledge.


References


External links

*RFC 2616: Hypertext Transfer Protocol – HTTP/1.1
HTTP Content-Coding Values
by Internet Assigned Numbers Authority
Compression with lighttpd
*
Using HTTP Compression
{{Webarchive, url=https://web.archive.org/web/20160314155152/http://www.serverwatch.com/tutorials/article.php/3514866 , date=2016-03-14 by Martin Brown of Server Watch
Using HTTP Compression in PHPDynamic and static HTTP compression with Apache httpd
Web development Lossless compression algorithms Hypertext Transfer Protocol de:Hypertext Transfer Protocol#HTTP-Kompression