The data URI scheme is a uniform resource identifier (URI) scheme that provides a way to include data in-line in Web pages as if they were external resources. It is a form of file literal or
here document
In computing, a here document (here-document, here-text, heredoc, hereis, here-string or here-script) is a file literal or input stream literal: it is a section of a source code file that is treated as if it were a separate file. The term is also ...
. This technique allows normally separate elements such as images and style sheets to be fetched in a single Hypertext Transfer Protocol (HTTP) request, which may be more efficient than multiple HTTP requests, and used by several browser extensions to package images as well as other multimedia contents in a single HTML file for page saving. , data URIs are fully supported by most major browsers, and partially supported in
Internet Explorer
Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical user interface, graphical web browsers developed by Microsoft which was used in the Microsoft Wind ...
.
Syntax
The syntax of data URIs is defined in Request for Comments (RFC) 2397, published in August 1998, and follows the URI scheme syntax. A data URI consists of:
data: '<media type>'';base64],''<data>''
* The scheme, data. It is followed by a colon (:).
* An optional media type. The media type part may include one or more parameters, in the format attribute=value, separated by semicolons (;) . A common media type parameter is charset, specifying the character set of the media type, where the value is from the IANA list of
character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
names. If one is not specified, the
media type
A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication o ...
of the data URI is assumed to be text/plain;charset=US-ASCII.
* An optional base64 extension base64, separated from the preceding part by a semicolon. When present, this indicates that the data content of the URI is
binary data
Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra.
Binary data occurs in many different technical and scientific fields, wher ...
, encoded in ASCII format using the
Base64
In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits.
Common to all bina ...
scheme for
binary-to-text encoding
A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary dat ...
. The base64 extension is distinguished from any media type parameters by virtue of not having a =value component and by coming after any media type parameters. Since Base64 encoded data is approximately 33% larger than original data, it is recommended to use Base64 data URIs only if the server supports
HTTP compression
HTTP compression is a capability that can be built into web servers and web clients to improve transfer speed and bandwidth utilization.
HTTP data is compressed before it is sent from the server: compliant browsers will announce what methods are ...
or embedded files are smaller than 1KB.
* The data, separated from the preceding part by a comma (,). The data is a sequence of zero or more
octets
Octet may refer to:
Music
* Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble
** String octet, a piece of music written for eight string instruments
*** Octet (Mendelssohn), 1825 compos ...
represented as characters. The comma is required in a data URI, even when the data part has zero length. The characters permitted within the data part include ASCII upper and lowercase letters, digits, and many ASCII punctuation and special characters. Note that this may include characters, such as colon, semicolon, and comma which are delimiters in the URI components preceding the data part. Other octets must be
percent-encoded
Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI. Although it is known as ''URL encoding'', it is also used m ...
. If the data is Base64-encoded, then the data part may contain only valid Base64 characters. Note that Base64-encoded data: URIs use the standard Base64 character set (with '+' and '/' as characters 62 and 63) rather than the so-called " URL-safe Base64" character set.
Examples of data URIs showing most of the features are:
:data:text/vnd-example+xyz;foo=bar;base64,R0lGODdh
:data:text/plain;charset=UTF-8;page=21,the%20data:1234,5678 (outputs: "the data:1234,5678")
The minimal data URI is data:,, consisting of the
scheme, no media-type, and zero-length data.
Thus, within the overall URI syntax, a data URI consists of a scheme and a path, with no authority part, query string, or fragment. The optional media type, the optional base64 indicator, and the data are all parts of the
URI path.
Examples of use
HTML
An
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
fragment embedding a picture of a small red dot:
In this example, the lines are broken for formatting purposes. In actual URIs,
including data URIs, control characters (ASCII 0 to 31, and 127) and spaces (ASCII 32) are "excluded characters". This means that
whitespace character
In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s are not permitted in data URIs. However, in the context of HTML 4 and HTML 5, linefeeds within an element attribute value (such as the "src" above) are ignored. So the data URI above would be processed ignoring the linefeeds, giving the correct result. But note that this is an HTML feature, not a data URI feature, and in other contexts, it is not possible to rely on whitespace within the URI being ignored.
CSS
A Cascading Style Sheets (CSS) rule that includes a background image:
ul.checklist li.complete
In this example, the \ + line terminators
are a feature of CSS, indicating continuation on the next line. These would be removed by the CSS stylesheet processor, and the data URI would be reconstituted without whitespace, making it correct, since whitespace is not allowed within the data component of a data:
URI.
JavaScript
A
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
statement that opens an embedded subwindow, as for a footnote link:
window.open('data:text/html;charset=utf-8,' +
encodeURIComponent( // Escape for URL formatting
''+
''+
'Embedded Window'+
'
42
'+
''
)
);
SVG
A
Scalable Vector Graphic
Scalable Vector Graphics (SVG) is an XML-based vector image format for defining two-dimensional graphics, having support for interactivity and animation. The SVG specification is an open standard developed by the World Wide Web Consortium sinc ...
image containing an embedded JPEG image encoded in Base64:
Malware and phishing
The data URI can be utilized to construct attack pages that attempt to obtain usernames and passwords from unsuspecting web users. It can also be used to get around
cross-site scripting
Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may ...
(XSS) restrictions, embedding the attack payload fully inside the address bar, and hosted via URL shortening services rather than needing a full website that is controlled by a third party. As a result, some browsers now block webpages from navigating to data URIs.