HOME

TheInfoList



OR:

Content sniffing, also known as media type sniffing or
MIME Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message ...
sniffing, is the practice of inspecting the content of a
byte stream A bitstream (or bit stream), also known as binary sequence, is a sequence of bits. A bytestream is a sequence of bytes. Typically, each byte is an 8-bit quantity, and so the term octet stream is sometimes used interchangeably. An octet may ...
to attempt to deduce the
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file formats ...
of the data within it. Content sniffing is generally used to compensate for a lack of accurate
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
that would otherwise be required to enable the file to be interpreted correctly. Content sniffing techniques tend to use a mixture of techniques that rely on the redundancy found in most file formats: looking for
file signature {{short description, Data used to identify or verify the content of a file In computing, a file signature is data used to identify or verify the contents of a file. In particular, it may refer to: * File magic number: bytes within a file used to ...
s and magic numbers, and
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
s including searching for well-known representative substrings, the use of byte frequency and ''n''-gram tables, and
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, a ...
. MIME (Multipurpose Internet Mail Extensions) sniffing was, and still is, used by some
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used on ...
s, including notably
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...
's
Internet Explorer Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical web browsers developed by Microsoft which was used in the Windows line of operating systems ( ...
, in an attempt to help web sites which do not correctly signal the MIME type of web content display. However, doing this opens up a serious
security vulnerability Vulnerabilities are flaws in a computer system that weaken the overall security of the device/system. Vulnerabilities can be weaknesses in either the hardware itself, or the software that runs on the hardware. Vulnerabilities can be exploited by ...
, in which, by confusing the MIME sniffing algorithm, the browser can be manipulated into interpreting data in a way that allows an attacker to carry out operations that are not expected by either the site operator or user, such as
cross-site scripting Cross-site scripting (XSS) is a type of security vulnerability that can be found in some web applications. XSS attacks enable attackers to inject client-side scripts into web pages viewed by other users. A cross-site scripting vulnerability may ...
. Moreover, by making sites which do not correctly assign MIME types to content appear to work correctly in those browsers, it fails to encourage the correct labeling of material, which in turn makes content sniffing necessary for these sites to work, creating a vicious circle of incompatibility with web standards and security best practices. A specification exists for media type sniffing in
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
, which attempts to balance the requirements of security with the need for reverse compatibility with web content with missing or incorrect MIME-type data. It attempts to provide a precise specification that can be used across implementations to implement a single well-defined and deterministic set of behaviors. The UNIX command can be viewed as a content sniffing application.


Charset sniffing

Numerous web browsers use a more limited form of content sniffing to attempt to determine the
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
of text files for which the MIME type is already known. This technique is known as charset sniffing or
codepage In computing, a code page is a character encoding and as such it is a specific association of a set of printable characters and control characters with unique numbers. Typically each number represents the binary value in a single byte. (In some co ...
sniffing and, for certain encodings, may be used to bypass security restrictions too. For instance,
Internet Explorer 7 Windows Internet Explorer 7 (IE7) (codenamed Rincon) is a web browser for Windows. It was released by Microsoft on October 18, 2006, as the seventh version of Internet Explorer and the successor to Internet Explorer 6. Internet Explorer 7 is pa ...
may be tricked to run
JScript JScript is Microsoft's legacy dialect of the ECMAScript standard that is used in Microsoft's Internet Explorer 11 and older. JScript is implemented as an Active Scripting engine. This means that it can be "plugged in" to OLE Automation applic ...
in circumvention of its policy by allowing the browser to guess that an
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaSc ...
-file was encoded in
UTF-7 UTF-7 (7- bit Unicode Transformation Format) is an obsolete variable-length character encoding for representing Unicode text using a stream of ASCII characters. It was originally intended to provide a means of encoding Unicode text for use in In ...
. This bug is worsened by the feature of the UTF-7 encoding which permits multiple encodings of the same text and, specifically, alternative representations of
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
characters. Most encodings do not allow evasive presentations of ASCII characters, so charset sniffing is less dangerous in general because, due to the historical accident of the ASCII-centric nature of scripting and markup languages, characters outside the ASCII repertoire are more difficult to use to circumvent security boundaries, and mis-interpretations of character sets tend to produce results no worse than the display of
mojibake Mojibake ( ja, 文字化け; , "character transformation") is the garbled text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, oft ...
.


See also

* Browser sniffing
X-Content-Type-Options header


References


External links


MIME Sniffing Standard
* * * {{cite web, url=http://deletethis.net/dave/?q=mime-sniffing, title=Mime-sniffing, author=David Risney, access-date=2012-07-14 Heuristics * Web technology Web security exploits