Content sniffing
   HOME

TheInfoList



OR:

Content sniffing, also known as media type sniffing or MIME sniffing, is the practice of inspecting the content of a byte stream to attempt to deduce the
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file format ...
of the data within it. Content sniffing is generally used to compensate for a lack of accurate
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
that would otherwise be required to enable the file to be interpreted correctly. Content sniffing techniques tend to use a mixture of techniques that rely on the redundancy found in most file formats: looking for
file signature {{short description, Data used to identify or verify the content of a file In computing, a file signature is data used to identify or verify the contents of a file. In particular, it may refer to: * File magic number: bytes within a file used to ...
s and magic numbers, and
heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate ...
s including searching for well-known representative substrings, the use of byte frequency and ''n''-gram tables, and
Bayesian inference Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Bayesian inference is an important technique in statistics, and ...
. MIME (Multipurpose Internet Mail Extensions) sniffing was, and still is, used by some
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
s, including notably
Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washi ...
's
Internet Explorer Internet Explorer (formerly Microsoft Internet Explorer and Windows Internet Explorer, commonly abbreviated IE or MSIE) is a series of graphical web browsers developed by Microsoft which was used in the Windows line of operating systems (in ...
, in an attempt to help web sites which do not correctly signal the
MIME type A media type (also known as a MIME type) is a two-part identifier for file formats and format contents transmitted on the Internet. The Internet Assigned Numbers Authority (IANA) is the official authority for the standardization and publication o ...
of web content display. However, doing this opens up a serious security vulnerability, in which, by confusing the MIME sniffing algorithm, the browser can be manipulated into interpreting data in a way that allows an attacker to carry out operations that are not expected by either the site operator or user, such as cross-site scripting. Moreover, by making sites which do not correctly assign MIME types to content appear to work correctly in those browsers, it fails to encourage the correct labeling of material, which in turn makes content sniffing necessary for these sites to work, creating a vicious circle of incompatibility with web standards and security best practices. A specification exists for media type sniffing in HTML5, which attempts to balance the requirements of security with the need for reverse compatibility with web content with missing or incorrect MIME-type data. It attempts to provide a precise specification that can be used across implementations to implement a single well-defined and deterministic set of behaviors. The UNIX command can be viewed as a content sniffing application.


Charset sniffing

Numerous web browsers use a more limited form of content sniffing to attempt to determine the
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
of text files for which the MIME type is already known. This technique is known as charset sniffing or codepage sniffing and, for certain encodings, may be used to bypass security restrictions too. For instance, Internet Explorer 7 may be tricked to run
JScript JScript is Microsoft's legacy dialect of the ECMAScript standard that is used in Microsoft's Internet Explorer 11 and older. JScript is implemented as an Active Scripting engine. This means that it can be "plugged in" to OLE Automation applica ...
in circumvention of its policy by allowing the browser to guess that an
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
-file was encoded in UTF-7. This bug is worsened by the feature of the UTF-7 encoding which permits multiple encodings of the same text and, specifically, alternative representations of
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters. Most encodings do not allow evasive presentations of ASCII characters, so charset sniffing is less dangerous in general because, due to the historical accident of the ASCII-centric nature of scripting and markup languages, characters outside the ASCII repertoire are more difficult to use to circumvent security boundaries, and mis-interpretations of character sets tend to produce results no worse than the display of mojibake.


See also

*
Browser sniffing Browser sniffing (also known as browser detection) is a set of techniques used in websites and web applications in order to determine the web browser a visitor is using, and to serve browser-appropriate content to the visitor. It is also used to de ...

X-Content-Type-Options header


References


External links


MIME Sniffing Standard
* * * {{cite web, url=http://deletethis.net/dave/?q=mime-sniffing, title=Mime-sniffing, author=David Risney, access-date=2012-07-14 Heuristics * Web technology Web security exploits