Billion Laughs
   HOME

TheInfoList



OR:

In
computer security Computer security, cybersecurity (cyber security), or information technology security (IT security) is the protection of computer systems and networks from attack by malicious actors that may result in unauthorized information disclosure, the ...
, a billion laughs attack is a type of denial-of-service (DoS) attack which is aimed at
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lati ...
s of
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
documents. It is also referred to as an XML bomb or as an exponential entity expansion attack.


Details

The example attack consists of defining 10 entities, each defined as consisting of 10 of the previous entity, with the document consisting of a single instance of the largest entity, which expands to one
billion Billion is a word for a large number, and it has two distinct definitions: *1,000,000,000, i.e. one thousand million, or (ten to the ninth power), as defined on the short scale. This is its only current meaning in English. * 1,000,000,000,000, i.e ...
copies of the first entity. In the most frequently cited example, the first entity is the string "
lol LOL, or lol, is an initialism for laughing out loud and a popular element of Internet slang. It was first used almost exclusively on Usenet, but has since become widespread in other forms of computer-mediated communication and even face-to ...
", hence the name "billion laughs". At the time this vulnerability was first reported, the
computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term ''primary storage ...
used by a billion instances of the string "lol" would likely exceed that available to the process parsing the XML. While the original form of the attack was aimed specifically at XML parsers, the term may be applicable to similar subjects as well. The problem was first reported as early as 2002, but began to be widely addressed in 2008. Defenses against this kind of attack include capping the memory allocated in an individual parser if loss of the document is acceptable, or treating entities symbolically and expanding them lazily only when (and to the extent) their content is to be used.


Code example

]> &lol9; When an XML parser loads this document, it sees that it includes one root element, "lolz", that contains the text "&lol9;". However, "&lol9;" is a defined entity that expands to a string containing ten "&lol8;" strings. Each "&lol8;" string is a defined entity that expands to ten "&lol7;" strings, and so on. After all the entity expansions have been processed, this small (< 1 KB) block of XML will actually contain 109 = a billion "lol"s, taking up almost 3
gigabyte The gigabyte () is a multiple of the unit byte for digital information. The prefix ''giga'' means 109 in the International System of Units (SI). Therefore, one gigabyte is one billion bytes. The unit symbol for the gigabyte is GB. This defini ...
s of memory.


Variations

The billion laughs attack described above can take an
exponential Exponential may refer to any of several mathematical topics related to exponentiation, including: *Exponential function, also: **Matrix exponential, the matrix analogue to the above * Exponential decay, decrease at a rate proportional to value *Exp ...
amount of space or time. The quadratic blowup variation causes
quadratic growth In mathematics, a function or sequence is said to exhibit quadratic growth when its values are proportional to the square of the function argument or sequence position. "Quadratic growth" often means more generally "quadratic growth in the limit" ...
in resource requirements by simply repeating a large entity over and over again, to avoid countermeasures that detect heavily nested entities. (See
computational complexity theory In theoretical computer science and mathematics, computational complexity theory focuses on classifying computational problems according to their resource usage, and relating these classes to each other. A computational problem is a task solved by ...
for comparisons of different growth classes.) A "billion laughs" attack could exist for any file format that can contain macro expansions, for example this
YAML YAML ( and ) (''see '') is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Exte ...
bomb: a: &a lol","lol","lol","lol","lol","lol","lol","lol","lol"b: &b a,*a,*a,*a,*a,*a,*a,*a,*ac: &c b,*b,*b,*b,*b,*b,*b,*b,*bd: &d c,*c,*c,*c,*c,*c,*c,*c,*ce: &e d,*d,*d,*d,*d,*d,*d,*d,*df: &f e,*e,*e,*e,*e,*e,*e,*e,*eg: &g f,*f,*f,*f,*f,*f,*f,*f,*fh: &h g,*g,*g,*g,*g,*g,*g,*g,*gi: &i h,*h,*h,*h,*h,*h,*h,*h,*h This crashed earlier versions of Go because the Go YAML processor (contrary to the YAML spec) expands references as if they were macros. The Go YAML processor was modified to fail parsing if the result object becomes too large. Enterprise software like
Kubernetes Kubernetes (, commonly stylized as K8s) is an open-source container orchestration system for automating software deployment, scaling, and management. Google originally designed Kubernetes, but the Cloud Native Computing Foundation now maintains ...
has been affected by this attack through its YAML parser. For this reason, either a parser with intentionally limited capabilities is preferred (like StrictYAML) or file formats that do not allow references are often preferred for data arriving from untrusted sources.


See also

*
Fork bomb In computing, a fork bomb (also called rabbit virus or wabbit) is a denial-of-service attack In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unav ...
: a similar method to exhaust a system's resources through
recursion Recursion (adjective: ''recursive'') occurs when a thing is defined in terms of itself or of its type. Recursion is used in a variety of disciplines ranging from linguistics to logic. The most common application of recursion is in mathematics ...
*
Zip bomb In computing, a zip bomb, also known as a decompression bomb or zip of death, is a malicious archive file designed to crash or render useless the program or system reading it. It is often employed to disable antivirus software, in order to creat ...
: a similar attack utilizing zip archives * XML external entity attack: an XML attack to return arbitrary server files *
Document type definition A document type definition (DTD) is a set of ''markup declarations'' that define a ''document type'' for an SGML-family markup language ( GML, SGML, XML, HTML). A DTD defines the valid building blocks of an XML document. It defines the document ...
: a template for validating XML files


References

{{Reflist Algorithmic complexity attacks Denial-of-service attacks XML