ReDoS
   HOME

TheInfoList



OR:

A regular expression denial of service (ReDoS) is an algorithmic complexity attack that produces a
denial-of-service In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host conne ...
by providing a
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
and/or an input that takes a long time to evaluate. The attack exploits the fact that many regular expression implementations have super-linear
worst-case complexity In computer science (specifically computational complexity theory), the worst-case complexity measures the System resource, resources (e.g. running time, Computer memory, memory) that an algorithm requires given an input of arbitrary size (commonl ...
; on certain regex-input pairs, the time taken can grow polynomially or exponentially in relation to the input size. An attacker can thus cause a program to spend substantial time by providing a specially crafted regular expression and/or input. The program will then slow down or become unresponsive.


Description

Regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
("regex") matching can be done by building a
finite-state automaton A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...
. Regex can be easily converted to nondeterministic automata (NFAs), in which for each state and input symbol, there may be several possible next states. After building the automaton, several possibilities exist: * the engine may convert it to a deterministic finite-state automaton (DFA) and run the input through the result; * the engine may try one by one all the possible paths until a match is found or all the paths are tried and fail ("backtracking"). * the engine may consider all possible paths through the nondeterministic automaton in parallel; * the engine may convert the nondeterministic automaton to a DFA lazily (''i.e.'', on the fly, during the match). Of the above algorithms, the first two are problematic. The first is problematic because a deterministic automaton could have up to 2^m states where m is the number of states in the nondeterministic automaton; thus, the conversion from NFA to DFA may take
exponential time In computer science, the time complexity is the computational complexity that describes the amount of computer time it takes to run an algorithm. Time complexity is commonly estimated by counting the number of elementary operations performed by t ...
. The second is problematic because a nondeterministic automaton could have an exponential number of paths of length n, so that walking through an input of length n will also take exponential time. The last two algorithms, however, do not exhibit pathological behavior. Note that for non-pathological regular expressions, the problematic algorithms are usually fast, and in practice, one can expect them to " compile" a regex in O(m) time and match it in O(n) time; instead, simulation of an NFA and lazy computation of the DFA have O(m2n) worst-case complexity. Regex denial of service occurs when these expectations are applied to a regex provided by the user, and malicious regular expressions provided by the user trigger the worst-case complexity of the regex matcher. While regex algorithms can be written in an efficient way, most regex engines in existence extend the regex languages with additional constructs that cannot always be solved efficiently. Such extended patterns essentially force the implementation of regex in most
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s to use backtracking.


Examples


Exponential backtracking

The most severe type of problem happens with backtracking regular expression matches, where some patterns have a runtime that is exponential in the length of the input string. For strings of n characters, the runtime is O(2^n). This happens when a regular expression has three properties: * the regular expression applies repetition (+, *) to a subexpression; * the subexpression can match the same input in multiple ways, or the subexpression can match an input string which is a prefix of a longer possible match; * and after the repeated subexpression, there is an expression that matches something which the subexpression does not match. The second condition is best explained with two examples: * in (a, a)+$, repetition is applied to the subexpression a, a, which can match a in two ways on each side of the alternation. * in (a+)*$, repetition is applied to the subexpression a+, which can match a or aa, etc. In both of these examples we used $ to match the end of the string, satisfying the third condition, but it is also possible to use another character for this. For example (a, aa)*c has the same problematic structure. All three of the above regular expressions will exhibit exponential runtime when applied to strings of the form a...a!. For example, if you try to match them against aaaaaaaaaaaaaaaaaaaaaaaa! on a backtracking expression engine, it will take a significantly long time to complete, and the runtime will approximately double for each extra a before the !. It is also possible to have backtracking which is polynomial time O(n^x), instead of exponential. This can also cause problems for long enough inputs, though less attention has been paid to this problem as malicious input must be much longer to have a significant effect. An example of such a pattern is "a*b?a*x", when the input is an arbitrarily long sequence of "a"s.


Vulnerable regexes in online repositories

So-called "evil" or vulnerable regexes have been found in online regular expression repositories. Note that it is enough to find a vulnerable ''sub''expression in order to attack the full regex:
RegExLib, id=1757 (email validation)
- see part
^( -zA-Z0-9(@) -z0-9 ( -z, ( -z.] -z)$
OWASP Validation Regex Repository
Java Classname - see part
^ -Z -z+$ These two examples are also susceptible to the input aaaaaaaaaaaaaaaaaaaaaaaa!.


Attacks

If the regex itself is affected by user input, such as a web service permitting clients to provide a search pattern, then an attacker can inject a malicious regex to consume the server's resources. Therefore, in most cases, regular expression denial of service can be avoided by removing the possibility for the user to execute arbitrary patterns on the server. In this case, web applications and databases are the main vulnerable applications. Alternatively, a malicious page could hang the user's web browser or cause it to use arbitrary amounts of memory. However, if a vulnerable regex exists on the server-side already, then an attacker may instead be able to provide an input that triggers its worst-case behavior. In this case, e-mail scanners and
intrusion detection system An intrusion detection system (IDS; also intrusion prevention system or IPS) is a device or software application that monitors a network or systems for malicious activity or policy violations. Any intrusion activity or violation is typically rep ...
s could also be vulnerable. Fortunately, in most cases, the problematic regular expressions can be rewritten as "non-evil" patterns. For example, (.*a)+ can be rewritten to ( aa)+. In the case of a web application, the programmer may use the same regular expression to validate input on both the client and the server side of the system. An attacker could inspect the client code, looking for evil regular expressions, and send crafted input directly to the web server in order to hang it.


See also

*
Denial-of-service attack In computing, a denial-of-service attack (DoS attack) is a cyber-attack in which the perpetrator seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely disrupting services of a host conn ...
* Cyberwarfare * Low Orbit Ion Cannon *
High Orbit Ion Cannon High Orbit Ion Cannon (HOIC) is an open-source network stress testing and denial-of-service attack application designed to attack as many as 256 URLs at the same time. It was designed to replace the Low Orbit Ion Cannon which was developed by ...


References

{{Reflist


External links

* Examples of ReDoS in open source applications: *
ReDoS in DataVault
*
ReDoS in EntLib
*
ReDoS in NASD CORE.NET Terelik
* Some benchmarks for ReDoS ** Achim Hoffman (2010).
ReDoS - benchmark for regular expression DoS in JavaScript
. Retrieved 2010-04-19. ** Richard M. Smith (2010).
Regular expression denial of service (ReDoS) attack test results
. Retrieved 2010-04-19. * Paper on implementing regular expressions not vulnerable to certain classes of ReDoS ** Russ Cox (2007).

. Retrieved 2011-04-20. * Tools for detecting ReDoS vulnerabilities ** H. Thielecke, A. Rathnayake (2013).
Regular expression denial of service (ReDoS) static analysis
. Retrieved 2013-05-30. ** B. van der Merwe, N Weideman (2017).
Regex Static Analysis
. Retrieved 2017-08-12. Algorithmic complexity attacks Denial-of-service attacks Pattern matching Regular expressions