HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, the thundering herd problem occurs when a large number of processes or threads waiting for an event are awoken when that event occurs, but only one process is able to handle the event. When the processes wake up, they will each try to handle the event, but only one will win. All processes will compete for resources, possibly freezing the computer, until the herd is calmed down again.


Mitigation

The Linux kernel serializes responses for requests to a single file descriptor, so only one thread or process is woken up. For epoll() in version 4.5 of the Linux kernel, the EPOLLEXCLUSIVE flag was added. Thus several epoll sets (different threads or different processes) may wait on the same resource and only one set will be woken up. For certain workloads this flag can give significant processing time reduction. Similarly in Microsoft Windows, I/O completion ports can mitigate the thundering herd problem, as they can be configured such that only one of the threads waiting on the completion port is woken up when an event occurs.{{Cite web, url=https://xania.org/200807/iocp, title=IO Completion Ports — Matt Godbolt's blog , website=xania.org, access-date=2019-01-23 In systems that rely on a backoff mechanism (e.g.
exponential backoff Exponential backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process, in order to gradually find an acceptable rate. These algorithms find usage in a wide range of systems and processes, with radio network ...
), the clients will retry failed calls by waiting a specific amount of time between consecutive retries. In order to avoid the thundering herd problem,
jitter In electronics and telecommunications, jitter is the deviation from true periodicity of a presumably periodic signal, often in relation to a reference clock signal. In clock recovery applications it is called timing jitter. Jitter is a significa ...
can be purposefully introduced in order to break the synchronization across the clients, thereby avoiding collisions. In this approach, randomness is added to the wait intervals between retries, so that clients are no longer synchronized.


See also

*
Process management (computing) A process is a program in execution, and an integral part of any modern-day operating system (OS). The OS must allocate resources to processes, enable processes to share and exchange information, protect the resources of each process from other pro ...
*
Lock convoy In computer science, a lock convoy is a performance problem that can occur when using locks for concurrency control in a multithreaded application. A lock convoy occurs when multiple threads of equal priority contend repeatedly for the same loc ...
*
Sleeping barber problem In computer science, the sleeping barber problem is a classic inter-process communication and synchronization problem that illustrates the complexities that arise when there are multiple operating system processes. The problem was originally pro ...
*
TCP global synchronization TCP global synchronization in computer networks can happen to TCP/ IP flows during periods of congestion because each sender will reduce their transmission rate at the same time when packet loss occurs. Routers on the Internet normally have packe ...
*
Cache stampede A cache stampede is a type of cascading failure that can occur when massively parallel computing systems with caching mechanisms come under very high load. This behaviour is sometimes also called dog-piling. To understand how cache stampedes occ ...


References


External links


A discussion of this observation on Linux

Better Retries with Exponential Backoff and Jitter
Concurrency control