HOME

TheInfoList



OR:

In
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s, an interrupt storm is an event during which a processor receives an inordinate number of
interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted, ...
s that consume the majority of the processor's time. Interrupt storms are typically caused by hardware devices that do not support interrupt rate limiting.


Background

Because
interrupt In digital computers, an interrupt (sometimes referred to as a trap) is a request for the processor to ''interrupt'' currently executing code (when permitted), so that the event can be processed in a timely manner. If the request is accepted, ...
processing is typically a non- preemptible task in
time-sharing In computing, time-sharing is the sharing of a computing resource among many users at the same time by means of multiprogramming and multi-tasking.DEC Timesharing (1965), by Peter Clark, The DEC Professional, Volume 1, Number 1 Its emergence ...
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s, an interrupt storm will cause sluggish response to user input, or even appear to freeze the system completely. This state is commonly known as '' live lock''. In such a state, the system is spending most of its resources processing interrupts instead of completing other work. To the end-user, it does not appear to be processing anything at all as there is often no output. An interrupt storm is sometimes mistaken for thrashing, since they both have similar symptoms (unresponsive or sluggish response to user input, little or no output). Common causes include: misconfigured or faulty hardware, faulty device drivers, flaws in the operating system, or metastability in one or more components. The latter condition rarely occurs outside of prototype or amateur-built hardware. Most modern hardware and operating systems have methods for mitigating the effect of an interrupt storm. For example, most
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 1 ...
controllers implement interrupt "rate limiting", which causes the controller to wait a programmable amount of time between each interrupt it generates. When not present within the device, similar functionality is usually written into the device driver, and/or the operating system itself. The most common cause is when a device "behind" another signals an interrupt to an APIC (Advanced Programmable Interrupt Controller). Most computer peripherals generate interrupts through an APIC as the number of interrupts is most always less (typically 15 for the modern PC) than the number of devices. The OS must then query each driver registered to that interrupt to ask if the interrupt originated from its hardware. Faulty drivers may always claim "yes", causing the OS to not query other drivers registered to that interrupt (only one interrupt can be processed at a time). The device which originally requested the interrupt therefore does not get its interrupt serviced, so a new interrupt is generated (or is not cleared) and the processor becomes swamped with continuous interrupt signals. Any operating system can live lock under an interrupt storm caused by such a fault. A
kernel Kernel may refer to: Computing * Kernel (operating system), the central component of most operating systems * Kernel (image processing), a matrix used for image convolution * Compute kernel, in GPGPU programming * Kernel method, in machine learn ...
debugger A debugger or debugging tool is a computer program used to test and debug other programs (the "target" program). The main use of a debugger is to run the target program under controlled conditions that permit the programmer to track its executi ...
can usually break the storm by unloading the faulty driver, allowing the driver "underneath" the faulty one to clear the interrupt, if user input is still possible. As drivers are most often implemented by a 3rd party, most operating systems also have a
polling Poll, polled, or polling may refer to: Figurative head counts * Poll, a formal election ** Election verification exit poll, a survey taken to verify election counts ** Polling, voting to make decisions or determine opinions ** Polling places o ...
mode that queries for pending interrupts at fixed intervals or in a round-robin fashion. This mode can be set globally, on a per-driver, per-interrupt basis, or dynamically if the OS detects a fault condition or excessive interrupt generation. A polling mode may be enabled dynamically when the number of interrupts or the resource use caused by an interrupt, passes certain thresholds. When these thresholds are no longer exceeded, an OS may then change the interrupting driver, interrupt, or interrupt handling globally, from an interrupt mode to a polling mode. Interrupt rate limiting in hardware usually negates the use of a polling mode, but can still happen during normal operation during intense I/O if the processor is unable switch contexts quickly enough to keep pace.


History

Perhaps the first interrupt storm occurred during the Apollo 11's lunar descent in 1969.


Considerations

Interrupt rate limiting must be carefully configured for optimum results. For example, an
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 1 ...
controller with interrupt rate limiting will buffer the packets it receives from the network in between each interrupt. If the rate is set too low, the controller's buffer will overflow, and packets will be dropped. The rate must take into account how fast the buffer may fill between interrupts, and the
interrupt latency In computing, interrupt latency refers to the delay between the start of an Interrupt Request (IRQ) and the start of the respective Interrupt Service Routine (ISR). For many operating systems, devices are serviced as soon as the device's interrup ...
between the interrupt and the transfer of the buffer to the system.


Interrupt mitigating

There are hardware-based and software-based approaches to the problem. For example,
FreeBSD FreeBSD is a free and open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD), which was based on Research Unix. The first version of FreeBSD was released in 1993. In 2005, FreeBSD was the most popular ...
detects interrupt storms and masks problematic interrupts for some time in response. The system used by NAPI is an example of the hardware-based approach: the system (driver) starts in interrupt enabled state, and the
Interrupt handler In computer systems programming, an interrupt handler, also known as an interrupt service routine or ISR, is a special block of code associated with a specific interrupt condition. Interrupt handlers are initiated by hardware interrupts, softw ...
then disables the interrupt and lets a thread/task handle the event(s) and then task polls the device, processing some number of events and enabling the interrupt. Another interesting approach using hardware support is one where the device generates interrupt when the event queue state changes from "empty" to "not empty". Then, if there are no free DMA descriptors at the RX FIFO tail, the device drops the event. The event is then added to the tail and the FIFO entry is marked as occupied. If at that point entry (tail−1) is free (cleared), an interrupt will be generated (level interrupt) and the tail pointer will be incremented. If the hardware requires the interrupt be acknowledged, the CPU (interrupt handler) will do that, handle the valid DMA descriptors at the head, and return from the interrupt.


See also

* Broadcast radiation *
Inter-processor interrupt An inter-processor interrupt (IPI), also known as a ''shoulder tap'', is a special type of interrupt by which one processor may interrupt another processor in a multiprocessor system if the interrupting processor requires action from the other pr ...
(IPI) *
Non-maskable interrupt In computing, a non-maskable interrupt (NMI) is a hardware interrupt that standard interrupt-masking techniques in the system cannot ignore. It typically occurs to signal attention for non-recoverable hardware errors. Some NMIs may be masked, but ...
(NMI) * Programmable Interrupt Controller (PIC)


References

{{DEFAULTSORT:Interrupt Storm Interrupts Software anomalies