In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includi ...
, deadlock prevention algorithms are used in
concurrent programming
Concurrent means happening at the same time. Concurrency, concurrent, or concurrence may refer to:
Law
* Concurrence, in jurisprudence, the need to prove both ''actus reus'' and ''mens rea''
* Concurring opinion (also called a "concurrence"), ...
when multiple processes must acquire more than one
shared resource
In computing, a shared resource, or network share, is a computer resource made available from one host to other hosts on a computer network.
It is a device or piece of information on a computer that can be remotely accessed from another compu ...
. If two or more concurrent processes obtain multiple resources indiscriminately, a situation can occur where each process has a resource needed by another process. As a result, none of the processes can obtain all the resources it needs, so all processes are blocked from further execution. This situation is called a
deadlock
In concurrent computing, deadlock is any situation in which no member of some group of entities can proceed because each waits for another member, including itself, to take action, such as sending a message or, more commonly, releasing a loc ...
. A deadlock prevention
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
organizes resource usage by each process to ensure that at least one process is always able to get all the resources it needs. One such example of deadlock algorithm is Banker's algorithm.
Overview
Distributed deadlock
Distributed deadlocks can occur in
distributed systems
A distributed system is a system whose components are located on different networked computers, which communicate and coordinate their actions by passing messages to one another from any system. Distributed computing is a field of computer sci ...
when
distributed transaction A distributed transaction is a database transaction in which two or more network hosts are involved. Usually, hosts provide transactional resources, while the transaction manager is responsible for creating and managing a global transaction that enc ...
s or
concurrency control
In information technology and computer science, especially in the fields of computer programming, operating systems, multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while ...
is being used. Distributed deadlocks can be detected either by constructing a global
wait-for graph
A wait-for graph in computer science is a directed graph used for deadlock detection in operating systems and relational database systems.
In computer science, a system that allows concurrent operation of multiple processes and locking of resour ...
, from local wait-for graphs at a deadlock detector or by a
distributed algorithm A distributed algorithm is an algorithm designed to run on computer hardware constructed from interconnected processors. Distributed algorithms are used in different application areas of distributed computing, such as telecommunications, scientific ...
like edge chasing.
''Phantom deadlocks'' are deadlocks that are detected in a distributed system due to system internal delays but no longer actually exist at the time of detection.
Deadlock prevention
There are many different ways to increase parallelism where recursive locks would otherwise cause deadlocks. But there is a price. And that price is either performance/overhead, allow data corruption, or both.
Some examples include: lock hierarchies,
lock reference-counting and preemption (either using versioning or allowing data corruption when preemption occurs); Wait-For-Graph (WFG
algorithms, which track all cycles that cause deadlocks (including temporary deadlocks); and heuristics algorithms which don't necessarily increase parallelism in 100% of the places that deadlocks are possible, but instead compromise by solving them in enough places that performance/overhead vs parallelism is acceptable.
Consider a "when two trains approach each other at a crossing" situation. Just-in-time prevention works like having a person standing at the crossing (the crossing guard) with a switch that will let only one train onto "super tracks" which runs above and over the other waiting train(s).
*For non-recursive locks, a lock may be entered only once (where a single thread entering twice without unlocking will cause a deadlock, or throw an exception to enforce circular wait prevention).
*For recursive locks, only one thread is allowed to pass through a lock. If any other threads enter the lock, they must wait until the initial thread that passed through completes n number of times it has entered.
So the issue with the first one is that it does no deadlock prevention at all. The second does not do distributed deadlock prevention. But the second one is redefined to prevent a deadlock scenario the first one does not address.
Recursively, only one thread is allowed to pass through a lock. If other threads enter the lock, they must wait until the initial thread that passed through completes n number of times. But if the number of threads that enter locking equal the number that are locked, assign one thread as the super-thread, and only allow it to run (tracking the number of times it enters/exits locking) until it completes.
After a super-thread is finished, the condition changes back to using the logic from the recursive lock, and the exiting super-thread
#sets itself as not being a super-thread
#notifies the locker that other locked, waiting threads need to re-check this condition
If a deadlock scenario exists, set a new super-thread and follow that logic. Otherwise, resume regular locking.
Issues not addressed above
A lot of confusion revolves around the
halting problem
In computability theory, the halting problem is the problem of determining, from a description of an arbitrary computer program and an input, whether the program will finish running, or continue to run forever. Alan Turing proved in 1936 that a ...
. But this logic does not solve the halting problem because the conditions in which locking occurs are known, giving a specific solution (instead of the otherwise required general solution that the halting problem requires). Still, this locker prevents all deadlocked only considering locks using this logic. But if it is used with other locking mechanisms, a lock that is started never unlocks (exception thrown jumping out without unlocking, looping indefinitely within a lock, or coding error forgetting to call unlock), deadlocking is very possible. To increase the condition to include these would require solving the halting issue, since one would be dealing with conditions that one knows nothing about and is unable to change.
Another issue is it does not address the temporary deadlocking issue (not really a deadlock, but a performance killer), where two or more threads lock on each other while another unrelated thread is running. These temporary deadlocks could have a thread running exclusively within them, increasing parallelism. But because of how the distributed deadlock detection works for all locks, and not subsets therein, the unrelated running thread must complete before performing the super-thread logic to remove the temporary deadlock.
One can see the temporary live-lock scenario in the above. If another unrelated running thread begins before the first unrelated thread exits, another duration of temporary deadlocking will occur. If this happens continuously (extremely rare), the temporary deadlock can be extended until right before the program exits, when the other unrelated threads are guaranteed to finish (because of the guarantee that one thread will always run to completion).
Further expansion
This can be further expanded to involve additional logic to increase parallelism where temporary deadlocks might otherwise occur. But for each step of adding more logic, we add more overhead.
A couple of examples include: expanding distributed super-thread locking mechanism to consider each subset of existing locks; Wait-For-Graph (WFG
algorithms, which track all cycles that cause deadlocks (including temporary deadlocks); and heuristics algorithms which don't necessarily increase parallelism in 100% of the places that temporary deadlocks are possible, but instead compromise by solving them in enough places that performance/overhead vs parallelism is acceptable (e.g. for each processor available, work towards finding deadlock cycles less than the number of processors + 1 deep).
References
{{reflist
Distributed computing