Interval scheduling is a class of problems in
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, particularly in the area of
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
design. The problems consider a set of tasks. Each task is represented by an ''interval'' describing the time in which it needs to be processed by some machine (or, equivalently, scheduled on some resource). For instance, task A might run from 2:00 to 5:00, task B might run from 4:00 to 10:00 and task C might run from 9:00 to 11:00. A subset of intervals is ''compatible'' if no two intervals overlap on the machine/resource. For example, the subset is compatible, as is the subset ; but neither nor are compatible subsets, because the corresponding intervals within each subset overlap.
The ''interval scheduling maximization problem'' (ISMP) is to find a largest compatible set, i.e., a set of non-overlapping intervals of maximum size. The goal here is to execute as many tasks as possible, that is, to maximize the
throughput
Network throughput (or just throughput, when in context) refers to the rate of message delivery over a communication channel, such as Ethernet or packet radio, in a communication network. The data that these messages contain may be delivered ov ...
. It is equivalent to finding a
maximum independent set
In graph theory, an independent set, stable set, coclique or anticlique is a set of vertices in a graph, no two of which are adjacent. That is, it is a set S of vertices such that for every two vertices in S, there is no edge connecting the two ...
in an
interval graph
In graph theory, an interval graph is an undirected graph formed from a set of intervals on the real line,
with a vertex for each interval and an edge between vertices whose intervals intersect. It is the intersection graph of the intervals.
Int ...
.
A generalization of the problem considers
machines/resources.
Here the goal is to find
compatible subsets whose union is the largest.
In an upgraded version of the problem, the intervals are partitioned into groups. A subset of intervals is ''compatible'' if no two intervals overlap, and moreover, no two intervals belong to the same group (i.e., the subset contains at most a single representative of each group). Each group of intervals corresponds to a single task, and represents several alternative intervals in which it can be executed.
The ''group interval scheduling decision problem'' (GISDP) is to decide whether there exists a compatible set in which all groups are represented. The goal here is to execute a single representative task from each group. GISDPk is a restricted version of GISDP in which the number of intervals in each group is at most ''k''.
The ''group interval scheduling maximization problem'' (GISMP) is to find a largest compatible set - a set of non-overlapping representatives of maximum size. The goal here is to execute a representative task from as many groups as possible. GISMPk is a restricted version of GISMP in which the number of intervals in each group is at most ''k''. This problem is often called JISPk, where J stands for
Job
Work or labor (or labour in British English) is intentional activity people perform to support the needs and wants of themselves, others, or a wider community. In the context of economics, work can be viewed as the human activity that contr ...
.
GISMP is the most general problem; the other two problems can be seen as special cases of it:
* ISMP is the special case in which each task belongs to its own group (i.e. it is equal to GISMP1).
* GISDP is the problem of deciding whether the maximum exactly equals the number of groups.
All these problems can be generalized by adding a ''weight'' for each interval, representing the profit from executing the task in that interval. Then, the goal is to maximize the total weight.
All these problems are special cases of
single-machine scheduling, since they assume that all tasks must run on a single processor. Single-machine scheduling is a special case of
optimal job scheduling Optimal job scheduling is a class of optimization problems related to scheduling. The inputs to such problems are a list of ''jobs'' (also called ''processes'' or ''tasks'') and a list of ''machines'' (also called ''processors'' or ''workers''). The ...
.
Single-Interval Scheduling Maximization
Unweighted
Several algorithms, that may look promising at first sight, actually do not find the optimal solution:
* Selecting the intervals that start earliest is not an optimal solution, because if the earliest interval happens to be very long, accepting it would make us reject many other shorter requests.
* Selecting the shortest intervals or selecting intervals with the fewest conflicts is also not optimal.
The following
greedy algorithm
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
, called
Earliest deadline first scheduling
Earliest deadline first (EDF) or least time to go is a dynamic priority scheduling algorithm used in real-time operating systems to place processes in a priority queue. Whenever a scheduling event occurs (task finishes, new task released, etc.) ...
, does find the optimal solution for unweighted single-interval scheduling:
# Select the interval, ''x'', with the earliest finishing time.
# Remove ''x'', and all intervals intersecting ''x'', from the set of candidate intervals.
# Repeat until the set of candidate intervals is empty.
Whenever we select an interval at step 1, we may have to remove many intervals in step 2. However, all these intervals necessarily cross the finishing time of ''x'', and thus they all cross each other (see figure). Hence, at most 1 of these intervals can be in the optimal solution. Hence, for every interval in the optimal solution, there is an interval in the greedy solution. This proves that the greedy algorithm indeed finds an optimal solution.
A more formal explanation is given by a
Charging argument
In computer science, a charging argument is used to compare the output of an optimization algorithm to an optimal solution. It is typically used to show that an algorithm produces optimal results by proving the existence of a particular injective ...
.
The greedy algorithm can be executed in time O(''n'' log ''n''), where ''n'' is the number of tasks, using a preprocessing step in which the tasks are sorted by their finishing times.
Weighted
When the intervals have weights, the problem is equivalent to finding a maximum-weight
independent set in an
interval graph
In graph theory, an interval graph is an undirected graph formed from a set of intervals on the real line,
with a vertex for each interval and an edge between vertices whose intervals intersect. It is the intersection graph of the intervals.
Int ...
. It can be solved in polynomial time.
To find the maximum weight of a single interval schedule in Θ(n) time, perform the following pseudocode:
// The vectors are already sorted from earliest to latest finish time.
int v umOfVectors+1 //list of interval vectors
int w umOfVectors+1 //w is the weight for v
int p umOfVectors+1 //p is the # of vectors that end before v begins.
int M umOfVectors+1
int finalSchedule umOfVectors
//v does not exist. The first interval vector is set to v
w 0; p 0; M 0;
for(int i = 1; i < numOfVector+1; i++)
//The maximum weight of the schedule is equal to M umOfVectors
int index = numOfVectors-1;
schedule(j)
Group Interval Scheduling Decision
NP-complete when some groups contain 3 or more intervals
GISDPk is NP-complete when
,
even when all intervals have the same length.
This can be shown by a reduction from the following version of the
Boolean satisfiability problem
In logic and computer science, the Boolean satisfiability problem (sometimes called propositional satisfiability problem and abbreviated SATISFIABILITY, SAT or B-SAT) is the problem of determining if there exists an interpretation that satisfie ...
, which was shown
to be
NP-complete
In computational complexity theory, a problem is NP-complete when:
# it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
likewise to the unrestricted version.
::Let
be a set of Boolean variables. Let
be a set of clauses over ''X'' such that (1) each clause in ''C'' has at most three literals and (2) each variable is restricted to appear once or twice positively and once negatively overall in ''C''. Decide whether there is an assignment to variables of ''X'' such that each clause in ''C'' has at least one true literal.
Given an instance of this satisfiability problem, construct the following instance of GISDP. All intervals have a length of 3, so it is sufficient to represent each interval by its starting time:
* For every variable
(for ), create a group with two intervals: one starting at
(representing the assignment
) and another starting at
(representing the assignment
).
* For every clause
(for ), create a group with the following intervals:
** For every variable
that appears positively for the ''first'' time in ''C'' an interval starting at
.
** For every variable
that appears positively for the ''second'' time in ''C'' an interval starting at
. Note that both these intervals intersect the interval
, associated with
.
** For every variable
that appears negatively - an interval starting at
. This interval intersects the interval
associated with
.
Note that there is no overlap between intervals in groups associated with different clauses. This is ensured since a variable appears at most twice positively and once negatively.
The constructed GISDP has a feasible solution (i.e. a scheduling in which each group is represented), if and only if the given set of boolean clauses has a satisfying assignment. Hence GISDP3 is NP-complete, and so is GISDPk for every
.
Polynomial when all groups contain at most 2 intervals
GISDP2 can be solved at polynomial time by the following reduction to the
2-satisfiability
In computer science, 2-satisfiability, 2-SAT or just 2SAT is a computational problem of assigning values to variables, each of which has two possible values, in order to satisfy a system of constraints on pairs of variables. It is a special case ...
problem:
* For every group ''i'' create two variables, representing its two intervals:
and
.
* For every group ''i'', create the clauses:
and
, which represent the assertion that exactly one of these two intervals should be selected.
* For every two intersecting intervals (i.e.
and
) create the clause:
, which represent the assertion that at most one of these two intervals should be selected.
This construction contains at most O(''n''
2) clauses (one for each intersection between intervals, plus two for each group). Each clause contains 2 literals. The satisfiability of such formulas can be decided in time linear in the number of clauses (see
2-SAT
In computer science, 2-satisfiability, 2-SAT or just 2SAT is a computational problem of assigning values to variables, each of which has two possible values, in order to satisfy a system of Constraint (mathematics), constraints on pairs of variabl ...
). Therefore, the GISDP2 can be solved in polynomial time.
Group Interval Scheduling Maximization
MaxSNP-complete when some groups contain 2 or more intervals
GISMPk is NP-complete even when
.
[ citing Kolen in personal communication]
Moreover, GISMPk is
MaxSNP In computational complexity theory, SNP (from Strict NP) is a complexity class containing a limited subset of NP based on its logical characterization in terms of graph-theoretical properties. It forms the basis for the definition of the class Max ...
-complete, i.e., it does not have a PTAS unless P=NP. This can be proved by showing an approximation-preserving reduction from MAX 3-SAT-3 to GISMP2.
[
]
Polynomial 2-approximation
The following greedy algorithm finds a solution that contains at least 1/2 of the optimal number of intervals:[
# Select the interval, ''x'', with the earliest finishing time.
# Remove ''x'', and all intervals intersecting ''x'', and all intervals in the same group of ''x'', from the set of candidate intervals.
# Continue until the set of candidate intervals is empty.
A formal explanation is given by a ]Charging argument
In computer science, a charging argument is used to compare the output of an optimization algorithm to an optimal solution. It is typically used to show that an algorithm produces optimal results by proving the existence of a particular injective ...
.
The approximation factor of 2 is tight. For example, in the following instance of GISMP2:
* Group #1:
* Group #2:
The greedy algorithm selects only 1 interval ..2from group #1, while an optimal scheduling is to select ..3from group #2 and then ..6from group #1.
A more general approximation algorithm attains a 2-factor approximation for the weighted case.
LP based approximation algorithms
Using the technique of Linear programming relaxation
In mathematics, the relaxation of a (mixed) integer linear program is the problem that arises by removing the integrality constraint of each variable.
For example, in a 0–1 integer program, all constraints are of the form
:x_i\in\.
The relax ...
, it is possible to approximate the optimal scheduling with slightly better approximation factors. The approximation ratio of the first such algorithm is asymptotically 2 when ''k'' is large, but when ''k=2'' the algorithm achieves an approximation ratio of 5/3.[ The approximation factor for arbitrary ''k'' was later improved to 1.582.]
Related problems
An interval scheduling problem can be described by an intersection graph
In graph theory, an intersection graph is a graph that represents the pattern of intersections of a family of sets. Any graph can be represented as an intersection graph, but some important special classes of graphs can be defined by the types o ...
, where each vertex is an interval, and there is an edge between two vertices if and only if their intervals overlap. In this representation, the interval scheduling problem is equivalent to finding the maximum independent set in this intersection graph. Finding a maximum independent set is NP-hard in general graphs, but it can be done in polynomial time in the special case of intersection graphs (ISMP).
A group-interval scheduling problem (GISMPk) can be described by a similar interval-intersection graph, with additional edges between each two intervals of the same group, i.e., this is the edge union of an interval graph and a graph consisting of n disjoint cliques of size ''k''.
Variations
An important class of scheduling algorithms is the class of dynamic priority algorithms. When none of the intervals overlap the optimum solution is trivial. The optimum for the non-weighted version can found with the earliest deadline first scheduling
Earliest deadline first (EDF) or least time to go is a dynamic priority scheduling algorithm used in real-time operating systems to place processes in a priority queue. Whenever a scheduling event occurs (task finishes, new task released, etc.) ...
. Weighted interval scheduling is a generalization where a value is assigned to each executed task and the goal is to maximize the total value. The solution need not be unique.
The interval scheduling problem is 1-dimensional – only the time dimension is relevant. The Maximum disjoint set
In computational geometry, a maximum disjoint set (MDS) is a largest set of non-overlapping geometric shapes selected from a given set of candidate shapes.
Every set of non-overlapping shapes is an independent set in the intersection graph of the ...
problem is a generalization to 2 or more dimensions. This generalization, too, is NP-complete.
Another variation is resource allocation, in which a set of intervals s are scheduled using resources ''k'' such that ''k'' is minimized. That is, all the intervals must be scheduled, but the objective is to minimize the usage of resources.
Another variation is when there are ''m'' processors instead of a single processor. I.e., ''m'' different tasks can run in parallel. See identical-machines scheduling Identical-machines scheduling is an optimization problem in computer science and operations research. We are given ''n'' jobs ''J''1, ''J''2, ..., ''Jn'' of varying processing times, which need to be scheduled on ''m'' identical machines, such that ...
.
Sources
{{Scheduling problems
Optimal scheduling