Marzullo's algorithm, invented by
Keith Marzullo
Keith Marzullo is the inventor of Marzullo's algorithm, which is part of the basis of the Network Time Protocol and the Windows Time Service. On August 1, 2016 he became the Dean of the University of Maryland College of Information Studies af ...
for his Ph.D. dissertation in 1984, is an
agreement algorithm Agreement may refer to:
Agreements between people and organizations
* Gentlemen's agreement, not enforceable by law
* Trade agreement, between countries
* Consensus, a decision-making process
* Contract, enforceable in a court of law
** Meeting of ...
used to select sources for estimating accurate time from a number of
noisy time sources. A refined version of it, renamed the "
intersection algorithm", forms part of the modern
Network Time Protocol.
Marzullo's algorithm is also used to compute the
relaxed intersection The ''relaxed intersection'' of ''m'' sets corresponds to the classical
intersection between sets except that it is allowed to relax few sets in order to avoid an empty intersection.
This notion can be used to solve Constraints Satisfaction Problem ...
of n boxes (or more generally ''n'' subsets of R
''n''), as required by several
robust set estimation methods.
Purpose
Marzullo's algorithm is efficient in terms of time for producing an optimal value from a set of estimates with
confidence interval
In frequentist statistics, a confidence interval (CI) is a range of estimates for an unknown parameter. A confidence interval is computed at a designated ''confidence level''; the 95% confidence level is most common, but other levels, such as ...
s where the actual value may be outside the confidence interval for some sources. In this case the best estimate is taken to be the smallest interval
consistent
In classical deductive logic, a consistent theory is one that does not lead to a logical contradiction. The lack of contradiction can be defined in either semantic or syntactic terms. The semantic definition states that a theory is consisten ...
with the largest number of sources.
If we have the estimates 10 ± 2, 12 ± 1 and 11 ± 1 then these intervals are
,12 1,13and
0,12
This list contains selected positive numbers in increasing order, including counts of things, dimensionless quantity, dimensionless quantities and probability, probabilities. Each number is given a name in the Long and short scales, short scale ...
which intersect to form
1,12or 11.5 ± 0.5 as consistent with all three values.
If instead the ranges are
,12 1,13and
4,15then there is no interval consistent with all these values but
1,12is consistent with the largest number of sources — namely, two of them.
Finally, if the ranges are
,9 ,12and
0,12
This list contains selected positive numbers in increasing order, including counts of things, dimensionless quantity, dimensionless quantities and probability, probabilities. Each number is given a name in the Long and short scales, short scale ...
then both the intervals
,9and
0,12
This list contains selected positive numbers in increasing order, including counts of things, dimensionless quantity, dimensionless quantities and probability, probabilities. Each number is given a name in the Long and short scales, short scale ...
are consistent with the largest number of sources.
This procedure determines an interval. If the desired result is a best value from that interval then a naive approach would be to take the center of the interval as the value, which is what was specified in the original Marzullo algorithm. A more sophisticated approach would recognize that this could be throwing away useful information from the confidence intervals of the sources and that a
probabilistic model
A statistical model is a mathematical model that embodies a set of statistical assumptions concerning the generation of sample data (and similar data from a larger population). A statistical model represents, often in considerably idealized form, ...
of the sources could return a value other than the center.
Note that the computed value is probably better described as "optimistic" rather than "optimal". For example, consider three intervals
0,12
This list contains selected positive numbers in increasing order, including counts of things, dimensionless quantity, dimensionless quantities and probability, probabilities. Each number is given a name in the Long and short scales, short scale ...
1, 13and
1.99,13 The algorithm described below computes
1.99, 12or 11.995 ± 0.005 which is a very precise value. If we suspect that one of the estimates might be incorrect, then at least two of the estimates must be correct. Under this condition, the best estimate is
1,13since this is the largest interval that always intersects at least two estimates. The algorithm described below is easily parameterized with the maximum number of incorrect estimates.
Method
Marzullo's algorithm begins by preparing a table of the sources, sorting it and then searching (efficiently) for the intersections of intervals. For each source there is a range
−r,c+rdefined by c ± r. For each range the table will have two
tuple
In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s of the form . One tuple will represent the beginning of the range, marked with type −1 as and the other will represent the end with type +1 as .
The description of the algorithm uses the following variables: best (largest number of overlapping intervals found), cnt (current number of overlapping intervals), beststart and bestend (the beginning and end of best interval found so far), i (an index), and the table of tuples.
#
Build the table of tuples.
#
Sort the table by the offset. (If two tuples with the same offset but opposite types exist, indicating that one interval ends just as another begins, then a method of deciding which comes first is necessary. Such an occurrence can be considered an overlap with no duration, which can be found by the algorithm by putting type −1 before type +1. If such pathological overlaps are considered objectionable they can be avoided by putting type +1 before −1 in this case.)
#
nitializebest=0 cnt=0
#
oop
OOP, Oop, or oop may refer to:
Science and technology
* Object-oriented positioning
Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of ...
go through each tuple in the table in ascending order
:#
urrent number of overlapping intervalscnt=cnt−type /li>
:# if cnt>best then best=cnt beststart=offset bestend=offset +1
:''commentary: the next tuple, at +1 will either be an end of an interval (type=+1) in which case it ends this best interval, or it will be a beginning of an interval (type=−1) and in the next step will replace best.''
:''ambiguity: unspecified is what to do if best=cnt. This is a condition of a tie for greatest overlap. The decision can either be made to take the smaller of bestend−beststart and offset +1minus;offset or just take an arbitrary one of the two equally good entries. This decision is relevant only when type +1+1.''
# nd loopreturn eststart,bestendas optimal interval. The number of ''false'' sources (ones which do not overlap the optimal interval returned) is the number of sources minus the value of best.
Efficiency
Marzullo's algorithm is efficient in both space and time. The
asymptotic
In analytic geometry, an asymptote () of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the ''x'' or ''y'' coordinates tends to infinity. In projective geometry and related contexts, ...
space usage is
O(n), where n is the number of sources. In considering the asymptotic time requirement the algorithm can be considered to consist of building the table, sorting it and searching it. Sorting can be done in O(n log n) time, and this dominates the building and searching phases which can be performed in
linear
Linearity is the property of a mathematical relationship ('' function'') that can be graphically represented as a straight line. Linearity is closely related to '' proportionality''. Examples in physics include rectilinear motion, the linear ...
time. Therefore, the time efficiency of Marzullo's algorithm is
O(n log n).
Once the table has been built and sorted it is possible to update the interval for one source (when new information is received) in linear time. Therefore, updating data for one source and finding the best interval can be done in O(n) time.
References
*
External links
*
* {{cite web , url= http://www.cse.ucsd.edu/users/marzullo/ , title= Keith Marzullo , work= CSE , publisher= UCSD
Agreement algorithms