The overlap problem in mapping
Each clone in a DNA mapping project has a "fingerprint", ''i.e.'' a set of DNA fragment lengths inferred from (1) enzymatically digesting the clone, (2) separating these fragments on a gel, and (3) estimating their lengths based on gel location. For each pairwise clone comparison, one can establish how many lengths from each set match-up. Cases having at least 1 match indicate that the clones ''might'' overlap because matches ''may'' represent the same DNA. However, the underlying sequences for each match are not known. Consequently, two fragments whose lengths match may still represent different sequences. In other words, matches do not conclusively indicate overlaps. The problem is instead one of using matches to probabilistically classify overlap status.Mathematical scores in overlap assessment
Biologists have used a variety of means (often in combination) to discern clone overlaps in DNA mapping projects. While many are biological, ''i.e.'' looking for shared markers, others are basically mathematical, usually adopting probabilistic and/or statistical approaches.Sulston score exposition
The Sulston score is rooted in the concepts ofMathematical refinement
In a 2005 paper, Michael Wendl gave an example showing that the assumption of independent trials is not valid. So, although the traditional Sulston score does indeed represent a probability distribution, it is not actually the distribution characteristic of the fingerprint problem. Wendl went on to give the general solution for this problem in terms of the Bell polynomials, showing the traditional score overpredicts P-values by orders of magnitude. (P-values are very small in this problem, so we are talking, for example, about probabilities on the order of 10×10−14 versus 10×10−12, the latter Sulston value being 2 orders of magnitude too high.) This solution provides a basis for determining when a problem has sufficient information content to be treated by the probabilistic approach and is also a general solution to the birthday problem of 2 types. A disadvantage of the exact solution is that its evaluation is computationally intensive and, in fact, is not feasible for comparing large clones. Some fast approximations for this problem have been proposed.References
{{reflistSee also