Jewels Of Stringology
   HOME

TheInfoList



OR:

''Jewels of Stringology: Text Algorithms'' is a book on
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s for
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually has to be exact: "either it will or will not be ...
in
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
and related problems. It was written by
Maxime Crochemore Maxime Crochemore (born 1947) is a French computer scientist known for his numerous contributions to algorithms on strings. He is currently a professor at King's College London. Biography Crochemore earned his doctorate (PhD) in 1978 and his Doct ...
and
Wojciech Rytter Wojciech Rytter is a Polish computer scientist, a professor of computer science in the automata theory group at the University of Warsaw. His research focuses on the design and analysis of algorithms, and in particular on stringology, the study of ...
, and published by World Scientific in 2003.


Topics

The first topics of the book are two basic
string-searching algorithm In computer science, string-searching algorithms, sometimes called string-matching algorithms, are an important class of string algorithms that try to find a place where one or several strings (also called patterns) are found within a larger stri ...
s for finding exactly-matching
substring In formal language theory and computer science, a substring is a contiguous sequence of characters within a string. For instance, "''the best of''" is a substring of "''It was the best of times''". In contrast, "''Itwastimes''" is a subsequence ...
s, the
Knuth–Morris–Pratt algorithm In computer science, the Knuth–Morris–Pratt string-searching algorithm (or KMP algorithm) searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies s ...
and the
Boyer–Moore string-search algorithm In computer science, the Boyer–Moore string-search algorithm is an efficient string-searching algorithm that is the standard benchmark for practical string-search literature. It was developed by Robert S. Boyer and J Strother Moore in 1977. T ...
. It then describes the
suffix tree In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particu ...
, an index for quickly looking up matching substrings, and two algorithms for constructing it. Other topics in the book include the construction of
deterministic finite automata In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
for pattern recognition, the discovery of repeated patterns in strings, constant-space string matching algorithms, and the
lossless compression Lossless compression is a class of data compression that allows the original data to be perfectly reconstructed from the compressed data with no loss of information. Lossless compression is possible because most real-world data exhibits statistic ...
of strings.
Approximate string matching In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching i ...
is covered in several variations including
edit distance In computational linguistics and computer science, edit distance is a string metric, i.e. a way of quantifying how dissimilar two strings (e.g., words) are to one another, that is measured by counting the minimum number of operations required to tr ...
and the
longest common subsequence problem The longest common subsequence (LCS) problem is the problem of finding the longest subsequence common to all sequences in a set of sequences (often just two sequences). It differs from the longest common substring problem: unlike substrings, subs ...
. The book concludes with advanced topics including two-dimensional pattern matching, parallel algorithms for pattern matching, the shortest common superstring problem, parameterized pattern matching and
duplicate code In computer programming, duplicate code is a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a n ...
detection, and the
Rabin–Karp algorithm In computer science, the Rabin–Karp algorithm or Karp–Rabin algorithm is a string-searching algorithm created by that uses hashing to find an exact match of a pattern string in a text. It uses a rolling hash to quickly filter out positions o ...
.


Audience and reception

The book is written for an audience familiar with algorithm design and analysis, but not necessarily familiar with string algorithms. Reviewer Rolf Klein suggests that this target audience may be narrow, as he evaluates the book as being too difficult for many students, but not supplying as much depth for experts as the same authors' earlier book ''Text Algorithms'' (1994). Reviewer Shoshana Marcus writes that the algorithms chosen for inclusion in the book are "elegant yet fundamental" but have often been overlooked by more general algorithms textbooks. She writes that the book itself should become a valuable reference for researchers in this area, and that it could also be used to supplement undergraduate or graduate course material in algorithms. Reviewer
Ricardo Baeza-Yates Ricardo A. Baeza-Yates (born March 21, 1961) is a Chilean-Catalan computer scientist that currently is a Research Professor at the Institute for Experiential AI of Northeastern University in the Silicon Valley campus. He is also part-time professo ...
suggests that the book's omission of bit-level parallel programming techniques reflects its bias towards theoretical rather than practical methods, but nevertheless agrees with its suitability for graduate courses.


References

{{reflist, refs= {{citation , last = Baeza-Yates , first = Ricardo , author-link = Ricardo Baeza-Yates , journal =
Mathematical Reviews ''Mathematical Reviews'' is a journal published by the American Mathematical Society (AMS) that contains brief synopses, and in some cases evaluations, of many articles in mathematics, statistics, and theoretical computer science. The AMS also pu ...
, mr = 2012571 , title = none , year = 2005
{{citation , last = Klein , first = Rolf , journal =
zbMATH zbMATH Open, formerly Zentralblatt MATH, is a major reviewing service providing reviews and abstracts for articles in pure mathematics, pure and applied mathematics, produced by the Berlin office of FIZ Karlsruhe – Leibniz Institute for Informa ...
, title = none , zbl = 1078.68151
{{citation , last = Marcus , first = Shoshana , date = September 2015 , doi = 10.1145/2818936.2818940 , issue = 3 , journal =
ACM SIGACT News ACM SIGACT or SIGACT is the Association for Computing Machinery Special Interest Group on Algorithms and Computation Theory, whose purpose is support of research in theoretical computer science. It was founded in 1968 by Patrick C. Fischer. Publ ...
, pages = 11–14 , title = Review of ''Jewels of Stringology'' , url = https://mathcs.clarku.edu/~fgreen/SIGACTReviews/bookrev/46-3.pdf , volume = 46, s2cid = 29751366
Algorithms on strings Computer science books 2003 non-fiction books