Krauss Matching Wildcards Algorithm
   HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, information, and automation. Computer science spans Theoretical computer science, theoretical disciplines (such as algorithms, theory of computation, and information theory) to Applied science, ...
, the Krauss wildcard-matching algorithm is a
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
algorithm. Based on the wildcard syntax in common use, e.g. in the
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
command-line interface A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
, the algorithm provides a non- recursive mechanism for matching patterns in software applications, based on syntax simpler than that typically offered by
regular expression A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s.


History

The algorithm is based on a history of development, correctness and performance testing, and programmer feedback that began with an unsuccessful search for a reliable non-recursive algorithm for matching wildcards. An initial algorithm, implemented in a single while loop, quickly prompted comments from software developers, leading to improvements. Ongoing comments and suggestions culminated in a revised algorithm still implemented in a single while loop but refined based on a collection of test cases and a performance profiler. The experience tuning the single while loop using the profiler prompted development of a two-loop strategy that achieved further performance gains, particularly in situations involving empty input strings or input containing no wildcard characters. The two-loop algorithm is available for use by the
open-source software Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
development community, under the terms of the
Apache License The Apache License is a permissive free software license written by the Apache Software Foundation (ASF). It allows users to use the software for any purpose, to distribute it, to modify it, and to distribute modified versions of the software ...
v. 2.0, and is accompanied by test case code.


Usage

The algorithm made available under the Apache license is implemented in both pointer-based C++ and portable C++ (implemented without pointers). The test case code, also available under the Apache license, can be applied to any algorithm that provides the pattern matching operations below. The implementation as coded is unable to handle multibyte character sets and poses problems when the text being searched may contain multiple incompatible character sets.


Pattern matching operations

The algorithm supports three pattern matching operations: * A one-to-one match is performed between the pattern and the source to be checked for a match, with the exception of asterisk ( *) or question mark ( ?) characters in the pattern. * An asterisk ( *) character matches any sequence of zero or more characters. * A question mark ( ?) character matches any single character.


Examples

* ''*foo*'' matches any string containing "foo". * ''mini*'' matches any string that begins with "mini" (including the string "mini" itself). * ''???*'' matches any string of three or more letters.


Applications

The original algorithm has been ported to the DataFlex programming language by Larry Heiges for use with Data Access Worldwide code library. It has been posted on GitHub in modified form as part of a log file reader. The 2014 algorithm is part of the Unreal Model Viewer built into the Epic Games
Unreal Engine Unreal Engine (UE) is a 3D computer graphics game engine developed by Epic Games, first showcased in the 1998 first-person shooter video game '' Unreal''. Initially developed for PC first-person shooters, it has since been used in a variety of ...
game engine A game engine is a software framework primarily designed for the development of video games which generally includes relevant libraries and support programs such as a level editor. The "engine" terminology is akin to the term " software engine" u ...
.


See also

*
pattern matching In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
*
glob (programming) glob() () is a libc function for ''globbing'', which is the archetypal use of pattern matching against the names in a filesystem directory such that a name pattern is expanded into a list of names matching that pattern. Although ''globbing'' m ...
*
wildmat wildmat is a pattern matching library developed by Rich Salz. Based on the wildcard syntax already used in the Bourne shell, wildmat provides a uniform mechanism for matching patterns across applications with simpler syntax than that typicall ...


References

{{reflist Algorithms