HOME

TheInfoList



OR:

In
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
and the study of
combinatorics on words Combinatorics on words is a fairly new field of mathematics, branching from combinatorics, which focuses on the study of words and formal languages. The subject looks at letters or symbols, and the sequences they form. Combinatorics on words ...
, a partial word is a string that may contain a number of "do not know" or "do not care" symbols i.e. placeholders in the string where the symbol value is not known or not specified. More formally, a partial word is a
partial function In mathematics, a partial function from a set to a set is a function from a subset of (possibly itself) to . The subset , that is, the domain of viewed as a function, is called the domain of definition of . If equals , that is, if is de ...
u: \ \rightarrow A where A is some finite alphabet. If ''u''(''k'') is not defined for some k \in \ then the unknown element at place ''k'' in the string is called a "hole". In
regular expression A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s (following the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming interf ...
standard) a hole is represented by the
metacharacter A metacharacter is a character that has a special meaning to a computer program, such as a shell interpreter or a regular expression (regex) engine. In POSIX extended regular expressions, there are 14 metacharacters that must be ''escaped'' (prec ...
".". For example, ''aab.ab.b'' is a partial word of length 8 over the alphabet ''A'' = in which the fourth and seventh characters are holes.


Algorithms

Several algorithms have been developed for the problem of "string matching with don't cares", in which the input is a long text and a shorter partial word and the goal is to find all strings in the text that match the given partial word.


Applications

Two partial words are said to be ''compatible'' when they have the same length and when every position that is a non-wildcard in both of them has the same character in both. If one forms an
undirected graph In discrete mathematics, and more specifically in graph theory, a graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense "related". The objects correspond to mathematical abstractions called '' v ...
with a vertex for each partial word in a collection of partial words, and an edge for each compatible pair, then the
cliques A clique ( AusE, CanE, or ), in the social sciences, is a group of individuals who interact with one another and share similar interests. Interacting with cliques is part of normative social development regardless of gender, ethnicity, or popular ...
of this graph come from sets of partial words that all match at least one common string. This graph-theoretical interpretation of compatibility of partial words plays a key role in the proof of
hardness of approximation In computer science, hardness of approximation is a field that studies the algorithmic complexity of finding near-optimal solutions to optimization problems. Scope Hardness of approximation complements the study of approximation algorithms by prov ...
of the
clique problem In computer science, the clique problem is the computational problem of finding cliques (subsets of vertices, all adjacent to each other, also called complete subgraphs) in a graph. It has several different formulations depending on which cliq ...
, in which a collection of partial words representing successful runs of a
probabilistically checkable proof In computational complexity theory, a probabilistically checkable proof (PCP) is a type of proof that can be checked by a randomized algorithm using a bounded amount of randomness and reading a bounded number of bits of the proof. The algorithm i ...
verifier has a large clique if and only if there exists a valid proof of an underlying
NP-complete In computational complexity theory, a problem is NP-complete when: # it is a problem for which the correctness of each solution can be verified quickly (namely, in polynomial time) and a brute-force search algorithm can find a solution by tryi ...
problem. The faces (subcubes) of an n-dimensional
hypercube In geometry, a hypercube is an ''n''-dimensional analogue of a square () and a cube (). It is a closed, compact, convex figure whose 1- skeleton consists of groups of opposite parallel line segments aligned in each of the space's dimensions, ...
can be described by partial words of length n over a binary alphabet, whose symbols are the
Cartesian coordinates A Cartesian coordinate system (, ) in a plane is a coordinate system that specifies each point uniquely by a pair of numerical coordinates, which are the signed distances to the point from two fixed perpendicular oriented lines, measured in t ...
of the hypercube vertices (e.g., 0 or 1 for a
unit cube A unit cube, more formally a cube of side 1, is a cube whose sides are 1 unit long.. See in particulap. 671. The volume of a 3-dimensional unit cube is 1 cubic unit, and its total surface area is 6 square units.. Unit hypercube The term '' ...
). The dimension of a subcube, in this representation, equals the number of don't-care symbols it contains. The same representation may also be used to describe the
implicant In Boolean logic, the term implicant has either a generic or a particular meaning. In the generic use, it refers to the hypothesis of an implication ( implicant). In the particular use, a product term (i.e., a conjunction of literals) ''P'' is a ...
s of
Boolean function In mathematics, a Boolean function is a function whose arguments and result assume values from a two-element set (usually , or ). Alternative names are switching function, used especially in older computer science literature, and truth function ( ...
s.


Related concepts

Partial words may be generalized to
parameter word In the mathematical study of combinatorics on words, a parameter word is a string over a given alphabet having some number of wildcard characters. The set of strings matching a given parameter word is called a parameter set or combinatorial cube. P ...
s, in which some of the "do not know" symbols are marked as being equal to each other. A partial word is a special case of a parameter word in which each do not know symbol may be substituted by a character independently of all of the other ones.


References

{{reflist Combinatorics on words String (computer science)