HOME

TheInfoList



OR:

A syntactic predicate specifies the syntactic validity of applying a
production Production may refer to: Economics and business * Production (economics) * Production, the act of manufacturing goods * Production, in the outline of industrial organization, the act of making products (goods and services) * Production as a stati ...
in a
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
and is analogous to a
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and comput ...
predicate Predicate or predication may refer to: * Predicate (grammar), in linguistics * Predication (philosophy) * several closely related uses in mathematics and formal logic: **Predicate (mathematical logic) **Propositional function **Finitary relation, o ...
that specifies the semantic validity of applying a production. It is a simple and effective means of dramatically improving the recognition strength of an
LL parser In computer science, an LL parser (Left-to-right, leftmost derivation) is a top-down parser for a restricted context-free language. It parses the input from Left to right, performing Leftmost derivation of the sentence. An LL parser is called a ...
by providing arbitrary lookahead. In their original implementation, syntactic predicates had the form “( α )?” and could only appear on the left edge of a production. The required syntactic condition α could be any valid context-free grammar fragment. More formally, a syntactic predicate is a form of production intersection, used in
parser Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
specifications or in
formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
s. In this sense, the term ''predicate'' has the meaning of a mathematical
indicator function In mathematics, an indicator function or a characteristic function of a subset of a set is a function that maps elements of the subset to one, and all other elements to zero. That is, if is a subset of some set , one has \mathbf_(x)=1 if x\i ...
. If ''p1'' and ''p2,'' are production rules, the
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of met ...
generated by ''both'' ''p1'' ''and'' ''p2'' is their set intersection. As typically defined or implemented, syntactic predicates implicitly order the productions so that predicated productions specified earlier have higher precedence than predicated productions specified later within the same decision. This conveys an ability to disambiguate ambiguous productions because the programmer can simply specify which production should match.
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
s (PEGs), invented by Bryan Ford, extend these simple predicates by allowing "not predicates" and permitting a predicate to appear anywhere within a production. Moreover, Ford invented packrat parsing to handle these grammars in linear time by employing memoization, at the cost of heap space. It is possible to support linear-time parsing of predicates as general as those allowed by PEGs, but reduce the memory cost associated with memoization by avoiding backtracking where some more efficient implementation of lookahead suffices. This approach is implemented by
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
version 3, which uses
Deterministic finite automata In the theory of computation, a branch of theoretical computer science, a deterministic finite automaton (DFA)—also known as deterministic finite acceptor (DFA), deterministic finite-state machine (DFSM), or deterministic finite-state automa ...
for lookahead; this may require testing a predicate in order to choose between transitions of the DFA (called "pred-LL(*)" parsing).


Overview


Terminology

The term ''syntactic predicate'' was coined by Parr & Quong and differentiates this form of predicate from
semantic predicate Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...
s (also discussed). Syntactic predicates have been called ''multi-step matching'', ''parse constraints'', and simply ''predicates'' in various literature. (See References section below.) This article uses the term ''syntactic predicate'' throughout for consistency and to distinguish them from
semantic predicate Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference, meaning, or truth. The term can be used to refer to subfields of several distinct disciplines, including philosophy, linguistics and compu ...
s.


Formal closure properties

Bar-Hillel ''et al.'' show that the intersection of two
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
s is also a regular language, which is to say that the regular languages are closed under intersection. The intersection of a
regular language In theoretical computer science and formal language theory, a regular language (also called a rational language) is a formal language that can be defined by a regular expression, in the strict sense in theoretical computer science (as opposed to ...
and a
context-free language In formal language theory, a context-free language (CFL) is a language generated by a context-free grammar (CFG). Context-free languages have many applications in programming languages, in particular, most arithmetic expressions are generated by ...
is also closed, and it has been known at least since Hartmanis that the intersection of two context-free languages is not necessarily a context-free language (and is thus not closed). This can be demonstrated easily using the canonical Type 1 language, L = \ : Let L_1 = \ (Type 2) Let L_2 = \ (Type 2) Let L_3 = L_1 \cap L_2 Given the
strings String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * ''Strings'' (1991 film), a Canadian anim ...
', ', and ', it is clear that the only string that belongs to both L1 and L2 (that is, the only one that produces a
non-empty In mathematics, the empty set is the unique set having no elements; its size or cardinality (count of elements in a set) is zero. Some axiomatic set theories ensure that the empty set exists by including an axiom of empty set, while in other t ...
intersection) is '.


Other considerations

In most formalisms that use syntactic predicates, the syntax of the predicate is
noncommutative In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name o ...
, which is to say that the operation of predication is ordered. For instance, using the above example, consider the following pseudo-grammar, where ''X ::= Y PRED Z'' is understood to mean: "''Y'' produces ''X''
if and only if In logic and related fields such as mathematics and philosophy, "if and only if" (shortened as "iff") is a biconditional logical connective between statements, where either both statements are true or both are false. The connective is bicondi ...
''Y'' also satisfies predicate ''Z''": S ::= a X X ::= Y PRED Z Y ::= a+ BNCN Z ::= ANBN c+ BNCN ::= b NCNc ANBN ::= a NBNb Given the string ', in the case where ''Y'' must be satisfied ''first'' (and assuming a greedy implementation), S will generate ''aX'' and ''X'' in turn will generate ', thereby generating '. In the case where ''Z'' must be satisfied first, ANBN will fail to generate ', and thus ' is not generated by the grammar. Moreover, if either ''Y'' or ''Z'' (or both) specify any action to be taken upon reduction (as would be the case in many parsers), the order that these productions match determines the order in which those side-effects occur. Formalisms that vary over time (such as
adaptive grammar An adaptive grammar is a formal grammar that explicitly provides mechanisms within the formalism to allow its own production rules to be manipulated. Overview John N. Shutt defines adaptive grammar as a grammatical formalism that allows rule set ...
s) may rely on these
side effects In medicine, a side effect is an effect, whether therapeutic or adverse, that is secondary to the one intended; although the term is predominantly employed to describe adverse effects, it can also apply to beneficial, but unintended, consequence ...
.


Examples of use


ANTLR

Parr & Quong give this example of a syntactic predicate: stat: (declaration)? declaration , expression ; which is intended to satisfy the following informally stated constraints of
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
: # If it looks like a declaration, it is; otherwise # if it looks like an expression, it is; otherwise # it is a syntax error. In the first production of rule stat, the syntactic predicate (declaration)? indicates that declaration is the syntactic context that must be present for the rest of that production to succeed. We can interpret the use of (declaration)? as "I am not sure if declaration will match; let me try it out and, if it does not match, I shall try the next alternative." Thus, when encountering a valid declaration, the rule declaration will be recognized twice—once as syntactic predicate and once during the actual parse to execute semantic actions. Of note in the above example is the fact that any code triggered by the acceptance of the ''declaration'' production will only occur if the predicate is satisfied.


Canonical examples

The language L = \ can be represented in various grammars and formalisms as follows:


= Parsing Expression Grammars

= S ← &(A !b) a+ B !c A ← a A? b B ← b B? c


= §-Calculus

= Using a ''bound'' predicate: S → B A → X 'c+' X → 'a' 'b' B → 'a+' Y Y → 'b' 'c' Using two ''free'' predicates: A → <'a+'>''a'' <'b+'>''b'' Ψ(''a'' ''b'')X <'c+'>''c'' Ψ(''b'' ''c'')Y X → 'a' 'b' Y → 'b' 'c'


= Conjunctive Grammars

= (Note: the following example actually generates L = \, but is included here because it is the example given by the inventor of conjunctive grammars.): S → AB&DC A → aA , ε B → bBc , ε C → cC , ε D → aDb , ε


= Perl 6 rules

= rule S rule A rule B


Parsers/formalisms using some form of syntactic predicate

Although by no means an exhaustive list, the following
parsers Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from L ...
and
grammar In linguistics, the grammar of a natural language is its set of structure, structural constraints on speakers' or writers' composition of clause (linguistics), clauses, phrases, and words. The term can also refer to the study of such constraint ...
formalisms employ syntactic predicates: ;
ANTLR In computer-based language recognition, ANTLR (pronounced ''antler''), or ANother Tool for Language Recognition, is a parser generator that uses LL(*) for parsing. ANTLR is the successor to the Purdue Compiler Construction Tool Set (PCCTS), firs ...
(Parr & Quong) :As originally implemented, syntactic predicates sit on the leftmost edge of a production such that the
production Production may refer to: Economics and business * Production (economics) * Production, the act of manufacturing goods * Production, in the outline of industrial organization, the act of making products (goods and services) * Production as a stati ...
to the right of the predicate is attempted if and only if the syntactic predicate first accepts the next portion of the input stream. Although ordered, the predicates are checked first, with parsing of a clause continuing if and only if the predicate is satisfied, and semantic actions only occurring in non-predicates. ; Augmented Pattern Matcher (Balmas) :Balmas refers to syntactic predicates as "multi-step matching" in her paper on APM. As an APM parser parses, it can bind substrings to a variable, and later check this variable against other rules, continuing to parse if and only if that substring is acceptable to further rules. ;
Parsing expression grammar In computer science, a parsing expression grammar (PEG) is a type of analytic formal grammar, i.e. it describes a formal language in terms of a set of rules for recognizing strings in the language. The formalism was introduced by Bryan Ford in 200 ...
s (Ford) :Ford's PEGs have syntactic predicates expressed as the ''and-predicate'' and the ''not-predicate''. ; §-Calculus (Jackson) :In the §-Calculus, syntactic predicates are originally called simply ''predicates'', but are later divided into ''bound'' and ''free'' forms, each with different input properties. ;
Raku rules Raku may refer to: * Lake Raku, an artificial lake in Tallinn, Estonia * Raku ware, a type of pottery used in the Japanese tea ceremony * Raku, Nepal, a village in the Karnali Zone * ''RAkU'' (ballet), a ballet by Yuri Possokhov * Raku (programm ...
: Raku introduces a generalized tool for describing a grammar called ''rules'', which are an extension of
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
5's regular expression syntax. Predicates are introduced via a lookahead mechanism called ''before'', either with "" or "" (that is: "''not'' before"). Perl 5 also has such lookahead, but it can only encapsulate Perl 5's more limited regexp features. ; ProGrammar (NorKen Technologies) :ProGrammar's GDL (Grammar Definition Language) makes use of syntactic predicates in a form called ''parse constraints''. ATTENTION NEEDED: This link is no longer valid! ; Conjunctive and Boolean Grammars (Okhotin) :Conjunctive grammars, first introduced by Okhotin, introduce the explicit notion of
conjunction Conjunction may refer to: * Conjunction (grammar), a part of speech * Logical conjunction, a mathematical operator ** Conjunction introduction, a rule of inference of propositional logic * Conjunction (astronomy), in which two astronomical bodies ...
-as-predication. Later treatment of conjunctive and boolean grammars is the most thorough treatment of this formalism to date.


References


External links

*
Alexander Okhotin's Conjunctive Grammars Page

Alexander Okhotin's Boolean Grammars Page

The Packrat Parsing and Parsing Expression Grammars Page
{{DEFAULTSORT:Syntactic Predicate Parsing Formal languages