TheInfoList

In
computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , , and . Computer science ...
, the syntax of a
computer languageComputer language is a formal language In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from an alphabet (computer science), alphabet ...
is the set of rules that defines the combinations of symbols that are considered to be correctly structured
statement Statement or statements may refer to: Common uses *Statement (computer science), the smallest standalone element of an imperative programming language *Statement (logic), declarative sentence that is either true or false *Statement, a Sentence_(lin ...
s or expressions in that language. This applies both to
programming language A programming language is a formal language In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from an alphabet (computer science) ...

s, where the document represents
source code In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and ...

, and to
markup language #REDIRECT Markup language In computer text processing, a markup language is a system for annotation, annotating a document in a way that is Syntax (logic), syntactically distinguishable from the text, meaning when the document is processed for di ...
s, where the document represents data. The syntax of a language defines its surface form.
Text-based In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and sof ...
computer languages are based on sequences of
character Character(s) may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to Theophrastus M ...
s, while
visual programming languages In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and software ...
are based on the spatial layout and connections between symbols (which may be textual or graphical). Documents that are syntactically invalid are said to have a
syntax error In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algorit ...

. When designing the syntax of a language, a designer might start by writing down examples of both legal and illegal
string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects Arts, entertainment, and media Films * Strings (1991 film), ''Strings'' (1991 fil ...
s, before trying to figure out the general rules from these examples. Syntax therefore refers to the ''form'' of the code, and is contrasted with
semantics Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference Reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another ...
– the ''meaning''. In processing computer languages, semantic processing generally comes after syntactic processing; however, in some cases, semantic processing is necessary for complete syntactic analysis, and these are done together or concurrently. In a
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily ...

, the syntactic analysis comprises the frontend, while the semantic analysis comprises the backend (and middle end, if this phase is distinguished).

# Levels of syntax

Computer language syntax is generally distinguished into three levels: * Words – the lexical level, determining how characters form
token Token may refer to: Arts, entertainment, and media * Token, a game piece or counter, used in some games * The Tokens The Tokens were an American male doo-wop-style vocal group and record production company group from Brooklyn, New York B ...
s; * Phrases – the grammar level, narrowly speaking, determining how tokens form phrases; * Context – determining what objects or variables names refer to, if types are valid, etc. Distinguishing in this way yields modularity, allowing each level to be described and processed separately and often independently. First, a lexer turns the linear sequence of characters into a linear sequence of tokens; this is known as "
lexical analysis In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , , ...
" or "lexing". Second, the parser turns the linear sequence of tokens into a hierarchical syntax tree; this is known as "
parsing Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string String or strings may refer to: *String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other obje ...

" narrowly speaking. Thirdly, the contextual analysis resolves names and checks types. This modularity is sometimes possible, but in many real-world languages an earlier step depends on a later step – for example,
the lexer hackIn computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task. Programming involves tasks such as: analysis, generatin ...
in C is because tokenization depends on context. Even in these cases, syntactical analysis is often seen as approximating this ideal model. The parsing stage itself can be divided into two parts: the
parse tree A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree (data structure), tree that represents the syntax, syntactic structure of a string (computer science), string according to some context-free grammar ...
, or "concrete syntax tree", which is determined by the grammar, but is generally far too detailed for practical use, and the
abstract syntax tree In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algor ...
(AST), which simplifies this into a usable form. The AST and contextual analysis steps can be considered a form of semantic analysis, as they are adding meaning and interpretation to the syntax, or alternatively as informal, manual implementations of syntactical rules that would be difficult or awkward to describe or implement formally. The levels generally correspond to levels in the
Chomsky hierarchy In formal language, formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of grammars ...

. Words are in a
regular language In theoretical computer science An artistic representation of a Turing machine. Turing machines are used to model general computing devices. Theoretical computer science (TCS) is a subset of general computer science that focuses on mathematica ...
, specified in the
lexical grammarIn computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algorith ...
, which is a Type-3 grammar, generally given as
regular expression A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of Character (computing), characters that specifies a ''search pattern matching, pattern''. Usually such patterns are used by string-se ...
s. Phrases are in a
context-free languageIn formal language theory In mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from an alphabet (computer science), alphabet and are well-formedn ...
(CFL), generally a
deterministic context-free languageIn formal language theory, deterministic context-free languages (DCFL) are a proper subset of context-free languages. They are the context-free languages that can be accepted by a deterministic pushdown automaton. DCFLs are always unambiguous, me ...
(DCFL), specified in a
phrase structure grammar The term phrase structure grammar was originally introduced by Noam Chomsky Avram Noam Chomsky (born December 7, 1928) is an American linguist Linguistics is the scientific study of language A language is a structured system ...
, which is a Type-2 grammar, generally given as production rules in
Backus–Naur form In computer science, Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the Syntax (programming languages), syntax of Formal language#Programming languages, languages used in c ...
(BNF). Phrase grammars are often specified in much more constrained grammars than full
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose Production (computer science), production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of Terminal and nonterm ...
s, in order to make them easier to parse; while the
LR parser In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of compu ...
can parse any DCFL in linear time, the simple
LALR parser In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , , a ...
and even simpler
LL parser In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of compu ...
are more efficient, but can only parse grammars whose production rules are constrained. In principle, contextual structure can be described by a
context-sensitive grammar A context-sensitive grammar (CSG) is a formal grammar In formal language theory In mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from an alp ...
, and automatically analyzed by means such as
attribute grammarAn attribute grammar is a formal way to define attributes for the productions of a formal grammar In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a l ...
s, though, in general, this step is done manually, via name resolution rules and
type checking In programming language A programming language is a formal language In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbol (formal), letters are taken from a ...
, and implemented via a
symbol table In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , , ...
which stores names and types for each scope. Tools have been written that automatically generate a lexer from a lexical specification written in regular expressions and a parser from the phrase grammar written in BNF: this allows one to use
declarative programming In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses the logic of a computation without describing its control flow. Many languages that app ...
, rather than need to have procedural or functional programming. A notable example is the lex-
yacc Yacc (Yet Another Compiler-Compiler) is a computer program A computer program is a collection of instructions that can be executed by a computer to perform a specific task. A computer program is usually written by a computer programmer in a ...
pair. These automatically produce a ''concrete'' syntax tree; the parser writer must then manually write code describing how this is converted to an ''abstract'' syntax tree. Contextual analysis is also generally implemented manually. Despite the existence of these automatic tools, parsing is often implemented manually, for various reasons – perhaps the phrase structure is not context-free, or an alternative implementation improves performance or error-reporting, or allows the grammar to be changed more easily. Parsers are often written in functional languages, such as Haskell, or in scripting languages, such as
Python PYTHON was a Cold War contingency plan of the Government of the United Kingdom, British Government for the continuity of government in the event of Nuclear warfare, nuclear war. Background Following the report of the Strath Committee in 1955, the ...
or
Perl Perl is a family of two high-level High-level and low-level, as technical terms, are used to classify, describe and point to specific Objective (goal), goals of a systematic operation; and are applied in a wide range of contexts, such as, for ...
, or in C or
C++ C++ () is a general-purpose programming language In computer software, a general-purpose programming language is a programming language dedicated to a general-purpose, designed to be used for writing software in a wide variety of application ...

.

## Examples of errors

As an example, (add 1 1) is a syntactically valid Lisp program (assuming the 'add' function exists, else name resolution fails), adding 1 and 1. However, the following are invalid: (_ 1 1) lexical error: '_' is not valid (add 1 1 parsing error: missing closing ')' Note that the lexer is unable to identify the first error – all it knows is that, after producing the token LEFT_PAREN, '(' the remainder of the program is invalid, since no word rule begins with '_'. The second error is detected at the parsing stage: The parser has identified the "list" production rule due to the '(' token (as the only match), and thus can give an error message; in general it may be
ambiguous Ambiguity is a type of meaning in which a phrase, statement or resolution is not explicitly defined, making several interpretations plausible. A common aspect of ambiguity is uncertainty Uncertainty refers to Epistemology, epistemic sit ...
. Type errors and undeclared variable errors are sometimes considered to be syntax errors when they are detected at compile-time (which is usually the case when compiling strongly-typed languages), though it is common to classify these kinds of error as
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference Reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another ...
/ref> As an example, the Python code 'a' + 1 contains a type error because it adds a string literal to an integer literal. Type errors of this kind can be detected at compile-time: They can be detected during parsing (phrase analysis) if the compiler uses separate rules that allow "integerLiteral + integerLiteral" but not "stringLiteral + integerLiteral", though it is more likely that the compiler will use a parsing rule that allows all expressions of the form "LiteralOrIdentifier + LiteralOrIdentifier" and then the error will be detected during contextual analysis (when type checking occurs). In some cases this validation is not done by the compiler, and these errors are only detected at runtime. In a dynamically typed language, where type can only be determined at runtime, many type errors can only be detected at runtime. For example, the Python code a + b is syntactically valid at the phrase level, but the correctness of the types of a and b can only be determined at runtime, as variables do not have types in Python, only values do. Whereas there is disagreement about whether a type error detected by the compiler should be called a syntax error (rather than a static semantic error), type errors which can only be detected at program execution time are always regarded as semantic rather than syntax errors.

# Syntax definition

The syntax of textual programming languages is usually defined using a combination of
regular expression A regular expression (shortened as regex or regexp; also referred to as rational expression) is a sequence of Character (computing), characters that specifies a ''search pattern matching, pattern''. Usually such patterns are used by string-se ...
s (for lexical structure) and
Backus–Naur form In computer science, Backus–Naur form or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the Syntax (programming languages), syntax of Formal language#Programming languages, languages used in c ...
(for
grammatical In linguistics Linguistics is the science, scientific study of language. It encompasses the analysis of every aspect of language, as well as the methods for studying and modeling them. The traditional areas of linguistic analysis include ...
structure) to inductively specify syntactic categories (nonterminals) and ''terminal'' symbols. Syntactic categories are defined by rules called ''productions'', which specify the values that belong to a particular syntactic category. Terminal symbols are the concrete characters or strings of characters (for example keywords such as ''define'', ''if'', ''let'', or ''void'') from which syntactically valid programs are constructed. A language can have different equivalent grammars, such as equivalent regular expressions (at the lexical levels), or different phrase rules which generate the same language. Using a broader category of grammars, such as LR grammars, can allow shorter or simpler grammars compared with more restricted categories, such as LL grammar, which may require longer grammars with more rules. Different but equivalent phrase grammars yield different parse trees, though the underlying language (set of valid documents) is the same.

## Example: Lisp S-expressions

Below is a simple grammar, defined using the notation of regular expressions and
Extended Backus–Naur form In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of Algor ...
. It describes the syntax of
S-expression In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a particular task. Programming involves tasks such as analysis, gener ...
s, a data syntax of the programming language
Lisp Lisp (historically LISP) is a family of programming language A programming language is a formal language In logic, mathematics, computer science, and linguistics, a formal language consists of string (computer science), words whose symbo ...
, which defines productions for the syntactic categories ''expression'', ''atom'', ''number'', ''symbol'', and ''list'': expression = atom , list atom = number , symbol number = - 0'-'9' symbol = A'-'Z''A'-'Z''0'-'9'].* list = '(', expression*, ')' This grammar specifies the following: * an ''expression'' is either an ''atom'' or a ''list''; * an ''atom'' is either a ''number'' or a ''symbol''; * a ''number'' is an unbroken sequence of one or more decimal digits, optionally preceded by a plus or minus sign; * a ''symbol'' is a letter followed by zero or more of any characters (excluding whitespace); and * a ''list'' is a matched pair of parentheses, with zero or more ''expressions'' inside it. Here the decimal digits, upper- and lower-case characters, and parentheses are terminal symbols. The following are examples of well-formed token sequences in this grammar: '12345', '()', '(A B C232 (1))'

## Complex grammars

The grammar needed to specify a programming language can be classified by its position in the
Chomsky hierarchy In formal language, formal language theory, computer science and linguistics, the Chomsky hierarchy (also referred to as the Chomsky–Schützenberger hierarchy) is a containment hierarchy of classes of formal grammars. This hierarchy of grammars ...

. The phrase grammar of most programming languages can be specified using a Type-2 grammar, i.e., they are
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose Production (computer science), production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of Terminal and nonterm ...
s, though the overall syntax is context-sensitive (due to variable declarations and nested scopes), hence Type-1. However, there are exceptions, and for some languages the phrase grammar is Type-0 (Turing-complete). In some languages like Perl and Lisp the specification (or implementation) of the language allows constructs that execute during the parsing phase. Furthermore, these languages have constructs that allow the programmer to alter the behavior of the parser. This combination effectively blurs the distinction between parsing and execution, and makes syntax analysis an
undecidable problem In computability theory Computability theory, also known as recursion theory, is a branch of mathematical logic Mathematical logic is the study of formal logic within mathematics. Major subareas include model theory, proof theory, set the ...
in these languages, meaning that the parsing phase may not finish. For example, in Perl it is possible to execute code during parsing using a BEGIN statement, and Perl function prototypes may alter the syntactic interpretation, and possibly even the syntactic validity of the remaining code. Colloquially this is referred to as "only Perl can parse Perl" (because code must be executed during parsing, and can modify the grammar), or more strongly "even Perl cannot parse Perl" (because it is undecidable). Similarly, Lisp macros introduced by the defmacro syntax also execute during parsing, meaning that a Lisp compiler must have an entire Lisp run-time system present. In contrast, C macros are merely string replacements, and do not require code execution.

# Syntax versus semantics

The syntax of a language describes the form of a valid program, but does not provide any information about the meaning of the program or the results of executing that program. The meaning given to a combination of symbols is handled by semantics (either
formal Formal, formality, informal or informality imply the complying with, or not complying with, some set theory, set of requirements (substantial form, forms, in Ancient Greek). They may refer to: Dress code and events * Formal wear, attire for forma ...
or hard-coded in a
reference implementation In the software development process In software engineering, a software development process is the process of dividing software development work into smaller, parallel or sequential steps or subprocesses to improve Software design, design, So ...
). Not all syntactically correct programs are semantically correct. Many syntactically correct programs are nonetheless ill-formed, per the language's rules; and may (depending on the language specification and the soundness of the implementation) result in an error on translation or execution. In some cases, such programs may exhibit
undefined behavior In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a particular task. Programming involves tasks such as analysis, generati ...
. Even when a program is well-defined within a language, it may still have a meaning that is not intended by the person who wrote it. Using
natural language In neuropsychology Neuropsychology is a branch of psychology. It is concerned with how a person's cognition and behavior are related to the brain and the rest of the nervous system. Professionals in this branch of psychology often focus on ...
as an example, it may not be possible to assign a meaning to a grammatically correct sentence or the sentence may be false: * "
Colorless green ideas sleep furiously ''Colorless green ideas sleep furiously'' is a sentence composed by Noam Chomsky in his 1957 book ''Syntactic Structures'' as an example of a sentence (linguistics), sentence that is grammatically well-formed, but semantically Nonsense, nonsensic ...
." is grammatically well formed but has no generally accepted meaning. * "John is a married bachelor." is grammatically well formed but expresses a meaning that cannot be true. The following C language fragment is syntactically correct, but performs an operation that is not semantically defined (because is a
null pointer In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and softwar ...
, the operations and have no meaning): complex *p = NULL; complex abs_p = sqrt (p->real * p->real + p->im * p->im); As a simpler example, int x; printf("%d", x); is syntactically valid, but not semantically defined, as it uses an
uninitialized variableIn computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and software. ...
. Even though compilers for some programming languages (e.g., Java and C#) would detect uninitialized variable errors of this kind, they should be regarded as
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of reference Reference is a relationship between objects in which one object designates, or acts as a means by which to connect to or link to, another ...
errors rather than syntax errors.Issue of syntax or semantics?
/ref>

*
Naming convention (programming) In computer programming Computer programming is the process of designing and building an executable computer program to accomplish a specific computing result or to perform a specific task. Programming involves tasks such as: analysis, generati ...
*
Comparison of programming languages (syntax) This comparison of programming language A programming language is a formal language comprising a Instruction set architecture, set of instructions that produce various kinds of Input/output, output. Programming languages are used in computer p ...
To quickly compare syntax of various programming languages, take a look at the list of "Hello, World!" program examples: * Prolog syntax and semantics * Perl syntax * PHP syntax and semantics *
C syntax The syntax of the C programming language is the set of rules governing writing of software in the C language C (, as in the letter ''c'') is a general-purpose, procedural computer programming language A programming language is a formal ...
* C++ syntax *
Java syntax The syntax of Java (programming language), Java refers to syntax, the set of rules defining how a Java program is written and interpreted. The syntax is mostly derived from C (programming language), C and C++. Unlike in C++, in Java there are n ...
*
JavaScript syntax The Syntax (programming languages), syntax of JavaScript is the set of rules that define a correctly structured JavaScript program. The examples below make use of the log function of the console object present in most browsers for Standard stream ...
*
Python syntax and semantics The syntax In linguistics, syntax () is the set of rules, principles, and processes that govern the structure of Sentence (linguistics), sentences (sentence structure) in a given Natural language, language, usually including word order. The t ...
* Lua syntax * Haskell syntax

{{Reflist