In
computer programming
Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, a programming language specification (or standard or definition) is a
documentation
Documentation is any communicable material that is used to describe, explain or instruct regarding some attributes of an object, system or procedure, such as its parts, assembly, installation, maintenance, and use. As a form of knowledge managem ...
artifact that defines a
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
so that
user
Ancient Egyptian roles
* User (ancient Egyptian official), an ancient Egyptian nomarch (governor) of the Eighth Dynasty
* Useramen, an ancient Egyptian vizier also called "User"
Other uses
* User (computing), a person (or software) using an ...
s and
implementors can agree on what programs in that language mean. Specifications are typically detailed and formal, and primarily used by implementors, with users referring to them in case of ambiguity; the
C++ specification is frequently cited by users, for instance, due to the complexity. Related documentation includes a
programming language reference, which is intended expressly for users, and a programming language rationale, which explains why the specification is written as it is; these are typically more informal than a specification.
Standardization
Not all major programming languages have specifications, and languages can exist and be popular for decades without a specification. A language may have one or more implementations, whose behavior acts as a ''de facto'' standard, without this behavior being documented in a specification.
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
(through
Perl 5) is a notable example of a language without a specification, while PHP was only specified in 2014, after being in use for 20 years. A language may be implemented and then specified, or specified and then implemented, or these may develop together, which is usual practice today. This is because implementations and specifications provide checks on each other: writing a specification requires precisely stating the behavior of an implementation, and implementation checks that a specification is possible, practical and consistent. Writing a specification before an implementation has largely been avoided since
ALGOL 68
ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language member of the ALGOL family that was conceived as a successor to the ALGOL 60 language, designed with the goal of a much wider scope of application and ...
(1968), due to unexpected difficulties in implementation when implementation is deferred. However, languages are still occasionally implemented and gain popularity without a formal specification: an implementation is essential for use, while a specification is desirable but not essential (informally, "code talks").
Forms
A programming language specification can take several forms, including:
* An explicit definition of the
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
and
semantics
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
of the language. While syntax is commonly specified using a
formal grammar
A formal grammar is a set of Terminal and nonterminal symbols, symbols and the Production (computer science), production rules for rewriting some of them into every possible string of a formal language over an Alphabet (formal languages), alphabe ...
, semantic definitions may be written in
natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
(e.g., the approach taken for the
C language
C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities o ...
), or a
formal semantics (e.g., the
Standard ML
Standard ML (SML) is a General-purpose programming language, general-purpose, High-level programming language, high-level, Modular programming, modular, Functional programming, functional programming language with compile-time type checking and t ...
and
Scheme specifications). A notable example is the C language, which gained popularity without a formal specification, instead being described as part of a book, ''
The C Programming Language
''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the C programming langu ...
'' (1978), and only much later being formally standardized in
ANSI C
ANSI C, ISO C, and Standard C are successive standards for the C programming language published by the American National Standards Institute (ANSI) and ISO/IEC JTC 1/SC 22/WG 14 of the International Organization for Standardization (ISO) and the ...
(1989).
* A description of the behavior of a
compiler
In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
(sometimes called "translator") for the language (e.g., the
C++ language and
Fortran). The syntax and semantics of the language has to be inferred from this description, which may be written in natural or a formal language.
* A
''model'' implementation, sometimes written in the language being specified (e.g.,
Prolog
Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics.
Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
). The syntax and semantics of the language are explicit in the behavior of the model implementation.
Syntax
The
syntax
In linguistics, syntax ( ) is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure (constituenc ...
of a programming language represents the definition of acceptable words, i.e., formal parameters and rules upon which to decide whether a given code is valid in respect to the language. On that note, the language syntax usually consists of a combination of the following three construction components:
* A specific character set (non-empty, finite set of symbols)
*
Regular expressions
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of character (computing), characters that specifies a pattern matching, match pattern in string (computer science), text. Usually ...
describing its
lexemes (for alphabet-wise tokenisation)
* A
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules
can be applied to a nonterminal symbol regardless of its context.
In particular, in a context-free grammar, each production rule is of the fo ...
which describes how the lexemes may be combined in order to form a correct program
Syntax specification generally supposes a natural language description in order to provide modest comprehensibility. However, the formal representation of the above outlined components is usually part of the section as it favors the implementation and approval of the language and its concepts.
Semantics
Formulating a rigorous semantics of a large, complex, practical programming language is a daunting task even for experienced specialists, and the resulting specification can be difficult for anyone but experts to understand. The following are some of the ways in which programming language semantics can be described; all languages use at least one of these description methods, and some languages combine more than one.
*
Natural language
A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...
: Description by human natural language.
*
Formal semantics: Description by
mathematics
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
.
*
Reference implementation
In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation ...
s: Description by
computer program
A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
.
*
Test suite
In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviors. A test suite often conta ...
s: Description by examples of programs and their expected behaviors. While few language specifications start off in this form, the evolution of some language specifications has been influenced by the semantics of a test suite (e.g., in the past the specification of
Ada has been modified to match the behavior of the
Ada Conformity Assessment Test Suite
The Ada Conformity Assessment Test Suite (ACATS) is the test suite used for Ada processor conformity testing. A prior test suite was known as the Ada Compiler Validation Capability (ACVC).
ACVC era
The Ada Compiler Validation Capability test ...
).
Natural language
Most widely used languages are specified using natural language descriptions of their semantics. This description usually takes the form of a ''reference manual'' for the language. These manuals can run to hundreds of pages; e.g., the print version of ''The Java Language Specification, 3rd Ed.'' is 596 pages.
The imprecision of natural language as a vehicle for describing programming language semantics can lead to problems with interpreting the specification. For example, the semantics of
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
threads were specified in English, and it was later discovered that the specification did not provide adequate guidance for implementors.
[William Pugh. The Java Memory Model is Fatally Flawed. ''Concurrency: Practice and Experience'' 12(6):445-455, August 2000]
Formal semantics
Formal semantics are grounded in mathematics. As a result, they can be more precise and less ambiguous than semantics given in natural language. However, supplemental natural language descriptions of the semantics are often included to aid understanding of the formal definitions. For example, The
ISO
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
Standard for
Modula-2
Modula-2 is a structured, procedural programming language developed between 1977 and 1985/8 by Niklaus Wirth at ETH Zurich. It was created as the language for the operating system and application software of the Lilith personal workstation. It w ...
contains both a formal and a natural language definition on opposing pages.
Programming languages whose semantics are described formally can reap many benefits. For example:
* Formal semantics enable mathematical proofs of program correctness;
* Formal semantics facilitate the design of
type system
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a ''type'' (for example, integer, floating point, string) to every '' term'' (a word, phrase, or other set of symbols). Usu ...
s, and proofs about the soundness of those type systems;
* Formal semantics can establish unambiguous and uniform standards for implementations of a language.
Automatic tool support can help to realize some of these benefits. For example, an
automated theorem prover
Automated theorem proving (also known as ATP or automated deduction) is a subfield of automated reasoning and mathematical logic dealing with proving mathematical theorems by computer programs. Automated reasoning over mathematical proof was a ma ...
or theorem checker can increase a programmer's (or language designer's) confidence in the correctness of proofs about programs (or the language itself). The power and scalability of these tools varies widely: full
formal verification
In the context of hardware and software systems, formal verification is the act of proving or disproving the correctness of a system with respect to a certain formal specification or property, using formal methods of mathematics.
Formal ver ...
is computationally intensive, rarely scales beyond programs containing a few hundred lines and may require considerable manual assistance from a programmer; more lightweight tools such as
model checker
In computer science, model checking or property checking is a method for checking whether a finite-state model of a system meets a given specification (also known as correctness). This is typically associated with hardware or software system ...
s require fewer resources and have been used on programs containing tens of thousands of lines; many compilers apply static
type checks to any program they compile.
Reference implementation
A
reference implementation
In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation ...
is a single implementation of a programming language that is designated as authoritative. The behavior of this implementation is held to define the proper behavior of a program written in the language. This approach has several attractive properties. First, it is precise, and requires no human interpretation: disputes as to the meaning of a program can be settled simply by executing the program on the reference implementation (provided that the implementation behaves deterministically for that program).
On the other hand, defining language semantics through a reference implementation also has several potential drawbacks. Chief among them is that it conflates limitations of the reference implementation with properties of the language. For example, if the reference implementation has a bug, then that bug must be considered to be an authoritative behavior. Another drawback is that programs written in this language may rely on quirks in the reference implementation, hindering portability across different implementations.
Nevertheless, several languages have successfully used the reference implementation approach. For example, the
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
interpreter is considered to define the authoritative behavior of Perl programs. In the case of Perl, the
open-source model
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
of software distribution has contributed to the fact that nobody has ever produced another implementation of the language, so the issues involved in using a reference implementation to define the language semantics are moot.
Test suite
Defining the semantics of a programming language in terms of a
test suite
In software development, a test suite, less commonly known as a validation suite, is a collection of test cases that are intended to be used to test a software program to show that it has some specified set of behaviors. A test suite often conta ...
involves writing a number of example programs in the language, and then describing how those programs ought to behave—perhaps by writing down their correct outputs. The programs, plus their outputs, are called the "test suite" of the language. Any correct language implementation must then produce exactly the correct outputs on the test suite programs.
The chief advantage of this approach to semantic description is that it is easy to determine whether a language implementation passes a test suite. The user can simply execute all the programs in the test suite, and compare the outputs to the desired outputs. However, when used by itself, the test suite approach has major drawbacks as well. For example, users want to run their own programs, which are not part of the test suite; indeed, a language implementation that could ''only'' run the programs in its test suite would be largely useless. But a test suite does not, by itself, describe how the language implementation should behave on any program not in the test suite; determining that behavior requires some extrapolation on the implementor's part, and different implementors may disagree. In addition, a test suite cannot easily test behavior that is intended or allowed to be
nondeterministic.
Therefore, in common practice, test suites are used only in combination with one of the other language specification techniques, such as a natural language description or a reference implementation.
External links
Language specifications
A few examples of official or draft language specifications:
*Specifications written primarily in formal mathematics:
The Definition of Standard ML, revised edition– a formal definition in an
operational semantics
Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its exec ...
style
Scheme R5RS– a formal definition in a
denotational semantics
In computer science, denotational semantics (initially known as mathematical semantics or Scott–Strachey semantics) is an approach of formalizing the meanings of programming languages by constructing mathematical objects (called ''denotations'' ...
style
*Specifications written primarily in natural language:
Algol 60 reportAda 95 reference manualJava language specification*Specifications via test suite:
Ruby's de facto community-driven specification
Notes
{{Reflist
Specification
A specification often refers to a set of documented requirements to be satisfied by a material, design, product, or service. A specification is often a type of technical standard.
There are different types of technical or engineering specificati ...