In
computer science
Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
, higher-order abstract syntax (abbreviated HOAS) is a technique for the representation of
abstract syntax tree
In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurring ...
s for languages with variable
binders
Ring binders (loose leaf binders, looseleaf binders, or sometimes called files in Britain) are large folders that contain file folders or hole punched papers. These binders come in various sizes and can accommodate an array of paper sizes. The ...
.
Relation to first-order abstract syntax
An abstract syntax is ''abstract'' because it is represented by
mathematical object
A mathematical object is an abstract concept arising in mathematics.
In the usual language of mathematics, an ''object'' is anything that has been (or could be) formally defined, and with which one may do deductive reasoning and mathematical p ...
s that have certain structure by their very nature. For instance, in ''
first-order abstract syntax'' (''FOAS'') trees, as commonly used in
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s, the tree structure implies the subexpression relation, meaning that no parentheses are required to disambiguate programs (as they are, in the
concrete syntax
A parse tree or parsing tree or derivation tree or concrete syntax tree is an ordered, rooted tree that represents the syntactic structure of a string according to some context-free grammar. The term ''parse tree'' itself is used primarily in com ...
). HOAS exposes additional structure: the relationship between variables and their binding sites. In FOAS representations, a variable is typically represented with an identifier, with the relation between binding site and use being indicated by using the ''same'' identifier. With HOAS, there is no name for the variable; each use of the variable refers directly to the binding site.
There are a number of reasons why this technique is useful. First, it makes the binding structure of a program explicit: just as there is no need to explain operator precedence in a FOAS representation, there is no need to have the rules of binding and scope at hand to interpret a HOAS representation. Second, programs that are
alpha-equivalent (differing only in the names of bound variables) have identical representations in HOAS, which can make equivalence checking more efficient.
Implementation
One mathematical object that could be used to implement HOAS is a
graph
Graph may refer to:
Mathematics
*Graph (discrete mathematics), a structure made of vertices and edges
**Graph theory, the study of such graphs and their properties
*Graph (topology), a topological space resembling a graph in the sense of discre ...
where variables are associated with their binding sites via
edges. Another popular way to implement HOAS (in, for example, compilers) is with
de Bruijn indices.
Use in logic programming
The first programming language which directly supported
λ-bindings in syntax was the higher-order logic programming
language
λProlog.
The paper that introduced the term HOAS
used
λProlog code to illustrate it. Unfortunately, when one transfers the
term HOAS from the logic programming to the functional programming
setting, that term implies the identification of bindings in syntax
with functions over expressions. In this latter setting, HOAS has a
different and problematic sense. The term
λ-tree syntax has been introduced to
refer specifically to the style of representation available in the
logic programming setting.
[
]
While different in detail, the treatment of bindings in λProlog is similar
to their treatment in logical frameworks, elaborated in the next section.
Use in logical frameworks
In the domain of
logical framework In logic, a logical framework provides a means to define (or present) a logic as a signature in a higher-order type theory in such a way that provability of a formula in the original logic reduces to a type inhabitation problem in the framework typ ...
s, the term higher-order abstract syntax is usually used to refer to a specific representation that uses the binders of the
meta-language
In logic and linguistics, a metalanguage is a language used to describe another language, often called the ''object language''. Expressions in a metalanguage are often distinguished from those in the object language by the use of italics, quot ...
to encode the binding structure of the object language.
For instance, the logical framework
LF has a λ-construct, which has arrow
(→) type. As an example, consider we wanted to formalize a very primitive language with untyped expressions, a built-in set of variables, and a let construct (
let = in
), which allows to bind variables
var
with definition
exp
in expressions
exp'
.
In
Twelf Twelf is an implementation of the logical framework LF developed by Frank Pfenning and Carsten Schürmann at Carnegie Mellon University. It is used for logic programming and for the formalization of programming language theory.
Introduction
At i ...
syntax, we could do as follows:
Here,
exp
is the type of all expressions and
var
the type of all built-in variables (implemented perhaps as natural numbers, which is not shown). The constant
v
acts as a casting function and witnesses the fact that variables are expressions. Finally, the constant
let
represents let constructs of the form
let = in
: it accepts a variable, an expression (being bound by the variable), and another expression (that the variable is bound within).
The
canonical
The adjective canonical is applied in many contexts to mean "according to the canon" the standard, rule or primary source that is accepted as authoritative for the body of knowledge or literature in that context. In mathematics, "canonical example ...
HOAS representation of the same object language would be:
In this representation, object level variables do not appear explicitly. The constant
let
takes an expression (that is being bound) and a meta-level function
exp
→
exp
(the body of the let). This function is the ''higher-order'' part: an expression with a free variable is
represented as an expression with ''holes'' that are filled in by the meta-level function when applied. As a concrete example, we would construct the object level expression
(assuming the natural constructors for numbers and addition) using the HOAS signature above as
where
e
is Twelf's syntax for the function
.
This specific representation has advantages beyond the ones above: for one, by reusing the meta-level notion of binding, the encoding enjoys properties such as type-preserving ''substitution'' without the need to define/prove them. In this way using HOAS can drastically reduce the amount of
boilerplate code having to do with binding in an encoding.
Higher-order abstract syntax is generally only applicable when object language variables can be understood as variables in the mathematical sense (that is, as stand-ins for arbitrary members of some domain). This is often, but not always, the case: for instance, there are no advantages to be gained from a HOAS encoding of
dynamic scope as it appears in some dialects of
Lisp
A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech.
Types
* A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lisping ...
because dynamically scoped variables do not act like mathematical variables.
See also
*
Generalized algebraic data type In functional programming, a generalized algebraic data type (GADT, also first-class phantom type, guarded recursive datatype, or equality-qualified type) is a generalization of parametric algebraic data types.
Overview
In a GADT, the product co ...
*
Parametric higher-order abstract syntax (PHOAS)
References
Further reading
*
*
*
*
{{DEFAULTSORT:Higher-Order Abstract Syntax
Type theory
Logic programming
Dependently typed programming
Programming language theory