HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, an S-expression (or symbolic expression, abbreviated as sexpr or sexp) is an expression in a like-named notation for nested
list A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby unio ...
(
tree In botany, a tree is a perennial plant with an elongated stem, or trunk, usually supporting branches and leaves. In some usages, the definition of a tree may be narrower, including only woody plants with secondary growth, plants that are ...
-structured) data. S-expressions were invented for and popularized by the programming language
Lisp A lisp is a speech impairment in which a person misarticulates sibilants (, , , , , , , ). These misarticulations often result in unclear speech. Types * A frontal lisp occurs when the tongue is placed anterior to the target. Interdental lispin ...
, which uses them for
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
as well as data. In the usual parenthesized
syntax In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituenc ...
of Lisp, an S-expression is classically definedJohn McCarthy (1960/2006)
Recursive functions of symbolic expressions
. Originally published in
Communications of the ACM ''Communications of the ACM'' is the monthly journal of the Association for Computing Machinery (ACM). It was established in 1958, with Saul Rosen as its first managing editor. It is sent to all ACM members. Articles are intended for readers wi ...
.
as # an atom of the form ''x'', or # an expression of the form (''x'' . ''y'') where ''x'' and ''y'' are S-expressions. This definition reflects LISP's representation of a list as a series of "cells", each one an
ordered pair In mathematics, an ordered pair (''a'', ''b'') is a pair of objects. The order in which the objects appear in the pair is significant: the ordered pair (''a'', ''b'') is different from the ordered pair (''b'', ''a'') unless ''a'' = ''b''. (In con ...
. In plain lists, ''y'' points to the next cell (if any), thus forming a
list A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby unio ...
. The recursive clause of the definition means that both this representation and the S-expression notation can represent any
binary tree In computer science, a binary tree is a k-ary k = 2 tree data structure in which each node has at most two children, which are referred to as the ' and the '. A recursive definition using just set theory notions is that a (non-empty) binary t ...
. However, the representation can in principle allow circular references, in which cases the structure is not a tree at all, but a
cyclic graph In mathematics, a cyclic graph may mean a graph that contains a cycle, or a graph that is a cycle, with varying definitions of cycles. See: *Cycle (graph theory), a cycle in a graph * Forest (graph theory), an undirected graph with no cycles *Biconn ...
, and cannot be represented in classical S-expression notation unless a convention for cross-reference is provided (analogous to SQL
foreign key A foreign key is a set of attributes in a table that refers to the primary key of another table. The foreign key links these two tables. Another way to put it: In the context of relational databases, a foreign key is a set of attributes subject to ...
s,
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
IDREFs, etc.). Modern Lisp dialects such as
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
and
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
provide such syntax via ''datum labels'', with which objects can be marked, which can then recur elsewhere, indicating shared structure rather than object duplication, enabling the reader or printer to detect and thus trigger evaluation or display of cycles without infinitely recursing : #n=(''x'' ''y'' . #n#) The definition of an atom varies per context; in the original definition by John McCarthy, it was assumed that there existed "an infinite set of distinguishable atomic symbols" represented as "strings of capital Latin letters and digits with single embedded blanks" (a subset of character string and numeric
literal Literal may refer to: * Interpretation of legal concepts: ** Strict constructionism ** The plain meaning rule The plain meaning rule, also known as the literal rule, is one of three rules of statutory construction traditionally applied by ...
s). Most modern sexpr notations allow more general quoted strings (for example including punctuation or full
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
), and use an abbreviated notation to represent lists with more than 2 members, so that : (''x'' ''y'' ''z'') stands for : (''x'' . (''y'' . (''z'' . NIL))) NIL is the special end-of-list
object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Object (abstract), an object which does not exist at any particular time or place ** Physical object, an identifiable collection of matter * Goal, an ...
(alternatively written (), which is the only representation in
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
). In the Lisp family of programming languages, S-expressions are used to represent both source code and data. Other uses of S-expressions are in Lisp-derived languages such as DSSSL, and as mark-up in
communication protocol A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchroniza ...
s like
IMAP In computing, the Internet Message Access Protocol (IMAP) is an Internet standard protocol used by email clients to retrieve email messages from a mail server over a TCP/IP connection. IMAP is defined by . IMAP was designed with the goal of per ...
and John McCarthy's CBCL. It's also used as text representation of
WebAssembly WebAssembly (sometimes abbreviated Wasm) defines a portable binary-code format and a corresponding text format for executable programs as well as software interfaces for facilitating interactions between such programs and their host environmen ...
. The details of the syntax and supported
data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s vary in the different languages, but the most common feature among these languages is the use of S-expressions and prefix notation.


Datatypes and syntax

There are many variants of the S-expression format, supporting a variety of different syntaxes for different datatypes. The most widely supported are: * ''Lists and pairs'': (1 () (2 . 3) (4)) * ''Symbols'': with-hyphen ?@!$ a\ symbol\ with\ spaces * ''Strings'': "Hello, world!" * ''Integers'': -9876543210 * ''Floating-point numbers'': -0.0 6.28318 6.022e23 The character # is often used to prefix extensions to the syntax, e.g. #x10 for hexadecimal integers, or #\C for characters.


Use in Lisp

When representing source code in Lisp, the first element of an S-expression is commonly an operator or function name and any remaining elements are treated as arguments. This is called "prefix notation" or "
Polish notation Polish notation (PN), also known as normal Polish notation (NPN), Łukasiewicz notation, Warsaw notation, Polish prefix notation or simply prefix notation, is a mathematical notation in which operators ''precede'' their operands, in contrast ...
". As an example, the Boolean expression written 4

(2 + 2)
in C, is represented as (= 4 (+ 2 2)) in Lisp's s-expr-based prefix notation. As noted above, the precise definition of "atom" varies across LISP-like languages. A quoted string can typically contain anything but a quote, while an unquoted identifier atom can typically contain anything but quotes, whitespace characters, parentheses, brackets, braces, backslashes, and semicolons. In either case, a prohibited character can typically be included by escaping it with a preceding backslash.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
support varies. The recursive case of the s-expr definition is traditionally implemented using cons cells. S-expressions were originally intended only for data to be manipulated by M-expressions, but the first implementation of Lisp was an interpreter of S-expression encodings of M-expressions, and Lisp programmers soon became accustomed to using S-expressions for both code and data. This means that Lisp is
homoiconic In computer programming, homoiconicity (from the Greek words ''homo-'' meaning "the same" and ''icon'' meaning "representation") is a property of some programming languages. A language is homoiconic if a program written in it can be manipulated ...
; that is, the primary representation of programs is also a data structure in a primitive type of the language itself.


Examples of data S-expressions

Nested lists can be written as S-expressions: ((milk juice) (honey marmalade)) is a two-element S-expression whose elements are also two-element S-expressions. The whitespace-separated notation used in Lisp (and this article) is typical. Line breaks (newline characters) usually qualify as separators. This is a simple
context-free grammar In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form :A\ \to\ \alpha with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be em ...
for a tiny subset of English written as an S-expression (Gazdar/Melish, Natural Language Processing in Lisp), where S=sentence, NP=Noun Phrase, VP=Verb Phrase, V=Verb: (((S) (NP VP)) ((VP) (V)) ((VP) (V NP)) ((V) died) ((V) employed) ((NP) nurses) ((NP) patients) ((NP) Medicenter) ((NP) "Dr Chan"))


Example of source code S-expressions

Program code can be written in S-expressions, usually using prefix notation. Example in
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
: (defun factorial (x) (if (zerop x) 1 (* x (factorial (- x 1))))) S-expressions can be read in Lisp using the function READ. READ reads the textual representation of an S-expression and returns Lisp data. The function PRINT can be used to output an S-expression. The output then can be read with the function READ, when all printed data objects have a readable representation. Lisp has readable representations for numbers, strings, symbols, lists and many other data types. Program code can be formatted as pretty printed S-expressions using the function PPRINT (note: with two Ps, short for ''pretty''-print). Lisp programs are valid S-expressions, but not all S-expressions are valid Lisp programs. (1.0 + 3.1) is a valid S-expression, but not a valid Lisp program, since Lisp uses prefix notation and a floating point number (here 1.0) is not valid as an operation (the first element of the expression). An S-expression preceded by a single quotation mark, as in 'x, is
syntactic sugar In computer science, syntactic sugar is syntax within a programming language that is designed to make things easier to read or to express. It makes the language "sweeter" for human use: things can be expressed more clearly, more concisely, or in an ...
for a quoted S-expression, in this case (quote x).


Parsing

S-expressions are often compared to
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
: the key difference is that S-expressions have just one form of containment, the dotted pair, while XML tags can contain simple attributes, other tags, or CDATA, each using different syntax. For simple use cases, S-expressions are simpler than XML, but for more advanced use cases, XML has a query language so called
XPath XPath (XML Path Language) is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values (e.g., strings, numbers, or Boolean v ...
, many tools and third party libraries to simplify the handling of XML data.


Standardization

Standards for some Lisp-derived programming languages include a specification for their S-expression syntax. These include
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
(ANSI standard document ANSI INCITS 226-1994 (R2004)),
Scheme A scheme is a systematic plan for the implementation of a certain idea. Scheme or schemer may refer to: Arts and entertainment * ''The Scheme'' (TV series), a BBC Scotland documentary series * The Scheme (band), an English pop band * ''The Schem ...
(R5RS and R6RS), and ISLISP.


Rivest's variant

In May 1997, Ron Rivest submitted an
Internet Draft An Internet Draft (I-D) is a document published by the Internet Engineering Task Force (IETF) containing preliminary technical specifications, results of networking-related research, or other technical information. Often, Internet Drafts are int ...
to be considered for publication as an RFC. The draft defined a syntax based on Lisp S-expressions but intended for general-purpose data storage and exchange (similar to
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
) rather than specifically for programming. It was never approved as an RFC, but it has since been cited and used by other RFCs (e.g. RFC 2693) and several other publications. It was originally intended for use in
SPKI Simple public key infrastructure (SPKI, pronounced ''spoo-key'') was an attempt to overcome the complexity of traditional X.509 public key infrastructure. It was specified in two Internet Engineering Task Force (IETF) Request for Comments (RFC) ...
. Rivest's format defines an S-expression as being either an octet-string (a series of
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s) or a finite list of other S-expressions. It describes three interchange formats for expressing this structure. One is the "advanced transport", which is very flexible in terms of formatting, and is syntactically similar to Lisp-style expressions, but they are not identical. The advanced transport, for example, allows octet-strings to be represented verbatim (the string's length followed by a colon and the entire raw string), a quoted form allowing escape characters,
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, he ...
, Base64, or placed directly as a "token" if it meets certain conditions. (Rivest's tokens differ from Lisp tokens in that the former are just for convenience and aesthetics, and treated exactly like other strings, while the latter have specific syntactical meaning.) Rivest's draft defines a canonical representation "for digital signature purposes". It's intended to be compact, easier to parse, and unique for any abstract S-expression. It only allows verbatim strings, and prohibits whitespace as formatting outside strings. Finally there is the "basic transport representation", which is either the canonical form or the same encoded as Base64 and surrounded by braces, the latter intended to safely transport a canonically encoded S-expression in a system which might change spacing (e.g. an email system which has 80-character-wide lines and wraps anything longer than that). This format has not been widely adapted for use outside of SPKI (some of the users being GnuPG, libgcrypt, Nettle, and
GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...
lsh). Rivest's S-expressions web page provides C source code for a parser and generator (available under the
MIT license The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license comp ...
), which could be adapted and embedded into other programs. In addition, there are no restrictions on independently implementing the format.


See also

* cons * CAR and CDR *
Fexpr In Lisp programming languages, a fexpr is a function whose operands are passed to it without being evaluated. When a fexpr is called, only the body of the fexpr is evaluated; no other evaluations take place except when explicitly initiated by the fe ...
*
Lambda calculus Lambda calculus (also written as ''λ''-calculus) is a formal system in mathematical logic for expressing computation based on function abstraction and application using variable binding and substitution. It is a universal model of computation th ...
* M-expression *
Canonical S-expressions A Canonical S-expression (or csexp) is a binary encoding form of a subset of general S-expression (or sexp). It was designed for use in SPKI to retain the power of S-expressions and ensure canonical form for applications such as digital signatures ...
* Comparison of data serialization formats


References


External links


sfsexp
the small, fast S-expression library for C/C++ on GitHub
minilisp
by Léon Bottou.
S-expressions on Rosettacode
has implementations of readers and writers in many languages. {{Lisp programming language Lisp (programming language) Data serialization formats