SNOBOL ("StriNg Oriented and symBOlic Language") is a series of
programming language
A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language.
The description of a programming ...
s developed between 1962 and 1967 at
AT&T
AT&T Inc. is an American multinational telecommunications holding company headquartered at Whitacre Tower in Downtown Dallas, Texas. It is the world's largest telecommunications company by revenue and the third largest provider of mobile tel ...
Bell Laboratories
Nokia Bell Labs, originally named Bell Telephone Laboratories (1925–1984),
then AT&T Bell Laboratories (1984–1996)
and Bell Labs Innovations (1996–2007),
is an American industrial research and scientific development company owned by mult ...
by
David J. Farber
David J. Farber (born April 17, 1934) is a professor of computer science, noted for his major contributions to programming languages and computer networking. He is currently the Distinguished Professor and Co-Director of Cyber Civilization Res ...
,
Ralph E. Griswold and Ivan P. Polonsky, culminating in SNOBOL4. It was one of a number of text-string-oriented languages developed during the 1950s and 1960s; others included
COMIT
COMIT was the first string processing language (compare SNOBOL, TRAC, and Perl), developed on the IBM 700/7000 series computers by Dr. Victor Yngve, University of Chicago, and collaborators at MIT from 1957 to 1965. Yngve created the language ...
and
TRAC
Trac is an open-source software, open-source, web-based Project management software, project management and bug tracking system. It has been adopted by a variety of organizations for use as a bug tracking system for both free and open-source s ...
.
SNOBOL4 stands apart from most programming languages of its era by having patterns as a
first-class data type
In programming language design, a first-class citizen (also type, object, entity, or value) in a given programming language is an entity which supports all the operations generally available to other entities. These operations typically include ...
(''i.e.'' a data type whose values can be manipulated in all ways permitted to any other data type in the programming language) and by providing operators for pattern
concatenation
In formal language, formal language theory and computer programming, string concatenation is the operation of joining character string (computer science), character strings wikt:end-to-end, end-to-end. For example, the concatenation of "sno ...
and
alternation. SNOBOL4 patterns are a type of object and admit various manipulations, much like later
object-oriented language
Object-oriented programming (OOP) is a programming paradigm based on the concept of "objects", which can contain data and code. The data is in the form of fields (often known as attributes or ''properties''), and the code is in the form of pro ...
s such as
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
whose patterns are known as
regular expressions
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
. In addition SNOBOL4 strings generated during execution can be treated as programs and either interpreted or compiled and executed (as in the
eval
In some programming languages, eval , short for the English evaluate, is a function which evaluates a string as though it were an expression in the language, and returns a result; in others, it executes multiple lines of code as though they had b ...
function of other languages).
SNOBOL4 was quite widely taught in larger US universities in the late 1960s and early 1970s and was widely used in the 1970s and 1980s as a text manipulation language in the
humanities
Humanities are academic disciplines that study aspects of human society and culture. In the Renaissance, the term contrasted with divinity and referred to what is now called classics, the main area of secular study in universities at the t ...
.
In the 1980s and 1990s its use faded as newer languages such as
AWK
AWK (''awk'') is a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. Like sed and grep, it is a filter, and is a standard feature of most Unix-like operating systems.
The AWK langu ...
and
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
made string manipulation by means of
regular expression
A regular expression (shortened as regex or regexp; sometimes referred to as rational expression) is a sequence of characters that specifies a search pattern in text. Usually such patterns are used by string-searching algorithms for "find" or ...
s fashionable. SNOBOL4 patterns subsume
BNF grammars, which are equivalent to
context-free grammar
In formal language theory, a context-free grammar (CFG) is a formal grammar whose production rules are of the form
:A\ \to\ \alpha
with A a ''single'' nonterminal symbol, and \alpha a string of terminals and/or nonterminals (\alpha can be empt ...
s and more powerful than regular expressions.
The "regular expressions" in current versions of AWK and Perl are in fact extensions of regular expressions in the
traditional sense, but regular expressions, unlike SNOBOL4 patterns, are not recursive, which gives a distinct computational advantage to SNOBOL4 patterns. (Recursive expressions did appear in
Perl 5.10, though, released in December 2007.)
The later SL5 (1977) and
Icon
An icon () is a religious work of art, most commonly a painting, in the cultures of the Eastern Orthodox, Oriental Orthodox, and Catholic churches. They are not simply artworks; "an icon is a sacred image used in religious devotion". The most ...
(1978) languages were designed by Griswold to combine the backtracking of SNOBOL4 pattern matching with more standard
ALGOL
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
-like structuring.
Development
SNOBOL1
The initial SNOBOL language was created as a tool to be used by its authors to work with the symbolic manipulation of polynomials. It was written in assembly language for the
IBM 7090
The IBM 7090 is a second-generation transistorized version of the earlier IBM 709 vacuum tube mainframe computer that was designed for "large-scale scientific and technological applications". The 7090 is the fourth member of the IBM 700/7000 ser ...
. It had a simple syntax, only one datatype, the string, no functions, and no declarations and very little error control. However, despite its simplicity and its "personal" nature its use began to spread to other groups. As a result, the authors decided to extend it and tidy it up.
SNOBOL2
SNOBOL2 did exist but it was a short-lived intermediate development version without user-defined functions and was never released.
SNOBOL3
SNOBOL was rewritten to add functions, both standard and user-defined, and the result was released as SNOBOL3. SNOBOL3 became quite popular and was rewritten for other computers than the IBM 7090 by other programmers. As a result, several incompatible dialects arose.
SNOBOL4
As SNOBOL3 became more popular, the authors received more and more requests for extensions to the language. They also began to receive complaints about incompatibility and bugs in versions that they hadn't written. To address this and to take advantage of the new computers being introduced in the late 1960s, the decision was taken to develop SNOBOL4 with many extra datatypes and features but based on a
virtual machine
In computing, a virtual machine (VM) is the virtualization/emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized hardw ...
to allow improved portability across computers. The SNOBOL4 language translator was still written in assembly language. However the macro features of the assembler were used to define the virtual machine instructions of the SNOBOL Implementation Language, the SIL. This very much improved the portability of the language by making it relatively easy to port the virtual machine which hosted the translator by recreating its virtual instructions on any machine which included a macro assembler or indeed a high level language.
The machine-independent language SIL arose as a generalization of string manipulation macros by
Douglas McIlroy
Malcolm Douglas McIlroy (born 1932) is a mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College.
McIlroy is best known for having originally proposed Unix pipelines and developed se ...
, which were used extensively in the initial SNOBOL implementation. In 1969, McIlroy influenced the language again by insisting on addition of the table type to SNOBOL4.
SNOBOL4 features
SNOBOL is distinctive in format and programming style, which are radically different from contemporary procedural languages such as
Fortran and
ALGOL
ALGOL (; short for "Algorithmic Language") is a family of imperative computer programming languages originally developed in 1958. ALGOL heavily influenced many other languages and was the standard method for algorithm description used by the ...
.
SNOBOL4 supports a number of built-in
data type
In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allowed operations on it. A data type tells the compiler or interpreter how the programmer intends to use the data. Most progra ...
s, such as
integer
An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign (−1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
s and limited precision
real number
In mathematics, a real number is a number that can be used to measure a ''continuous'' one-dimensional quantity such as a distance, duration or temperature. Here, ''continuous'' means that values can have arbitrarily small variations. Every real ...
s,
strings
String or strings may refer to:
*String (structure), a long flexible structure made from threads twisted together, which is used to tie, bind, or hang other objects
Arts, entertainment, and media Films
* ''Strings'' (1991 film), a Canadian anim ...
,
pattern
A pattern is a regularity in the world, in human-made design, or in abstract ideas. As such, the elements of a pattern repeat in a predictable manner. A geometric pattern is a kind of pattern formed of geometric shapes and typically repeated l ...
s,
array
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
s, and
table
Table may refer to:
* Table (furniture), a piece of furniture with a flat surface and one or more legs
* Table (landform), a flat area of land
* Table (information), a data arrangement with rows and columns
* Table (database), how the table data ...
s (associative arrays), and also allows the programmer to define additional data types and new
function
Function or functionality may refer to:
Computing
* Function key, a type of key on computer keyboards
* Function model, a structured representation of processes in a system
* Function object or functor or functionoid, a concept of object-oriente ...
s. SNOBOL4's programmer-defined data type facility was advanced at the time—it is similar to the records of the earlier
COBOL
COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily us ...
and the later
Pascal
Pascal, Pascal's or PASCAL may refer to:
People and fictional characters
* Pascal (given name), including a list of people with the name
* Pascal (surname), including a list of people and fictional characters with the name
** Blaise Pascal, Fren ...
programming languages.
All SNOBOL command lines are of the form
:''label subject pattern'' = ''object'' : ''transfer''
Each of the five elements is optional. In general, the ''subject'' is matched against the ''pattern''. If the ''object'' is present, any matched portion is replaced by the ''object'' via rules for replacement. The ''transfer'' can be an absolute branch or a conditional branch dependent upon the success or failure of the subject evaluation, the pattern evaluation, the pattern match, the object evaluation or the final assignment. It can also be a transfer to code created and compiled by the program itself during a run.
A SNOBOL pattern can be very simple or extremely complex. A simple pattern is just a text string (e.g. "ABCD"), but a complex pattern may be a large structure describing, for example, the complete grammar of a computer language. It is possible to implement a language interpreter in SNOBOL almost directly from a
Backus–Naur form
In computer science, Backus–Naur form () or Backus normal form (BNF) is a metasyntax notation for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats ...
expression of it, with few changes. Creating a macro assembler and an interpreter for a completely theoretical piece of hardware could take as little as a few hundred lines, with a new instruction being added with a single line.
Complex SNOBOL patterns can do things that would be impractical or impossible using the more primitive regular expressions used in most other pattern-matching languages. Some of this power derives from the so-called "SPITBOL extensions" (which have since been incorporated in basically all modern implementations of the original SNOBOL 4 language too), although it is possible to achieve the same power without them. Part of this power comes from the side effects that it is possible to produce during the pattern matching operation, including saving numerous intermediate/tentative matching results and the ability to invoke user-written functions during the pattern match which can perform nearly any desired processing, and then influence the ongoing direction the interrupted pattern match takes, or even to indeed change the pattern itself during the matching operation. Patterns can be saved like any other first-class data item, and can be concatenated, used within other patterns, and used to create very complex and sophisticated pattern expressions. It is possible to write, for example, a SNOBOL4 pattern which matches "a complete name and international postal mailing address", which is well beyond anything that is practical to even attempt using regular expressions.
SNOBOL4 pattern-matching uses a backtracking algorithm similar to that used in the
logic programming
Logic programming is a programming paradigm which is largely based on formal logic. Any program written in a logic programming language is a set of sentences in logical form, expressing facts and rules about some problem domain. Major logic prog ...
language
Prolog
Prolog is a logic programming language associated with artificial intelligence and computational linguistics.
Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
, which provides pattern-like constructs via
DCGs. This algorithm makes it easier to use SNOBOL as a logic programming language than is the case for most languages.
SNOBOL stores variables, strings and data structures in a single
garbage-collected heap.
Example programs
The "Hello, World!" program might be as follows...
OUTPUT = "Hello, World!"
END
A simple program to ask for a user's name and then use it in an output sentence...
OUTPUT = "What is your name?"
Username = INPUT
OUTPUT = "Thank you, " Username
END
To choose between three possible outputs...
OUTPUT = "What is your name?"
Username = INPUT
Username "J" :S(LOVE)
Username "K" :S(HATE)
MEH OUTPUT = "Hi, " Username :(END)
LOVE OUTPUT = "How nice to meet you, " Username :(END)
HATE OUTPUT = "Oh. It's you, " Username
END
To continue requesting input until no more is forthcoming...
OUTPUT = "This program will ask you for personal names"
OUTPUT = "until you press return without giving it one"
NameCount = 0 :(GETINPUT)
AGAIN NameCount = NameCount + 1
OUTPUT = "Name " NameCount ": " PersonalName
GETINPUT OUTPUT = "Please give me name " NameCount + 1
PersonalName = INPUT
PersonalName LEN(1) :S(AGAIN)
OUTPUT = "Finished. " NameCount " names requested."
END
Implementations
The classic implementation was on the
PDP-10
Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especi ...
; it has been used to study
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
s,
formal grammar
In formal language theory, a grammar (when the context is not given, often called a formal grammar for clarity) describes how to form strings from a language's alphabet that are valid according to the language's syntax. A grammar does not describe ...
s, and
artificial intelligence
Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech re ...
, especially
machine translation
Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
and machine comprehension of
natural language
In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
s. The original implementation was on an IBM 7090 at Bell Labs, Holmdel, N.J. SNOBOL4 was specifically designed for portability; the first implementation was started on an IBM 7094 in 1966 but completed on an IBM 360 in 1967. It was rapidly ported to many other platforms.
It is normally implemented as an
interpreter because of the difficulty in implementing some of its very high-level features, but there is a
compiler
In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
, the
SPITBOL compiler
SPITBOL (Speedy Implementation of SNOBOL) is a compiled implementation of the SNOBOL4 programming language. Originally targeted for the IBM System/360 and System/370 family of computers, it has now been ported to most major microprocessors includi ...
, which provides nearly all the facilities that the interpreter provides.
The classic implementation on the
PDP-10
Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especi ...
was quite slow, and in 1972 James Gimpel of Bell Labs, Holmdel, N.J. designed a native implementation of SNOBOL4 for the
PDP-10
Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, especi ...
that he named SITBOL. He used the design as the basis of a graduate class in string processing that he taught that year at
Stevens Institute of Technology
Stevens Institute of Technology is a private research university in Hoboken, New Jersey. Founded in 1870, it is one of the oldest technological universities in the United States and was the first college in America solely dedicated to mechanical ...
(which is why it was named SITBOL). Students were given sections to implement (in PDP-10 assembler) and the entire semester was focused on implementing SITBOL. It was over 80% complete by the end of the semester and was subsequently completed by Professor Gimpel and several students over the summer. SITBOL was a full-featured, high-performance SNOBOL4 interpreter.
The
Gnat
A gnat () is any of many species of tiny flying insects in the dipterid suborder Nematocera, especially those in the families Mycetophilidae, Anisopodidae and Sciaridae. They can be both biting and non-biting. Most often they fly in large num ...
Ada
Ada may refer to:
Places
Africa
* Ada Foah, a town in Ghana
* Ada (Ghana parliament constituency)
* Ada, Osun, a town in Nigeria
Asia
* Ada, Urmia, a village in West Azerbaijan Province, Iran
* Ada, Karaman, a village in Karaman Province, Tur ...
Compiler comes with a package (GNAT.Spitbol) that implements all of the Spitbol string manipulation semantics. This can be called from within an Ada program.
The file editor for the
Michigan Terminal System
The Michigan Terminal System (MTS) is one of the first time-sharing computer operating systems.. Developed in 1967 at the University of Michigan for use on IBM S/360-67, S/370 and compatible mainframe computers, it was developed and used by a cons ...
(MTS) provided pattern matching based on SNOBOL4 patterns.
Several implementations are currently available. Macro SNOBOL4 in C written by Phil Budne is a free, open source implementation, capable of running on almost any platform. Catspaw, Inc provided a commercial implementation of the SNOBOL4 language for many different computer platforms, including DOS, Macintosh, Sun, RS/6000, and others, and these implementations are now available free from Catspaw. Minnesota SNOBOL4, by Viktors Berstis, the closest PC implementation to the original IBM mainframe version (even including Fortran-like FORMAT statement support) is also free.
Although SNOBOL itself has no
structured programming
Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection ( if/then/else) and repetition ( ...
features, a SNOBOL preprocessor called
Snostorm Snostorm (Snostorm3) is a version of the SNOBOL, SNOBOL4 language with structured programming constructs added. It compensates for the near absence of structured programming constructs in SNOBOL4 by providing IF, ELSEIF, ELSE, LOOP, CASE, and PROCED ...
was designed and implemented during the 1970s by Fred G. Swartz for use under the
Michigan Terminal System
The Michigan Terminal System (MTS) is one of the first time-sharing computer operating systems.. Developed in 1967 at the University of Michigan for use on IBM S/360-67, S/370 and compatible mainframe computers, it was developed and used by a cons ...
(MTS) at the
University of Michigan
, mottoeng = "Arts, Knowledge, Truth"
, former_names = Catholepistemiad, or University of Michigania (1817–1821)
, budget = $10.3 billion (2021)
, endowment = $17 billion (2021)As o ...
.
["SNOSTORM"]
''MTS Volume 9: SNOBOL4 in MTS'', Computing Center, University of Michigan, June 1979, pages 99-120. Retrieved 1 September 2014. Snostorm was used at the eight to fifteen sites that ran MTS. It was also available at
University College London
, mottoeng = Let all come who by merit deserve the most reward
, established =
, type = Public research university
, endowment = £143 million (2020)
, budget = ...
(UCL) between 1982 and 1984.
Snocone by
Andrew Koenig
Joshua Andrew Koenig (; August 17, 1968 – February 16, 2010) was an American character actor, film director, editor, writer, and human rights activist. He was known for his role as Richard "Boner" Stabone in ''Growing Pains''.
Early ...
adds block-structured constructs to the SNOBOL4 language. Snocone is a self-contained programming language, rather than a proper superset of SNOBOL4.
The SPITBOL implementation also introduced a number of features which, while not using traditional structured programming keywords, nevertheless can be used to provide many of the equivalent capabilities normally thought of as "structured programming", most notably nested if/then/else type constructs. These features have since been added to most recent SNOBOL4 implementations. After many years as a commercial product, in April 2009 SPITBOL was released as free software under the
GNU General Public License
The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
.
Naming
According to Dave Farber, he, Griswold and Polonsky "finally arrived at the name Symbolic EXpression Interpreter SEXI."
Common
backronym
A backronym is an acronym formed from an already existing word by expanding its letters into the words of a phrase. Backronyms may be invented with either serious or humorous intent, or they may be a type of false etymology or folk etymology. The ...
s of "SNOBOL" are 'String Oriented Symbolic Language' or (as a
quasi-initialism) 'StriNg Oriented symBOlic Language'.
See also
*
Icon (programming language) :
Icon is a very high-level programming language A very high-level programming language (VHLL) is a programming language with a very high level of abstraction, used primarily as a professional programmer productivity tool.
VHLLs are usually domain ...
*
Snowball (programming language)
Snowball is a small string processing programming language designed for creating stemming algorithms for use in information retrieval.
*
Snostorm Snostorm (Snostorm3) is a version of the SNOBOL, SNOBOL4 language with structured programming constructs added. It compensates for the near absence of structured programming constructs in SNOBOL4 by providing IF, ELSEIF, ELSE, LOOP, CASE, and PROCED ...
*
SPITBOL
SPITBOL (Speedy Implementation of SNOBOL) is a compiled implementation of the SNOBOL4 programming language. Originally targeted for the IBM System/360 and System/370 family of computers, it has now been ported to most major microprocessors includi ...
*
Unicon (programming language)
Unicon is a programming language designed by American computer scientist Clint Jeffery with collaborators including Shamim Mohamed, Jafar Al Gharaibeh, Robert Parlett and others. Unicon descended from Icon programming language, Icon and a preproc ...
References
Further reading
*
* republished Salida, CO: Catspaw, 1986 ().
*
*
*
*
External links
CSNOBOL4is a free and open source BSD-licensed port of the original Bell Labs SNOBOL4 to systems with a C compiler, and includes SPITBOL and Blocks enhancements.
Catspaw, Inc. offers implementations of and commercial support for SNOBOL4*
* ].
*
For a small brief taste of what SNOBOL4 is about try this online compilerTry It Online (Snobol4/CSNOBOL)Online compiler
{{Authority control
Pattern matching programming languages
Programming languages created in 1962
SNOBOL programming language family
Assembly language software
Text-oriented programming languages
Programming languages
Homoiconic programming languages
1962 software