literate programming
   HOME

TheInfoList



OR:

Literate programming is a
programming paradigm Programming paradigms are a way to classify programming languages based on their features. Languages can be classified into multiple paradigms. Some paradigms are concerned mainly with implications for the execution model of the language, suc ...
introduced in 1984 by
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sc ...
in which a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
is given as an explanation of its logic in a
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
, such as English, interspersed (embedded) with snippets of macros and traditional
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
, from which compilable source code can be generated. The approach is used in
scientific computing Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
and in
data science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a br ...
routinely for
reproducible research Reproducibility, also known as replicability and repeatability, is a major principle underpinning the scientific method. For the findings of a study to be reproducible means that results obtained by an experiment or an observational study or in a ...
and
open access Open access (OA) is a set of principles and a range of practices through which research outputs are distributed online, free of access charges or other barriers. With open access strictly defined (according to the 2001 definition), or libre op ...
purposes. Literate programming tools are used by millions of programmers today. The literate programming paradigm, as conceived by Donald Knuth, represents a move away from writing computer programs in the manner and order imposed by the computer, and instead gives programmers macros to develop programs in the order demanded by the logic and flow of their thoughts. Literate programs are written as an exposition of logic in more
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languages ...
in which macros are used to hide abstractions and traditional
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the wo ...
, more like the text of an
essay An essay is, generally, a piece of writing that gives the author's own argument, but the definition is vague, overlapping with those of a letter, a paper, an article, a pamphlet, and a short story. Essays have been sub-classified as formal a ...
. Literate programming (LP) tools are used to obtain two representations from a source file: one understandable by a compiler or interpreter, the "tangled" code, and another for viewing as formatted documentation, which is said to be "woven" from the literate source.If one remembers that the first version of the tool was called WEB, the amusing literary reference hidden by Knuth in these names becomes obvious: "Oh, what a tangled web we weave when first we practise to deceive" –
Sir Walter Scott Sir Walter Scott, 1st Baronet (15 August 1771 – 21 September 1832), was a Scottish novelist, poet, playwright and historian. Many of his works remain classics of European and Scottish literature, notably the novels '' Ivanhoe'', '' Rob Roy' ...
, in Canto VI, Stanza 17 of '' Marmion'' (1808) an epic poem about the
Battle of Flodden The Battle of Flodden, Flodden Field, or occasionally Branxton, (Brainston Moor) was a battle fought on 9 September 1513 during the War of the League of Cambrai between the Kingdom of England and the Kingdom of Scotland, resulting in an English ...
in 1513. – the actual citation appeared as an epigraph in a May 1986 article by Jon Bentley and Donald Knuth in one of the classical Programming Pearls columns in Communications of the ACM, vol 29 num 5 on p.365
While the first generation of literate programming tools were
computer language A computer language is a formal language used to communicate with a computer. Types of computer languages include: * Construction language – all forms of communication by which a human can specify an executable problem solution to a compu ...
-specific, the later ones are
language-agnostic Language-agnostic programming or scripting (also called language-neutral, language-independent, or cross-language) is a software paradigm in which no particular language is promoted. In introductory instruction, the term refers to teaching princip ...
and exist beyond the individual programming languages.


History and philosophy

Literate programming was first introduced in 1984 by Donald Knuth, who intended it to create programs that were suitable literature for human beings. He implemented it at
Stanford University Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
as a part of his research on
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s and digital
typography Typography is the art and technique of arranging type to make written language legible, readable and appealing when displayed. The arrangement of type involves selecting typefaces, point sizes, line lengths, line-spacing ( leading), and ...
. The implementation was called " WEB" since he believed that it was one of the few three-letter words of English that had not yet been applied to computing. However, it resembles the complicated nature of software delicately pieced together from simple materials. The practice of literate programming has seen an important resurgence in the 2010s with the use of computational notebooks, especially in
data science Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured and unstructured data, and apply knowledge from data across a br ...
.


Concept

Literate programming is writing out the program logic in a human language with included (separated by a primitive markup) code snippets and macros. Macros in a literate source file are simply title-like or explanatory phrases in a human language that describe human abstractions created while solving the programming problem, and hiding chunks of code or lower-level macros. These macros are similar to the
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algorithms are used as specificat ...
s in
pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
typically used in teaching
computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to Applied science, practical discipli ...
. These arbitrary explanatory phrases become precise new operators, created on the fly by the programmer, forming a ''meta-language'' on top of the underlying programming language. A
preprocessor In computer science, a preprocessor (or precompiler) is a program that processes its input data to produce output that is used as input in another program. The output is said to be a preprocessed form of the input data, which is often used by so ...
is used to substitute arbitrary hierarchies, or rather "interconnected 'webs' of macros", to produce the compilable source code with one command ("tangle"), and documentation with another ("weave"). The preprocessor also provides an ability to write out the content of the macros and to add to already created macros in any place in the text of the literate program source file, thereby disposing of the need to keep in mind the restrictions imposed by traditional programming languages or to interrupt the flow of thought.


Advantages

According to Knuth, literate programming provides higher-quality programs, since it forces programmers to explicitly state the thoughts behind the program, making poorly thought-out design decisions more obvious. Knuth also claims that literate programming provides a first-rate documentation system, which is not an add-on, but is grown naturally in the process of exposition of one's thoughts during a program's creation. The resulting documentation allows the author to restart their own thought processes at any later time, and allows other programmers to understand the construction of the program more easily. This differs from traditional documentation, in which a programmer is presented with source code that follows a compiler-imposed order, and must decipher the thought process behind the program from the code and its associated comments. The meta-language capabilities of literate programming are also claimed to facilitate thinking, giving a higher "bird's eye view" of the code and increasing the number of concepts the mind can successfully retain and process. Applicability of the concept to programming on a large scale, that of commercial-grade programs, is proven by an edition of
TeX Tex may refer to: People and fictional characters * Tex (nickname), a list of people and fictional characters with the nickname * Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr. Entertainment * ''Tex'', the Italian ...
code as a literate program. Knuth also claims that literate programming can lead to easy porting of software to multiple environments, and even cites the implementation of TeX as an example.


Contrast with documentation generation

Literate programming is very often misunderstood to refer only to formatted documentation produced from a common file with both source code and comments – which is properly called
documentation generation A documentation generator is a programming tool that generates software documentation intended for programmers (API documentation) or end users (end-user guide), or both, from a set of source code files, and in some cases, binary files. Some genera ...
– or to voluminous commentaries included with code. This is the converse of literate programming: well-documented code or documentation extracted from code follows the structure of the code, with documentation embedded in the code; while in literate programming, code is embedded in documentation, with the code following the structure of the documentation. This misconception has led to claims that comment-extraction tools, such as the
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
Plain Old Documentation Plain Old Documentation (pod) is a lightweight markup language used to document the Perl programming language as well as Perl modules and programs. Design Pod is designed to be a simple, clean language with just enough syntax to be useful. It pur ...
or
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
Javadoc Javadoc (originally cased JavaDoc) is a documentation generator created by Sun Microsystems for the Java language (now owned by Oracle Corporation) for generating API documentation in HTML format from Java source code. The HTML format is used for ...
systems, are "literate programming tools". However, because these tools do not implement the "web of abstract concepts" hiding behind the system of natural-language macros, or provide an ability to change the order of the source code from a machine-imposed sequence to one convenient to the human mind, they cannot properly be called literate programming tools in the sense intended by Knuth.


Critique

In 1986, Jon Bentley asked Knuth to demonstrate the concept of literate programming for his ''Programming Pearls'' column in the ''
Communications of the ACM ''Communications of the ACM'' is the monthly journal of the Association for Computing Machinery (ACM). It was established in 1958, with Saul Rosen as its first managing editor. It is sent to all ACM members. Articles are intended for readers with ...
'', by writing a program in WEB. Knuth sent him a program for a problem previously discussed in the column (that of sampling ''M'' random numbers in the range 1..''N''), and also asked for an "assignment". Bentley gave him the problem of finding the ''K'' most common words from a text file, for which Knuth wrote a WEB program that was published together with a review by
Douglas McIlroy Malcolm Douglas McIlroy (born 1932) is a mathematician, engineer, and programmer. As of 2019 he is an Adjunct Professor of Computer Science at Dartmouth College. McIlroy is best known for having originally proposed Unix pipelines and developed se ...
of Bell Labs. McIlroy praised the intricacy of Knuth's solution, his choice of a data structure (a variant of Frank M. Liang's
hash trie In computer science, hash trie can refer to: * Hash tree (persistent data structure), a trie used to map hash values to keys * A space-efficient implementation of a sparse trie, in which the descendants of each node may be interleaved in memory. ( ...
), and the presentation. He criticized some matters of style, such as the fact that the central idea was described late in the paper, the use of magic constants, and the absence of a diagram to accompany the explanation of the data structure. McIlroy, also used the review to critique the programming task itself, pointing out that in Unix (developed at Bell Labs), utilities for text processing ( tr,
sort Sort may refer to: * Sorting, any process of arranging items in sequence or in sets ** Sorting algorithm, any algorithm for arranging elements in lists ** Sort (Unix), a Unix utility which sorts the lines of a file ** Sort (C++), a function in the ...
,
uniq uniq is a utility command (computing), command on Unix, Plan 9 from Bell Labs, Plan 9, Inferno (operating system), Inferno, and Unix-like operating systems which, when fed a text file or Standard streams#Standard input (stdin), standard input, o ...
and
sed sed ("stream editor") is a Unix utility that parses and transforms text, using a simple, compact programming language. It was developed from 1973 to 1974 by Lee E. McMahon of Bell Labs, and is available today for most operating systems. sed w ...
) had been written previously that were "staples", and a solution that was easy to implement, debug and reuse could be obtained by combining these utilities in a six-line
shell script A shell script is a computer program designed to be run by a Unix shell, a command-line interpreter. The various dialects of shell scripts are considered to be scripting languages. Typical operations performed by shell scripts include file manip ...
. In response, Bentley wrote that: McIlroy later admitted that his critique was unfair, since he criticized Knuth's program on engineering grounds, while Knuth's purpose was only to demonstrate the literate programming technique. In 1987, ''
Communications of the ACM ''Communications of the ACM'' is the monthly journal of the Association for Computing Machinery (ACM). It was established in 1958, with Saul Rosen as its first managing editor. It is sent to all ACM members. Articles are intended for readers with ...
'' published a followup article which illustrated literate programming with a C program that combined artistic approach of Knuth with engineering approach of McIlroy, with a critique by John Gilbert.


Workflow

Implementing literate programming consists of two steps: # Weaving: Generating a comprehensive document about the program and its maintenance. # Tangling: Generating machine executable code Weaving and tangling are done on the same source so that they are consistent with each other.


Example

A classic example of literate programming is the literate implementation of the standard
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...
wc word counting program. Knuth presented a
CWEB Web is a computer programming system created by Donald E. Knuth as the first implementation of what he called "literate programming": the idea that one could create software as works of literature, by embedding source code inside descriptive t ...
version of this example in Chapter 12 of his ''Literate Programming'' book. The same example was later rewritten for the
noweb Noweb, stylised in lowercase as noweb, is a literate programming tool, created in 1989–1999 by Norman Ramsey, and designed to be simple, easily extensible and language independent. As in WEB and CWEB, the main components of Noweb are two prog ...
literate programming tool. This example provides a good illustration of the basic elements of literate programming.


Creation of macros

The following snippet of the wc literate program shows how arbitrary descriptive phrases in a natural language are used in a literate program to create macros, which act as new "operators" in the literate programming language, and hide chunks of code or other macros. The mark-up notation consists of double angle brackets ("<<...>>") that indicate macros, the "@" symbol which indicates the end of the code section in a noweb file. The "<<*>>" symbol stands for the "root", topmost node the literate programming tool will start expanding the web of macros from. Actually, writing out the expanded source code can be done from any section or subsection (i.e. a piece of code designated as "<>=", with the equal sign), so one literate program file can contain several files with machine source code. The purpose of wc is to count lines, words, and/or characters in a list of files. The number of lines in a file is ......../more explanations/ Here, then, is an overview of the file wc.c that is defined by the noweb program wc.nw: <<*>>= <
> <> <> <> <> @ We must include the standard I/O definitions, since we want to send formatted output to stdout and stderr. <
>= #include @ The unraveling of the chunks can be done in any place in the literate program text file, not necessarily in the order they are sequenced in the enclosing chunk, but as is demanded by the logic reflected in the explanatory text that envelops the whole program.


Program as a web

Macros are not the same as "section names" in standard documentation. Literate programming macros hide the real code behind themselves, and be used inside any low-level machine language operators, often inside logical operators such as "if", "while" or "case". This can be seen in the following wc literate program. The present chunk, which does the counting, was actually one of the simplest to write. We look at each character and change state if it begins or ends a word. <>= while (1) @ The macros stand for any chunk of code or other macros, and are more general than top-down or bottom-up "chunking", or than subsectioning. Donald Knuth said that when he realized this, he began to think of a program as a ''web'' of various parts.


Order of human logic, not that of the compiler

In a noweb literate program besides the free order of their exposition, the chunks behind macros, once introduced with "<<...>>=", can be grown later in any place in the file by simply writing "<>=" and adding more content to it, as the following snippet illustrates ("plus" is added by the document formatter for readability, and is not in the code). The grand totals must be initialized to zero at the beginning of the program. If we made these variables local to main, we would have to do this initialization explicitly; however, C globals are automatically zeroed. (Or rather,``statically zeroed.'' (Get it?) <>+= long tot_word_count, tot_line_count, tot_char_count; /* total number of words, lines, chars */ @


Record of the train of thought

The documentation for a literate program is produced as part of writing the program. Instead of comments provided as side notes to source code a literate program contains the explanation of concepts on each level, with lower level concepts deferred to their appropriate place, which allows for better communication of thought. The snippets of the literate wc above show how an explanation of the program and its source code are interwoven. Such exposition of ideas creates the flow of thought that is like a literary work. Knuth wrote a "novel" which explains the code of the interactive fiction game
Colossal Cave Adventure ''Colossal Cave Adventure'' (also known as ''Adventure'' or ''ADVENT'') is a text-based adventure game, released in 1976 by developer Will Crowther for the PDP-10 mainframe computer. It was expanded upon in 1977 by Don Woods. In the game, the ...
.


Remarkable examples

*
Axiom An axiom, postulate, or assumption is a statement that is taken to be true, to serve as a premise or starting point for further reasoning and arguments. The word comes from the Ancient Greek word (), meaning 'that which is thought worthy or f ...
, which is evolved from scratchpad, a computer algebra system developed by IBM. It is now being developed by Tim Daly, one of the developers of scratchpad, Axiom is totally written as a literate program.


Literate programming practices

The first published literate programming environment was WEB, introduced by Knuth in 1981 for his
TeX Tex may refer to: People and fictional characters * Tex (nickname), a list of people and fictional characters with the nickname * Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr. Entertainment * ''Tex'', the Italian ...
typesetting system; it uses Pascal as its underlying programming language and TeX for typesetting of the documentation. The complete commented TeX source code was published in Knuth's ''TeX: The program'', volume B of his 5-volume ''
Computers and Typesetting ''Computers and Typesetting'' is a 5-volume set of books by Donald Knuth published in 1986 describing the TeX and Metafont systems for digital typography. Knuth's computers and typesetting project was the result of his frustration with the lac ...
''. Knuth had privately used a literate programming system called DOC as early as 1979. He was inspired by the ideas of Pierre-Arnoul de Marneffe. The free
CWEB Web is a computer programming system created by Donald E. Knuth as the first implementation of what he called "literate programming": the idea that one could create software as works of literature, by embedding source code inside descriptive t ...
, written by Knuth and Silvio Levy, is WEB adapted for C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, runs on most operating systems and can produce TeX and
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
documentation. There are various other implementations of the literate programming concept (many of these don't have macros and hence violate the order of human logic principle, which makes them more of semi-literate tools): Other useful tools include: * The Leo text editor is an ''outlining'' editor which supports optional noweb and CWEB markup. The author of Leo mixes two different approaches: first, Leo is an outlining editor, which helps with management of large texts; second, Leo incorporates some of the ideas of literate programming, which in its pure form (i.e., the way it is used by Knuth Web tool or tools like "noweb") is possible only with some degree of inventiveness and the use of the editor in a way not exactly envisioned by its author (in modified @root nodes). However, this and other extensions (@file nodes) make outline programming and text management successful and easy and in some ways similar to literate programming. * The
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
programming language has native support for semi-literate programming. The compiler/interpreter supports two file name extensions: .hs and .lhs; the latter stands for literate Haskell. :The literate scripts can be full LaTeX source text, at the same time it can be compiled, with no changes, because the interpreter only compiles the text in a code environment, for example: : % here text describing the function: \begin fact 0 = 1 fact (n+1) = (n+1) * fact n \end here more text :The code can be also marked in the Richard Bird style, starting each line with a greater than symbol and a space, preceding and ending the piece of code with blank lines. :The LaTeX listings package provides a lstlisting environment which can be used to embellish the source code. It can be used to define a code environment to use within Haskell to print the symbols in the following manner: : \newenvironment \begin comp :: (beta -> gamma) -> (alpha -> beta) -> (alpha -> gamma) (g `comp` f) x = g(f x) \end :which can be configured to yield: :: \begin &comp :: (\beta \to \gamma) \to (\alpha \to \beta) \to (\alpha \to \gamma)\\ &(g \operatorname f) x = g(f x) \end :Although the package does not provide means to organize chunks of code, one can split the LaTeX source code in different files. Se
listings manual
for an overview. * The Web 68 Literate Programming system used
Algol 68 ALGOL 68 (short for ''Algorithmic Language 1968'') is an imperative programming language that was conceived as a successor to the ALGOL 60 programming language, designed with the goal of a much wider scope of application and more rigorously d ...
as the underlying programming language, although there was nothing in the pre-processor 'tang' to force the use of that language. * The customization mechanism of the
Text Encoding Initiative The Text Encoding Initiative (TEI) is a text-centric community of practice in the academic field of digital humanities, operating continuously since the 1980s. The community currently runs a mailing list, meetings and conference series, and main ...
which enables the constraining, modification, or extension of the TEI scheme enables users to mix prose documentation with fragments of schema specification in their One Document Does-it-all format. From this prose documentation, schemas, and processing model pipelines can be generated and Knuth's Literate Programming paradigm is cited as the inspiration for this way of working.


See also

*
Documentation generator A documentation generator is a programming tool that generates software documentation intended for programmers ( API documentation) or end users (end-user guide), or both, from a set of source code files, and in some cases, binary files. Some gen ...
– the inverse on literate programming where documentation is embedded in and generated from source code *
Notebook interface A notebook interface (also called a computational notebook) is a virtual notebook environment used for literate programming, a method of writing computer programs. Some notebooks are WYSIWYG environments including executable calculations embedded i ...
– virtual notebook environment used for literate programming *
Sweave Sweave is a function in the statistical programming language R that enables integration of R code into LaTeX or LyX documents. The purpose is "to create dynamic reports, which can be updated automatically if data or analysis change". The data an ...
and
Knitr knitr is an engine for dynamic report generation with R. It is a package in the programming language R that enables integration of R code into LaTeX, LyX, HTML, Markdown, AsciiDoc, and reStructuredText documents. The purpose of knitr is to allow ...
– examples of use of the "noweb"-like Literate Programming tool inside the R language for creation of dynamic statistical reports *
Self-documenting code In computer programming, self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge. In web development, s ...
– source code that can be easily understood without documentation


References


Further reading

* * * (includes software) * * * *


External links


LiterateProgramming
at
WikiWikiWeb The WikiWikiWeb is the first wiki, or user-editable website. It was launched on 25 March 1995 by programmer Ward Cunningham to accompany the Portland Pattern Repository website discussing software design patterns. The name ''WikiWikiWeb'' origi ...

Literate Programming FAQ
at
CTAN CTAN (an acronym for "Comprehensive TeX Archive Network") is the authoritative place where TeX related material and software can be found for download. Repositories for other projects, such as the MiKTeX distribution of TeX, constantly mirror mo ...
{{Donald Knuth navbox Articles with example code Computer-related introductions in 1981 Programming paradigms