HOME

TheInfoList



OR:

The structured program theorem, also called the Böhm–Jacopini theorem, is a result in
programming language theory Programming language theory (PLT) is a branch of computer science that deals with the design, implementation, analysis, characterization, and classification of formal languages known as programming languages. Programming language theory is clos ...
. It states that a class of
control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that ...
s (historically called
flowchart A flowchart is a type of diagram that represents a workflow or process. A flowchart can also be defined as a diagrammatic representation of an algorithm, a step-by-step approach to solving a task. The flowchart shows the steps as boxes of va ...
s in this context) can compute any
computable function Computable functions are the basic objects of study in computability theory. Computable functions are the formalized analogue of the intuitive notion of algorithms, in the sense that a function is computable if there exists an algorithm that can do ...
if it combines subprograms in only three specific ways ( control structures). These are #Executing one subprogram, and then another subprogram (sequence) #Executing one of two subprograms according to the value of a
boolean Any kind of logic, function, expression, or theory based on the work of George Boole is considered Boolean. Related to this, "Boolean" may refer to: * Boolean data type, a form of data with only two possible values (usually "true" and "false" ...
expression (selection) #Repeatedly executing a subprogram as long as a boolean expression is true (iteration) The structured chart subject to these constraints may however use additional variables in the form of
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented ...
s (stored in an extra integer variable in the original proof) in order to keep track of information that the original program represents by the program location. The construction was based on Böhm's programming language
P′′ P′′ (P double prime) is a primitive computer programming language created by Corrado BöhmBöhm, C.: "On a family of Turing machines and the related programming language", ICC Bull. 3, 185-194, July 1964.Böhm, C. and Jacopini, G.: "Flow diag ...
. The theorem forms the basis of
structured programming Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection ( if/then/else) and repetition ( ...
, a programming paradigm which eschews goto commands and exclusively uses subroutines, sequences, selection and iteration.


Origin and variants

The theorem is typically credited to a 1966 paper by
Corrado Böhm Corrado Böhm (17 January 1923 – 23 October 2017) was a Professor Emeritus at the University of Rome "La Sapienza" and a computer scientist known especially for his contributions to the theory of structured programming, constructive mathemati ...
and Giuseppe Jacopini.
David Harel David Harel ( he, דוד הראל; born 12 April 1950) is a computer scientist, currently serving as President of the Israel Academy of Sciences and Humanities. He has been on the faculty of the Weizmann Institute of Science in Israel since 1980, ...
wrote in 1980 that the Böhm–Jacopini paper enjoyed "universal popularity", particularly with proponents of structured programming. Harel also noted that "due to its rather technical style he 1966 Böhm–Jacopini paperis apparently more often cited than read in detail" and, after reviewing a large number of papers published up to 1980, Harel argued that the contents of the Böhm–Jacopini proof were usually misrepresented as a folk theorem that essentially contains a simpler result, a result which itself can be traced to the inception of modern computing theory in the papers of
von Neumann Von Neumann may refer to: * John von Neumann (1903–1957), a Hungarian American mathematician * Von Neumann family * Von Neumann (surname), a German surname * Von Neumann (crater), a lunar impact crater See also * Von Neumann algebra * Von Neu ...
and
Kleene Stephen Cole Kleene ( ; January 5, 1909 – January 25, 1994) was an American mathematician. One of the students of Alonzo Church, Kleene, along with Rózsa Péter, Alan Turing, Emil Post, and others, is best known as a founder of the branch of ...
. Harel also writes that the more generic name was proposed by H.D. Mills as "The Structure Theorem" in the early 1970s.


Single-while-loop, folk version of the theorem

This version of the theorem replaces all the original program's control flow with a single global while loop that simulates a
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
going over all possible labels (flowchart boxes) in the original non-structured program. Harel traced the origin of this folk theorem to two papers marking the beginning of computing. One is the 1946 description of the
von Neumann architecture The von Neumann architecture — also known as the von Neumann model or Princeton architecture — is a computer architecture based on a 1945 description by John von Neumann, and by others, in the ''First Draft of a Report on the EDVAC''. Th ...
, which explains how a
program counter The program counter (PC), commonly called the instruction pointer (IP) in Intel x86 and Itanium microprocessors, and sometimes called the instruction address register (IAR), the instruction counter, or just part of the instruction sequencer, is ...
operates in terms of a while loop. Harel notes that the single loop used by the folk version of the structured programming theorem basically just provides
operational semantics Operational semantics is a category of formal programming language semantics in which certain desired properties of a program, such as correctness, safety or security, are verified by constructing proofs from logical statements about its execu ...
for the execution of a flowchart on a von Neumann computer. Another, even older source that Harel traced the folk version of the theorem is
Stephen Kleene Stephen Cole Kleene ( ; January 5, 1909 – January 25, 1994) was an American mathematician. One of the students of Alonzo Church, Kleene, along with Rózsa Péter, Alan Turing, Emil Post, and others, is best known as a founder of the branch of ...
's normal form theorem from 1936.
Donald Knuth Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist, mathematician, and professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of computer sci ...
criticized this form of the proof, which results in
pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
like the one below, by pointing out that the structure of the original program is completely lost in this transformation. Similarly, Bruce Ian Mills wrote about this approach that "The spirit of block structure is a style, not a language. By simulating a Von Neumann machine, we can produce the behavior of any spaghetti code within the confines of a block-structured language. This does not prevent it from being spaghetti." p := 1 while p > 0 do if p = 1 then perform step 1 from the flowchart p := resulting successor step number of step 1 from the flowchart (0 if no successor) end if if p = 2 then perform step 2 from the flowchart p := resulting successor step number of step 2 from the flowchart (0 if no successor) end if ... if p = n then perform step n from the flowchart p := resulting successor step number of step n from the flowchart (0 if no successor) end if end while


Böhm and Jacopini's proof

The proof in Böhm and Jacopini's paper proceeds by induction on the structure of the flow chart. Because it employed pattern matching in graphs, the proof of Böhm and Jacopini's was not really practical as a
program transformation A program transformation is any operation that takes a computer program and generates another program. In many cases the transformed program is required to be semantically equivalent to the original, relative to a particular formal semantics and ...
algorithm, and thus opened the door for additional research in this direction.


Implications and refinements

The Böhm–Jacopini proof did not settle the question of whether to adopt
structured programming Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection ( if/then/else) and repetition ( ...
for software development, partly because the construction was more likely to obscure a program than to improve it. On the contrary, it signalled the beginning of the debate.
Edsger Dijkstra Edsger Wybe Dijkstra ( ; ; 11 May 1930 – 6 August 2002) was a Dutch computer scientist, programmer, software engineer, systems scientist, and science essayist. He received the 1972 Turing Award for fundamental contributions to developing progra ...
's famous letter, " Go To Statement Considered Harmful," followed in 1968. Some academics took a purist approach to the Böhm–Jacopini result and argued that even instructions like break and return from the middle of loops are bad practice as they are not needed in the Böhm–Jacopini proof, and thus they advocated that all loops should have a single exit point. This purist approach is embodied in the
Pascal programming language Pascal is an imperative and procedural programming language, designed by Niklaus Wirth as a small, efficient language intended to encourage good programming practices using structured programming and data structuring. It is named in honour of ...
(designed in 1968–1969), which up to the mid-1990s was the preferred tool for teaching introductory programming classes in academia.Roberts, E. 995br>Loop Exits and Structured Programming: Reopening the Debate
" ACM SIGCSE Bulletin, (27)1: 268–272.
Edward Yourdon Edward Nash Yourdon (April 30, 1944 – January 20, 2016) was an American software engineer, computer consultant, author and lecturer, and software engineering methodology pioneer. He was one of the lead developers of the structured analysis tech ...
notes that in the 1970s there was even philosophical opposition to transforming unstructured programs into structured ones by automated means, based on the argument that one needed to think in structured programming fashion from the get go. The pragmatic counterpoint was that such transformations benefited a large body of existing programs. Among the first proposals for an automated transformation was a 1971 paper by Edward Ashcroft and Zohar Manna. The direct application of the Böhm–Jacopini theorem may result in additional local variables being introduced in the structured chart, and may also result in some
code duplication In computer programming, duplicate code is a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a n ...
. The latter issue is called the loop and a half problem in this context. Pascal is affected by both of these problems and according to empirical studies cited by
Eric S. Roberts Eric S. Roberts is an American computer scientist noted for his contributions to computer science education through textbook authorship and his leadership in computing curriculum development. He is a co-chair of the ACM Education Council, forme ...
, student programmers had difficulty formulating correct solutions in Pascal for several simple problems, including writing a function for searching an element in an array. A 1980 study by Henry Shapiro cited by Roberts found that using only the Pascal-provided control structures, the correct solution was given by only 20% of the subjects, while no subject wrote incorrect code for this problem if allowed to write a return from the middle of a loop. In 1973,
S. Rao Kosaraju Sambasiva Rao Kosaraju is a professor of computer science at Johns Hopkins University, and division director for Computing & Communication Foundations at the National Science Foundation. Furthermore, Kosaraju proved that a strict hierarchy of programs exists, nowadays called the ''Kosaraju hierarchy'', in that for every integer ''n'', there exists a program containing a multi-level break of depth ''n'' that cannot be rewritten as program with multi-level breaks of depth less than ''n'' (without introducing additional variables). Kosaraju cites the multi-level break construct to the
BLISS BLISS is a system programming language developed at Carnegie Mellon University (CMU) by W. A. Wulf, D. B. Russell, and A. N. Habermann around 1970. It was perhaps the best known system language until C debuted a few years later. Since then, C b ...
programming language. The multi-level breaks, in the form a leave ''label'' keyword were actually introduced in the BLISS-11 version of that language; the original BLISS only had single-level breaks. The BLISS family of languages didn't provide an unrestricted goto. The
Java programming language Java is a high-level, class-based, object-oriented programming language that is designed to have as few implementation dependencies as possible. It is a general-purpose programming language intended to let programmers ''write once, run anywh ...
would later follow this approach as well. A simpler result from Kosaraju's paper is that a program is reducible to a structured program (without adding variables) if and only if it does not contain a loop with two distinct exits. Reducibility was defined by Kosaraju, loosely speaking, as computing the same function and using the same "primitive actions" and predicates as the original program, but possibly using different control flow structures. (This is a narrower notion of reducibility than what Böhm–Jacopini used.) Inspired by this result, in section VI of his highly-cited paper that introduced the notion of
cyclomatic complexity Cyclomatic complexity is a software metric used to indicate the complexity of a program. It is a quantitative measure of the number of linearly independent paths through a program's source code. It was developed by Thomas J. McCabe, Sr. in 1976. ...
, Thomas J. McCabe described an analogue of
Kuratowski's theorem In graph theory, Kuratowski's theorem is a mathematical forbidden graph characterization of planar graphs, named after Kazimierz Kuratowski. It states that a finite graph is planar if and only if it does not contain a subgraph that is a subdivi ...
for the
control-flow graph In computer science, a control-flow graph (CFG) is a representation, using graph notation, of all paths that might be traversed through a program during its execution. The control-flow graph was discovered by Frances E. Allen, who noted that ...
s (CFG) of non-structured programs, which is to say, the minimal subgraphs that make the CFG of a program non-structured. These subgraphs have a very good description in natural language. They are: # branching out of a loop (other than from the loop cycle test) # branching into a loop # branching into a decision (i.e. into an if "branch") # branching out of a decision McCabe actually found that these four graphs are not independent when appearing as subgraphs, meaning that a necessary and sufficient condition for a program to be non-structured is for its CFG to have as subgraph one of any subset of three of these four graphs. He also found that if a non-structured program contains one of these four sub-graphs, it must contain another distinct one from the set of four. This latter result helps explain how the control flow of non-structured program becomes entangled in what is popularly called "
spaghetti code Spaghetti code is a pejorative phrase for unstructured and difficult-to- maintain source code. Spaghetti code can be caused by several factors, such as volatile project requirements, lack of programming style rules, and software engineers with insu ...
". McCabe also devised a numerical measure that, given an arbitrary program, quantifies how far off it is from the ideal of being a structured program; McCabe called his measure essential complexity.The original paper is For a secondary exposition see McCabe's characterization of the forbidden graphs for structured programming can be considered incomplete, at least if the Dijkstra's D structures are considered the building blocks. Up to 1990 there were quite a few proposed methods for eliminating gotos from existing programs, while preserving most of their structure. The various approaches to this problem also proposed several notions of equivalence, which are stricter than simply Turing equivalence, in order to avoid output like the folk theorem discussed above. The strictness of the chosen notion of equivalence dictates the minimal set of control flow structures needed. The 1988
JACM The ''Journal of the ACM'' is a peer-reviewed scientific journal covering computer science in general, especially theoretical aspects. It is an official journal of the Association for Computing Machinery. Its current editor-in-chief is Venkatesan ...
paper by Lyle Ramshaw surveys the field up to that point, as well proposing its own method. Ramshaw's algorithm was used for example in some Java
decompiler A decompiler is a computer program that translates an executable file to a high-level source file which can be recompiled successfully. It does therefore the opposite of a typical compiler, which translates a high-level language to a low-level l ...
s because the
Java virtual machine A Java virtual machine (JVM) is a virtual machine that enables a computer to run Java programs as well as programs written in other languages that are also compiled to Java bytecode. The JVM is detailed by a specification that formally describes ...
code has branch instructions with targets expressed as offsets, but the high-level Java language only has multi-level break and continue statements.http://www.openjit.org/publications/pro1999-06/decompiler-pro-199906.pdf Ammarguellat (1992) proposed a transformation method that goes back to enforcing single-exit.


Application to Cobol

In the 1980s IBM researcher
Harlan Mills Harlan D. Mills (May 14, 1919 – January 8, 1996) was Professor of Computer Science at the Florida Institute of Technology and founder of Software Engineering Technology, Inc. of Vero Beach, Florida (since acquired by Q-Labs). Mills' cont ...
oversaw the development of the COBOL Structuring Facility, which applied a structuring algorithm to
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily us ...
code. Mills's transformation involved the following steps for each procedure. #Identify the
basic block In compiler construction, a basic block is a straight-line code sequence with no branches in except to the entry and no branches out except at the exit. This restricted form makes a basic block highly amenable to analysis. Compilers usually decom ...
s in the procedure. #Assign a unique
label A label (as distinct from signage) is a piece of paper, plastic film, cloth, metal, or other material affixed to a container or product, on which is written or printed information or symbols about the product or item. Information printed dir ...
to each block's entry path, and label each block's exit paths with the labels of the entry paths they connect to. Use 0 for return from the procedure and 1 for the procedure's entry path. #Break the procedure into its basic blocks. #For each block that is the destination of only one exit path, reconnect that block to that exit path. #Declare a new variable in the procedure (called L for reference). #On each remaining unconnected exit path, add a statement that sets L to the label value on that path. #Combine the resulting programs into a selection statement that executes the program with the entry path label indicated by L #Construct a loop that executes this selection statement as long as L is not 0. #Construct a sequence that initializes L to 1 and executes the loop. Note that this construction can be improved by converting some cases of the selection statement into subprocedures.


See also

*
Structured programming Structured programming is a programming paradigm aimed at improving the clarity, quality, and development time of a computer program by making extensive use of the structured control flow constructs of selection ( if/then/else) and repetition ( ...
*
Turing completeness In computability theory, a system of data-manipulation rules (such as a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be used to simulate any Tur ...


References


Further reading

Material not yet covered above: * * {{cite book , doi = 10.1007/3-540-57785-8_128 , chapter=One binary horn clause is enough , volume=775 , date=1994 , pages=19–32 , first=Philippe , last=Devienne, title=Stacs 94 , series=Lecture Notes in Computer Science , isbn=978-3-540-57785-0 , citeseerx=10.1.1.14.537 Programming language theory Models of computation Theorems in computational complexity theory