computer science Computer science is the study of computation, automation, and information. Computer science spans theoretical disciplines (such as algorithms, theory of computation, information theory, and automation) to practical disciplines (includin ...

, the Sethi–Ullman algorithm is an

algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...

named after

Ravi Sethi Ravi Sethi (born 1947) is an Indian computer scientist retired from executive roles at Bell Labs and Avaya, Avaya Labs. He also serves as a member of the National Science Foundation's Computer and Information Science and Engineering (CISE) Advis ...

and Jeffrey D. Ullman, its inventors, for translating

abstract syntax tree In computer science, an abstract syntax tree (AST), or just syntax tree, is a tree representation of the abstract syntactic structure of text (often source code) written in a formal language. Each node of the tree denotes a construct occurri ...

s into

machine code In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a ve ...

that uses as few registers as possible.

Overview

When generating code for arithmetic expressions, the

compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...

has to decide which is the best way to translate the expression in terms of number of instructions used as well as number of registers needed to evaluate a certain subtree. Especially in the case that free registers are scarce, the order of evaluation can be important to the length of the generated code, because different orderings may lead to larger or smaller numbers of intermediate values being spilled to memory and then restored. The Sethi–Ullman algorithm (also known as Sethi–Ullman numbering) produces code which needs the fewest instructions possible as well as the fewest storage references (under the assumption that at the most

commutativity In mathematics, a binary operation is commutative if changing the order of the operands does not change the result. It is a fundamental property of many binary operations, and many mathematical proofs depend on it. Most familiar as the name of ...

and

associativity In mathematics, the associative property is a property of some binary operations, which means that rearranging the parentheses in an expression will not change the result. In propositional logic, associativity is a valid rule of replacemen ...

apply to the operators used, but distributive laws i.e.

a * b + a * c = a * (b + c)

do not hold). The algorithm succeeds as well if neither

nor

hold for the expressions used, and therefore arithmetic transformations can not be applied. The algorithm also does not take advantage of common subexpressions or apply directly to expressions represented as general directed acyclic graphs rather than trees.

Simple Sethi–Ullman algorithm

The simple Sethi–Ullman algorithm works as follows (for a load/store architecture): # Traverse the

in pre- or postorder ## For every non-constant leaf node, assign a 1 (i.e. 1 register is needed to hold the variable/field/etc.) if it is the left child of its parent else assign a 0. For every constant leaf node (RHS of an operation – literals, values), assign a 0. ## For every non-leaf node ''n'', assign the number of registers needed to evaluate the respective subtrees of ''n''. If the number of registers needed in the left subtree (''l'') are not equal to the number of registers needed in the right subtree (''r''), the number of registers needed for the current node ''n'' is max(l, r). If ''l

r'', then the number of registers needed for the current node is ''r'' + 1. # Code emission ## If the number of registers needed to compute the left subtree of node ''n'' is bigger than the number of registers for the right subtree, then the left subtree is evaluated first (since it may be possible that the one more register needed by the right subtree to save the result makes the left subtree spill). If the right subtree needs more registers than the left subtree, the right subtree is evaluated first accordingly. If both subtrees need an equal number of registers, then the order of evaluation is irrelevant.

Example

For an arithmetic expression

a = (b + c+ f * g) * (d + 3)

, the

looks like this: = / \ a * / \ / \ + + / \ / \ / \ d 3 + * / \ / \ b c f g To continue with the algorithm, we need only to examine the arithmetic expression

(b + c + f * g) * (d + 3)

, i.e. we only have to look at the right subtree of the assignment '=': * / \ / \ + + / \ / \ / \ d 3 + * / \ / \ b c f g Now we start traversing the tree (in preorder for now), assigning the number of registers needed to evaluate each subtree (note that the last summand in the expression

(b + c + f * g) * (d + 3)

is a constant): *₂ / \ / \ +₂ +₁ / \ / \ / \ d₁ 3₀ +₁ *₁ / \ / \ b₁ c₀f₁ g₀ From this tree it can be seen that we need 2 registers to compute the left subtree of the '*', but only 1 register to compute the right subtree. Nodes 'c' and 'g' do not need registers for the following reasons: If T is a tree leaf, then the number of registers to evaluate T is either 1 or 0 depending whether T is a left or a right subtree (since an operation such as add R1, A can handle the right component A directly without storing it into a register). Therefore we shall start to emit code for the left subtree first, because we might run into the situation that we only have 2 registers left to compute the whole expression. If we now computed the right subtree first (which needs only 1 register), we would then need a register to hold the result of the right subtree while computing the left subtree (which would still need 2 registers), therefore needing 3 registers concurrently. Computing the left subtree first needs 2 registers, but the result can be stored in 1, and since the right subtree needs only 1 register to compute, the evaluation of the expression can do with only 2 registers left.

Advanced Sethi–Ullman algorithm

In an advanced version of the Sethi–Ullman algorithm, the arithmetic expressions are first transformed, exploiting the algebraic properties of the operators used.

References

External links