Within

theoretical computer science Theoretical computer science is a subfield of computer science and mathematics that focuses on the Abstraction, abstract and mathematical foundations of computation. It is difficult to circumscribe the theoretical areas precisely. The Associati ...

, the Sun–Ni law (or Sun and Ni's law, also known as memory-bounded speedup) is a memory-bounded

speedup In computer architecture, speedup is a number that measures the relative performance of two systems processing the same problem. More technically, it is the improvement in speed of execution of a task executed on two similar architectures with ...

model which states that as computing power increases the corresponding increase in problem size is constrained by the system’s memory capacity. In general, as a system grows in computational power, the problems run on the system increase in size. Analogous to

Amdahl's law In computer architecture, Amdahl's law (or Amdahl's argument) is a formula that shows how much faster a task can be completed when more resources are added to the system. The law can be stated as: "the overall performance improvement gained by ...

, which says that the problem size remains constant as system sizes grow, and

Gustafson's law In computer architecture, Gustafson's law (or Gustafson–Barsis's law) gives the speedup in the execution time of a task that theoretically gains from parallel computing, using a hypothetical run of ''the task'' on a single-core machine as the ba ...

, which proposes that the problem size should scale but be bound by a fixed amount of time, the Sun–Ni law states the problem size should scale but be bound by the memory capacity of the system. Sun–Ni law
Another View on Parallel Speedup, Xian-He Sun and Lionel Ni, Proceedings of ACM/IEEE Supercomputing Conference, IEEE Supercomputing Conference '90.
Scalable Problems and Memory-Bounded Speedup, X.H. Sun, and L. Ni, Journal of Parallel and Distributed Computing, Vol. 19, p. 27–37, Sept. 1993. was initially proposed by Xian-He Sun and Lionel Ni at the Proceedings of ACM/IEEE Supercomputing Conference, IEEE Supercomputing Conference 1990. With the increasing disparity between CPU speed and memory data access latency, application execution time often depends on the memory speed of the system.
Hitting the Memory Wall:Implications of the Obvious, Wm.A. Wulf and Sally A. McKee, ACM SIGARCH Computer Architecture News Vol. 23 p. 20–24 March 1995. As predicted by Sun and Ni, data access has become the premier performance bottleneck for high-end computing. From this one can see the intuition behind the law; as system resources increase, applications are often bottlenecked by memory speed and bandwidth, thus an application can achieve a larger speedup by utilizing all the memory capacity in the system. The law can be applied to different layers of a

memory hierarchy In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and contr ...

system, from

L1 cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which ...

to main memory. Through its memory-bounded function, ''W=G(M)'', it reveals the trade-off between computing and memory in algorithm and

system architecture A system architecture is the conceptual model that defines the structure, behavior, and views of a system. An architecture description is a formal description and representation of a system, organized in a way that supports reasoning about the s ...

design. All three speedup models, Sun–Ni, Gustafson, and Amdahl, provide a metric to analyze speedup for

parallel computing Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...

. Amdahl’s law focuses on the time reduction for a given fixed-size problem. Amdahl’s law states that the sequential portion of the problem (algorithm) limits the total speedup that can be achieved as system resources increase. Gustafson’s law suggests that it is beneficial to build a large-scale parallel system as the speedup can grow linearly with the system size if the problem size is scaled up to maintain a fixed execution time.
Reevaluating Amdahl's Law in the Multicore Era, Xian-He Sun and Yong Chen, ''Journal of Parallel and Distributed Computing'', Vol. 70 p. 183–188, February 2010. Yet as memory access latency often becomes the dominant factor in an application’s execution time,
Scaling the Bandwidth Wall:Challenges in and Avenues for CMP Scaling, Brian M. Rogers, Anil Krishna, Gorden B. Bell, Ken Vu, Xiaowei Jiang, and Yan Solihin, ACM SIGARCH Computer Architecture News, Vol. 37 p. 371–382, June 2009 applications may not scale up to meet the time bound constraint. The Sun–Ni law, instead of constraining the problem size by time, constrains the problem by the memory capacity of the system, or in other words bounds based on memory. The law is a generalization of Amdahl's Law and Gustafson's Law. When the memory-bounded function ''G(M)=1'', it resolves to Amdahl's law; when the memory-bounded function ''G(M)=m'', the number of processors, it resolves to Gustafson's law.

Derivation

Let

\textstyle W^

be the scaled workload under a memory space constraint. The memory bounded speedup can be defined as: ''Sequential Time to Solve W^*/Parallel Time to Solve W^*'' Suppose

\textstyle f

is the portion of the workload that can be parallelized and

\textstyle(1-f)

is the sequential portion of the workload. Let

\textstyle y = g(x)

be the function that reflects the parallel workload increase factor as the memory capacity increases m times. Let:

\textstyle W = g(M)

and:

\textstyle W^ = g(m\cdot M)

where

\textstyle M

is the memory capacity of one node. Thus,

W^ = g(m\cdot g^(W))

The memory bounded speedup is then:

\frac

For any power function

\textstyle g(x) = ax^

and for any rational numbers ''a'' and ''b'', we have:

g(mx) = a(mx)^ = m^\cdot ax^ = m^g(x) = \bar(m)g(x)

where

\textstyle \bar(m)

is the power function with the coefficient as 1. Thus by taking the highest degree term to determine the complexity of the algorithm, one can rewrite memory bounded speedup as:

\frac =  \frac

In this equation,

\textstyle \bar(m)

represents the influence of memory change on the change in problem size. Suppose

\textstyle \bar(m) =1

. Then the memory-bounded speedup model reduces to

, since problem size is fixed or independent of resource increase. Suppose

\textstyle \bar(m)=m

, Then the memory-bounded speedup model reduces to

, which means when memory capacity increases ''m'' times and the workload also increases ''m'' times all the data needed is local to every node in the system. Often, for simplicity and for matching the notation of Amdahl's Law and Gustafson's Law, the letter ''G'' is used to represent the memory bound function

\textstyle \bar(m)

, and ''n'' replaces ''m''. Using this notation one gets:

Speedup_\text  = \frac

Examples

Suppose one would like to determine the memory-bounded speedup of matrix multiplication. The memory requirement of matrix multiplication is roughly

\textstyle x = 3N^

where ''N'' is the dimension of the two ''N X N'' source matrices. And the computation requirement is

2N^

Thus we have:

g(x) = 2(x/3)^ = \fracx^

and

\bar(x) = x^

Thus the memory-bounded speedup is for matrix multiplication is:

\frac =  \frac

The following is another matrix multiplication example which illustrates the rapid increase in parallel execution time. The execution time of a ''N X N'' matrix for a uniprocessor is:

O(n^)

. While the memory usage is:

O(n^)

Suppose a ''10000''-by-''10000'' matrix takes ''800 MB'' of memory and can be factorized in ''1'' hour on a uniprocessor. Now for the scaled workload suppose is possible to factorize a ''320,000''-by-''320,000'' matrix in ''32'' hours. The time increase is quite large, but the increase in problem size may be more valuable for someones whose premier goal is accuracy. For example, an astrophysicist may be more interested in simulating an

N-body problem In physics, the -body problem is the problem of predicting the individual motions of a group of astronomical object, celestial objects interacting with each other gravitationally.Leimanis and Minorsky: Our interest is with Leimanis, who first d ...

with as the number of particles as large as possible. This example shows for computation intensive applications, memory capacity does not need to proportionally scale up with computing power.

Applications/Effects of Sun–Ni's law

The memory-bounded speedup model is the first work to reveal that memory is the performance constraint for high-end computing and presents a quantitative mathematical formulation for the trade-off between memory and computing. It is based on the memory-bounded function,''W=G(n)'', where W is the work and thus also the computation for most applications. ''M'' is the memory requirement in terms of capacity, and ''G'' is the reuse rate. ''W=G(M)'' gives a very simple, but effective, description of the relation between computation and memory requirement. From an architecture viewpoint, the memory-bounded model suggests the size, as well as speed, of the cache(s) should match the CPU performance. Today, modern microprocessors such as the

Pentium Pro The Pentium Pro is a sixth-generation x86 microprocessor developed and manufactured by Intel and introduced on November 1, 1995. It implements the P6 (microarchitecture), P6 microarchitecture (sometimes termed i686), and was the first x86 Intel C ...

Alpha 21164 The Alpha 21164, also known by its code name, EV5, is a microprocessor developed and fabricated by Digital Equipment Corporation that implemented the Alpha instruction set architecture (ISA). It was introduced in January 1995, succeeding the Al ...

, Strong Arm SA110, and Longson-3A use 80% or more of their transistors for the on-chip cache rather than computing components. From an algorithm design viewpoint, we should reduce the number of memory accesses. That is, reuse the data when it is possible. The function ''G()'' gives the reuse rate. Today, the term

memory bound function Memory bound refers to a situation in which the time to complete a given computational problem is decided primarily by the amount of free memory required to hold the working data. This is in contrast to algorithms that are compute-bound, where th ...

s has become a general term which refers to functions which involve extensive memory access.U.Meyer, P.Sanders, and J.F. Sibeyn (Eds.), Algorithms for Memory Hierarchies, Vol. 2625 of LNCS Springer, 2003. Memory complexity analysis has become a discipline of computer algorithm analysis.

References

{{DEFAULTSORT:Sun-Ni law Computer architecture statements Theoretical computer science