Communication-avoiding Algorithm
   HOME
*



picture info

Communication-avoiding Algorithm
Communication-avoiding algorithms minimize movement of data within a memory hierarchy for improving its running-time and energy consumption. These minimize the total of two costs (in terms of time and energy): arithmetic and communication. Communication, in this context refers to moving data, either between levels of memory or between multiple processors over a network. It is much more expensive than arithmetic. Motivation Consider the following running-time model: * Measure of computation = Time per FLOP = γ * Measure of communication = No. of words of data moved = β ⇒ Total running time = γ·(no. of FLOPs) + β·(no. of words) From the fact that ''β'' >> ''γ'' as measured in time and energy, communication cost dominates computation cost. Technological trends indicate that the relative cost of communication is increasing on a variety of platforms, from cloud computing to supercomputers to mobile devices. The report also predicts that gap between DRAM access t ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Memory Hierarchy
In computer architecture, the memory hierarchy separates computer storage into a hierarchy based on response time. Since response time, complexity, and capacity are related, the levels may also be distinguished by their performance and controlling technologies. Memory hierarchy affects performance in computer architectural design, algorithm predictions, and lower level programming constructs involving locality of reference. Designing for high performance requires considering the restrictions of the memory hierarchy, i.e. the size and capabilities of each component. Each of the various components can be viewed as part of a hierarchy of memories (m1, m2, ..., mn) in which each member mi is typically smaller and faster than the next highest member mi+1 of the hierarchy. To limit waiting by higher levels, a lower level will respond by filling a buffer and then signaling for activating the transfer. There are four major storage levels. * ''Internal'' – Processor registers and ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Cache-oblivious Algorithm
In computing, a cache-oblivious algorithm (or cache-transcendent algorithm) is an algorithm designed to take advantage of a processor cache without having the size of the cache (or the length of the cache lines, etc.) as an explicit parameter. An optimal cache-oblivious algorithm is a cache-oblivious algorithm that uses the cache optimally (in an asymptotic sense, ignoring constant factors). Thus, a cache-oblivious algorithm is designed to perform well, without modification, on multiple machines with different cache sizes, or for a memory hierarchy with different levels of cache having different sizes. Cache-oblivious algorithms are contrasted with explicit ''loop tiling'', which explicitly breaks a problem into blocks that are optimally sized for a given cache. Optimal cache-oblivious algorithms are known for matrix multiplication, matrix transposition, sorting, and several other problems. Some more general algorithms, such as Cooley–Tukey FFT, are optimally cache-oblivious ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Parallel Computing
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different forms of parallel computing: bit-level, instruction-level, data, and task parallelism. Parallelism has long been employed in high-performance computing, but has gained broader interest due to the physical constraints preventing frequency scaling.S.V. Adve ''et al.'' (November 2008)"Parallel Computing Research at Illinois: The UPCRC Agenda" (PDF). Parallel@Illinois, University of Illinois at Urbana-Champaign. "The main techniques for these performance benefits—increased clock frequency and smarter but increasingly complex architectures—are now hitting the so-called power wall. The computer industry has accepted that future performance increases must largely come from increasing the number of processors (or cores) on a die, rather than m ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Laura Grigori
Laura Grigori is a French-Romanian applied mathematician and computer scientist known for her research on numerical linear algebra and communication-avoiding algorithms. She is a director of research for the French Institute for Research in Computer Science and Automation (INRIA) in Paris, and heads the "Alpines" scientific computing project jointly affiliated with INRIA and the of Sorbonne University. Education and career Grigori earned her Ph.D. from Henri Poincaré University in 2001. Her dissertation, ''Prédiction de structure et algorithmique parallèle pour la factorisation LU des matrices creuses'', concerned parallel algorithms for LU decomposition of sparse matrices, and was supervised by . After postdoctoral research at the University of California, Berkeley and the Lawrence Berkeley National Laboratory, she became a researcher for INRIA in 2004, and became the head of the Alpines project in 2013. In 2021, she will join the SIAM Council as a Member-at-Large. Recogni ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Information Processing Techniques Office
The Information Processing Techniques Office (IPTO), originally "Command and Control Research",Lyon, Matthew; Hafner, Katie (1999-08-19). ''Where Wizards Stay Up Late: The Origins Of The Internet'' (p. 39). Simon & Schuster. Kindle Edition. was part of the Defense Advanced Research Projects Agency of the United States Department of Defense. Origin According to an ARPA-sponsored history of the organization, IPTO grew from a distinctly unpromising beginning: the Air Force had a large, expensive computer ( AN/FSQ 321A) which was intended as a backup for the SAGE air defense program, but no longer needed; and it also had too few required tasks to maintain the desired staffing level at its main software contractor, the System Development Corporation (SDC). Accordingly, the Under Secretary of Defense for Research and Engineering decided to capitalize on these "sunk costs" and SDC expertise by standing up an ARPA program in Command & Control Research. It was accordingly begun in June 1961 ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Defense Advanced Research Projects Agency
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Advanced Research Projects Agency (ARPA), the agency was created on February 7, 1958, by President Dwight D. Eisenhower in response to the Soviet Union, Soviet launching of Sputnik 1 in 1957. By collaborating with academia, industry, and government partners, DARPA formulates and executes research and development projects to expand the frontiers of technology and science, often beyond immediate U.S. military requirements.Dwight D. Eisenhower and Science & Technology, (2008). Dwight D. Eisenhower Memorial CommissionSource ''The Economist'' has called DARPA the agency "that shaped the modern world," and pointed out that "Moderna COVID-19 vaccine, Moderna's COVID-19 vaccine sits alongside weather satellites, Global Positioning System, GPS, Unmann ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Data Locality
In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference locality temporal and spatial locality. Temporal locality refers to the reuse of specific data and/or resources within a relatively small time duration. Spatial locality (also termed ''data locality''"NIST Big Data Interoperability Framework: Volume 1"urn:doi:10.6028/NIST.SP.1500-1r2) refers to the use of data elements within relatively close storage locations. Sequential locality, a special case of spatial locality, occurs when data elements are arranged and accessed linearly, such as traversing the elements in a one-dimensional Array data structure, array. Locality is a type of predictability, predictable behavior that occurs in computer systems. Systems that exhibit strong ''locality of reference'' are great candidates for performance optimiza ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Fast Fourier Transform
A fast Fourier transform (FFT) is an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies. This operation is useful in many fields, but computing it directly from the definition is often too slow to be practical. An FFT rapidly computes such transformations by factorizing the DFT matrix into a product of sparse (mostly zero) factors. As a result, it manages to reduce the complexity of computing the DFT from O\left(N^2\right), which arises if one simply applies the definition of DFT, to O(N \log N), where N is the data size. The difference in speed can be enormous, especially for long data sets where ''N'' may be in the thousands or millions. In the presence of round-off error, many FFT algorithm ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

FLOP
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate measure than measuring instructions per second. Floating-point arithmetic Floating-point arithmetic is needed for very large or very small real numbers, or computations that require a large dynamic range. Floating-point representation is similar to scientific notation, except everything is carried out in base two, rather than base ten. The encoding scheme stores the sign, the exponent (in base two for Cray and VAX, base two or ten for IEEE floating point formats, and base 16 for IBM Floating Point Architecture) and the significand (number after the radix point). While several similar formats are in use, the most common is ANSI/IEEE Std. 754-1985. This standard defines the format for 32-bit numbers called ''single precision'', as well as 64-b ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

CPU Cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit (MMU) w ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Matrix Multiplication Algorithm Diagram
Matrix most commonly refers to: * ''The Matrix'' (franchise), an American media franchise ** ''The Matrix'', a 1999 science-fiction action film ** "The Matrix", a fictional setting, a virtual reality environment, within ''The Matrix'' (franchise) * Matrix (mathematics), a rectangular array of numbers, symbols or expressions Matrix (or its plural form matrices) may also refer to: Science and mathematics * Matrix (mathematics), algebraic structure, extension of vector into 2 dimensions * Matrix (logic), part of a formula in prenex normal form * Matrix (biology), the material in between a eukaryotic organism's cells * Matrix (chemical analysis), the non-analyte components of a sample * Matrix (geology), the fine-grained material in which larger objects are embedded * Matrix (composite), the constituent of a composite material * Hair matrix, produces hair * Nail matrix, part of the nail in anatomy Arts and entertainment Fictional entities * Matrix (comics), two comic book ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]