GotoBLAS

	GotoBLAS In scientific computing, GotoBLAS and GotoBLAS2 are open source implementations of the BLAS (Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types. GotoBLAS was developed by Kazushige Goto at the Texas Advanced Computing Center. , it was used in seven of the world's ten fastest supercomputers. GotoBLAS remains available, but development ceased with a final version touting optimal performance on Intel's Nehalem architecture (contemporary in 2008). OpenBLAS is an actively maintained fork of GotoBLAS, developed at the Lab of Parallel Software and Computational Science, ISCAS. GotoBLAS was written by Goto during his sabbatical leave from the Japan Patent Office in 2002. It was initially optimized for the Pentium 4 processor and managed to immediately boost the performance of a supercomputer based on that CPU from 1.5 TFLOPS to 2 TFLOPS. , the library was available at no cost for noncommercial use. A later open source version was r ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Basic Linear Algebra Subprograms Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. They are the ''de facto'' standard low-level routines for linear algebra libraries; the routines have bindings for both C ("CBLAS interface") and Fortran ("BLAS interface"). Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions. It originated as a Fortran library in 1979* and its interface was standardized by the BLAS Technical (BLAST) Forum, whose latest BLAS report can be found on the netlib website. This Fortran library is known as the ''reference implementation'' (sometimes co ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	OpenBLAS In scientific computing, OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software and Computational Science, ISCAS. OpenBLAS adds optimized implementations of linear algebra kernels for several processor architectures, including Intel Sandy Bridge and Loongson Loongson () is the name of a family of general-purpose, MIPS architecture-compatible microprocessors, as well as the name of the Chinese fabless company (Loongson Technology) that develops them. The processors are alternately called Godson proc .... It claims to achieve performance comparable to the Intel MKL: this mostly holds true on the BLAS part, while the LAPACK part falls behind. On machines that support the AVX2 instruction set, OpenBLAS can achieve similar performance to MKL, but there are currently almost no open source libraries comparable to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Kazushige Goto is a software engineer specializing in high performance, hand-written, machine code. Education Goto was a research associate at the Texas Advanced Computing Center at the University of Texas at Austin when he wrote his famously hand-optimized assembly routines for supercomputing A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instructions ... and PC platforms that outperform the best compiler generated code. Several of the fastest supercomputers in the world still use his implementation of the Basic Linear Algebra Subprograms (BLAS) known as GotoBLAS. Career In 2010, Goto joined Microsoft's Technical Computing Group with the title of Senior Researcher. In July 2012, he joined Intel with the title of Software Engineer. Goto continues to write hand-optimized machine code, utilizing ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Math Kernel Library Intel oneAPI (compute acceleration), oneAPI Math Kernel Library (Intel oneMKL; formerly Intel Math Kernel Library or Intel MKL) is a Library (computer science), library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The library supports Intel processors and is available for Windows, Linux and macOS operating systems. ''Intel oneAPI Math Kernel Library'' is not to be confused with ''oneAPI Math Kernel Library'' (oneMKL) Interfaces, a piece of open-source glue code that allows Intel MKL routines to be used from Data Parallel C++. History and licensing Intel launched the Math Kernel Library on May 9, 2003, and called it blas.lib. The project's development teams are located in Russia and the United States. The library was available in a standalone form, free of charge under the terms of Intel Simplified Software License which allow redist ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	BSD License BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD license was used for its namesake, the Berkeley Software Distribution (BSD), a Unix-like operating system. The original version has since been revised, and its descendants are referred to as modified BSD licenses. BSD is both a license and a class of license (generally referred to as BSD-like). The modified BSD license (in wide use today) is very similar to the license originally used for the BSD version of Unix. The BSD license is a simple license that merely requires that all code retain the BSD license notice if redistributed in source code format, or reproduce the notice if redistributed in binary format. The BSD license (unlike some other licenses e.g. GPL) does not require that source code be distributed at all. Terms In additi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Numerical Linear Algebra Numerical linear algebra, sometimes called applied linear algebra, is the study of how matrix operations can be used to create computer algorithms which efficiently and accurately provide approximate answers to questions in continuous mathematics. It is a subfield of numerical analysis, and a type of linear algebra. Computers use floating-point arithmetic and cannot exactly represent irrational data, so when a computer algorithm is applied to a matrix of data, it can sometimes increase the difference between a number stored in the computer and the true number that it is an approximation of. Numerical linear algebra uses properties of vectors and matrices to develop computer algorithms that minimize the error introduced by the computer, and is also concerned with ensuring that the algorithm is as efficient as possible. Numerical linear algebra aims to solve problems of continuous mathematics using finite precision computers, so its applications to the natural and social sciences ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	ACM Transactions On Mathematical Software ''ACM Transactions on Mathematical Software'' (''TOMS'') is a quarterly scientific journal that aims to disseminate the latest findings of note in the field of numeric, symbolic, algebraic, and geometric computing applications. It is one of the oldest scientific journals specifically dedicated to mathematical algorithms and their implementation in software, and has been published since March 1975 by the Association for Computing Machinery (ACM). The journal is described as follows on thTOMS Homepage of the ACM Digital Librarypage: The purpose of the journal was laid out by its founding editor, John Rice, in the inaugural issue. The decision to found the journal came out of the 1970 Mathematical Software Symposium at Purdue University, also organized by Rice, who then negotiated with both SIAM and the ACM regarding its publication. The journal publishes two kinds of articles: Regular research papers that advance the development of algorithms and software for mathematical computi ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	New York Times ''The New York Times'' (''the Times'', ''NYT'', or the Gray Lady) is a daily newspaper based in New York City with a worldwide readership reported in 2020 to comprise a declining 840,000 paid print subscribers, and a growing 6 million paid digital subscribers. It also is a producer of popular podcasts such as '' The Daily''. Founded in 1851 by Henry Jarvis Raymond and George Jones, it was initially published by Raymond, Jones & Company. The ''Times'' has won 132 Pulitzer Prizes, the most of any newspaper, and has long been regarded as a national "newspaper of record". For print it is ranked 18th in the world by circulation and 3rd in the U.S. The paper is owned by the New York Times Company, which is publicly traded. It has been governed by the Sulzberger family since 1896, through a dual-class share structure after its shares became publicly traded. A. G. Sulzberger, the paper's publisher and the company's chairman, is the fifth generation of the family to head the p ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Automatically Tuned Linear Algebra Software Automatically Tuned Linear Algebra Software (ATLAS) is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and Fortran77. ATLAS is often recommended as a way to automatically generate an optimized BLAS library. While its performance often trails that of specialized libraries written for one specific hardware platform, it is often the first or even only optimized BLAS implementation available on new systems and is a large improvement over the generic BLAS available at Netlib. For this reason, ATLAS is sometimes used as a performance baseline for comparison with other products. ATLAS runs on most Unix-like operating systems and on Microsoft Windows (using Cygwin). It is released under a BSD-style license without advertising clause, and many well-known mathematics applications including MATLAB, Mathematica, Scilab, SageMath, and some builds of GNU Octave may use it. Functionality ATLAS provides a full implementation of the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	L2 Cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit (MMU) ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	CPU Cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, which stores copies of the data from frequently used main memory locations. Most CPUs have a hierarchy of multiple cache levels (L1, L2, often L3, and rarely even L4), with different instruction-specific and data-specific caches at level 1. The cache memory is typically implemented with static random-access memory (SRAM), in modern CPUs by far the largest part of them by chip area, but SRAM is not always used for all levels (of I- or D-cache), or even any level, sometimes some latter or all levels are implemented with eDRAM. Other types of caches exist (that are not counted towards the "cache size" of the most important caches mentioned above), such as the translation lookaside buffer (TLB) which is part of the memory management unit ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]