HOME

TheInfoList



OR:

In
scientific computing Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
, GotoBLAS and GotoBLAS2 are open source implementations of the
BLAS Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...
(Basic Linear Algebra Subprograms)
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
with many hand-crafted optimizations for specific
processor Processor may refer to: Computing Hardware * Processor (computing) **Central processing unit (CPU), the hardware within a computer that executes a program *** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
types. GotoBLAS was developed by Kazushige Goto at the
Texas Advanced Computing Center The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that provides comprehensive advanced computing resources and support services to researchers in Texas and acr ...
. , it was used in seven of the world's ten fastest supercomputers. GotoBLAS remains available, but development ceased with a final version touting optimal performance on Intel's Nehalem architecture (contemporary in 2008).
OpenBLAS In scientific computing, OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software ...
is an actively maintained fork of GotoBLAS, developed at the Lab of Parallel Software and Computational Science, ISCAS. GotoBLAS was written by Goto during his sabbatical leave from the Japan Patent Office in 2002. It was initially optimized for the
Pentium 4 Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 200 ...
processor and managed to immediately boost the performance of a supercomputer based on that CPU from 1.5
TFLOPS In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
to 2 TFLOPS. , the library was available at no cost for noncommercial use. A later open source version was released under the terms of the
BSD license BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lice ...
. GotoBLAS's matrix-matrix multiplication routine, called GEMM in BLAS terms, is highly tuned for the
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
and
AMD64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging m ...
processor architectures by means of handcrafted
assembly code In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
. It follows a similar decomposition into smaller "kernel" routines that other BLAS implementations use, but where earlier implementations streamed data from the L1 processor cache, GotoBLAS uses the
L2 cache A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
. The kernel used for GEMM is a routine called GEBP, for "General block-times-panel multiply", which was experimentally found to be "inherently superior" over several other kernels that were considered in the design. Several other BLAS routines are, as is customary in BLAS libraries, implemented in terms of GEMM.


See also

*
Automatically Tuned Linear Algebra Software Automatically Tuned Linear Algebra Software (ATLAS) is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and Fortran77. ATLAS is often recommended as a way to automatically generate an ...
(ATLAS) *
Intel Math Kernel Library Intel oneAPI Math Kernel Library (Intel oneMKL; formerly Intel Math Kernel Library or Intel MKL) is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, ...
(MKL)


References

{{Numerical linear algebra Numerical linear algebra Numerical software Software using the BSD license