In
scientific computing
Computational science, also known as scientific computing or scientific computation (SC), is a field in mathematics that uses advanced computing capabilities to understand and solve complex problems. It is an area of science that spans many disc ...
, GotoBLAS and GotoBLAS2 are
open source implementations of the
BLAS
Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...
(Basic Linear Algebra Subprograms)
API
An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
with many hand-crafted optimizations for specific
processor
Processor may refer to:
Computing Hardware
* Processor (computing)
**Central processing unit (CPU), the hardware within a computer that executes a program
*** Microprocessor, a central processing unit contained on a single integrated circuit (I ...
types. GotoBLAS was developed by
Kazushige Goto at the
Texas Advanced Computing Center
The Texas Advanced Computing Center (TACC) at the University of Texas at Austin, United States, is an advanced computing research center that provides comprehensive advanced computing resources and support services to researchers in Texas and acr ...
. , it was used in seven of the world's ten fastest supercomputers.
GotoBLAS remains available, but development ceased with a final version touting optimal performance on Intel's
Nehalem architecture (contemporary in 2008).
OpenBLAS
In scientific computing, OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software ...
is an actively maintained fork of GotoBLAS, developed at the Lab of Parallel Software and Computational Science,
ISCAS.
GotoBLAS was written by Goto during his
sabbatical leave from the
Japan Patent Office in 2002. It was initially optimized for the
Pentium 4
Pentium 4 is a series of single-core CPUs for desktops, laptops and entry-level servers manufactured by Intel. The processors were shipped from November 20, 2000 until August 8, 2008. The production of Netburst processors was active from 200 ...
processor and managed to immediately boost the performance of a
supercomputer based on that CPU from 1.5
TFLOPS
In computing, floating point operations per second (FLOPS, flops or flop/s) is a measure of computer performance, useful in fields of scientific computations that require floating-point calculations. For such cases, it is a more accurate mea ...
to 2 TFLOPS.
, the library was available at no cost for noncommercial use.
A later open source version was released under the terms of the
BSD license
BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lice ...
.
GotoBLAS's
matrix-matrix multiplication routine, called GEMM in BLAS terms, is highly tuned for the
x86
x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
and
AMD64
x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging m ...
processor architectures by means of handcrafted
assembly code
In computer programming, assembly language (or assembler language, or symbolic machine code), often referred to simply as Assembly and commonly abbreviated as ASM or asm, is any low-level programming language with a very strong correspondence b ...
.
It follows a similar decomposition into smaller "kernel" routines that other BLAS implementations use, but where earlier implementations streamed data from the
L1 processor cache, GotoBLAS uses the
L2 cache
A CPU cache is a hardware cache used by the central processing unit (CPU) of a computer to reduce the average cost (time or energy) to access data from the main memory. A cache is a smaller, faster memory, located closer to a processor core, whic ...
.
The kernel used for GEMM is a routine called GEBP, for "General block-times-panel multiply",
which was experimentally found to be "inherently superior" over several other kernels that were considered in the design.
Several other BLAS routines are, as is customary in BLAS libraries, implemented in terms of GEMM.
See also
*
Automatically Tuned Linear Algebra Software
Automatically Tuned Linear Algebra Software (ATLAS) is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and Fortran77.
ATLAS is often recommended as a way to automatically generate an ...
(ATLAS)
*
Intel Math Kernel Library
Intel oneAPI Math Kernel Library (Intel oneMKL; formerly Intel Math Kernel Library or Intel MKL) is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, ...
(MKL)
References
{{Numerical linear algebra
Numerical linear algebra
Numerical software
Software using the BSD license