Automatically Tuned Linear Algebra Software (ATLAS) is a

software library In computer science, a library is a collection of non-volatile resources used by computer programs, often for software development. These may include configuration data, documentation, help data, message templates, pre-written code and subr ...

for

linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...

. It provides a mature

open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...

implementation of

BLAS Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...

APIs for C and Fortran77. ATLAS is often recommended as a way to automatically generate an optimized BLAS library. While its performance often trails that of specialized libraries written for one specific hardware platform, it is often the first or even only optimized BLAS implementation available on new systems and is a large improvement over the generic BLAS available at

Netlib Netlib is a repository of software for scientific computing maintained by AT&T, Bell Laboratories, the University of Tennessee and Oak Ridge National Laboratory. Netlib comprises many separate programs and libraries. Most of the code is written in ...

. For this reason, ATLAS is sometimes used as a performance baseline for comparison with other products. ATLAS runs on most

Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, and ot ...

-like operating systems and on

Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...

(using

Cygwin Cygwin ( ) is a POSIX-compatible programming and runtime environment that runs natively on Microsoft Windows. Under Cygwin, source code designed for Unix-like operating systems may be compiled with minimal modification and executed. The Cygwin in ...

). It is released under a

BSD-style license BSD licenses are a family of permissive free software licenses, imposing minimal restrictions on the use and distribution of covered software. This is in contrast to copyleft licenses, which have share-alike requirements. The original BSD lice ...

without advertising clause, and many well-known mathematics applications including

MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementation ...

Mathematica Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimizat ...

Scilab Scilab is a free and open-source, cross-platform numerical computational package and a high-level, numerically oriented programming language. It can be used for signal processing, statistical analysis, image enhancement, fluid dynamics simulat ...

SageMath SageMath (previously Sage or SAGE, "System for Algebra and Geometry Experimentation") is a computer algebra system (CAS) with features covering many aspects of mathematics, including algebra, combinatorics, graph theory, numerical analysis, numbe ...

, and some builds of

GNU Octave GNU Octave is a high-level programming language primarily intended for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a langu ...

may use it.

Functionality

ATLAS provides a full implementation of the BLAS APIs as well as some additional functions from

LAPACK LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also ...

, a higher-level library built on top of BLAS. In BLAS, functionality is divided into three groups called levels 1, 2 and 3. * Level 1 contains ''vector operations'' of the form ::

\mathbf \leftarrow \alpha \mathbf + \mathbf \!

:as well as scalar

dot product In mathematics, the dot product or scalar productThe term ''scalar product'' means literally "product with a scalar as a result". It is also used sometimes for other symmetric bilinear forms, for example in a pseudo-Euclidean space. is an algebra ...

s and

vector norm In mathematics, a norm is a function (mathematics), function from a real number, real or complex number, complex vector space to the non-negative real numbers that behaves in certain ways like the distance from the Origin (mathematics), origin: it ...

s, among other things. * Level 2 contains ''matrix-vector operations'' of the form ::

\mathbf \leftarrow \alpha A \mathbf + \beta \mathbf \!

:as well as solving

T \mathbf = \mathbf

for

\mathbf

with

T

being triangular, among other things. * Level 3 contains ''matrix-matrix operations'' such as the widely used

General Matrix Multiply Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix ...

(GEMM) operation ::

C \leftarrow \alpha A B + \beta C \!

:as well as solving

B \leftarrow \alpha T^ B

for triangular matrices

T

, among other things.

Optimization approach

The

optimization Mathematical optimization (alternatively spelled ''optimisation'') or mathematical programming is the selection of a best element, with regard to some criterion, from some set of available alternatives. It is generally divided into two subfi ...

approach is called Automated Empirical Optimization of Software (AEOS), which identifies four fundamental approaches to computer assisted optimization of which ATLAS employs three: #

Parameter A parameter (), generally, is any characteristic that can help in defining or classifying a particular system (meaning an event, project, object, situation, etc.). That is, a parameter is an element of a system that is useful, or critical, when ...

ization—searching over the parameter space of a function, used for blocking factor, cache edge, etc. # Multiple implementation—searching through various approaches to implementing the same function, e.g., for SSE support before intrinsics made them available in C code # Code generation—programs that write programs incorporating what knowledge they can about what will produce the best performance for the system * Optimization of the level 1 BLAS uses parameterization and multiple implementation : Every ATLAS level 1 BLAS function has its own kernel. Since it would be difficult to maintain thousands of cases in ATLAS there is little architecture specific optimization for Level 1 BLAS. Instead multiple implementation is relied upon to allow for

compiler optimization In computing, an optimizing compiler is a compiler that tries to minimize or maximize some attributes of an executable computer program. Common requirements are to minimize a program's execution time, memory footprint, storage size, and power con ...

to produce high performance implementation for the system. * Optimization of the level 2 BLAS uses parameterization and multiple implementation : With

N^2

data and

N^2

operations to perform the function is usually limited by bandwidth to memory, and thus there is not much opportunity for optimization : All routines in the ATLAS level 2 BLAS are built from two Level 2 BLAS kernels: ** GEMV—matrix by vector multiply update: ::

\mathbf \leftarrow \alpha A \mathbf + \beta \mathbf \!

** GER—general rank 1 update from an outer product: ::

A \leftarrow \alpha \mathbf \mathbf^T + A \!

* Optimization of the level 3 BLAS uses code generation and the other two techniques : Since we have

N^3

ops with only

N^2

data, there are many opportunities for optimization

Level 3 BLAS

Most of the Level 3 BLAS is derived from GEMM, so that is the primary focus of the optimization. :

O(n^3)

operations vs.

O(n^2)

data The intuition that the

n^3

operations will dominate over the

n^2

data accesses only works for roughly square matrices. The real measure should be some kind of surface area to volume. The difference becomes important for very non-square matrices.

Can it afford to copy?

Copying the inputs allows the data to be arranged in a way that provides optimal access for the kernel functions, but this comes at the cost of allocating temporary space, and an extra read and write of the inputs. So the first question GEMM faces is, can it afford to copy the inputs? If so, * Put into block major format with good alignment * Take advantage of user contributed kernels and cleanup * Handle the transpose cases with the copy: make everything into TN (transpose - no-transpose) * Deal with α in the copy If not, * Use the nocopy version * Make no assumptions on the stride of matrix ''A'' and ''B'' in memory * Handle all transpose cases explicitly * No guarantee about alignment of data * Support α specific code * Run the risk of TLB issues, bad strides, etc. The actual decision is made through a simple

heuristic A heuristic (; ), or heuristic technique, is any approach to problem solving or self-discovery that employs a practical method that is not guaranteed to be optimal, perfect, or rational, but is nevertheless sufficient for reaching an immediate, ...

which checks for "skinny cases".

Cache edge

For 2nd Level Cache blocking a single cache edge parameter is used. The high level choose an order to traverse the blocks: ''ijk, jik, ikj, jki, kij, kji''. These need not be the same order as the product is done within a block. Typically chosen orders are ''ijk'' or ''jik''. For ''jik'' the ideal situation would be to copy ''A'' and the ''NB'' wide panel of ''B''. For ''ijk'' swap the role of ''A'' and ''B''. Choosing the bigger of ''M'' or ''N'' for the outer loop reduces the footprint of the copy. But for large ''K'' ATLAS does not even allocate such a large amount of memory. Instead it defines a parameter, ''Kp'', to give best use of the L2 cache. Panels are limited to ''Kp'' in length. It first tries to allocate (in the ''jik'' case)

M\cdot p + NB\cdot Kp + NB\cdot NB

. If that fails it tries

2\cdot Kp\cdot NB + NB\cdot NB

. (If that fails it uses the no-copy version of GEMM, but this case is unlikely for reasonable choices of cache edge.) ''Kp'' is a function of cache edge and ''NB''.

LAPACK

When integrating the ATLAS BLAS with

an important consideration is the choice of blocking factor for LAPACK. If the ATLAS blocking factor is small enough the blocking factor of LAPACK could be set to match that of ATLAS. To take advantage of recursive factorization, ATLAS provides replacement routines for some LAPACK routines. These simply overwrite the corresponding LAPACK routines from

Need for installation

Installing ATLAS on a particular platform is a challenging process which is typically done by a system vendor or a local expert and made available to a wider audience. For many systems, architectural default parameters are available; these are essentially saved searches plus the results of hand tuning. If the arch defaults work they will likely get 10-15% better performance than the install search. On such systems the installation process is greatly simplified.

References

External links

*
User contribution to ATLASA Collaborative guide to ATLAS Development
*Th

has links to the Quick reference guide to BLAS and Quick reference to ATLAS LAPACK API reference

for ATLAS {{Numerical linear algebra C (programming language) libraries Fortran libraries Numerical linear algebra Numerical software Software using the BSD license