BLAS

	BLAS Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. They are the ''de facto'' standard low-level routines for linear algebra libraries; the routines have bindings for both C ("CBLAS interface") and Fortran ("BLAS interface"). Although the BLAS specification is general, BLAS implementations are often optimized for speed on a particular machine, so using them can bring substantial performance benefits. BLAS implementations will take advantage of special floating point hardware such as vector registers or SIMD instructions. It originated as a Fortran library in 1979* and its interface was standardized by the BLAS Technical (BLAST) Forum, whose latest BLAS report can be found on the netlib website. This Fortran library is known as the ''reference implementation'' (sometimes co ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	OpenBLAS In scientific computing, OpenBLAS is an open-source implementation of the BLAS (Basic Linear Algebra Subprograms) and LAPACK APIs with many hand-crafted optimizations for specific processor types. It is developed at the Lab of Parallel Software and Computational Science, ISCAS. OpenBLAS adds optimized implementations of linear algebra kernels for several processor architectures, including Intel Sandy Bridge and Loongson Loongson () is the name of a family of general-purpose, MIPS architecture-compatible microprocessors, as well as the name of the Chinese fabless company (Loongson Technology) that develops them. The processors are alternately called Godson proc .... It claims to achieve performance comparable to the Intel MKL: this mostly holds true on the BLAS part, while the LAPACK part falls behind. On machines that support the AVX2 instruction set, OpenBLAS can achieve similar performance to MKL, but there are currently almost no open source libraries comparable to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Automatically Tuned Linear Algebra Software Automatically Tuned Linear Algebra Software (ATLAS) is a software library for linear algebra. It provides a mature open source implementation of BLAS APIs for C and Fortran77. ATLAS is often recommended as a way to automatically generate an optimized BLAS library. While its performance often trails that of specialized libraries written for one specific hardware platform, it is often the first or even only optimized BLAS implementation available on new systems and is a large improvement over the generic BLAS available at Netlib. For this reason, ATLAS is sometimes used as a performance baseline for comparison with other products. ATLAS runs on most Unix-like operating systems and on Microsoft Windows (using Cygwin). It is released under a BSD-style license without advertising clause, and many well-known mathematics applications including MATLAB, Mathematica, Scilab, SageMath, and some builds of GNU Octave may use it. Functionality ATLAS provides a full implementation of the ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	LAPACK LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also includes routines to implement the associated matrix factorizations such as LU, QR, Cholesky and Schur decomposition. LAPACK was originally written in FORTRAN 77, but moved to Fortran 90 in version 3.2 (2008). The routines handle both real and complex matrices in both single and double precision. LAPACK relies on an underlying BLAS implementation to provide efficient and portable computational building blocks for its routines. LAPACK was designed as the successor to the linear equations and linear least-squares routines of LINPACK and the eigenvalue routines of EISPACK. LINPACK, written in the 1970s and 1980s, was designed to run on the then-modern vector computers with shared memory. LAPACK, in contrast, was designed to ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	BLIS (software) In scientific computing, BLIS (BLAS-like Library Instantiation Software) is an open-source framework for implementing a superset of BLAS (Basic Linear Algebra Subprograms) functionality for specific processor types that was recently awarded the J. H. Wilkinson Prize for Numerical Software. It exposes that functionality through two traditional Application Programming Interfaces (APIs): the BLAS interface and the CBLAS interface. BLIS also includes two APIs native to the framework: a typed (BLAS-like) API and an object API. These native interfaces provide access to BLAS-like functionality that is not supported by, but closely related to, operations found in the BLAS (and CBLAS). The framework is developed and supported by the Science of High-Performance Computing (SHPC) group of the Oden Institute for Computational Engineering and Sciences at The University of Texas at Austin and the Matthews Research Group at Southern Methodist University. BLIS yields high performance on many cu ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	ROCm ROCm is an Advanced Micro Devices (AMD) software stack for graphics processing unit (GPU) programming. ROCm spans several domains: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP ( GPU-kernel-based programming), OpenMP/ Message Passing Interface (MPI) ( directive-based programming), OpenCL. ROCm is free, libre and open-source software (except the GPU firmware blobs), it is distributed under various licenses. Background The first GPGPU software stack from ATI/AMD was Close to Metal, which became Stream. ROCm was launched around 2016 with the Boltzmann Initiative. ROCm stack builds upon previous AMD GPU stacks, some tools trace back to GPUOpen, others to the Heterogeneous System Architecture (HSA). Heterogeneous System Architecture HSA was aimed at producing a middle-level, hardware-agnostic intermediate representation, that could be JIT-compiled to t ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Intel Math Kernel Library Intel oneAPI Math Kernel Library (Intel oneMKL; formerly Intel Math Kernel Library or Intel MKL) is a library of optimized math routines for science, engineering, and financial applications. Core math functions include BLAS, LAPACK, ScaLAPACK, sparse solvers, fast Fourier transforms, and vector math. The library supports Intel processors and is available for Windows, Linux and macOS operating systems. ''Intel oneAPI Math Kernel Library'' is not to be confused with ''oneAPI Math Kernel Library'' (oneMKL) Interfaces, a piece of open-source glue code that allows Intel MKL routines to be used from Data Parallel C++. History and licensing Intel launched the Math Kernel Library on May 9, 2003, and called it blas.lib. The project's development teams are located in Russia and the United States. The library was available in a standalone form, free of charge under the terms of Intel Simplified Software License which allow redistribution. Since April 2020, MKL has become part of oneAPI ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Armadillo (C++ Library) Armadillo is a linear algebra software library for the C++ programming language. It aims to provide efficient and streamlined base calculations, while at the same time having a straightforward and easy-to-use interface. Its intended target users are scientists and engineers. It supports integer, floating point ( single and double precision), complex numbers, and a subset of trigonometric and statistics functions. Dense and sparse matrices are supported. Various matrix decompositions are provided through optional integration with Linear Algebra PACKage (LAPACK), Automatically Tuned Linear Algebra Software (ATLAS), and ARPACK. High-performance BLAS/LAPACK replacement libraries such as OpenBLAS and Intel MKL can also be used. The library employs a delayed-evaluation approach (during compile time) to combine several operations into one and reduce (or eliminate) the need for temporaries. Where applicable, the order of operations is optimised. Delayed evaluation and optimisation ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	CUDA CUDA (or Compute Unified Device Architecture) is a parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for general purpose processing, an approach called general-purpose computing on GPUs ( GPGPU). CUDA is a software layer that gives direct access to the GPU's virtual instruction set and parallel computational elements, for the execution of compute kernels. CUDA is designed to work with programming languages such as C, C++, and Fortran. This accessibility makes it easier for specialists in parallel programming to use GPU resources, in contrast to prior APIs like Direct3D and OpenGL, which required advanced skills in graphics programming. CUDA-powered GPUs also support programming frameworks such as OpenMP, OpenACC and OpenCL; and HIP by compiling such code to CUDA. CUDA was created by Nvidia. When it was first introduced, the name was an acronym for Compute Unified Device A ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Netlib Netlib is a repository of software for scientific computing maintained by AT&T, Bell Laboratories, the University of Tennessee and Oak Ridge National Laboratory. Netlib comprises many separate programs and libraries. Most of the code is written in C and Fortran, with some programs in other languages. History The project began with email distribution on UUCP, ARPANET and CSNET in the 1980s. The code base of Netlib was written at a time when computer software was not yet considered merchandise. Therefore, no license terms or terms of use are stated for many programs. Before the Berne Convention Implementation Act of 1988 (and the earlier Copyright Act of 1976) works without an explicit copyright notice were public-domain software. Also, most of the Netlib code is work of US government employees and therefore in the public domain. [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	GPGPU General-purpose computing on graphics processing units (GPGPU, or less often GPGP) is the use of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU). The use of multiple video cards in one computer, or large numbers of graphics chips, further parallelizes the already parallel nature of graphics processing. Essentially, a GPGPU pipeline is a kind of parallel processing between one or more GPUs and CPUs that analyzes data as if it were in image or other graphic form. While GPUs operate at lower frequencies, they typically have many times the number of cores. Thus, GPUs can process far more pictures and graphical data per second than a traditional CPU. Migrating data into graphical form and then using the GPU to scan and analyze it can create a large speedup. GPGPU pipelines were developed at the beginning of the 21st century for grap ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	LINPACK Benchmarks The LINPACK Benchmarks are a measure of a system's floating-point computing power. Introduced by Jack Dongarra, they measure how fast a computer solves a dense ''n'' by ''n'' system of linear equations ''Ax'' = ''b'', which is a common task in engineering. The latest version of these benchmarks is used to build the TOP500 list, ranking the world's most powerful supercomputers. The aim is to approximate how fast a computer will perform when solving real problems. It is a simplification, since no single computational task can reflect the overall performance of a computer system. Nevertheless, the LINPACK benchmark performance can provide a good correction over the peak performance provided by the manufacturer. The peak performance is the maximal theoretical performance a computer can achieve, calculated as the machine's frequency, in cycles per second, times the number of operations per cycle it can perform. The actual performance will always be lower than the peak perf ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]