NumPy (pronounced ) is a
library
A library is a collection of Book, books, and possibly other Document, materials and Media (communication), media, that is accessible for use by its members and members of allied institutions. Libraries provide physical (hard copies) or electron ...
for the
Python programming language, adding support for large, multi-dimensional
arrays
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
and
matrices, along with a large collection of
high-level mathematical
Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
functions to operate on these arrays.
The predecessor of NumPy, Numeric, was originally created by
Jim Hugunin with contributions from several other developers. In 2005,
Travis Oliphant created NumPy by incorporating features of the competing Numarray into Numeric, with extensive modifications. NumPy is
open-source software
Open-source software (OSS) is Software, computer software that is released under a Open-source license, license in which the copyright holder grants users the rights to use, study, change, and Software distribution, distribute the software an ...
and has many contributors. NumPy is fiscally sponsored by
NumFOCUS.
History
matrix-sig
The Python programming language was not originally designed for numerical computing, but attracted the attention of the scientific and engineering community early on. In 1995 the
special interest group
A special interest group (SIG) is a community within a larger organization with a shared interest in advancing a specific area of knowledge, learning or technology where members cooperate to effect or to produce solutions within their particular f ...
(SIG) ''matrix-sig'' was founded with the aim of defining an
array computing package; among its members was Python designer and maintainer
Guido van Rossum, who extended
Python's syntax (in particular the indexing syntax) to make
array computing easier.
Numeric
An implementation of a matrix package was completed by Jim Fulton, then generalized by Jim Hugunin and called ''Numeric''
(also variously known as the "Numerical Python extensions" or "NumPy"), with influences from the
APL family of languages, Basis,
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
,
FORTRAN,
S and
S+, and others.
Hugunin, a graduate student at the
Massachusetts Institute of Technology
The Massachusetts Institute of Technology (MIT) is a Private university, private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of moder ...
(MIT),
joined the
Corporation for National Research Initiatives
The Corporation for National Research Initiatives (CNRI), based in Reston, Virginia, is a non-profit organization founded in 1986 by Bob Kahn, Robert E. Kahn as an "activities center around strategic development of network-based information technol ...
(CNRI) in 1997 to work on
JPython,
leaving Paul Dubois of
Lawrence Livermore National Laboratory
Lawrence Livermore National Laboratory (LLNL) is a Federally funded research and development centers, federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now i ...
(LLNL) to take over as maintainer.
Other early contributors include David Ascher, Konrad Hinsen and
Travis Oliphant.
Numarray
A new package called ''Numarray'' was written as a more flexible replacement for Numeric.
Like Numeric, it too is now deprecated.
Numarray had faster operations for large arrays, but was slower than Numeric on small ones, so for a time both packages were used in parallel for different use cases. The last version of Numeric (v24.2) was released on 11 November 2005, while the last version of numarray (v1.5.2) was released on 24 August 2006.
There was a desire to get Numeric into the Python standard library, but Guido van Rossum decided that the code was not maintainable in its state then.
NumPy
In early 2005, NumPy developer Travis Oliphant wanted to unify the community around a single array package and ported Numarray's features to Numeric, releasing the result as NumPy 1.0 in 2006.
This new project was part of
SciPy. To avoid installing the large SciPy package just to get an array object, this new package was separated and called NumPy. Support for Python 3 was added in 2011 with NumPy version 1.5.0.
In 2011,
PyPy started development on an implementation of the NumPy API for PyPy. As of 2023, it is not yet fully compatible with NumPy.
Features
NumPy targets the
CPython
CPython is the reference implementation of the Python programming language. Written in C and Python, CPython is the default and most widely used implementation of the Python language.
CPython can be defined as both an interpreter and a comp ...
reference implementation
In the software development process, a reference implementation (or, less frequently, sample implementation or model implementation) is a program that implements all requirements from a corresponding specification. The reference implementation ...
of Python, which is a non-optimizing
bytecode
Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (normal ...
interpreter.
Mathematical algorithms written for this version of Python often run much slower than
compiled equivalents due to the absence of compiler optimization. NumPy addresses the slowness problem partly by providing multidimensional arrays and functions and operators that operate efficiently on arrays; using these requires rewriting some code, mostly
inner loop
In computer programs, an important form of control flow is the Loop (computing), loop which causes a block of code to be executed more than once. A common idiom is to have a loop Nested loop, nested inside another loop, with the contained loop be ...
s, using NumPy.
Using NumPy in Python gives functionality comparable to
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
since they are both interpreted, and they both allow the user to write fast programs as long as most operations work on
arrays
An array is a systematic arrangement of similar objects, usually in rows and columns.
Things called an array include:
{{TOC right
Music
* In twelve-tone and serial composition, the presentation of simultaneous twelve-tone sets such that the ...
or matrices instead of
scalars. In comparison, MATLAB boasts a large number of additional toolboxes, notably
Simulink
Simulink is a MATLAB-based graphical programming environment for modeling, simulating and analyzing multidomain dynamical systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offe ...
, whereas NumPy is intrinsically integrated with Python, a more modern and complete
programming language
A programming language is a system of notation for writing computer programs.
Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
. Moreover, complementary Python packages are available; SciPy is a library that adds more MATLAB-like functionality and
Matplotlib is a
plotting package that provides MATLAB-like plotting functionality. Although matlab can perform sparse matrix operations, numpy alone cannot perform such operations and requires the use of the scipy.sparse library. Internally, both MATLAB and NumPy rely on
BLAS and
LAPACK for efficient
linear algebra
Linear algebra is the branch of mathematics concerning linear equations such as
:a_1x_1+\cdots +a_nx_n=b,
linear maps such as
:(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n,
and their representations in vector spaces and through matrix (mathemat ...
computations.
Python
bindings of the widely used
computer vision
Computer vision tasks include methods for image sensor, acquiring, Image processing, processing, Image analysis, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical ...
library
OpenCV utilize NumPy arrays to store and operate on data.
Since images with multiple channels are simply represented as three-dimensional arrays, indexing,
slicing or
masking with other arrays are very efficient ways to access specific pixels of an image.
The NumPy array as universal data structure in OpenCV for images, extracted
feature points,
filter kernels and many more vastly simplifies the programming workflow and
debugging
In engineering, debugging is the process of finding the Root cause analysis, root cause, workarounds, and possible fixes for bug (engineering), bugs.
For software, debugging tactics can involve interactive debugging, control flow analysis, Logf ...
.
Importantly, many NumPy operations release the
global interpreter lock
A global interpreter lock (GIL) is a mechanism used in computer-language Interpreter (computing), interpreters to synchronize the execution of Threads (computer science), threads so that only one native thread (per process) can execute basic ope ...
, which allows for multithreaded processing.
NumPy also provides a C API, which allows Python code to interoperate with external libraries written in low-level languages.
The ndarray data structure
The core functionality of NumPy is its "ndarray", for ''n''-dimensional array,
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
. These arrays are
strided views on memory.
In contrast to Python's built-in list data structure, these arrays are homogeneously typed: all elements of a single array must be of the same type.
Such arrays can also be views into memory buffers allocated by
C/
C++,
Python, and
Fortran extensions to the CPython interpreter without the need to copy data around, giving a degree of compatibility with existing numerical libraries. This functionality is exploited by the SciPy package, which wraps a number of such libraries (notably BLAS and LAPACK). NumPy has built-in support for
memory-mapped ndarrays.
Limitations
Inserting or appending entries to an array is not as trivially possible as it is with Python's lists.
The routine to extend arrays actually creates new arrays of the desired shape and padding values, copies the given array into the new one and returns it.
NumPy's operation does not actually link the two arrays but returns a new one, filled with the entries from both given arrays in sequence.
Reshaping the dimensionality of an array with is only possible as long as the number of elements in the array does not change.
These circumstances originate from the fact that NumPy's arrays must be views on contiguous
memory buffers.
Algorithms
In mathematics and computer science, an algorithm () is a finite sequence of mathematically rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for per ...
that are not expressible as a vectorized operation will typically run slowly because they must be implemented in "pure Python", while vectorization may increase
memory complexity of some operations from constant to linear, because temporary arrays must be created that are as large as the inputs. Runtime compilation of numerical code has been implemented by several groups to avoid these problems; open source solutions that interoperate with NumPy include numexpr and
Numba. Cython and
Pythran are static-compiling alternatives to these.
Many modern
large-scale scientific computing applications have requirements that exceed the capabilities of the NumPy arrays.
For example, NumPy arrays are usually loaded into a computer's
memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
, which might have insufficient capacity for the analysis of large
datasets.
Further, NumPy operations are executed on a single
CPU.
However, many linear algebra operations can be accelerated by executing them on
clusters of CPUs or of specialized hardware, such as
GPUs and
TPUs, which many
deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
applications rely on.
As a result, several alternative array implementations have arisen in the scientific python ecosystem over the recent years, such as
Dask for distributed arrays and
TensorFlow
TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...
or
JAX for computations on GPUs.
Because of its popularity, these often implement a
subset
In mathematics, a Set (mathematics), set ''A'' is a subset of a set ''B'' if all Element (mathematics), elements of ''A'' are also elements of ''B''; ''B'' is then a superset of ''A''. It is possible for ''A'' and ''B'' to be equal; if they a ...
of NumPy's
API
An application programming interface (API) is a connection between computers or between computer programs. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how to build ...
or mimic it, so that users can change their array implementation with minimal changes to their code required.
A library named
CuPy, accelerated by
Nvidia
Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...
's
CUDA
In computing, CUDA (Compute Unified Device Architecture) is a proprietary parallel computing platform and application programming interface (API) that allows software to use certain types of graphics processing units (GPUs) for accelerated gene ...
framework, has also shown potential for faster computing, being a '
drop-in replacement' of NumPy.
Examples
import numpy as np
from numpy.random import rand
from numpy.linalg import solve, inv
a = np.array( 1, 2, 3, 4 , 4, 6, 7 , 9, 0, 5)
a.transpose()
Basic operations
>>> a = np.array( , 2, 3, 6
>>> b = np.linspace(0, 2, 4) # create an array with four equally spaced points starting with 0 and ending with 2.
>>> c = a - b
>>> c
array( 1. , 1.33333333, 1.66666667, 4.
>>> a**2
array( 1, 4, 9, 36
Universal functions
>>> a = np.linspace(-np.pi, np.pi, 100)
>>> b = np.sin(a)
>>> c = np.cos(a)
>>>
>>> # Functions can take both numbers and arrays as parameters.
>>> np.sin(1)
0.8414709848078965
>>> np.sin(np.array( , 2, 3)
array( .84147098, 0.90929743, 0.14112001
Linear algebra
>>> from numpy.random import rand
>>> from numpy.linalg import solve, inv
>>> a = np.array( 1, 2, 3 , 4, 6.7 , 9.0, 5)
>>> a.transpose()
array( 1. , 3. , 5.
2. , 4. , 9.
3. , 6.7, 5. )
>>> inv(a)
array( -2.27683616, 0.96045198, 0.07909605
1.04519774, -0.56497175, 0.1299435
0.39548023, 0.05649718, -0.11299435)
>>> b = np.array( , 2, 1
>>> solve(a, b) # solve the equation ax = b
array( 4.83050847, 2.13559322, 1.18644068
>>> c = rand(3, 3) * 20 # create a 3x3 random matrix of values within ,1scaled by 20
>>> c
array( 3.98732789, 2.47702609, 4.71167924
9.24410671, 5.5240412 , 10.6468792
10.38136661, 8.44968437, 15.17639591)
>>> np.dot(a, c) # matrix multiplication
array( 53.61964114, 38.8741616 , 71.53462537
118.4935668 , 86.14012835, 158.40440712
155.04043289, 104.3499231 , 195.26228855)
>>> a @ c # Starting with Python 3.5 and NumPy 1.10
array( 53.61964114, 38.8741616 , 71.53462537
118.4935668 , 86.14012835, 158.40440712
155.04043289, 104.3499231 , 195.26228855)
Multidimensional arrays
>>> M = np.zeros(shape=(2, 3, 5, 7, 11))
>>> T = np.transpose(M, (4, 2, 1, 3, 0))
>>> T.shape
(11, 5, 3, 7, 2)
Incorporation with OpenCV
>>> import numpy as np
>>> import cv2
>>> r = np.reshape(np.arange(256*256)%256,(256,256)) # 256x256 pixel array with a horizontal gradient from 0 to 255 for the red color channel
>>> g = np.zeros_like(r) # array of same size and type as r but filled with 0s for the green color channel
>>> b = r.T # transposed r will give a vertical gradient for the blue color channel
>>> cv2.imwrite("gradients.png", np.dstack( ,g,r) # OpenCV images are interpreted as BGR, the depth-stacked array will be written to an 8bit RGB PNG-file called "gradients.png"
True
Nearest-neighbor search
Iterative Python algorithm and vectorized NumPy version.
>>> # # # Pure iterative Python # # #
>>> points = 9,2,8 ,7,2 ,4,4 ,6,9 ,0,7 ,2,7 ,3,2 ,3,0 ,1,1 ,9,6
>>> qPoint = ,5,3>>> minIdx = -1
>>> minDist = -1
>>> for idx, point in enumerate(points): # iterate over all points
... dist = sum( dp-dq)**2 for dp,dq in zip(point,qPoint)**0.5 # compute the euclidean distance for each point to q
... if dist < minDist or minDist < 0: # if necessary, update minimum distance and index of the corresponding point
... minDist = dist
... minIdx = idx
>>> print(f"Nearest point to q: ")
Nearest point to q: , 4, 4
>>> # # # Equivalent NumPy vectorization # # #
>>> import numpy as np
>>> points = np.array( 9,2,8 ,7,2 ,4,4 ,6,9 ,0,7 ,2,7 ,3,2 ,3,0 ,1,1 ,9,6)
>>> qPoint = np.array( ,5,3
>>> minIdx = np.argmin(np.linalg.norm(points-qPoint, axis=1)) # compute all euclidean distances at once and return the index of the smallest one
>>> print(f"Nearest point to q: ")
Nearest point to q: 4 4
F2PY
Quickly wrap native code for faster scripts.
! Python Fortran native code call example
! f2py -c -m foo *.f90
! Compile Fortran into python named module using intent statements
! Fortran subroutines only not functions--easier than JNI with C wrapper
! requires gfortran and make
subroutine ftest(a, b, n, c, d)
implicit none
integer, intent(in) :: a, b, n
integer, intent(out) :: c, d
integer :: i
c = 0
do i = 1, n
c = a + b + c
end do
d = (c * n) * (-1)
end subroutine ftest
>>> import numpy as np
>>> import foo
>>> a = foo.ftest(1, 2, 3) # or c,d = instead of a.c and a.d
>>> print(a)
(9,-27)
>>> help("foo.ftest") # foo.ftest.__doc__
See also
*
Array programming
In computer science, array programming refers to solutions that allow the application of operations to an entire set of values at once. Such solutions are commonly used in computational science, scientific and engineering settings.
Modern program ...
*
List of numerical-analysis software
Listed here are notable end-user computer applications intended for use with numerical or data analysis:
Numerical-software packages
* Analytica is a widely used proprietary software tool for building and analyzing numerical models. It is a de ...
*
Theano (software)
*
Matplotlib
*
Fortran
*
Row- and column-major order
In computing, row-major order and column-major order are methods for storing multidimensional arrays in linear storage such as random access memory.
The difference between the orders lies in which elements of an array are contiguous in memory. In ...
*
f2c
References
Further reading
*
*
*
External links
*
NumPy tutorialsHistory of NumPy
{{DEFAULTSORT:Numpy
Array programming languages
Articles with example Python (programming language) code
Free mathematics software
Free science software
Numerical analysis software for Linux
Numerical analysis software for macOS
Numerical analysis software for Windows
Numerical programming languages
Python (programming language) scientific libraries
Software using the BSD license