Owl Scientific Computing
   HOME

TheInfoList



OR:

Owl Scientific Computing is a software system for scientific and engineering computing developed in the
Department of Computer Science and Technology, University of Cambridge The Department of Computer Science and Technology, formerly the Computer Laboratory, is the computer science department of the University of Cambridge. it employed 35 academic staff, 25 support staff, 35 affiliated research staff, and about 15 ...
. The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. The source code is licensed under the
MIT License The MIT License is a permissive free software license originating at the Massachusetts Institute of Technology (MIT) in the late 1980s. As a permissive license, it puts only very limited restriction on reuse and has, therefore, high license comp ...
and can be accessed from the GitHub repository. The library is mostly designed and developed in the functional programming language
OCaml OCaml ( , formerly Objective Caml) is a general-purpose programming language, general-purpose, multi-paradigm programming language which extends the Caml dialect of ML (programming language), ML with object-oriented programming, object-oriented ...
. As a unique functional programming language, OCaml offers runtime efficiency, flexible module system, static type checking, intelligent garbage collector, and powerful
type inference Type inference refers to the automatic detection of the type of an expression in a formal language. These include programming languages and mathematical type systems, but also natural languages in some branches of computer science and linguistics ...
. Owl inherits these features directly from OCaml. With Owl, users can write succinct type-safe numerical applications in a concise functional language without sacrificing performance. It speeds up the development life-cycle, and reduces the cost from prototype to production use. The system serves as the de facto tool for computation intensive tasks in OCaml.


History

Owl was developed when Dr. Liang Wang was working as a Post-Doc in the OCaml Labs. Owl originated from a research project which studied the design of synchronous parallel machines for large-scale distributed computing in July 2016. Back then the libraries for numerical computing in OCaml ecosystem were very limited and the tooling was fragmented at that time. In order to test various analytical applications, many numerical functions had to be implemented, from very low level algebra and random number generators to the high level stuff like algorithmic differentiation and deep neural networks. These code snippets started accumulating. These functions were later taken out and wrapped into a standalone library named Owl. Owl's architecture undertook at least a dozen of iterations in the beginning, and some of the architectural changes are quite drastic. After one-year intensive development, Owl was capable of doing many complicated numerical tasks (e.g. image classification). Dr. Liang Wang held a tutorial at the CUFP 2017 to demonstrate data science in OCaml. In 2018, Prof. Richard Mortier gave a talk about Owl in the
Alan Turing Institute The Alan Turing Institute is the United Kingdom's national institute for data science and artificial intelligence, founded in 2015 and largely funded by the UK government. It is named after Alan Turing, the British mathematician and computing ...
. To further promote OCaml and functional programming in data science, Owl provides abundant learning materials in the form of a details manual.


Design and Features

Owl has implemented many advanced numerical functions atop of its implementation of n-dimensional arrays. Compared to other numerical libraries, Owl is unique in many perspectives, e.g. algorithmic differentiation and distributed computing have been included as integral components in the core system to maximise developers' productivity. The figure below gives a bird view of Owl's system architecture. The subsystem on the left part is Owl's Numerical system. The modules contained in this subsystem fall into three categories. The first is core modules contains basic data structures, i.e., N-dimensional array (Ndarray) in both dense and sparse forms. The Ndarray module supports various number types: float32, float64, complex32, complex64, int16, int32, etc. Also, the core module provide foreign function interfaces to other low level numerical libraries, such as CBLAS and
LAPACK LAPACK ("Linear Algebra Package") is a standard software library for numerical linear algebra. It provides routines for solving systems of linear equations and linear least squares, eigenvalue problems, and singular value decomposition. It also ...
. These libraries are fully interfaced to the Linear Algebra module. The second category is the classic analytics modules. This part contains basic mathematical and statistical functions,
linear algebra Linear algebra is the branch of mathematics concerning linear equations such as: :a_1x_1+\cdots +a_nx_n=b, linear maps such as: :(x_1, \ldots, x_n) \mapsto a_1x_1+\cdots +a_nx_n, and their representations in vector spaces and through matrices. ...
, regression, optimisation, plotting, etc. Advanced math and statistics functions such as
statistical hypothesis testing A statistical hypothesis test is a method of statistical inference used to decide whether the data at hand sufficiently support a particular hypothesis. Hypothesis testing allows us to make probabilistic statements about population parameters. ...
and
Markov chain Monte Carlo In statistics, Markov chain Monte Carlo (MCMC) methods comprise a class of algorithms for sampling from a probability distribution. By constructing a Markov chain that has the desired distribution as its equilibrium distribution, one can obtain ...
are also included. As a core functionality, Owl provides the algorithmic differentiation (or automatic differentiation) and dynamic computation graph modules. The highest level in the Owl architecture includes modules more advanced numerical applications such as
neural network A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
,
natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...
, data processing etc. The Zoo system is used for efficient scripting and code sharing. The modules in the second category, especially the algorithmic differentiation, make the code at this level quite concise. The subsystem on the right is called Actor Subsystem which extends Owl's capability to parallel and distributed computing. The core idea is to transform a user application from sequential execution mode into parallel mode (using various computation engines) with minimal efforts. The method is to compose two subsystems together with
functors In mathematics, specifically category theory, a functor is a mapping between categories. Functors were first considered in algebraic topology, where algebraic objects (such as the fundamental group) are associated to topological spaces, and m ...
to generate the parallel version of the module defined in the numerical subsystem. Besides what have been mentioned in this figure, there are several other features in Owl. For example, the
JavaScript JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
and
unikernel A unikernel is a specialised, single address space machine image constructed by using library operating systems. A developer selects, from a modular stack, the minimal set of libraries which correspond to the OS constructs required for the appl ...
backends, integration with other frameworks such as
TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learning ...
and
PyTorch PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...
, utilising GPU and other accelerator frameworks via symbolic graph, etc.


Research

The Owl project is research oriented, and supports research of numerical computing in multiple related topics. Some of its research topics are listed below. * Synchronous parallel distributed machine learning design. Owl is the first to propose using sampling to synchronise nodes in iterative algorithms. The work published on arxiv comes with solid mathematical proof. This idea proves to be advanced and was later proposed in top Machine Learning conferences. * One of the factors that contribute to the small code base of Owl is that it builds advanced analytical functions around the algorithmic differentiation. This idea was also proves to be popular and develops into the paradigm of
Differentiable programming Differentiable programming is a programming paradigm in which a numeric computer program can be differentiated throughout via automatic differentiation. This allows for gradient-based optimization of parameters in the program, often via gradie ...
. It is now being used in popular numerical packages such as JuliaDiff. * Using the computation graph offers another dimension optimization to the computation in Owl. Besides, the computation graph also bridges Owl application and hardware accelerators such as
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
and TPU. Later, the computation graph becomes a de facto intermediate representation. Standards such as the
Open Neural Network Exchange The Open Neural Network Exchange (ONNX) [] is an Open-source software, open-source artificial intelligence ecosystem of technology companies and research organizations that establish open standards for representing machine learning algorithms and ...
and
Neural Network Exchange Format Neural Network Exchange Format (NNEF) is an artificial neural network data exchange format developed by the Khronos Group. It is intended to reduce machine learning deployment fragmentation by enabling a rich mix of neural network training tools ...
are now widely supported by various deep learning frameworks such as
TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learning ...
and
PyTorch PyTorch is a machine learning framework based on the Torch library, used for applications such as computer vision and natural language processing, originally developed by Meta AI and now part of the Linux Foundation umbrella. It is free and open ...
. * The idea of service-level composition and serving was investigated in the Zoo subsystem of Owl. The prototype demonstrates the streamlining various stages in the code development including composition, test, distribution, validation, and deployment. It is very similar to the later MLOps concepts. Recently this topic attracts attention in top system conferences such as OSDI. As result of research following part of these directions, Owl produces several publications. In 2018, a paper titled Data Analytics Service Composition and Deployment on Edge Devices is accepted at the ACM SIGCOMM 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks. Two talks are also accepted at the OCaml Workshop of the
International Conference on Functional Programming The ACM SIGPLAN International Conference on Functional Programming (ICFP) is an annual academic conference in the field of computer science sponsored by the ACM SIGPLAN, in association with IFIP Working Group 2.8 (Functional Programming). The con ...
2019, on the topics of numerical ordinary differential equation solving, and executing Owl computation on GPUs. An internship in the OCaml Labs investigates the topic of
image segmentation In digital image processing and computer vision, image segmentation is the process of partitioning a digital image into multiple image segments, also known as image regions or image objects ( sets of pixels). The goal of segmentation is to simpl ...
and related memory optimisation in Owl. In 2022, the book <> was published by Springer.


See also

*
Array programming In computer science, array programming refers to solutions which allow the application of operations to an entire set of values at once. Such solutions are commonly used in scientific and engineering settings. Modern programming languages that s ...
*
List of numerical-analysis software Listed here are notable end-user computer applications intended for use with numerical or data analysis: Numerical-software packages General-purpose computer algebra systems Interface-oriented Language-oriented Historically significa ...


References

{{DEFAULTSORT:Owl Free mathematics software Numerical analysis software for Linux Numerical programming languages Array programming languages Free science software Numerical analysis software for macOS Software using the MIT license