A vision processing unit (VPU) is (as of 2018) an emerging class of
microprocessor
A microprocessor is a computer processor where the data processing logic and control is included on a single integrated circuit, or a small number of integrated circuits. The microprocessor contains the arithmetic, logic, and control circu ...
; it is a specific type of
AI accelerator, designed to
accelerate machine vision tasks.
Overview
Vision processing units are distinct from
video processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, ...
s (which are specialised for
video encoding and decoding) in their suitability for running
machine vision algorithms such as CNN (
convolutional neural network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
s), SIFT (
Scale-invariant feature transform
The scale-invariant feature transform (SIFT) is a computer vision algorithm to detect, describe, and match local '' features'' in images, invented by David Lowe in 1999.
Applications include object recognition, robotic mapping and navigation, ...
) and similar.
They may include
direct interfaces to take data from
cameras
A camera is an optical instrument that can capture an image. Most cameras can capture 2D images, with some more advanced models being able to capture 3D images. At a basic level, most cameras consist of sealed boxes (the camera body), with a ...
(bypassing any off chip buffers), and have a greater emphasis on on-chip
dataflow
In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming.
Software architecture
Da ...
between many
parallel execution units
Parallel computing is a type of computation in which many calculations or processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. There are several different for ...
with
scratchpad memory, like a
manycore DSP. But, like video processing units, they may have a focus on
low precision fixed point arithmetic
In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representi ...
for
image processing
An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimension ...
.
Contrast with GPUs
They are distinct from
GPUs, which contain specialised hardware for
rasterization and
texture mapping (for
3D graphics), and whose
memory architecture is optimised for manipulating
bitmap images in
off-chip memory (reading
textures, and modifying
frame buffers, with
random access patterns). VPUs are optimized for performance per watt, while GPUs mainly focus on absolute performance.
Target markets are
robotics
Robotics is an interdisciplinarity, interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist human ...
, the
internet of things
The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other com ...
, new classes of
digital cameras for
virtual reality
Virtual reality (VR) is a simulated experience that employs pose tracking and 3D near-eye displays to give the user an immersive feel of a virtual world. Applications of virtual reality include entertainment (particularly video games), e ...
and
augmented reality
Augmented reality (AR) is an interactive experience that combines the real world and computer-generated content. The content can span multiple sensory Modality (human–computer interaction), modalities, including visual, Hearing, auditory, hap ...
,
smart cameras, and integrating machine vision acceleration into
smartphone
A smartphone is a portable computer device that combines mobile telephone and computing functions into one unit. They are distinguished from feature phones by their stronger hardware capabilities and extensive mobile operating systems, whic ...
s and other
mobile devices.
Examples
Movidius Myriad X which is the third-generation vision processing unit in the Myriad VPU line from
Intel Corporation.
*
Movidius Myriad 2, which finds use in
Google Project Tango,
Google Clips and DJI Drones
*
Pixel Visual Core (PVC), which is a fully programmable
Image
An image is a visual representation of something. It can be two-dimensional, three-dimensional, or somehow otherwise feed into the visual system to convey information. An image can be an artifact, such as a photograph or other two-dimensio ...
, Vision and
AI processor for mobile devices
*
Microsoft HoloLens, which includes an accelerator referred to as a ''Holographic Processing Unit'' (complementary to its CPU and GPU), aimed at interpreting camera inputs, to accelerate environment tracking & vision for augmented reality applications.
*
Eyeriss, a design from
MIT intended for running
convolutional neural network
In deep learning, a convolutional neural network (CNN, or ConvNet) is a class of artificial neural network (ANN), most commonly applied to analyze visual imagery. CNNs are also known as Shift Invariant or Space Invariant Artificial Neural Netwo ...
s.
*
NeuFlow, a design by
Yann LeCun (implemented in
FPGA
A field-programmable gate array (FPGA) is an integrated circuit designed to be configured by a customer or a designer after manufacturinghence the term ''Field-programmability, field-programmable''. The FPGA configuration is generally specifi ...
) for accelerating
convolutions
In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...
, using a dataflow architecture.
*
Mobileye EyeQ
Mobileye Global Inc. is a company developing autonomous driving technologies and advanced driver-assistance systems (ADAS) including cameras, computer chips and software. Mobileye was acquired by Intel in 2017 and went public again in 2022. Mobi ...
, by
Mobileye
* Programmable Vision Accelerator (PVA), a
7-way VLIW Vision Processor designed by
Nvidia
Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
.
Similar processors
Some processors are not described as VPUs, but are equally applicable to machine vision tasks. These may form a broader category of ''
AI accelerators'' (to which VPUs may also belong), however as of 2016 there is no consensus on the name:
*
IBM TrueNorth, a
neuromorphic processor aimed at similar sensor data
pattern recognition and intelligence tasks, including video/audio.
*
Qualcomm Zeroth Neural processing unit, another entry in the emerging class of sensor/AI oriented chips.
See also
*
Adapteva Epiphany, a
Manycore processor with similar emphasis on on-chip dataflow, focussed on 32-bit
floating point
In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be r ...
performance.
*
CELL, a multicore processor with features fairly consistent with vision processing units (
SIMD instructions & datatypes suitable for video, and on-chip
DMA
DMA may refer to:
Arts
* ''DMA'' (magazine), a defunct dance music magazine
* Dallas Museum of Art, an art museum in Texas, US
* Danish Music Awards, an award show held in Denmark
* BT Digital Music Awards, an annual event in the UK
* Doctor of M ...
between scratchpad memories).
*
Coprocessor
*
Graphics processing unit
A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mo ...
, also commonly used to run vision algorithms. NVidia's
Pascal architecture includes
FP16 support, to provide a better precision/cost tradeoff for AI workloads.
*
MPSoC
*
OpenCL
*
OpenVX
*
Physics processing unit, a past attempt to complement the
CPU
A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, and ...
and
GPU with a high throughput accelerator.
*
Tensor processing unit, a chip used internally by Google for accelerating AI calculations.
References
External links
Eyeriss architectureHolographic processing unitNeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision
{{Differentiable computing
Microprocessors
AI accelerators
Machine vision