Pixel Visual Core
   HOME

TheInfoList



OR:

The Pixel Visual Core (PVC) is a series of ARM-based system in package (SiP)
image processor An image processor, also known as an image processing engine, image processing unit (IPU), or image signal processor (ISP), is a type of media processor or specialized digital signal processor (DSP) used for image processing, in digital cameras ...
s designed by
Google Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
. The PVC is a fully programmable image,
vision Vision, Visions, or The Vision may refer to: Perception Optical perception * Visual perception, the sense of sight * Visual system, the physical mechanism of eyesight * Computer vision, a field dealing with how computers can be made to gain und ...
and AI multi-core domain-specific architecture (DSA) for mobile devices and in future for IoT. It first appeared in the Google Pixel 2 and 2 XL which were introduced on October 19, 2017. It has also appeared in the Google Pixel 3 and 3 XL. Starting with the Pixel 4, this chip was replaced with the Pixel Neural Core.


History

Google previously used Qualcomm Snapdragon's CPU,
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
,
IPU Ipu is a percussion instrument made from gourds that is often used to provide a beat for hula dancing. There are two types of ipu, the ipu heke (), which is a double gourd made by taking two gourds of different sizes, cutting them and joining t ...
, and DSP to handle its image processing for their
Google Nexus Google Nexus is a discontinued line of consumer electronic devices that run the Android operating system. Google managed the design, development, marketing, and support of these devices, but some development and all manufacturing were carried ...
and Google Pixel devices. With the increasing importance of computational photography techniques, Google developed the Pixel Visual Core (PVC). Google claims the PVC uses less power than using CPU and
GPU A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobi ...
while still being fully programmable, unlike their tensor processing unit (TPU)
application-specific integrated circuit An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-effici ...
(ASIC). Indeed, classical
mobile devices A mobile device (or handheld computer) is a computer small enough to hold and operate in the hand. Mobile devices typically have a flat LCD or OLED screen, a touchscreen interface, and digital or physical buttons. They may also have a physical ...
equip an image signal processor (ISP) that is a fixed functionality image processing pipeline. In contrast to this, the PVC has a flexible programmable functionality, not limited only to image processing. The PVC in the Google Pixel 2 and 2 XL is labeled SR3HX X726C502. The PVC in the Google Pixel 3 and 3 XL is labeled SR3HX X739F030. Thanks to the PVC, the Pixel 2 and Pixel 3 obtained a mobile
DxOMark DxOMark, currently stylized as DXOMARK, is a commercial website described as "an independent benchmark that scientifically assesses smartphones, lenses and cameras". Founded in 2008, DxOMark was originally owned by DxO Labs, a French engi ...
of 98 and 101. The latter one was the top-ranked single-lens mobile DxOMark score, tied with the iPhone XR.


Pixel Visual Core software

A typical image-processing program of the PVC is written in Halide. Currently, it supports just a subset of Halide programming language without floating point operations and with limited memory access patterns. Halide is a domain specific language that lets the user decouple the
algorithm In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
and the
scheduling A schedule or a timetable, as a basic time-management tool, consists of a list of times at which possible tasks, events, or actions are intended to take place, or of a sequence of events in the chronological order in which such things are ...
of its execution. In this way, the developer can write a program that is optimized for the target hardware architecture.


Pixel Visual Core ISA

The PVC has two types of instruction set architecture (ISA), a virtual and a physical one. First, a high-level language program is compiled into a ''virtual ISA (vISA)'', inspired by
RISC-V RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on estab ...
ISA, which abstracts completely from the target hardware generation. Then, the vISA program is compiled into the so called ''physical ISA (pISA)'', that is a
VLIW Very long instruction word (VLIW) refers to instruction set architectures designed to exploit instruction level parallelism (ILP). Whereas conventional central processing units (CPU, processor) mostly allow programs to specify instructions to exe ...
ISA. This compilation step takes into account the target hardware parameters (e.g. array of PEs size, STP size, etc...) and specify explicitly memory movements. The decoupling of ''vISA'' and ''pISA'' lets the first one to be cross-architecture and generation-independent, while ''pISA'' can be compiled offline or through JIT compilation.


Pixel Visual Core architecture

The Pixel Visual Core is designed to be a scalable multi-core energy-efficient architecture, ranging from even numbers between 2 to 16 core designs. The core of a PVC is the image processing unit (IPU) a programmable unit tailored for image processing. The Pixel Visual Core architecture was also designed either to be its own chip, like the SR3HX, or as an IP block for System on a chip (SOC).


Image Processing Unit (IPU)

The IPU core has a stencil processor (STP), a line buffer pool (LBP) and a NoC. The STP mainly provides a 2-D
SIMD Single instruction, multiple data (SIMD) is a type of parallel processing in Flynn's taxonomy. SIMD can be internal (part of the hardware design) and it can be directly accessible through an instruction set architecture (ISA), but it shoul ...
array of processing elements (PEs) able to perform stencil computations, a small neighborhood of pixels. Though it seems similar to
systolic array In parallel computer architectures, a systolic array is a homogeneous network of tightly coupled data processing units (DPUs) called cells or nodes. Each node or DPU independently computes a partial result as a function of the data received from i ...
and wavefront computations, the STP has an explicit software controlled data movement. Each PEs features 2x 16-bit arithmetic logic units (ALUs), 1x 16-bit Multiplier–accumulator unit (MAC), 10x 16-bit registers, and 10x 1-bit predicate registers.


Line Buffer Pool (LBP)

Considering that one of the most energy costly operation is DRAM access, each STP has temporary buffers to increase
data locality In computer science, locality of reference, also known as the principle of locality, is the tendency of a processor to access the same set of memory locations repetitively over a short period of time. There are two basic types of reference localit ...
, namely LBP. The used LBP is a 2-D FIFO that accommodates different sizes of reading and writing. The LBP uses single-producer multi-consumer behavioral model. Each LBP can have eight logical LB memories and one for DMA input-output operations. Due to the real high complexity of the memory system, the PVC designers state the LBP controller as one of the most challenging components. The NoC used is a ring network on chip used to communicate with only neighbor cores for energy savings and pipelined computational pattern preservation.


Stencil Processor (STP)

The STP has a 2-D array of PEs: for example, a 16x16 array of full PEs and four lanes of simplified PEs called ''"halo"''. The STP has a scalar processor, called scalar lane (SCL), that adds control instructions with a small instruction memory. The last component of an STP is a load store unit called sheet generator (SHG), where the sheet is the PVC memory access unit.


SR3HX design summary

The SR3HX PVC features a 64-bit ARMv8a
ARM Cortex-A53 The ARM Cortex-A53 is one of the first two central processing units implementing the ARMv8-A 64-bit instruction set designed by ARM Holdings' Cambridge design centre. The Cortex-A53 is a 2-wide decode superscalar processor, capable of dual- ...
CPU, 8x image processing unit (IPU) cores, 512 MB LPDDR4, MIPI, PCIe. The IPU cores each have 512 arithmetic logic units (ALUs) consisting of 256 processing elements (PEs) arranged as a 16 x 16 2-dimensional array. Those cores execute a custom VLIW ISA. There are two 16-bit ALUs per processing element and they can operate in three distinct ways: independent, joined, and fused. The SR3HX PVC is manufactured as a SiP by
TSMC Taiwan Semiconductor Manufacturing Company Limited (TSMC; also called Taiwan Semiconductor) is a Taiwanese multinational semiconductor contract manufacturing and design company. It is the world's most valuable semiconductor company, the world' ...
using their 28HPM HKMG process. It was designed over 4 years in partnership with
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
. (Codename: Monette Hill) Google claims the SR3HX PVC is 7-16x more energy-efficient than the
Snapdragon 835 This is a list of Qualcomm Snapdragon systems on chips (SoC) made by Qualcomm for use in smartphones, tablets, laptops, 2-in-1 PCs, smartwatches, and smartbooks devices. Before Snapdragon SoC made by Qualcomm before it was renamed to Snapdr ...
. And that the SR3HX PVC can perform 3 trillion operations per second, HDR+ can run 5x faster and at less than one-tenth the energy than the
Snapdragon 835 This is a list of Qualcomm Snapdragon systems on chips (SoC) made by Qualcomm for use in smartphones, tablets, laptops, 2-in-1 PCs, smartwatches, and smartbooks devices. Before Snapdragon SoC made by Qualcomm before it was renamed to Snapdr ...
.{{Cite web, url=https://www.blog.google/products/pixel/pixel-visual-core-image-processing-and-machine-learning-pixel-2/, title=Pixel Visual Core: image processing and machine learning on Pixel 2, date=2017-10-17, website=Google, language=en, access-date=2019-02-02 It supports Halide for image processing and
TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...
for machine learning. The current chip runs at 426MHz and the single IPU is able to perform more than 1 TeraOPS.


References


Google hardware Application-specific integrated circuits