HOME

TheInfoList



OR:

Tensor Processing Unit (TPU) is an
AI accelerator An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications in ...
application-specific integrated circuit An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
(ASIC) developed by Google for neural network machine learning, using Google's own TensorFlow software. Google began using TPUs internally in 2015, and in 2018 made them available for third party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.


Overview

The tensor processing unit was announced in May 2016 at
Google I/O Google I/O (or simply I/O) is an annual developer conference held by Google in Mountain View, California. "I/O" stands for Input/Output, as well as the slogan "Innovation in the Open". The event's format is similar to Google Developer Day. Hi ...
, when the company said that the TPU had already been used inside their data centers for over a year. The chip has been specifically designed for Google's TensorFlow framework, a symbolic math library which is used for machine learning applications such as
neural networks A neural network is a network or circuit of biological neurons, or, in a modern sense, an artificial neural network, composed of artificial neurons or nodes. Thus, a neural network is either a biological neural network, made up of biological ...
."TensorFlow: Open source machine learning"
"It is machine learning software being used for various kinds of perceptual and language understanding tasks" — Jeffrey Dean, minute 0:47 / 2:17 from Youtube clip
However, as of 2017 Google still used
CPUs A central processing unit (CPU), also called a central processor, main processor or just processor, is the electronic circuitry that executes instructions comprising a computer program. The CPU performs basic arithmetic, logic, controlling, an ...
and
GPUs A graphics processing unit (GPU) is a specialized electronic circuit designed to manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display device. GPUs are used in embedded systems, mobil ...
for other types of machine learning. Other
AI accelerator An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications in ...
designs are appearing from other vendors also and are aimed at embedded and robotics markets. Google's TPUs are proprietary. Some models are commercially available, and on February 12, 2018, ''The New York Times'' reported that Google "would allow other companies to buy access to those chips through its cloud-computing service." Google has said that they were used in the AlphaGo versus Lee Sedol series of man-machine Go games, as well as in the
AlphaZero AlphaZero is a computer program developed by artificial intelligence research company DeepMind to master the games of chess, shogi and go. This algorithm uses an approach similar to AlphaGo Zero. On December 5, 2017, the DeepMind team ...
system, which produced Chess,
Shogi , also known as Japanese chess, is a strategy board game for two players. It is one of the most popular board games in Japan and is in the same family of games as Western chess, ''chaturanga, Xiangqi'', Indian chess, and '' janggi''. ''Shōgi' ...
and Go playing programs from the game rules alone and went on to beat the leading programs in those games. Google has also used TPUs for
Google Street View Google Street View is a technology featured in Google Maps and Google Earth that provides interactive panoramas from positions along many streets in the world. It was launched in 2007 in several cities in the United States, and has since ex ...
text processing and was able to find all the text in the Street View database in less than five days. In Google Photos, an individual TPU can process over 100 million photos a day. It is also used in RankBrain which Google uses to provide search results. Compared to a graphics processing unit, it is designed for a high volume of low precision computation (e.g. as little as
8-bit In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses ...
precision) with more input/output operations per joule, without hardware for rasterisation/ texture mapping. The TPU ASICs are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center
rack Rack or racks may refer to: Storage and installation * Amp rack, short for amplifier rack, a piece of furniture in which amplifiers are mounted * Bicycle rack, a frame for storing bicycles when not in use * Bustle rack, a type of storage bin ...
, according to
Norman Jouppi Norman Paul Jouppi is an American electrical engineer and computer scientist. Career Jouppi was one of the computer architects at the MIPS Stanford University Project (under John L. Hennessy), an early RISC project. He received his master's d ...
. Different types of processors are suited for different types of machine learning models, TPUs are well suited for CNNs while GPUs have benefits for some fully-connected neural networks, and CPUs can have advantages for RNNs. Google provides third parties access to TPUs through its ''Cloud TPU'' service as part of the Google Cloud Platform and through its notebook-based services Kaggle and Colaboratory.


Products


First generation TPU

The first-generation TPU is an
8-bit In computer architecture, 8-bit integers or other data units are those that are 8 bits wide (1 octet). Also, 8-bit central processing unit (CPU) and arithmetic logic unit (ALU) architectures are those that are based on registers or data buses ...
matrix multiplication In mathematics, particularly in linear algebra, matrix multiplication is a binary operation that produces a matrix from two matrices. For matrix multiplication, the number of columns in the first matrix must be equal to the number of rows in the ...
engine, driven with CISC instructions by the host processor across a PCIe 3.0 bus. It is manufactured on a 28 nm process with a die size ≤ 331  mm2. The
clock speed In computing, the clock rate or clock speed typically refers to the frequency at which the clock generator of a processor can generate pulses, which are used to synchronize the operations of its components, and is used as an indicator of the p ...
is 700 
MHz The hertz (symbol: Hz) is the unit of frequency in the International System of Units (SI), equivalent to one event (or cycle) per second. The hertz is an SI derived unit whose expression in terms of SI base units is s−1, meaning that one he ...
and it has a thermal design power of 28–40  W. It has 28 
MiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
of on chip memory, and 4 
MiB The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
of
32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in 32-bit units. Compared to smaller bit widths, 32-bit computers can perform large calculati ...
accumulators taking the results of a 256×256 systolic array of 8-bit multipliers. Within the TPU package is 8  GiB of
dual-channel In the fields of digital electronics and computer hardware, multi-channel memory architecture is a technology that increases the data transfer rate between the DRAM memory and the memory controller by adding more channels of communication between ...
2133 MHz DDR3 SDRAM offering 34 GB/s of bandwidth. Instructions transfer data to or from the host, perform matrix multiplications or
convolution In mathematics (in particular, functional analysis), convolution is a mathematical operation on two functions ( and ) that produces a third function (f*g) that expresses how the shape of one is modified by the other. The term ''convolution'' ...
s, and apply
activation function In artificial neural networks, the activation function of a node defines the output of that node given an input or set of inputs. A standard integrated circuit can be seen as a digital network of activation functions that can be "ON" (1) or "O ...
s.


Second generation TPU

The second-generation TPU was announced in May 2017. Google stated the first-generation TPU design was limited by memory bandwidth and using 16 GB of
High Bandwidth Memory High Bandwidth Memory (HBM) is a high-speed computer memory interface for 3D-stacked synchronous dynamic random-access memory (SDRAM) initially from Samsung, AMD and SK Hynix. It is used in conjunction with high-performance graphics accelerators ...
in the second-generation design increased bandwidth to 600 GB/s and performance to 45 tera FLOPS. The TPUs are then arranged into four-chip modules with a performance of 180 teraFLOPS. Then 64 of these modules are assembled into 256-chip pods with 11.5 petaFLOPS of performance. Notably, while the first-generation TPUs were limited to integers, the second-generation TPUs can also calculate in
floating point In computing, floating-point arithmetic (FP) is arithmetic that represents real numbers approximately, using an integer with a fixed precision, called the significand, scaled by an integer exponent of a fixed base. For example, 12.345 can be r ...
. This makes the second-generation TPUs useful for both training and inference of machine learning models. Google has stated these second-generation TPUs will be available on the
Google Compute Engine Google Compute Engine (GCE) is the Infrastructure as a Service (IaaS) component of Google Cloud Platform which is built on the global infrastructure that runs Google's search engine, Gmail, YouTube and other services. Google Compute Engine ...
for use in TensorFlow applications.


Third generation TPU

The third-generation TPU was announced on May 8, 2018. Google announced that processors themselves are twice as powerful as the second-generation TPUs, and would be deployed in pods with four times as many chips as the preceding generation. This results in an 8-fold increase in performance per pod (with up to 1,024 chips per pod) compared to the second-generation TPU deployment.


Fourth generation TPU

On May 18, 2021, Google CEO Sundar Pichai spoke about TPU v4 Tensor Processing Units during his keynote at the Google I/O virtual conference. TPU v4 improved performance by more than 2x over TPU v3 chips. Pichai said "A single v4 pod contains 4,096 v4 chips, and each pod has 10x the interconnect bandwidth per chip at scale, compared to any other networking technology.”


Edge TPU

In July 2018, Google announced the Edge TPU. The Edge TPU is Google's purpose-built
ASIC An application-specific integrated circuit (ASIC ) is an integrated circuit (IC) chip customized for a particular use, rather than intended for general-purpose use, such as a chip designed to run in a digital voice recorder or a high-efficien ...
chip designed to run machine learning (ML) models for edge computing, meaning it is much smaller and consumes far less power compared to the TPUs hosted in Google datacenters (also known as Cloud TPUs). In January 2019, Google made the Edge TPU available to developers with a line of products under the Coral brand. The Edge TPU is capable of 4 trillion operations per second with 2 W of electrical power. The product offerings include a single-board computer (SBC), a
system on module A system on a module (SoM) is a board-level circuit that integrates a system function in a single module. It may integrate digital and analog functions on a single board. A typical application is in the area of embedded systems. Unlike a sin ...
(SoM), a
USB Universal Serial Bus (USB) is an industry standard that establishes specifications for cables, connectors and protocols for connection, communication and power supply ( interfacing) between computers, peripherals and other computers. A broad ...
accessory, a mini
PCI-e PCI Express (Peripheral Component Interconnect Express), officially abbreviated as PCIe or PCI-e, is a high-speed serial computer expansion bus standard, designed to replace the older PCI, PCI-X and AGP bus standards. It is the common m ...
card, and an M.2 card. The SBC Coral Dev Board and Coral SoM both run Mendel Linux OS – a derivative of Debian. The USB, PCI-e, and M.2 products function as add-ons to existing computer systems, and support Debian-based Linux systems on x86-64 and ARM64 hosts (including Raspberry Pi). The machine learning runtime used to execute models on the Edge TPU is based on TensorFlow Lite. The Edge TPU is only capable of accelerating forward-pass operations, which means it's primarily useful for performing inferences (although it is possible to perform lightweight transfer learning on the Edge TPU). The Edge TPU also only supports 8-bit math, meaning that for a network to be compatible with the Edge TPU, it needs to either be trained using the TensorFlow quantization-aware training technique, or since late 2019 it's also possible to use post-training quantization. On November 12, 2019, Asus announced a pair of single-board computer (SBCs) featuring the Edge TPU. The Asus Tinker Edge T and Tinker Edge R Board designed for
IoT The Internet of things (IoT) describes physical objects (or groups of such objects) with sensors, processing ability, software and other technologies that connect and exchange data with other devices and systems over the Internet or other comm ...
and edge AI. The SBCs officially support Android and Debian operating systems. ASUS has also demonstrated a mini PC called Asus PN60T featuring the Edge TPU. On January 2, 2020, Google announced the Coral Accelerator Module and Coral Dev Board Mini, to be demonstrated at
CES 2020 CES (; formerly an initialism for Consumer Electronics Show) is an annual trade show organized by the Consumer Technology Association (CTA). Held in January at the Las Vegas Convention Center in Winchester, Nevada, United States, the event typi ...
later the same month. The Coral Accelerator Module is a
multi-chip module A multi-chip module (MCM) is generically an electronic assembly (such as a package with a number of conductor terminals or "pins") where multiple integrated circuits (ICs or "chips"), semiconductor dies and/or other discrete components are in ...
featuring the Edge TPU, PCIe and USB interfaces for easier integration. The Coral Dev Board Mini is a smaller SBC featuring the Coral Accelerator Module and MediaTek 8167s SoC.


Pixel Neural Core

On October 15, 2019, Google announced the
Pixel 4 The Pixel 4 and Pixel 4 XL are a pair of Android smartphones designed, developed, and marketed by Google as part of the Google Pixel product line. They collectively serve as the successors to the Pixel 3 and Pixel 3 XL. They were officially ...
smartphone, which contains an Edge TPU called the Pixel Neural Core.


Google Tensor

Google followed the Pixel Neural Core by integrating an Edge TPU into a custom system-on-chip named
Google Tensor Google Tensor is a series of ARM64-based system-on-chip (SoC) processors designed by Google for its Pixel devices. The first-generation chip debuted on the Pixel 6 smartphone series in 2021, and were succeeded by the second-generation chip o ...
, which was released in 2021 with the
Pixel 6 The Pixel 6 and Pixel 6 Pro are a pair of Android smartphones designed, developed, and marketed by Google as part of the Google Pixel product line. They collectively serve as the successor to the Pixel 5. The phones were first previewed in Au ...
line of smartphones. The Google Tensor SoC demonstrated "extremely large performance advantages over the competition" in machine learning-focused benchmarks; although instantaneous power consumption also was relatively high, the improved performance meant less energy was consumed due to shorter periods requiring peak performance.


See also

* Cognitive computer *
AI accelerator An AI accelerator is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications in ...
* Structure tensor, a mathematical foundation for TPU's * Tensor Core, a similar architecture by
Nvidia Nvidia CorporationOfficially written as NVIDIA and stylized in its logo as VIDIA with the lowercase "n" the same height as the uppercase "VIDIA"; formerly stylized as VIDIA with a large italicized lowercase "n" on products from the mid 1990s to ...
* TrueNorth, a similar device simulating
spiking neuron Spiking neural networks (SNNs) are artificial neural networks that more closely mimic natural neural networks. In addition to neuronal and synaptic state, SNNs incorporate the concept of time into their operating model. The idea is that neuron ...
s instead of low-precision tensors *
Vision processing unit A vision processing unit (VPU) is (as of 2018) an emerging class of microprocessor; it is a specific type of AI accelerator, designed to accelerate machine vision tasks. Overview Vision processing units are distinct from video processing uni ...
, a similar device specialised for vision processing


References


External links


Cloud Tensor Processing Units (TPUs)
(Documentation from Google Cloud)
Photo of Google's TPU chip and board

Photo of Google's TPU v2 board

Photo of Google's TPU v3 board

Photo of Google's TPU v2 pod
{{Digital electronics AI accelerators Application-specific integrated circuits Computer-related introductions in 2016 Google hardware Microprocessors