MobileNet is a family of

convolutional neural network A convolutional neural network (CNN) is a type of feedforward neural network that learns features via filter (or kernel) optimization. This type of deep learning network has been applied to process and make predictions from many different ty ...

(CNN) architectures designed for

image classification Computer vision tasks include methods for acquiring, processing, analyzing, and understanding digital images, and extraction of high-dimensional data from the real world in order to produce numerical or symbolic information, e.g. in the form o ...

object detection Object detection is a computer technology related to computer vision and image processing that deals with detecting instances of semantic objects of a certain class (such as humans, buildings, or cars) in digital images and videos. Well-researched ...

, and other computer vision tasks. They are designed for small size, low latency, and low power consumption, making them suitable for on-device inference and

edge computing Edge computing is a distributed computing model that brings computation and data storage closer to the sources of data. More broadly, it refers to any design that pushes computation physically closer to a user, so as to reduce the Latency (engineer ...

on resource-constrained devices like

mobile phones A mobile phone or cell phone is a portable telephone that allows users to make and receive calls over a radio frequency link while moving within a designated telephone service area, unlike fixed-location phones ( landline phones). This radio ...

and

embedded systems An embedded system is a specialized computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is em ...

. They were originally designed to be run efficiently on mobile devices with TensorFlow Lite. The need for efficient deep learning models on mobile devices led researchers at

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

to develop MobileNet. , the family has four versions, each improving upon the previous one in terms of performance and efficiency.

Features

V1

MobileNetV1 was published in April 2017. Its main architectural innovation was incorporation of depthwise separable convolutions. It was first developed by Laurent Sifre during an internship at

Google Brain Google Brain was a deep learning artificial intelligence research team that served as the sole AI branch of Google before being incorporated under the newer umbrella of Google AI, a research division at Google dedicated to artificial intelligence ...

in 2013 as an architectural variation on

AlexNet AlexNet is a convolutional neural network architecture developed for image classification tasks, notably achieving prominence through its performance in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC). It classifies images into 1, ...

to improve convergence speed and model size. The depthwise separable convolution decomposes a single standard convolution into two convolutions: a depthwise convolution that filters each input channel independently and a pointwise convolution (

1 \times 1

convolution) that combines the outputs of the depthwise convolution. This factorization significantly reduces computational cost. The MobileNetV1 has two hyperparameters: a width multiplier

\alpha

that controls the number of channels in each layer. Smaller values of

\alpha

lead to smaller and faster models, but at the cost of reduced accuracy, and a resolution multiplier

\rho

, which controls the input resolution of the images. Lower resolutions result in faster processing but potentially lower accuracy.

V2

MobileNetV2 was published in March 2019. It uses inverted residual layers and linear bottlenecks. Inverted residuals modify the traditional residual block structure. Instead of compressing the input channels before the depthwise convolution, they ''expand'' them. This expansion is followed by a

1 \times 1

depthwise convolution and then a

1 \times 1

projection layer that reduces the number of channels back down. This inverted structure helps to maintain representational capacity by allowing the depthwise convolution to operate on a higher-dimensional feature space, thus preserving more information flow during the convolutional process. Linear bottlenecks removes the typical ReLU activation function in the projection layers. This was rationalized by arguing that that nonlinear activation loses information in lower-dimensional spaces, which is problematic when the number of channels is already small.

V3

MobileNetV3 was published in 2019. The publication included MobileNetV3-Small, MobileNetV3-Large, and MobileNetEdgeTPU (optimized for Pixel 4). They were found by a form of

neural architecture search Neural architecture search (NAS) is a technique for automating the design of artificial neural networks (ANN), a widely used model in the field of machine learning. NAS has been used to design networks that are on par with or outperform hand-desig ...

(NAS) that takes mobile latency into account, to achieve good trade-off between accuracy and latency. It used piecewise-linear approximations of swish and

sigmoid Sigmoid means resembling the lower-case Greek letter sigma (uppercase Σ, lowercase σ, lowercase in word-final position ς) or the Latin letter S. Specific uses include: * Sigmoid function, a mathematical function * Sigmoid colon, part of the l ...

activation functions (which they called "h-swish" and "h-sigmoid"), squeeze-and-excitation modules, and the inverted bottlenecks of MobileNetV2.

V4

MobileNetV4 was published in September 2024. The publication included a large number of architectures found by NAS. Inspired by Vision Transformers, the V4 series included multi-query attention. It also unified both inverted residual and inverted bottleneck from the V3 series with the "universal inverted bottleneck", which includes these two as special cases.

References

External links

* * {{Differentiable computing Computer vision Machine learning Google software

Features

V1

V2

V3

V4

See also

References

External links