Protocol Buffers (Protobuf) is a
free and open-source
Free and open-source software (FOSS) is software available under a Software license, license that grants users the right to use, modify, and distribute the software modified or not to everyone free of charge. FOSS is an inclusive umbrella term ...
cross-platform
Within computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several Computing platform, computing platforms. Some ...
data format used to
serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an
interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.
Overview
Google
Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
developed Protocol Buffers for internal use and provided a
code generator for multiple languages under an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
license.
The design goals for Protocol Buffers emphasized simplicity and performance. In particular, it was designed to be smaller and faster than
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
.
Protocol Buffers is widely used at Google for storing and interchanging all kinds of structured information. The method serves as a basis for a custom
remote procedure call
In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure (subroutine) to execute in a different address space (commonly on another computer on a shared computer network), which is written as if it were a ...
(RPC) system that is used for nearly all
inter-machine communication at Google.
Protocol Buffers is similar to the
Apache Thrift
Thrift is an IDL (interface definition language, Interface Definition Language) and Binary protocol, binary communication protocol used for defining and creating service (systems architecture), services for programming languages. It was developed ...
,
Ion, and Microsoft Bond protocols, offering a concrete RPC
protocol stack to use for defined
service
Service may refer to:
Activities
* Administrative service, a required part of the workload of university faculty
* Civil service, the body of employees of a government
* Community service, volunteer service for the benefit of a community or a ...
s called
gRPC
gRPC (acronym for gRPC Remote Procedure Calls) is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservi ...
.
Data structure schemas (called ''messages'') and services are described in a proto definition file (
.proto
) and compiled with
protoc
. This compilation generates code that can be invoked by a sender or recipient of these data structures. For example,
example.pb.cc
and
example.pb.h
are generated from
example.proto
. They define
C++ classes for each message and service in
example.proto
.
Canonically, messages are serialized into a
binary wire
file:Sample cross-section of high tension power (pylon) line.jpg, Overhead power cabling. The conductor consists of seven strands of steel (centre, high tensile strength), surrounded by four outer layers of aluminium (high conductivity). Sample d ...
format which is compact,
forward- and
backward-compatible, but not
self-describing (that is, there is no way to tell the names, meaning, or full datatypes of fields without an external specification). There is no defined way to include or refer to such an external specification (
schema
Schema may refer to:
Science and technology
* SCHEMA (bioinformatics), an algorithm used in protein engineering
* Schema (genetic algorithms), a set of programs or bit strings that have some genotypic similarity
* Schema.org, a web markup vocab ...
) within a Protocol Buffers file. The officially supported implementation includes an ASCII serialization format, but this format—though self-describing—loses the forward- and backward-compatibility behavior, and is thus not a good choice for applications other than human editing and debugging.
Though the primary purpose of Protocol Buffers is to facilitate network communication, its simplicity and speed make Protocol Buffers an alternative to data-centric C++ classes and structs, especially where interoperability with other languages or systems might be needed in the future.
Limitations
Protobufs have no single specification. The format is best suited for small data chunks that don't exceed a few megabytes and can be loaded/sent into memory right away and therefore is not a streamable format. The library doesn't provide compression out of the box. The format also isn't well supported in non–object-oriented languages (e.g.
Fortran).
Example
A schema for a particular use of protocol buffers associates data types with field names, using integers to identify each field. (The protocol buffer data contains only the numbers, not the field names, providing some bandwidth/storage savings compared with systems that include the field names in the data.)
// polyline.proto
syntax = "proto2";
message Point
message Line
message Polyline
The "Point" message defines two mandatory data items, ''x'' and ''y''. The data item ''label'' is optional. Each data item has a tag. The tag is defined after the equal sign. For example, ''x'' has the tag 1.
The "Line" and "Polyline" messages, which both use Point, demonstrate how composition works in Protocol Buffers. Polyline has a ''repeated'' field, and thus Polyline behaves like a set of points (of unspecified number).
This schema can subsequently be compiled for use by one or more programming languages. Google provides a compiler called
protoc
which can produce output for C++, Java or Python. Other schema compilers are available from other sources to create language-dependent output for over 20 other languages.
For example, after a C++ version of the protocol buffer schema above is produced, a C++ source code file, polyline.cpp, can use the message objects as follows:
// polyline.cpp
#include "polyline.pb.h" // generated by calling "protoc polyline.proto"
Line* createNewLine(const std::string& name)
Polyline* createNewPolyline()
Language support
Protobuf 2.0 provides a
code generator for
C++,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
,
C#, and
Python.
Protobuf 3.0 provides a code generator for
C++,
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
(including JavaNano, a dialect intended for
low-resource environments),
Kotlin,
Python,
Go,
Ruby
Ruby is a pinkish-red-to-blood-red-colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapph ...
,
Objective-C
Objective-C is a high-level general-purpose, object-oriented programming language that adds Smalltalk-style message passing (messaging) to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was ...
,
C#,
PHP,
Dart. It also supports JavaScript since 3.0.0-beta-2.
Third-party implementations are also available for
Ballerina,
C,
C++,
Dart,
Elixir
An elixir is a sweet liquid used for medical purposes, to be taken orally and intended to cure one's illness. When used as a dosage form, pharmaceutical preparation, an elixir contains at least one active ingredient designed to be taken orall ...
,
Erlang,
Haskell
Haskell () is a general-purpose, statically typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research, and industrial applications, Haskell pioneered several programming language ...
,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
,
Julia,
Nim,
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
,
PHP,
Prolog
Prolog is a logic programming language that has its origins in artificial intelligence, automated theorem proving, and computational linguistics.
Prolog has its roots in first-order logic, a formal logic. Unlike many other programming language ...
,
R,
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO(OH) ...
,
Scala, and
Swift
Swift or SWIFT most commonly refers to:
* SWIFT, an international organization facilitating transactions between banks
** SWIFT code
* Swift (programming language)
* Swift (bird), a family of birds
It may also refer to:
Organizations
* SWIF ...
.
See also
*
gRPC
gRPC (acronym for gRPC Remote Procedure Calls) is a cross-platform high-performance remote procedure call (RPC) framework. gRPC was initially created by Google, but is open source and is used in many organizations. Use cases range from microservi ...
*
Comparison of data-serialization formats
*
FlatBuffers
*
ASN.1
Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networ ...
References
External links
Official documentationat developers.google.com
*
Mouse Melon- A GUI for editing Protocol Buffers data
{{Google FOSS
Data serialization formats
Google software
Software using the BSD license