Protocol Buffers (Protobuf) is a
free and open-source cross-platform
In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
data format used to
serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data. The method involves an
interface description language that describes the structure of some data and a program that generates source code from that description for generating or parsing a stream of bytes that represents the structured data.
Overview
Google
Google LLC () is an American Multinational corporation, multinational technology company focusing on Search Engine, search engine technology, online advertising, cloud computing, software, computer software, quantum computing, e-commerce, ar ...
developed Protocol Buffers for internal use and provided a
code generator In computing, Code generation denotes software techniques or systems that generate program code which may then be used independently of the generator system in a runtime environment.
Specific articles:
* Code generation (compiler), a mechanism to pr ...
for multiple languages under an
open-source license (see
below).
The design goals for Protocol Buffers emphasized simplicity and performance. In particular, it was designed to be smaller and faster than
XML
Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
.
Protocol Buffers are widely used at Google for storing and interchanging all kinds of structured information. The method serves as a basis for a custom
remote procedure call
In distributed computing, a remote procedure call (RPC) is when a computer program causes a procedure ( subroutine) to execute in a different address space (commonly on another computer on a shared network), which is coded as if it were a normal ( ...
(RPC) system that is used for nearly all
inter-machine communication at Google.
Protocol Buffers are similar to the
Apache Thrift (used by Facebook,
Evernote
Evernote is a note-taking and task management application. It is developed by the Evernote Corporation, headquartered in Redwood City, California. It is intended for archiving and creating notes in which photos, audio and saved web content can ...
),
Ion
An ion () is an atom or molecule with a net electrical charge.
The charge of an electron is considered to be negative by convention and this charge is equal and opposite to the charge of a proton, which is considered to be positive by conve ...
(created by Amazon), or Microsoft Bond protocols, offering as well a concrete RPC
protocol stack
The protocol stack or network stack is an implementation of a computer networking protocol suite or protocol family. Some of these terms are used interchangeably but strictly speaking, the ''suite'' is the definition of the communication protoco ...
to use for defined
services called
gRPC
gRPC (Google Remote Procedure Calls) is a cross-platform open source high performance Remote procedure call, Remote Procedure Call (RPC) framework. gRPC was initially created by Google, which has used a single general-purpose RPC infrastructure ...
.
Data structures (called ''messages'') and services are described in a proto definition file (
.proto
) and compiled with
protoc
. This compilation generates code that can be invoked by a sender or recipient of these data structures. For example,
example.pb.cc
and
example.pb.h
are generated from
example.proto
. They define
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
classes for each message and service in
example.proto
.
Canonically, messages are serialized into a
binary
Binary may refer to:
Science and technology Mathematics
* Binary number, a representation of numbers using only two digits (0 and 1)
* Binary function, a function that takes two arguments
* Binary operation, a mathematical operation that ta ...
wire
Overhead power cabling. The conductor consists of seven strands of steel (centre, high tensile strength), surrounded by four outer layers of aluminium (high conductivity). Sample diameter 40 mm
A wire is a flexible strand of metal.
Wire is c ...
format which is compact, forward- and backward-compatible, but not
self-describing
In computer programming, self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge. In web development, ...
(that is, there is no way to tell the names, meaning, or full datatypes of fields without an external specification). There is no defined way to include or refer to such an external specification (
schema
The word schema comes from the Greek word ('), which means ''shape'', or more generally, ''plan''. The plural is ('). In English, both ''schemas'' and ''schemata'' are used as plural forms.
Schema may refer to:
Science and technology
* SCHEMA ...
) within a Protocol Buffers file. The officially supported implementation includes an ASCII serialization format, but this format—though self-describing—loses the forward- and backward-compatibility behavior, and according to some, is thus not a good choice for applications other than debugging.
Though the primary purpose of Protocol Buffers is to facilitate network communication, its simplicity and speed make Protocol Buffers an alternative to data-centric C++ classes and structs, especially where interoperability with other languages or systems might be needed in the future.
Example
A schema for a particular use of protocol buffers associates data types with field names, using integers to identify each field. (The protocol buffer data contains only the numbers, not the field names, providing some bandwidth/storage savings compared with systems that include the field names in the data.)
// polyline.proto
syntax = "proto2";
message Point
message Line
message Polyline
The "Point" message defines two mandatory data items, ''x'' and ''y''. The data item ''label'' is optional. Each data item has a tag. The tag is defined after the equal sign. For example, ''x'' has the tag 1.
The "Line" and "Polyline" messages, which both use Point, demonstrate how composition works in Protocol Buffers. Polyline has a ''repeated'' field, which behaves like a vector.
This schema can subsequently be compiled for use by one or more programming languages. Google provides a compiler called
protoc
which can produce output for C++, Java or Python. Other schema compilers are available from other sources to create language-dependent output for over 20 other languages.
For example, after a C++ version of the protocol buffer schema above is produced, a C++ source code file, polyline.cpp, can use the message objects as follows:
// polyline.cpp
#include "polyline.pb.h" // generated by calling "protoc polyline.proto"
Line* createNewLine(const std::string& name)
Polyline* createNewPolyline()
Language support
Protobuf 2.0 provides a
code generator In computing, Code generation denotes software techniques or systems that generate program code which may then be used independently of the generator system in a runtime environment.
Specific articles:
* Code generation (compiler), a mechanism to pr ...
for
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
,
C#, and
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
.
Protobuf 3.0 provides a code generator for
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Java
Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
(including JavaNano, a dialect intended for
low-resource environments),
Python
Python may refer to:
Snakes
* Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia
** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia
* Python (mythology), a mythical serpent
Computing
* Python (pro ...
,
Go,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum ( aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called ...
,
Objective-C
Objective-C is a general-purpose, object-oriented programming language that adds Smalltalk-style messaging to the C programming language. Originally developed by Brad Cox and Tom Love in the early 1980s, it was selected by NeXT for its NeXT ...
,
C#. It also supports JavaScript since 3.0.0-beta-2.
Third-party implementations are also available for
Ballerina
A ballet dancer ( it, ballerina fem.; ''ballerino'' masc.) is a person who practices the art of classical ballet. Both females and males can practice ballet; however, dancers have a strict hierarchy and strict gender roles. They rely on ye ...
,
C,
C++
C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
,
Dart,
Elixir
ELIXIR (the European life-sciences Infrastructure for biological Information) is an initiative that will allow life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring t ...
,
Erlang,
Haskell
Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of websites use JavaScript on the client side for webpage behavior, of ...
,
Perl
Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
,
PHP
PHP is a general-purpose scripting language geared toward web development. It was originally created by Danish-Canadian programmer Rasmus Lerdorf in 1993 and released in 1995. The PHP reference implementation is now produced by The PHP Group. ...
,
Prolog
Prolog is a logic programming language associated with artificial intelligence and computational linguistics.
Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is intended primarily ...
,
R,
Rust
Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO( ...
,
Scala,
Swift
Swift or SWIFT most commonly refers to:
* SWIFT, an international organization facilitating transactions between banks
** SWIFT code
* Swift (programming language)
* Swift (bird), a family of birds
It may also refer to:
Organizations
* SWIFT, ...
,
Julia
Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
and
Nim.
See also
*
gRPC
gRPC (Google Remote Procedure Calls) is a cross-platform open source high performance Remote procedure call, Remote Procedure Call (RPC) framework. gRPC was initially created by Google, which has used a single general-purpose RPC infrastructure ...
*
Comparison of data-serialization formats
This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Overview
Syntax comparison of human-readable form ...
*
FlatBuffers
*
ASN.1
Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and ...
References
External links
Official documentationat developers.google.com
*
{{Google FOSS
Data serialization formats
Google software
Software using the BSD license