UBJSON
   HOME

TheInfoList



OR:

Universal Binary JSON (UBJSON) is a
computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
data interchange format. It is a binary form directly imitating
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
, but requiring fewer bytes of data. It aims to achieve the generality of JSON, combined with being much easier to process than JSON.


Rationale and Objectives

UBJSON is a proposed successor to
BSON BSON (; Binary JSON) is a computer data interchange format extending JSON. It is a binary form for representing simple or complex data structures including associative arrays (also known as name-value pairs), integer indexed arrays, and a suit ...
, BJSON and others. UBJSON has the following goals: * Complete compatibility with the JSON specification – there is a 1:1 mapping between standard JSON and UBJSON. * Ease of implementation – only including data types that are widely supported in popular programming languages so that there are no problems with certain languages not being supported well. * Ease of use – it can be quickly understood and adopted. * Speed and efficiency – UBJSON uses data representations that are (roughly) 30% smaller than their compacted JSON counterparts and are optimized for fast parsing. Streamed serialisation is supported, meaning that the transfer of UBJSON over a network connection can start sending data before the final size of the data is known.


Data types and syntax

UBJSON data can be either a ''value'' or a ''container''.


Value types

UBJSON uses a single binary tuple to represent all JSON value types: type ength ata Each element in the tuple is defined as:


type

The type is a 1-byte
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
character used to indicate the type of the data following it. The ASCII characters were chosen to make manually walking and debugging data stored in the UBJSON format as easy as possible (e.g. making the data relatively readable in a hex editor). Types are available for the five JSON value types. There is also a
no-op In computer science, a NOP, no-op, or NOOP (pronounced "no op"; short for no operation) is a machine language instruction and its assembly language mnemonic, programming language statement, or computer protocol command that does nothing. Mac ...
type used for stream keep-alive. *
Null Null may refer to: Science, technology, and mathematics Astronomy *Nuller, an optical tool using interferometry to block certain sources of light Computing *Null (SQL) (or NULL), a special marker and keyword in SQL indicating that a data value do ...
: Z *
No-op In computer science, a NOP, no-op, or NOOP (pronounced "no op"; short for no operation) is a machine language instruction and its assembly language mnemonic, programming language statement, or computer protocol command that does nothing. Mac ...
: N - no operation, to be ignored by the receiving end * Boolean types: true (T) and false (F) * Numeric types: int8 (i), uint8 (U), int16 (I), int32 (l), int64 (L), float32 (d), float64 (D), and high-precision (H) *
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
character: C *
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
string: S High-precision numbers are represented as an arbitrarily long, UTF-8 string-encoded numeric value.


length (optional)

The length is an integer number (e.g. uint8, or int64) encoding the size of the data payload in bytes. It is used for strings, high-precision numbers and optionally containers. They are omitted for other types. Length is encoded following the same convention as integers, thus including its own type. For example, the string hello is encoded as S,U,0x05,h,e,l,l,o.


data (optional)

A sequence of bytes representing the actual
binary data Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wh ...
for this type of value. All numbers are in
big-endian '' Jonathan_Swift.html" ;"title="Gulliver's Travels'' by Jonathan Swift">Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of d ...
order.


Container types

Similarly to JSON, UBJSON defines two container types: ''array'' and ''object''. Arrays are ordered sequences of elements, represented as a /code> followed by zero or more elements of value and container type and a trailing /code>. Objects are labeled sets of elements, represented as a . Each key is a string with the S character omitted, and each "value" can be any element of value or container type. Alternatively, arrays and objects may indicate the number of elements they contain as # followed by an integer number before their first element, in which case the trailing ] or } is omitted. Additionally, if all elements have the same type, the types can be omitted and replaced by a single $ followed by the type, in which case the element count must follow immediately. For example, the array a","b","c"/nowiki> may be represented as strongly typed array of uint8 values. This ensures binary efficiency while maintaining compatibility with JSON, even though JSON has no direct support for binary data.


Representation

The MIME type 'application/ubjson' is recommended, as is the file extension '.ubj' when stored in a file-system.


Software support

* Teradata Database * The
Wolfram Language The Wolfram Language ( ) is a proprietary, very high-level multi-paradigm programming language developed by Wolfram Research. It emphasizes symbolic computation, functional programming, and rule-based programming and can employ arbitrary stru ...
introduced support for UBJSON in 2017, with version 11.1 of the language.


See also

*
Comparison of data serialization formats This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file format A document file format is a Text file, text or bi ...
*
JSON JSON (JavaScript Object Notation, pronounced or ) is an open standard file format and electronic data interchange, data interchange format that uses Human-readable medium and data, human-readable text to store and transmit data objects consi ...
*
CBOR Concise Binary Object Representation (CBOR) is a binary data serialization format loosely based on JSON authored by Carsten Bormann and Paul Hoffman. Like JSON it allows the transmission of data objects that contain attribute–value pair, name ...
*
Smile A smile is a facial expression formed primarily by flexing the muscles at the sides of the mouth. Some smiles include a contraction of the muscles at the corner of the eyes, an action known as a Duchenne smile. Among humans, a smile expresses d ...
(binary JSON) *
Protocol Buffers Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an ...
*
Action Message Format Action Message Format (AMF) is a binary format used to serialize object graphs such as ActionScript objects and XML, or send messages between an Adobe Flash client and a remote service, usually a Flash Media Server or third party alternatives. T ...
*
Apache Thrift Thrift is an IDL (interface definition language, Interface Definition Language) and Binary protocol, binary communication protocol used for defining and creating service (systems architecture), services for programming languages. It was developed ...
* MessagePack *
Document-oriented database A document-oriented database, or document store, is a computer program and data storage system designed for storing, retrieving and managing document-oriented information, also known as semi-structured data. Document-oriented databases are one ...
(e.g.
MongoDB MongoDB is a source-available, cross-platform, document-oriented database program. Classified as a NoSQL database product, MongoDB uses JSON-like documents with optional database schema, schemas. Released in February 2009 by 10gen (now MongoDB ...
) *
Abstract Syntax Notation One Abstract Syntax Notation One (ASN.1) is a standard interface description language (IDL) for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networ ...
(ASN.1) *
Wireless Binary XML WAP Binary XML (WBXML) is a binary representation of XML. It was developed by the WAP Forum and since 2002 is maintained by the Open Mobile Alliance as a standard to allow XML documents to be transmitted in a compact manner over mobile networks and ...
(WBXML) *
Efficient XML Interchange Efficient XML Interchange (EXI) is a binary XML format for exchange of data on a computer network. It was developed by the W3C's Efficient Extensible Interchange Working Group and is one of the most prominent efforts to encode XML documents in a ...


References


External links

* {{Data Exchange Data serialization formats