Netstrings
   HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, a netstring is a formatting method for byte strings that uses a declarative notation to indicate the size of the string. Netstrings store the byte length of the data that follows, making it easier to unambiguously pass text and byte data between programs that could be sensitive to values that could be interpreted as delimiters or terminators (such as a
null character The null character (also null terminator) is a control character with the value zero. It is present in many character sets, including those defined by the Baudot and ITA2 codes, ISO/IEC 646 (or ASCII), the C0 control code, the Universal Coded Ch ...
). The format consists of the string's length written using ASCII digits, followed by a colon, the byte data, and a comma. "Length" in this context means "number of 8-bit units", so if the string is, for example, encoded using
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
, this may or may not be identical to the number of textual characters that are present in the string. For example, the text "hello world!" encodes as: < > i.e. And an empty string as: < > i.e. The comma makes it slightly simpler for humans to read netstrings that are used as adjacent records, and provides weak verification of correct parsing. Note that without the comma, the format mirrors how
Bencode Bencode (pronounced like ''Bee-encode'') is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. It supports four different types of values: * byte strings, * integers, * list ...
encodes strings. The length is written without leading zeroes. Empty string is the only netstring that begins with zero. There is exactly one legal netstring encoding for any
byte string In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
. Since the format is easy to generate and to
parse Parsing, syntax analysis, or syntactic analysis is the process of analyzing a string of symbols, either in natural language, computer languages or data structures, conforming to the rules of a formal grammar. The term ''parsing'' comes from Lat ...
, it is easy to support by programs written in different programming languages. In practice, netstrings are often used to simplify exchange of bytestrings, or lists of bytestrings. For example, see its use in the
Simple Common Gateway Interface The Simple Common Gateway Interface (SCGI) is a protocol for applications to interface with HTTP servers, as an alternative to the CGI protocol. It is similar to FastCGI but is designed to be easier to parse. Unlike CGI, it permits a long-running ...
(SCGI) and the Quick Mail Queuing Protocol (QMQP) . Netstrings avoid complications that arise in trying to embed arbitrary data in delimited formats. For example, XML may not contain certain byte values and requires a nontrivial combination of
escaping Escape or Escaping may refer to: Computing * Escape character, in computing and telecommunication, a character which signifies that what follows takes an alternative interpretation ** Escape sequence, a series of characters used to trigger some so ...
and delimiting, while generating multipart MIME messages involves choosing a delimiter that must not clash with the content of the data. Netstrings can be stored recursively. The result of encoding a sequence of strings is a single string. Rewriting the above "hello world!" example to instead be a sequence of two netstrings, itself encoded as a single netstring, gives the following: Parsing such a nested netstring is an example of
duck typing Duck typing in computer programming is an application of the duck test—"If it walks like a duck and it quacks like a duck, then it must be a duck"—to determine whether an object can be used for a particular purpose. With nominative ty ...
, since the contained string ("5:hello,6:world!,") is both a string and a sequence of netstrings. Its effective type is determined by how the application chooses to interpret it, not by any explicit type declaration required by the netstring specification. In general, there are 3 ways that a program expecting a netstring may choose to interpret its contents: * As human-readable text with no further automatic processing * As encapsulated data in some pre-arranged fixed
data serialization In computing, serialization (or serialisation) is the process of translating a data structure or object state into a format that can be stored (e.g. files in secondary storage devices, data buffers in primary storage devices) or transmitted (e ...
format (such as the binary contents of a C or C++ struct) * As encapsulated metadata and data, using a
tagged union In computer science, a tagged union, also called a variant, variant record, choice type, discriminated union, disjoint union, sum type or coproduct, is a data structure used to hold a value that could take on several different, but fixed, types. O ...
convention to describe the types of nested netstrings, thereby establishing a
self-describing In computer programming, self-documenting (or self-describing) source code and user interfaces follow naming conventions and structured programming conventions that enable use of the system without prior specific knowledge. In web development, ...
hierarchical data serialization format. ("Tagged netstrings" and
Bencode Bencode (pronounced like ''Bee-encode'') is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data. It supports four different types of values: * byte strings, * integers, * list ...
can be seen as extensions of netstring that support similar self-describing hierarchical formats"tnetstring: data serialization using typed netstrings"
) Note that since netstrings pose no limitations on the contents of the data they store, netstrings can not be embedded verbatim in most delimited formats without the possibility of interfering with the delimiting of the containing format. In the context of network programming it is potentially useful that the receiving program is informed of the size of the data that follows, as it can allocate exactly enough memory, avoid the need for reallocation to accommodate more data, and preemptively reject data that would exceed size limits.


See also

*
Hollerith constant Hollerith constants, named in honor of Herman Hollerith, were used in early FORTRAN programs to allow manipulation of character data. Early FORTRAN had no CHARACTER data type, only numeric types. In order to perform character manipulation, charac ...


Notes and references

{{reflist


External links

* http://cr.yp.to/proto/netstrings.txt * http://wiki.tcl.tk/15074 Data serialization formats String data structures