HOME

TheInfoList



OR:

Bencode (pronounced like ''Bee-encode'') is the encoding used by the
peer-to-peer Peer-to-peer (P2P) computing or networking is a distributed application architecture that partitions tasks or workloads between peers. Peers are equally privileged, equipotent participants in the network. They are said to form a peer-to-peer ...
file sharing system BitTorrent for storing and transmitting loosely structured data. It supports four different types of values: * byte strings, *
integer An integer is the number zero (), a positive natural number (, , , etc.) or a negative integer with a minus sign ( −1, −2, −3, etc.). The negative numbers are the additive inverses of the corresponding positive numbers. In the language ...
s, *
list A ''list'' is any set of items in a row. List or lists may also refer to: People * List (surname) Organizations * List College, an undergraduate division of the Jewish Theological Seminary of America * SC Germania List, German rugby uni ...
s, and * dictionaries (associative arrays). Bencoding is most commonly used in torrent files, and as such is part of the BitTorrent specification. These metadata files are simply bencoded dictionaries. Bencoding is simple and (because numbers are encoded as text in decimal notation) is unaffected by
endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
, which is important for a
cross-platform In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
application like BitTorrent. It is also fairly flexible, as long as applications ignore unexpected dictionary keys, so that new ones can be added without creating incompatibilities.


Encoding algorithm

Bencode uses
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
characters as delimiters and digits. * An integer is encoded as i''ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
>''e. Leading zeros are not allowed (although the number zero is still represented as "0"). Negative values are encoded by prefixing the number with a
hyphen-minus The hyphen-minus is the most commonly used type of hyphen, widely used in digital documents. It is the only character that looks like a minus sign or a dash in many character sets such as ASCII or on most keyboards, so it is also used as suc ...
. The number 42 would thus be encoded as , 0 as , and -42 as . Negative zero is not permitted. * A byte string (a sequence of
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable unit ...
s, not necessarily characters) is encoded as '''':''''. The length is encoded in base 10, like integers, but must be non-negative (zero is allowed); the contents are just the bytes that make up the string. The string "spam" would be encoded as . The specification does not deal with
encoding In communications and information processing, code is a system of rules to convert information—such as a letter (alphabet), letter, word, sound, image, or gesture—into another form, sometimes data compression, shortened or secrecy, secret ...
of characters outside the ASCII set; to mitigate this, some BitTorrent applications explicitly communicate the encoding (most commonly
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
) in various non-standard ways. This is identical to how netstrings work, except that netstrings additionally append a comma suffix after the byte sequence. * A list of values is encoded as l''''e . The contents consist of the bencoded elements of the list, in order, concatenated. A list consisting of the string "spam" and the number 42 would be encoded as: . Note the absence of separators between elements, and the first character is the letter 'l', not digit '1'. * A dictionary is encoded as d''''e. The elements of the dictionary are encoded with each key immediately followed by its value. All keys must be byte strings and must appear in
lexicographical order In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of ...
. A dictionary that associates the values 42 and "spam" with the keys "foo" and "bar", respectively (in other words, ), would be encoded as follows: . There are no restrictions on what kind of values may be stored in lists and dictionaries; they may (and usually do) contain other lists and dictionaries. This allows for arbitrarily complex data structures to be encoded.


Features & drawbacks

Bencode is a very specialized kind of binary coding with some unique properties: * For each possible (complex) value, there is only a single valid bencoding; i.e. there is a bijection between values and their encodings. This has the advantage that applications may compare bencoded values by comparing their encoded forms, eliminating the need to decode the values. * Many BE codegroups can be decoded manually. Since the bencoded values often contain
binary data Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wher ...
, decoding may become quite complex. Bencode is not considered a
human-readable A human-readable medium or human-readable format is any encoding of data or information that can be naturally read by humans. In computing, ''human-readable'' data is often encoded as ASCII or Unicode text, rather than as binary data. In most ...
encoding format. * Bencoding serves similar purposes as data languages like
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other s ...
and
YAML YAML ( and ) (''see '') is a human-readable data-serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML targets many of the same communications applications as Ext ...
, allowing complex yet loosely structured data to be stored in a platform independent way. However, this uniqueness can cause some problems: * There are very few bencode editors * Because bencoded files contain binary data, and because of some of the intricacies involved in the way binary strings are typically stored, it is often not safe to edit bencode files in text editors.


See also

* BitTorrent


References

{{Reflist


External links


Bencoding specification

File_Bittorrent2
- Another PHP Bencode/decode implementation
The original BitTorrent implementation in Python as standalone package

BEncode Editor
a visual editor for BEncoded files
Torrent File Editor
cross-platform GUI editor for BEncode files
bencode-tools
- a C library for manipulating bencoded data and a XML schema like validator for bencode messages in Python
Bento
- Bencode library in Elixir.
Beecoder
- the file stream parser that de/encoding "B-encode" data format on Java using java.io.* stream Api.
Bencode parsing in Java

Bencode library in Scala

Bencode parsing in C

There are numerous Perl implementations on CPAN
BitTorrent Data serialization formats