HOME

TheInfoList



OR:

A binary file is a
computer file A computer file is a computer resource for recording data in a computer storage device, primarily identified by its file name. Just as words can be written to paper, so can data be written to a computer file. Files can be shared with and trans ...
that is not a
text file A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operat ...
. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files containing formatted text, such as older Microsoft Word document files, contain the text of the document but also contain formatting information in binary form.


Structure

Binary files are usually thought of as being a sequence of
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s, which means the binary digits (
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s) are grouped in eights. Binary files typically contain bytes that are intended to be interpreted as something other than text characters. Compiled computer programs are typical examples; indeed, compiled applications are sometimes referred to, particularly by programmers, as binaries. But binary files can also mean that they contain images, sounds, compressed versions of other files, etc. – in short, any type of file content whatsoever. Some binary files contain headers, blocks of metadata used by a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer program ...
to interpret the data in the file. The header often contains a signature or ''magic'' number which can identify the format. For example, a GIF file can contain multiple images, and headers are used to identify and describe each block of image data. The leading bytes of the header would contain text like GIF87a or GIF89a that can identify the binary as a GIF file. If a binary file does not contain any headers, it may be called a flat binary file. A text file may consist partly or entirely of encoded binary information. When sending binary files over the network they may be encoded so that they use only printable characters. This is often necessary due to the limitations of network protocols used for internet browsing and e-mail communication. One such encoding is
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
. Also, files containing public-key and private-key information for use in systems employing
asymmetric cryptography Public-key cryptography, or asymmetric cryptography, is the field of cryptographic systems that use pairs of related keys. Each key pair consists of a public key and a corresponding private key. Key pairs are generated with cryptographic alg ...
(such as website certificates) may also be stored with the binary information encoded in printable characters.


Manipulation

To send binary files through certain systems (such as
email Electronic mail (email or e-mail) is a method of exchanging messages ("mail") between people using electronic devices. Email was thus conceived as the electronic ( digital) version of, or counterpart to, mail, at a time when "mail" mean ...
) that do not allow all data values, they are often translated into a plain text representation (using, for example,
Base64 In computer programming, Base64 is a group of binary-to-text encoding schemes that represent binary data (more specifically, a sequence of 8-bit bytes) in sequences of 24 bits that can be represented by four 6-bit Base64 digits. Common to all bina ...
). Encoding the data has the disadvantage of increasing the file size during the transfer (for example, using Base64 will increase the file's size by approximately 30%), as well as requiring translation back into binary after receipt. The increased size may be countered by lower-level link compression, as the resulting text data will have about as much less
entropy Entropy is a scientific concept, as well as a measurable physical property, that is most commonly associated with a state of disorder, randomness, or uncertainty. The term and the concept are used in diverse fields, from classical thermodynam ...
as it has increased size, so the actual data transferred in this scenario would likely be very close to the size of the original binary data. See
Binary-to-text encoding A binary-to-text encoding is encoding of data in plain text. More precisely, it is an encoding of binary data in a sequence of printable characters. These encodings are necessary for transmission of data when the channel does not allow binary ...
for more on this subject. Microsoft Windows and its standard libraries for the C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
programming languages allow the programmer to specify a parameter indicating if a file is expected to be plain text or binary when opening a file; this affects the standard library calls to read and write from the file in that the system converts between the C/C++ "end of line" character (the ASCII linefeed character) and the end-of-line sequence Windows expects in files (the ASCII
carriage return A carriage return, sometimes known as a cartridge return and often shortened to CR, or return, is a control character or mechanism used to reset a device's position to the beginning of a line of text. It is closely associated with the line feed ...
and linefeed characters in sequence). In
Unix-like A Unix-like (sometimes referred to as UN*X or *nix) operating system is one that behaves in a manner similar to a Unix system, although not necessarily conforming to or being certified to any version of the Single UNIX Specification. A Unix-li ...
systems, the C and C++ standard libraries on those systems also allow the programmer to specify whether a file is expected to be text or binary, but the libraries can and do ignore that parameter, as the end-of-line sequence in Unix-like systems is just the C/C++ end-of-line character.


Viewing

A
hex editor A hex editor (or ''binary file editor'' or ''byte editor'') is a computer program that allows for manipulation of the fundamental binary data that constitutes a computer file. The name 'hex' comes from 'hexadecimal', a standard numerical format f ...
or viewer may be used to view file data as a sequence of hexadecimal (or decimal, binary or ASCII character) values for corresponding bytes of a binary file. If a binary file is opened in a
text editor A text editor is a type of computer program that edits plain text. Such programs are sometimes known as "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be ...
, each group of eight bits will typically be translated as a single character, and the user will see a (probably unintelligible) display of textual characters. If the file is opened in some other application, that application will have its own use for each byte: maybe the application will treat each byte as a number and output a stream of numbers between 0 and 255—or maybe interpret the numbers in the bytes as colors and display the corresponding picture. Other type of viewers (called 'word extractors') simply replace the unprintable characters with spaces revealing only the human-readable text. This type of view is useful for a quick inspection of a binary file in order to find passwords in games, find hidden text in non-text files and recover corrupted documents. It can even be used to inspect suspicious files (software) for unwanted effects. For example, the user would see any URL/email to which the suspected software may attempt to connect in order to upload unapproved data (to steal). If the file is itself treated as an executable and run, then the operating system will attempt to interpret the file as a series of instructions in its
machine language In computer programming, machine code is any low-level programming language, consisting of machine language instructions, which are used to control a computer's central processing unit (CPU). Each instruction causes the CPU to perform a very ...
.


Interpretation

Standards are very important to binary files. For example, a binary file interpreted by the ASCII character set will result in text being displayed. A custom application can interpret the file differently: a byte may be a sound, or a pixel, or even an entire word. Binary itself is meaningless, until such time as an executed algorithm defines what should be done with each bit, byte, word or block. Thus, just examining the binary and attempting to match it against known formats can lead to the wrong conclusion as to what it actually represents. This fact can be used in
steganography Steganography ( ) is the practice of representing information within another message or physical object, in such a manner that the presence of the information is not evident to human inspection. In computing/electronic contexts, a computer file, ...
, where an algorithm interprets a binary data file differently to reveal hidden content. Without the algorithm, it is impossible to tell that hidden content exists.


Binary compatibility

Two files that are binary compatible will have the same sequence of zeros and ones in the data portion of the file. The file header, however, may be different. The term is used most commonly to state that data files produced by one application are exactly the same as data files produced by another application. For example, some software companies produce applications for
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ser ...
and the
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
that are binary compatible, which means that a file produced in a Windows environment is interchangeable with a file produced on a Macintosh. This avoids many of the conversion problems caused by importing and exporting data. One possible binary compatibility issue between different computers is the
endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the mos ...
of the computer. Some computers store the bytes in a file in a different order.


See also

*
Binary large object A binary large object (BLOB or blob) is a collection of binary data stored as a single entity. Blobs are typically images, audio or other multimedia objects, though sometimes binary executable code is stored as a blob. They can exist as persistent ...
*
Disassembler A disassembler is a computer program that translates machine language into assembly language—the inverse operation to that of an assembler. A disassembler differs from a decompiler, which targets a high-level language rather than an assembly l ...
* Executable *
List of file formats This is a list of file formats used by computers, organized by type. Filename extension it is usually noted in parentheses if they differ from the file format name or abbreviation. Many operating systems do not limit filenames to one extension s ...


References


External links

* {{Computer files Computer file formats