HOME

TheInfoList



OR:

Binary data is
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted. ...
whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the
binary numeral system A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" ( one). The base-2 numeral system is a positional notation ...
and
Boolean algebra In mathematics and mathematical logic, Boolean algebra is a branch of algebra. It differs from elementary algebra in two ways. First, the values of the variables are the truth values ''true'' and ''false'', usually denoted 1 and 0, whereas i ...
. Binary data occurs in many different technical and scientific fields, where it can be called by different names including ''
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented ...
'' (binary digit) in computer science, '' truth value'' in
mathematical logic Mathematical logic is the study of formal logic within mathematics. Major subareas include model theory, proof theory, set theory, and recursion theory. Research in mathematical logic commonly addresses the mathematical properties of formal sy ...
and related domains and '' binary variable'' in statistics.


Mathematical and combinatoric foundations

A
discrete Discrete may refer to: *Discrete particle or quantum in physics, for example in quantum theory *Discrete device, an electronic component with just one circuit element, either passive or active, other than an integrated circuit *Discrete group, a g ...
variable that can take only one state contains zero
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
, and is the next
natural number In mathematics, the natural numbers are those numbers used for counting (as in "there are ''six'' coins on the table") and ordering (as in "this is the ''third'' largest city in the country"). Numbers used for counting are called ''cardinal ...
after 1. That is why the
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represented ...
, a variable with only two possible values, is a standard primary
unit of information In computing and telecommunications, a unit of information is the capacity of some standard data storage system or communication channel, used to measure the capacities of other systems and channels. In information theory, units of information ar ...
. A collection of bits may have states: see
binary number A binary number is a number expressed in the base-2 numeral system or binary numeral system, a method of mathematical expression which uses only two symbols: typically "0" (zero) and "1" ( one). The base-2 numeral system is a positional notation ...
for details. Number of states of a collection of discrete variables depends
exponentially Exponential may refer to any of several mathematical topics related to exponentiation, including: *Exponential function, also: **Matrix exponential, the matrix analogue to the above *Exponential decay, decrease at a rate proportional to value *Expo ...
on the number of variables, and only as a
power law In statistics, a power law is a functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities: one qu ...
on number of states of each variable. Ten bits have more () states than three
decimal digit A numerical digit (often shortened to just digit) is a single symbol used alone (such as "2") or in combinations (such as "25"), to represent numbers in a positional numeral system. The name "digit" comes from the fact that the ten digits (Latin ...
s (). bits are more than sufficient to represent an information (a
number A number is a mathematical object used to count, measure, and label. The original examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can ...
or anything else) that requires decimal digits, so information contained in discrete variables with 3, 4, 5, 6, 7, 8, 9, 10... states can be ever superseded by allocating two, three, or four times more bits. So, the use of any other small number than 2 does not provide an advantage. Moreover, Boolean algebra provides a convenient mathematical structure for collection of bits, with a semantic of a collection of
propositional variable In mathematical logic, a propositional variable (also called a sentential variable or sentential letter) is an input variable (that can either be true or false) of a truth function. Propositional variables are the basic building-blocks of proposit ...
s. Boolean algebra operations are known as "
bitwise operation In computer programming, a bitwise operation operates on a bit string, a bit array or a binary numeral (considered as a bit string) at the level of its individual bits. It is a fast and simple action, basic to the higher-level arithmetic oper ...
s" in computer science. Boolean functions are also well-studied theoretically and easily implementable, either with computer programs or by so-named
logic gate A logic gate is an idealized or physical device implementing a Boolean function, a logical operation performed on one or more binary inputs that produces a single binary output. Depending on the context, the term may refer to an ideal logic gate ...
s in
digital electronics Digital electronics is a field of electronics involving the study of digital signals and the engineering of devices that use or produce them. This is in contrast to analog electronics and analog signals. Digital electronic circuits are usually ...
. This contributes to the use of bits to represent different data, even those originally not binary.


In statistics

In
statistics Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...
, binary data is a
statistical data type In statistics, groups of individual data points may be classified as belonging to any of various statistical data types, e.g. categorical ("red", "blue", "green"), real number (1.68, -5, 1.7e+6), odd number (1,3,5) etc. The data type is a fundam ...
consisting of categorical data that can take exactly two possible values, such as "A" and "B", or "heads" and "tails". It is also called dichotomous data, and an older term is quantal data. The two values are often referred to generically as "success" and "failure". As a form of categorical data, binary data is
nominal data Level of measurement or scale of measure is a classification that describes the nature of information within the values assigned to variables. Psychologist Stanley Smith Stevens developed the best-known classification with four levels, or scale ...
, meaning the values are qualitatively different and cannot be compared numerically. However, the values are frequently represented as 1 or 0, which corresponds to counting the number of successes in a single trial: 1 (success) or 0 (failure); see . Often, binary data is used to represent one of two conceptually opposed values, e.g.: *the outcome of an experiment ("success" or "failure") *the response to a yes-no question ("yes" or "no") *presence or absence of some feature ("is present" or "is not present") *the truth or falsehood of a proposition ("true" or "false", "correct" or "incorrect") However, it can also be used for data that is assumed to have only two possible values, even if they are not conceptually opposed or conceptually represent all possible values in the space. For example, binary data is often used to represent the party choices of voters in elections in the United States, i.e. Republican or Democratic. In this case, there is no inherent reason why only two
political parties A political party is an organization that coordinates candidates to compete in a particular country's elections. It is common for the members of a party to hold similar ideas about politics, and parties may promote specific ideological or po ...
should exist, and indeed, other parties do exist in the U.S., but they are so minor that they are generally simply ignored. Modeling continuous data (or categorical data of more than 2 categories) as a binary variable for analysis purposes is called dichotomization (creating a
dichotomy A dichotomy is a partition of a whole (or a set) into two parts (subsets). In other words, this couple of parts must be * jointly exhaustive: everything must belong to one part or the other, and * mutually exclusive: nothing can belong simult ...
). Like all
discretization In applied mathematics, discretization is the process of transferring continuous functions, models, variables, and equations into discrete counterparts. This process is usually carried out as a first step toward making them suitable for numerical ...
, it involves
discretization error In numerical analysis, computational physics, and simulation, discretization error is the error resulting from the fact that a function of a continuous variable is represented in the computer by a finite number of evaluations, for example, on a ...
, but the goal is to learn something valuable despite the error: treating it as negligible for the purpose at hand, but remembering that it cannot be assumed to be negligible in general.


Binary variables

A binary variable is a
random variable A random variable (also called random quantity, aleatory variable, or stochastic variable) is a mathematical formalization of a quantity or object which depends on random events. It is a mapping or a function from possible outcomes (e.g., the po ...
of binary type, meaning with two possible values.
Independent and identically distributed In probability theory and statistics, a collection of random variables is independent and identically distributed if each random variable has the same probability distribution as the others and all are mutually independent. This property is usual ...
(i.i.d.) binary variables follow a
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
, but in general binary data need not come from i.i.d. variables. Total counts of i.i.d. binary variables (equivalently, sums of i.i.d. binary variables coded as 1 or 0) follow a
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no questi ...
, but when binary variables are not i.i.d., the distribution need not be binomial.


Counting

Like categorical data, binary data can be converted to a
vector Vector most often refers to: *Euclidean vector, a quantity with a magnitude and a direction *Vector (epidemiology), an agent that carries and transmits an infectious pathogen into another living organism Vector may also refer to: Mathematic ...
of
count data Count (feminine: countess) is a historical title of nobility in certain European countries, varying in relative status, generally of middling rank in the hierarchy of nobility. Pine, L. G. ''Titles: How the King Became His Majesty''. New York: ...
by writing one coordinate for each possible value, and counting 1 for the value that occurs, and 0 for the value that does not occur. For example, if the values are A and B, then the data set A, A, B can be represented in counts as (1, 0), (1, 0), (0, 1). Once converted to counts, binary data can be grouped and the counts added. For instance, if the set A, A, B is grouped, the total counts are (2, 1): 2 A's and 1 B (out of 3 trials). Since there are only two possible values, this can be simplified to a single count (a scalar value) by considering one value as "success" and the other as "failure", coding a value of the success as 1 and of the failure as 0 (using only the coordinate for the "success" value, not the coordinate for the "failure" value). For example, if the value A is considered "success" (and thus B is considered "failure"), the data set A, A, B would be represented as 1, 1, 0. When this is grouped, the values are added, while the number of trial is generally tracked implicitly. For example, A, A, B would be grouped as 1 + 1 + 0 = 2 successes (out of n = 3 trials). Going the other way, count data with n = 1 is binary data, with the two classes being 0 (failure) or 1 (success). Counts of i.i.d. binary variables follow a
binomial distribution In probability theory and statistics, the binomial distribution with parameters ''n'' and ''p'' is the discrete probability distribution of the number of successes in a sequence of ''n'' independent experiments, each asking a yes–no questi ...
, with the total number of trials (points in the grouped data).


Regression

Regression analysis In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome' or 'response' variable, or a 'label' in machine learning parlance) and one ...
on predicted outcomes that are binary variables is known as binary regression; when binary data is converted to count data and modeled as i.i.d. variables (so they have a binomial distribution), binomial regression can be used. The most common regression methods for binary data are
logistic regression In statistics, the logistic model (or logit model) is a statistical model that models the probability of an event taking place by having the log-odds for the event be a linear combination of one or more independent variables. In regression analy ...
,
probit regression In statistics, a probit model is a type of regression where the dependent variable can take only two values, for example married or not married. The word is a portmanteau, coming from ''probability'' + ''unit''. The purpose of the model is to est ...
, or related types of binary choice models. Similarly, counts of i.i.d. categorical variables with more than two categories can be modeled with a multinomial regression. Counts of non-i.i.d. binary data can be modeled by more complicated distributions, such as the
beta-binomial distribution In probability theory and statistics, the beta-binomial distribution is a family of discrete probability distributions on a finite support of non-negative integers arising when the probability of success in each of a fixed or known number of Ber ...
(a
compound distribution In probability and statistics, a compound probability distribution (also known as a mixture distribution or contagious distribution) is the probability distribution that results from assuming that a random variable is distributed according to some p ...
). Alternatively, the ''relationship'' can be modeled without needing to explicitly model the distribution of the output variable using techniques from
generalized linear model In statistics, a generalized linear model (GLM) is a flexible generalization of ordinary linear regression. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a ''link function'' and by ...
s (GLM), such as
quasi-likelihood In statistics, quasi-likelihood methods are used to estimate parameters in a statistical model when exact likelihood methods, for example maximum likelihood estimation, are computationally infeasible. Due to the wrong likelihood being used, quasi-l ...
and a quasibinomial model; see .


In computer science

In modern computers, binary data refers to any data represented in binary form rather than interpreted on a higher level or converted into some other form. At the lowest level, bits are stored in a bistable device such as a flip-flop. While most binary data has
symbol A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different con ...
ic meaning (except for
don't care In digital logic, a don't-care term (abbreviated DC, historically also known as ''redundancies'', ''irrelevancies'', ''optional entries'', ''invalid combinations'', ''vacuous combinations'', ''forbidden combinations'', ''unused states'' or ''lo ...
s) not all binary data is numeric. Some binary data corresponds to computer instructions, such as the data within
processor register A processor register is a quickly accessible location available to a computer's processor. Registers usually consist of a small amount of fast storage, although some registers have specific hardware functions, and may be read-only or write-only ...
s decoded by the
control unit The control unit (CU) is a component of a computer's central processing unit (CPU) that directs the operation of the processor. A CU typically uses a binary decoder to convert coded instructions into timing and control signals that direct the ope ...
along the fetch-decode-execute cycle. Computers rarely modify individual bits for performance reasons. Instead, data is aligned in groups of a fixed number of bits, usually 1
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
(8 bits). Hence, "binary data" in computers are actually sequences of bytes. On a higher level, data is accessed in groups of 1
word A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
(4 bytes) for
32-bit In computer architecture, 32-bit computing refers to computer systems with a processor, memory, and other major system components that operate on data in 32-bit units. Compared to smaller bit widths, 32-bit computers can perform large calculati ...
systems and 2 words for
64-bit In computer architecture, 64-bit integers, memory addresses, or other data units are those that are 64 bits wide. Also, 64-bit CPUs and ALUs are those that are based on processor registers, address buses, or data buses of that size. A comput ...
systems. In applied computer science and in the
information technology Information technology (IT) is the use of computers to create, process, store, retrieve, and exchange all kinds of data . and information. IT forms part of information and communications technology (ICT). An information technology system (I ...
field, the term ''binary data'' is often specifically opposed to ''text-based data'', referring to any sort of data that cannot be interpreted as text. The "text" vs. "binary" distinction can sometimes refer to the semantic content of a file (e.g. a written document vs. a
digital image A digital image is an image composed of picture elements, also known as ''pixels'', each with ''finite'', '' discrete quantities'' of numeric representation for its intensity or gray level that is an output from its two-dimensional functions f ...
). However, it often refers specifically to whether the individual bytes of a file are interpretable as text (see
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
) or cannot so be interpreted. When this last meaning is intended, the more specific terms ''binary format'' and ''text(ual) format'' are sometimes used. Semantically textual data can be represented in binary format (e.g. when compressed or in certain formats that intermix various sorts of formatting codes, as in the DOC format used by
Microsoft Word Microsoft Word is a word processor, word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other pla ...
); contrarily, image data is sometimes represented in textual format (e.g. the
X PixMap X PixMap (XPM) is an image file format used by the X Window System, created in 1989 by Daniel Dardailler and Colas Nahaboo working at Bull Research Center at Sophia Antipolis, France, and later enhanced by Arnaud Le Hors. It is intended primaril ...
image format used in the
X Window System The X Window System (X11, or simply X) is a windowing system for bitmap displays, common on Unix-like operating systems. X provides the basic framework for a GUI environment: drawing and moving windows on the display device and interacting wi ...
).


See also

* Bit array *
Bernoulli distribution In probability theory and statistics, the Bernoulli distribution, named after Swiss mathematician Jacob Bernoulli,James Victor Uspensky: ''Introduction to Mathematical Probability'', McGraw-Hill, New York 1937, page 45 is the discrete probabili ...
*
Boolean data type In computer science, the Boolean (sometimes shortened to Bool) is a data type that has one of two possible values (usually denoted ''true'' and ''false'') which is intended to represent the two truth values of logic and Boolean algebra. It is name ...
*
Computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term ''primary storage ...
* Categorical data *
Qualitative data Qualitative properties are properties that are observed and can generally not be measured with a numerical result. They are contrasted to quantitative properties which have numerical characteristics. Some engineering and scientific properties are ...


References

* {{refend Statistical data types