(/ˈdeɪtə/ DAY-tə, /ˈdætə/ DAT-ə, /ˈdɑːtə/ DAH-tə)
is a set of values of qualitative or quantitative variables.
and information are often used interchangeably; however, the
extent to which a set of data is informative to someone depends on the
extent to which it is unexpected by that person. The amount of
information content in a data stream may be characterized by its
While the concept of data is commonly associated with scientific
research, data is collected by a huge range of organizations and
institutions, including businesses (e.g., sales data, revenue,
profits, stock price), governments (e.g., crime rates, unemployment
rates, literacy rates) and non-governmental organizations (e.g.,
censuses of the number of homeless people by non-profit
is measured, collected and reported, and analyzed, whereupon it
can be visualized using graphs, images or other analysis tools. Data
as a general concept refers to the fact that some existing information
or knowledge is represented or coded in some form suitable for better
usage or processing.
("unprocessed data") is a collection of
numbers or characters before it has been "cleaned" and corrected by
needs to be corrected to remove outliers or
obvious instrument or data entry errors (e.g., a thermometer reading
from an outdoor Arctic location recording a tropical temperature).
commonly occurs by stages, and the "processed data"
from one stage may be considered the "raw data" of the next stage.
Field data is raw data that is collected in an uncontrolled "in situ"
is data that is generated within the
context of a scientific investigation by observation and recording.
has been described as the new oil of the digital economy.
1 Etymology and terminology
3 In other fields
4 See also
6 External links
Etymology and terminology
The first English use of the word "data" is from the 1640s. Using the
word "data" to mean "transmittable and storable computer information"
was first done in 1946. The expression "data processing" was first
used in 1954.
The Latin word data is the plural of datum, "(thing) given," neuter
past participle of dare "to give".
Data may be used as a plural
noun in this sense, with some writers -- usually scientific writers --
in the 20th century using datum in the singular and data for plural.
However, in non-specialist, everyday writing, "data" is most commonly
used in the singular, as a mass noun (like "information", "sand" or
Data, information, knowledge and wisdom are closely related concepts,
but each has its own role in relation to the other, and each term has
its own meaning. According to a common view, data is collected and
analyzed; data only becomes information suitable for making decisions
once it has been analyzed in some fashion. 
Knowledge is derived
from extensive amounts of experience dealing with information on a
subject. For example, the height of
Mount Everest is generally
considered data. The height can be recorded precisely with an
altimeter and entered into a database. This data may be included in a
book along with other data on
Mount Everest to describe the mountain
in a manner useful for those who wish to make a decision about the
best method to climb it. Using an understanding based on experience
climbing mountains to advise persons on the way to reach Mount
Everest's peak may be seen as "knowledge". Some complement the series
"data", "information" and "knowledge" with "wisdom", which would mean
the status of a person in possession of a certain "knowledge" who also
knows under which circumstances is good to use it.
Data is often assumed to be the least abstract concept, information
the next least, and knowledge the most abstract. In this view, data
becomes information by interpretation; e.g., the height of Mount
Everest is generally considered "data", a book on Mount Everest
geological characteristics may be considered "information", and a
climber's guidebook containing practical information on the best way
to reach Mount Everest's peak may be considered "knowledge".
"Information" bears a diversity of meanings that ranges from everyday
usage to technical use. This view, however, has also been argued to
provide an upside-down model of the relation between data,
information, and knowledge. Generally speaking, the concept of
information is closely related to notions of constraint,
communication, control, data, form, instruction, knowledge, meaning,
mental stimulus, pattern, perception, and representation.
Beynon-Davies uses the concept of a sign to differentiate between data
and information; data is a series of symbols, while information occurs
when the symbols are used to refer to something.
Before the development of computing devices and machines, only people
could collect data and impose patterns on it. Since the development of
computing devices and machines, these devices can also collect data.
In the 2010s, computers are widely used in many fields to collect data
and sort or process it, in disciplines ranging from marketing,
analysis of social services usage by citizens to scientific research.
These patterns in data are seen as information which can be used to
enhance knowledge. These patterns may be interpreted as "truth"
(though "truth" can be a subjective concept), and may be authorized as
aesthetic and ethical criteria in some disciplines or cultures. Events
that leave behind perceivable physical or virtual remains can be
traced back through data. Marks are no longer considered data once the
link between the mark and observation is broken.
Mechanical computing devices are classified according to the means by
which they represent data. An analog computer represents a datum as a
voltage, distance, position, or other physical quantity. A digital
computer represents a piece of data as a sequence of symbols drawn
from a fixed alphabet. The most common digital computers use a binary
alphabet, that is, an alphabet of two characters, typically denoted
"0" and "1". More familiar representations, such as numbers or
letters, are then constructed from the binary alphabet. Some special
forms of data are distinguished. A computer program is a collection of
data, which can be interpreted as instructions. Most computer
languages make a distinction between programs and the other data on
which programs operate, but in some languages, notably Lisp and
similar languages, programs are essentially indistinguishable from
other data. It is also useful to distinguish metadata, that is, a
description of other data. A similar yet earlier term for metadata is
"ancillary data." The prototypical example of metadata is the library
catalog, which is a description of the contents of books.
Gathering data can be accomplished through a primary source (the
researcher is the first person to obtain the data) or a secondary
source (the researcher obtains the data that has already been
collected by other sources, such as data disseminated in a scientific
Data analysis methodologies vary and include data
triangulation and data percolation . The latter offers an
articulate method of collecting, classifying and analyzing data using
five possible angles of analysis (at least three) in order to maximize
the research's objectivity and permit an understanding of the
phenomena under investigation as complete as possible: qualitative and
quantitative methods, literature reviews (including scholarly
articles), interviews with experts, and computer simulation. The data
are thereafter "percolated" using a series of pre-determined steps so
as to extract the most relevant information.
In other fields
Though data is also increasingly used in other fields, it has been
suggested that the highly interpretive nature of them might be at odds
with the ethos of data as "given". Peter Checkland introduced the term
capta (from the Latin capere, “to take”) to distinguish between an
immense number of possible data and a sub-set of them, to which
attention is oriented.
Johanna Drucker has argued that since the
humanities affirm knowledge production as "situated, partial, and
constitutive," using data may introduce assumptions that are
counterproductive, for example that phenomena are discrete or are
observer-independent. The term capta, which emphasizes the act of
observation as constitutive, is offered as an alternative to data for
visual representations in the humanities.
Computer data processing
Environmental data rescue
Scientific data archiving
This article is based on material taken from the Free On-line
Dictionary of Computing prior to 1 November 2008 and incorporated
under the "relicensing" terms of the GFDL, version 1.3 or later.
^ The pronunciation /ˈdeɪtə/ DAY-tə is widespread throughout most
varieties of English. The pronunciation /ˈdætə/ DAT-ə is chiefly
Irish and North American. The pronunciation /ˈdɑːtə/ DAH-tə is
chiefly Australian, New Zealand, and South African. Each pronunciation
may be realized differently depending on the dialect/language of the
Data Is the New Oil of the Digital Economy
Data is the new Oil
^ a b http://www.etymonline.com/index.php?term=data
^ Hickey, Walt (2014-06-17). "Elitist, Superfluous, Or Popular? We
Polled Americans on the Oxford Comma". FiveThirtyEight. Retrieved
^ "Joint Publication 2-0, Joint Intelligence" (PDF). Defense Technical
Information Center (DTIC). Department of Defense. 22 June 2007.
pp. GL–11. Retrieved February 22, 2013.
^ Akash Mitra (2011). "Classifying data for successful
^ Tuomi, Ilkka (2000). "
Data is more than knowledge". Journal of
Information Systems. 6 (3): 103–117.
^ P. Beynon-Davies (2002).
Information Systems: An introduction to
informatics in organisations. Basingstoke, UK: Palgrave Macmillan.
^ P. Beynon-Davies (2009). Business information systems. Basingstoke,
UK: Palgrave. ISBN 978-0-230-20368-6.
^ Sharon Daniel. The Database: An Aesthetics of Dignity.
^ Mesly, Olivier (2015). Creating Models in Psychological Research.
États-Unis : Springer Psychology : 126 pages.
^ P. Checkland and S. Holwell (1998). Information, Systems, and
Information Systems: Making Sense of the Field. Chichester, West
Sussex: John Wiley & Sons. pp. 86–89.
Johanna Drucker (2011). "Humanities Approaches to Graphical
Look up data in Wiktionary, the free dictionary.
Data is a singular noun (a detailed assessment)
Coefficient of variation
Central limit theorem
Index of dispersion
Pearson product-moment correlation
Sample size determination
Method of moments
1- & 2-tails
Uniformly most powerful test
Goodness of fit
Signed rank (Wilcoxon)
Rank sum (Mann–Whitney)
Ordered alternative (Jonckheere–Terpstra)
Maximum posterior estimator
Coefficient of determination
Errors and residuals
Regression model validation
Mixed effects models
Simultaneous equations models
Multivariate adaptive regression splines (MARS)
Simple linear regression
Ordinary least squares
General linear model
Generalized linear model
Logistic (Bernoulli) / Binomial / Poisson regressions
Partition of variance
Analysis of variance
Analysis of variance (ANOVA, anova)
Analysis of covariance
Degrees of freedom
Categorical / Multivariate / Time-series / Survival
Structural equation model
ARIMA model (Box–Jenkins)
Autoregressive conditional heteroskedasticity (ARCH)
Vector autoregression (VAR)
Spectral density estimation
Kaplan–Meier estimator (product limit)
Proportional hazards models
Accelerated failure time (AFT) model
First hitting time
Clinical trials / studies
Process / quality control
Geographic information system