The Info List - Data

--- Advertisement ---

(/ˈdeɪtə/ DAY-tə, /ˈdætə/ DAT-ə, /ˈdɑːtə/ DAH-tə)[1] is a set of values of qualitative or quantitative variables. Data
and information are often used interchangeably; however, the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information content in a data stream may be characterized by its Shannon entropy. While the concept of data is commonly associated with scientific research, data is collected by a huge range of organizations and institutions, including businesses (e.g., sales data, revenue, profits, stock price), governments (e.g., crime rates, unemployment rates, literacy rates) and non-governmental organizations (e.g., censuses of the number of homeless people by non-profit organizations). Data
is measured, collected and reported, and analyzed, whereupon it can be visualized using graphs, images or other analysis tools. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing. Raw data
Raw data
("unprocessed data") is a collection of numbers or characters before it has been "cleaned" and corrected by researchers. Raw data
Raw data
needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next stage. Field data is raw data that is collected in an uncontrolled "in situ" environment. Experimental data is data that is generated within the context of a scientific investigation by observation and recording. Data
has been described as the new oil of the digital economy.[2][3]


1 Etymology and terminology 2 Meaning 3 In other fields 4 See also 5 References 6 External links

Etymology and terminology[edit] The first English use of the word "data" is from the 1640s. Using the word "data" to mean "transmittable and storable computer information" was first done in 1946. The expression "data processing" was first used in 1954.[4] The Latin word data is the plural of datum, "(thing) given," neuter past participle of dare "to give".[4] Data
may be used as a plural noun in this sense, with some writers -- usually scientific writers -- in the 20th century using datum in the singular and data for plural. However, in non-specialist, everyday writing, "data" is most commonly used in the singular, as a mass noun (like "information", "sand" or "rain").[5] Meaning[edit] Data, information, knowledge and wisdom are closely related concepts, but each has its own role in relation to the other, and each term has its own meaning. According to a common view, data is collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion. [6] Knowledge
is derived from extensive amounts of experience dealing with information on a subject. For example, the height of Mount Everest
Mount Everest
is generally considered data. The height can be recorded precisely with an altimeter and entered into a database. This data may be included in a book along with other data on Mount Everest
Mount Everest
to describe the mountain in a manner useful for those who wish to make a decision about the best method to climb it. Using an understanding based on experience climbing mountains to advise persons on the way to reach Mount Everest's peak may be seen as "knowledge". Some complement the series "data", "information" and "knowledge" with "wisdom", which would mean the status of a person in possession of a certain "knowledge" who also knows under which circumstances is good to use it. Data
is often assumed to be the least abstract concept, information the next least, and knowledge the most abstract.[7] In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that ranges from everyday usage to technical use. This view, however, has also been argued to provide an upside-down model of the relation between data, information, and knowledge.[8] Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. Beynon-Davies uses the concept of a sign to differentiate between data and information; data is a series of symbols, while information occurs when the symbols are used to refer to something.[9][10] Before the development of computing devices and machines, only people could collect data and impose patterns on it. Since the development of computing devices and machines, these devices can also collect data. In the 2010s, computers are widely used in many fields to collect data and sort or process it, in disciplines ranging from marketing, analysis of social services usage by citizens to scientific research. These patterns in data are seen as information which can be used to enhance knowledge. These patterns may be interpreted as "truth" (though "truth" can be a subjective concept), and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken.[11] Mechanical computing devices are classified according to the means by which they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A digital computer represents a piece of data as a sequence of symbols drawn from a fixed alphabet. The most common digital computers use a binary alphabet, that is, an alphabet of two characters, typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A computer program is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably Lisp and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish metadata, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.

collection Gathering data can be accomplished through a primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis
Data analysis
methodologies vary and include data triangulation and data percolation [12]. The latter offers an articulate method of collecting, classifying and analyzing data using five possible angles of analysis (at least three) in order to maximize the research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data are thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information. In other fields[edit] Though data is also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given". Peter Checkland introduced the term capta (from the Latin capere, “to take”) to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented.[13] Johanna Drucker
Johanna Drucker
has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using data may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent.[14] The term capta, which emphasizes the act of observation as constitutive, is offered as an alternative to data for visual representations in the humanities. See also[edit]

Biological data Data
acquisition Data
analysis Data
cable Data
curation Dark data Data
domain Data
element Data
farming Data
governance Data
integrity Data
maintenance Data
management Data
mining Data
modeling Data
visualization Computer
data processing Data
preservation Data
publication Data
protection Data
remanence Data
set Data
warehouse Database Datasheet Environmental data rescue Fieldwork Open data Scientific data archiving Statistics Computer
memory Data
structure Secondary Data Data

References[edit] This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later.

^ The pronunciation /ˈdeɪtə/ DAY-tə is widespread throughout most varieties of English. The pronunciation /ˈdætə/ DAT-ə is chiefly Irish and North American. The pronunciation /ˈdɑːtə/ DAH-tə is chiefly Australian, New Zealand, and South African. Each pronunciation may be realized differently depending on the dialect/language of the speaker. ^ Data
Is the New Oil of the Digital Economy ^ Data
is the new Oil ^ a b http://www.etymonline.com/index.php?term=data ^ Hickey, Walt (2014-06-17). "Elitist, Superfluous, Or Popular? We Polled Americans on the Oxford Comma". FiveThirtyEight. Retrieved 2015-05-04.  ^ "Joint Publication 2-0, Joint Intelligence" (PDF). Defense Technical Information
Center (DTIC). Department of Defense. 22 June 2007. pp. GL–11. Retrieved February 22, 2013.  ^ Akash Mitra (2011). "Classifying data for successful modeling".  ^ Tuomi, Ilkka (2000). " Data
is more than knowledge". Journal of Management Information
Systems. 6 (3): 103–117. doi:10.1080/07421222.1999.11518258.  ^ P. Beynon-Davies (2002). Information
Systems: An introduction to informatics in organisations. Basingstoke, UK: Palgrave Macmillan. ISBN 0-333-96390-3.  ^ P. Beynon-Davies (2009). Business information systems. Basingstoke, UK: Palgrave. ISBN 978-0-230-20368-6.  ^ Sharon Daniel. The Database: An Aesthetics of Dignity.  ^ Mesly, Olivier (2015). Creating Models in Psychological Research. États-Unis : Springer Psychology  : 126 pages. ISBN 978-3-319-15752-8 ^ P. Checkland and S. Holwell (1998). Information, Systems, and Information
Systems: Making Sense of the Field. Chichester, West Sussex: John Wiley & Sons. pp. 86–89. ISBN 0-471-95820-4.  ^ Johanna Drucker
Johanna Drucker
(2011). "Humanities Approaches to Graphical Display". 

External links[edit]

Look up data in Wiktionary, the free dictionary.

is a singular noun (a detailed assessment)

v t e


Outline Index

Descriptive statistics

Continuous data



arithmetic geometric harmonic

Median Mode


Variance Standard deviation Coefficient of variation Percentile Range Interquartile range


Central limit theorem Moments

Skewness Kurtosis L-moments

Count data

Index of dispersion

Summary tables

Grouped data Frequency distribution Contingency table


Pearson product-moment correlation Rank correlation

Spearman's rho Kendall's tau

Partial correlation Scatter plot


Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Run chart Scatter plot Stem-and-leaf display Radar chart


Study design

Population Statistic Effect size Statistical power Sample size determination Missing data

Survey methodology


stratified cluster

Standard error Opinion poll Questionnaire

Controlled experiments


control optimal

Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment

Uncontrolled studies

Observational study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Population Statistic Probability distribution Sampling distribution

Order statistic

Empirical distribution

Density estimation

Statistical model

Lp space


location scale shape

Parametric family

Likelihood (monotone) Location–scale family Exponential family

Completeness Sufficiency Statistical functional

Bootstrap U V

Optimal decision

loss function

Efficiency Statistical distance


Asymptotics Robustness

Frequentist inference

Point estimation

Estimating equations

Maximum likelihood Method of moments M-estimator Minimum distance

Unbiased estimators

Mean-unbiased minimum-variance

Rao–Blackwellization Lehmann–Scheffé theorem



Interval estimation

Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling

Bootstrap Jackknife

Testing hypotheses

1- & 2-tails Power

Uniformly most powerful test

Permutation test

Randomization test

Multiple comparisons

Parametric tests

Likelihood-ratio Wald Score

Specific tests

Z-test (normal) Student's t-test F-test

Goodness of fit

Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection

Cross validation AIC BIC

Rank statistics


Sample median

Signed rank (Wilcoxon)

Hodges–Lehmann estimator

Rank sum (Mann–Whitney) Nonparametric anova

1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra)

Bayesian inference

Bayesian probability

prior posterior

Credible interval Bayes factor Bayesian estimator

Maximum posterior estimator

Correlation Regression analysis


Pearson product-moment Partial correlation Confounding
variable Coefficient of determination

Regression analysis

Errors and residuals Regression model validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)

Linear regression

Simple linear regression Ordinary least squares General linear model Bayesian regression

Non-standard predictors

Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity

Generalized linear model

Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions

Partition of variance

Analysis of variance
Analysis of variance
(ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / Multivariate / Time-series / Survival analysis


Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test


Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model

Factor analysis

Multivariate distributions

Elliptical distributions




Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality

Specific tests

Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey

Time domain


partial (PACF)

(XCF) ARMA model ARIMA model (Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR)

Frequency domain

Spectral density estimation Fourier analysis Wavelet Whittle likelihood


Survival function

Kaplan–Meier estimator
Kaplan–Meier estimator
(product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time

Hazard function

Nelson–Aalen estimator


Log-rank test



Bioinformatics Clinical trials / studies Epidemiology Medical statistics

Engineering statistics

Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification

Social statistics

Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics

Spatial statistics

Cartography Environmental statistics Geographic information system Geostatistics Kriging

Category Portal Co