Data
Data (/ˈdeɪtə/ DAY-tə, /ˈdætə/ DAT-ə, /ˈdɑːtə/ DAH-tə)[1] is a set of values of qualitative or quantitative variables. Data
Data and information are often used interchangeably; however, the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information content in a data stream may be characterized by its Shannon entropy. While the concept of data is commonly associated with scientific research, data is collected by a huge range of organizations and institutions, including businesses (e.g., sales data, revenue, profits, stock price), governments (e.g., crime rates, unemployment rates, literacy rates) and non-governmental organizations (e.g., censuses of the number of homeless people by non-profit organizations). Data
Data is measured, collected and reported, and analyzed, whereupon it can be visualized using graphs, images or other analysis tools. Data as a general concept refers to the fact that some existing information or knowledge is represented or coded in some form suitable for better usage or processing. Raw data
Raw data ("unprocessed data") is a collection of numbers or characters before it has been "cleaned" and corrected by researchers. Raw data
Raw data needs to be corrected to remove outliers or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data"
from one stage may be considered the "raw data" of the next stage.
Field data is raw data that is collected in an uncontrolled "in situ"
environment.
Experimental data is data that is generated within the
context of a scientific investigation by observation and recording.
Data
Data has been described as the new oil of the digital economy.[2][3] Contents 1 Etymology and terminology 2 Meaning 3 In other fields 4 See also 5 References 6 External links Etymology and terminology[edit]
The first English use of the word "data" is from the 1640s. Using the
word "data" to mean "transmittable and storable computer information"
was first done in 1946. The expression "data processing" was first
used in 1954.[4]
The Latin word data is the plural of datum, "(thing) given," neuter
past participle of dare "to give".[4]
Biological data
References[edit] This article is based on material taken from the Free On-line Dictionary of Computing prior to 1 November 2008 and incorporated under the "relicensing" terms of the GFDL, version 1.3 or later. ^ The pronunciation /ˈdeɪtə/ DAY-tə is widespread throughout most
varieties of English. The pronunciation /ˈdætə/ DAT-ə is chiefly
Irish and North American. The pronunciation /ˈdɑːtə/ DAH-tə is
chiefly Australian, New Zealand, and South African. Each pronunciation
may be realized differently depending on the dialect/language of the
speaker.
^
External links[edit] Look up data in Wiktionary, the free dictionary.
v t e Statistics Outline Index Descriptive statistics Continuous data Center Mean arithmetic geometric harmonic Median Mode Dispersion Variance Standard deviation Coefficient of variation Percentile Range Interquartile range Shape Central limit theorem Moments Skewness Kurtosis L-moments Count data Index of dispersion Summary tables Grouped data Frequency distribution Contingency table Dependence Pearson product-moment correlation Rank correlation Spearman's rho Kendall's tau Partial correlation Scatter plot Graphics Bar chart Biplot Box plot Control chart Correlogram Fan chart Forest plot Histogram Pie chart Q–Q plot Run chart Scatter plot Stem-and-leaf display Radar chart
Study design Population Statistic Effect size Statistical power Sample size determination Missing data Survey methodology Sampling stratified cluster Standard error Opinion poll Questionnaire Controlled experiments Design control optimal Controlled trial Randomized Random assignment Replication Blocking Interaction Factorial experiment Uncontrolled studies Observational study Natural experiment Quasi-experiment Statistical inference Statistical theory Population Statistic Probability distribution Sampling distribution Order statistic Empirical distribution Density estimation Statistical model Lp space Parameter location scale shape Parametric family Likelihood (monotone) Location–scale family Exponential family Completeness Sufficiency Statistical functional Bootstrap U V Optimal decision loss function Efficiency Statistical distance divergence Asymptotics Robustness Frequentist inference Point estimation Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem
Plug-in Interval estimation Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife Testing hypotheses 1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons Parametric tests Likelihood-ratio Wald Score Specific tests
Goodness of fit Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality (Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC Rank statistics Sign Sample median Signed rank (Wilcoxon) Hodges–Lehmann estimator Rank sum (Mann–Whitney) Nonparametric anova 1-way (Kruskal–Wallis) 2-way (Friedman) Ordered alternative (Jonckheere–Terpstra) Bayesian inference Bayesian probability prior posterior Credible interval Bayes factor Bayesian estimator Maximum posterior estimator Correlation Regression analysis Correlation Pearson product-moment
Partial correlation
Regression analysis Errors and residuals
Regression model validation
Mixed effects models
Simultaneous equations models
Linear regression Simple linear regression Ordinary least squares General linear model Bayesian regression Non-standard predictors Nonlinear regression Nonparametric Semiparametric Isotonic Robust Heteroscedasticity Homoscedasticity Generalized linear model Exponential families Logistic (Bernoulli) / Binomial / Poisson regressions Partition of variance
Categorical / Multivariate / Time-series / Survival analysis Categorical Cohen's kappa Contingency table Graphical model Log-linear model McNemar's test Multivariate Regression Manova Principal components Canonical correlation Discriminant analysis Cluster analysis Classification Structural equation model Factor analysis Multivariate distributions Elliptical distributions Normal Time-series General Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality Specific tests Dickey–Fuller Johansen Q-statistic (Ljung–Box) Durbin–Watson Breusch–Godfrey Time domain
partial (PACF)
Frequency domain Spectral density estimation Fourier analysis Wavelet Whittle likelihood Survival Survival function
Hazard function Nelson–Aalen estimator Test Log-rank test Applications Biostatistics Bioinformatics Clinical trials / studies Epidemiology Medical statistics Engineering statistics Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification Social statistics Actuarial science Census Crime statistics Demography Econometrics National accounts Official statistics Population statistics Psychometrics Spatial statistics Cartography Environmental statistics Geographic information system Geostatistics Kriging Category Portal Commons |