TheInfoList

Data (; ) are individual
facts A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to experience. Standard reference works are often used to check facts. Science, Scientific ...

,
statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical ...

, or items of
information Information is processed, organised and structured data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quant ...

, often numeric. In a more technical sense, data are a set of values of qualitative or quantitative variables about one or more persons or objects, while a datum (singular of ''data'') is a single value of a single variable. Although the terms "data" and "information" are often used interchangeably, this term has distinct meanings. In some popular publications, data are sometimes said to be transformed into information when they are viewed in context or in post-analysis. However, in academic treatments of the subject data are simply units of information. Data are used in
scientific research The scientific method is an Empirical evidence, empirical method of acquiring knowledge that has characterized the development of science since at least the 17th century. It involves careful observation, applying rigorous skepticism about what ...
, businesses management (e.g., sales data, revenue, profits,
stock price A share price is the price of a single Share (finance), share of a number of saleable share capital, equity shares of a company. In layman's terms, the stock price is the highest amount someone is willing to pay for the stock, or the lowest amount ...
),
finance Finance is the study of financial institutions, financial markets and how they operate within the financial system. It is concerned with the creation and management of money and investments. Savers and investors have money available which could ...

, governance (e.g.,
crime rate Crime statistics refer to systematic, quantitative results about crime, as opposed to crime news or anecdotes. Notably, crime statistics can be the result of two rather different processes: * scientific research, such as criminological studies, vict ...
s,
unemployment rate Unemployment, according to the (Organisation for Economic Co-operation and Development), is people above a specified age (usually 15) not being in paid or but currently available for work during the . Unemployment is measured by the unemplo ...

s,
literacy Literacy is popularly understood as an ability to read and write Writing is a medium of human communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (ph ...
rates), and in virtually every other form of human organizational activity (e.g., censuses of the number of
homeless people Homelessness is lacking stable and appropriate housing. People can be categorized as homeless if they are: living on the streets (primary homelessness); moving between temporary shelters, including houses of friends, family and emergency accomm ...

by non-profit organizations). Data are , collected, reported, and analyzed, and used to create data visualizations such as graphs, tables or images. Data as a general
concept Concepts are defined as abstract ideas A mental representation (or cognitive representation), in philosophy of mind Philosophy of mind is a branch of philosophy that studies the ontology and nature of the mind and its relationship with the bod ...

refers to the fact that some existing
information Information is processed, organised and structured data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quant ...

or
knowledge Knowledge is a familiarity or awareness, of someone or something, such as facts A fact is an occurrence in the real world. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to exp ...
is '' represented'' or ''
code In communication Communication (from Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken in the area around Rome, known as Latium. ...

d'' in some form suitable for better usage or . ''
Raw data Raw data, also known as primary data, are ''data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quantity, quant ...
'' ("unprocessed data") is a collection of
numbers A number is a mathematical object A mathematical object is an abstract concept arising in mathematics. In the usual language of mathematics, an ''object'' is anything that has been (or could be) formally defined, and with which one may do deduc ...

or
characters Character(s) may refer to: Arts, entertainment, and media Literature * Character (novel), ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * Characters (Theophrastus), ''Characters'' (Theophrastus), a classical Greek set of char ...
before it has been "cleaned" and corrected by researchers. Raw data needs to be corrected to remove
outliers Figure 1. Box plot of data from the Michelson–Morley experiment displaying four outliers in the middle column, as well as one outlier in the first column. In statistics, an outlier is a data point that differs significantly from other observ ...

or obvious instrument or data entry errors (e.g., a thermometer reading from an outdoor Arctic location recording a tropical temperature). Data processing commonly occurs by stages, and the "processed data" from one stage may be considered the "raw data" of the next stage. is raw data that is collected in an uncontrolled "
in situ ''In situ'' (; often not italicized in English) is a Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken in the area around Rome, known as ...

" environment.
Experimental data Experimental data in science Science (from the Latin word ''scientia'', meaning "knowledge") is a systematic enterprise that Scientific method, builds and Taxonomy (general), organizes knowledge in the form of Testability, testable explanatio ...
is data that is generated within the context of a scientific investigation by observation and recording. Data has been described as the new
oil An oil is any nonpolar chemical substance A chemical substance is a form of matter In classical physics and general chemistry, matter is any substance that has mass and takes up space by having volume. All everyday objects that can b ...

of the
digital economy Digital economy refers to an economy that is based on digital computing technologies, but is often perceived as conducting business through markets based on the internet and the World Wide Web. The digital economy is also referred to as the ''Inte ...
.

# Etymology and terminology

The first English use of the word "data" is from the 1640s. The word "data" was first used to mean "transmissible and storable computer information" in 1946. The expression "data processing" was first used in 1954. The Latin word ''data'' is the plural of ' datum', "(thing) given," neuter past participle of ''dare'' "to give". In English the word ''data'' may be used as a plural noun in this sense, with some writers—usually, those working in natural sciences, life sciences, and social sciences—using ''datum'' in the singular and ''data'' for plural, especially in the 20th century and in many cases also the 21st (for example,
APA style APA style is a writing style and format for academic documents such as scholarly journal articles and books. It is commonly used for citing sources within the field of behavioral and social sciences. It is described in the style guide of the ...
as of the 7th edition still requires "data" to be plural.). However, in everyday language and much of the usage of
software development Software development is the process of conceiving, specifying, designing, , , , and involved in creating and maintaining , , or other software components. development involves writing and the , but in a broader sense, it includes all processe ...
and
computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of , , and . Computer science ...
, "data" is most commonly used in the singular as a
mass noun In linguistics Linguistics is the scientific study of language A language is a structured system of communication used by humans, including speech (spoken language), gestures (Signed language, sign language) and writing. Most langua ...
(like "sand" or "rain"). The term ''
big data Big data is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data processing, data-processing application software. Data ...

'' takes the singular.

# Meaning

Data,
information Information is processed, organised and structured data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quant ...

,
knowledge Knowledge is a familiarity or awareness, of someone or something, such as facts A fact is an occurrence in the real world. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to exp ...
, and
wisdom Wisdom, sapience, or sagacity is the ability to contemplate and act using knowledge Knowledge is a familiarity or awareness, of someone or something, such as facts A fact is an occurrence in the real world. The usual test for a stateme ...

are closely related concepts, but each has its role concerning the other, and each term has its meaning. According to a common view, data are collected and analyzed; data only becomes information suitable for making decisions once it has been analyzed in some fashion. One can say that the extent to which a set of data is informative to someone depends on the extent to which it is unexpected by that person. The amount of information contained in a data stream may be characterized by its
Shannon entropy Shannon may refer to: * Shannon (given name) Shannon ("old river") is an Irish language, Irish name, Anglicised from Sionainn. Alternative spellings include Shannen, Shanon, Shannan, Seanan, and Siannon. The variant Shanna is an Anglicisation of ' ...
.
Knowledge Knowledge is a familiarity or awareness, of someone or something, such as facts A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated to correspond to e ...
is the understanding based on extensive experience dealing with information on a subject. For example, the height of
Mount Everest Mount Everest (Chinese Chinese can refer to: * Something related to China China, officially the People's Republic of China (PRC), is a country in East Asia. It is the List of countries and dependencies by population, world's m ...

is generally considered data. The height can be measured precisely with an
altimeter An altimeter or an altitude meter is an instrument used to measure the altitude Altitude or height (also sometimes known as depth) is a distance measurement, usually in the vertical or "up" direction, between a reference datum Data are uni ...

and entered into a database. This data may be included in a book along with other data on Mount Everest to describe the mountain in a manner useful for those who wish to decide on the best method to climb it. An understanding based on experience climbing mountains that could advise persons on the way to reach Mount Everest's peak may be seen as "knowledge". The practical climbing of Mount Everest's peak based on this knowledge may be seen as "wisdom". In other words, wisdom refers to the practical application of a person's knowledge in those circumstances where good may result. Thus wisdom complements and completes the series "data", "information" and "knowledge" of increasingly abstract concepts. Data are often assumed to be the least abstract concept, information the next least, and knowledge the most abstract. In this view, data becomes information by interpretation; e.g., the height of Mount Everest is generally considered "data", a book on Mount Everest geological characteristics may be considered "information", and a climber's guidebook containing practical information on the best way to reach Mount Everest's peak may be considered "knowledge". "Information" bears a diversity of meanings that ranges from everyday usage to technical use. This view, however, has also been argued to reverse how data emerges from information, and information from knowledge. Generally speaking, the concept of information is closely related to notions of constraint, communication, control, data, form, instruction, knowledge, meaning, mental stimulus, pattern, perception, and representation. Beynon-Davies uses the concept of a
sign A sign is an object Object may refer to: General meanings * Object (philosophy), a thing, being, or concept ** Entity, something that is tangible and within the grasp of the senses ** Object (abstract), an object which does not exist at ...

to differentiate between data and information; data are a series of symbols, while information occurs when the symbols are used to refer to something. Before the development of computing devices and machines, people had to manually collect data and impose patterns on it. Since the development of computing devices and machines, these devices can also collect data. In the 2010s, computers are widely used in many fields to collect data and sort or process it, in disciplines ranging from
marketing Marketing is the process of intentionally stimulating demand for and purchases of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to emphasize in advertising; operation of adv ...

, analysis of
social servicesSocial services are a range of public services provided by the government, private, profit and non-profit organizations. These public services aim to create more effective organizations, build stronger communities, and promote equality and opportunit ...
usage by citizens to scientific research. These patterns in data are seen as information that can be used to enhance knowledge. These patterns may be interpreted as "
truth Truth is the property of being in accord with fact A fact is something that is true True most commonly refers to truth Truth is the property of being in accord with fact or reality.Merriam-Webster's Online Dictionarytruth 2005 In ...

" (though "truth" can be a subjective concept) and may be authorized as aesthetic and ethical criteria in some disciplines or cultures. Events that leave behind perceivable physical or virtual remains can be traced back through data. Marks are no longer considered data once the link between the mark and observation is broken. Mechanical computing devices are classified according to how they represent data. An analog computer represents a datum as a voltage, distance, position, or other physical quantity. A
digital computer A computer is a machine A machine is a man-made device that uses power to apply forces and control movement to perform an action. Machines can be driven by animals and people A people is a plurality of person A person (plural ...

represents a piece of data as a sequence of symbols drawn from a fixed
alphabet An alphabet is a standardized set of basic written symbols A symbol is a mark, sign, or word In linguistics, a word of a spoken language can be defined as the smallest sequence of phonemes that can be uttered in isolation with semantic ...

. The most common digital computers use a binary alphabet, that is, an alphabet of two characters typically denoted "0" and "1". More familiar representations, such as numbers or letters, are then constructed from the binary alphabet. Some special forms of data are distinguished. A
computer program In imperative programming In computer science, imperative programming is a programming paradigm that uses Statement (computer science), statements that change a program's state (computer science), state. In much the same way that the imperative mo ...
is a collection of data, which can be interpreted as instructions. Most computer languages make a distinction between programs and the other data on which programs operate, but in some languages, notably
Lisp Lisp (historically LISP) is a family of programming language A programming language is a formal language comprising a Instruction set architecture, set of instructions that produce various kinds of Input/output, output. Programming languages ...
and similar languages, programs are essentially indistinguishable from other data. It is also useful to distinguish
metadata Metadata is " that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata — the descriptive info ...

, that is, a description of other data. A similar yet earlier term for metadata is "ancillary data." The prototypical example of metadata is the library catalog, which is a description of the contents of books.

# Data documents

Whenever data needs to be registered, data exists in the form of a data
document A document is a written Writing is a medium of human communication Communication (from Latin ''communicare'', meaning "to share") is the act of developing Semantics, meaning among Subject (philosophy), entities or Organization, groups t ...

s. Kinds of data documents include: *
data repository A data library, data archive, or data repository is a collection of numeric and/or geospatial Geographic data and information is defined in the ISO/TC 211 series of standards as data and information having an implicit or explicit association with ...
*data study *
data set A data set (or dataset) is a collection of data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quantity, quantit ...
*
software Software is a collection of instructions that tell a computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operatio ...

*
data paper Data publishing (also data publication) is the act of releasing research data in academic publishing, published form for use by others. It is a practice consisting in preparing certain data or data set(s) for public use thus to make them available t ...
*
database In , a database is an organized collection of stored and accessed electronically from a . Where databases are more complex they are often developed using formal techniques. The (DBMS) is the that interacts with s, applications, and the data ...

*data handbook * data journal Some of these data documents (data repositories, data studies, data sets, and software) are indexed in
Data Citation Index Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" and thus defines both its essence and the nature of its characteristics. The concept of ''inform ...
es, while data papers are indexed in traditional bibliographic databases, e.g.,
Science Citation Index The Science Citation Index (SCI) is a citation index A citation index is a kind of bibliographic index, an index of citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded ...
. See further.

## Data collection

Gathering data can be accomplished through a primary source (the researcher is the first person to obtain the data) or a secondary source (the researcher obtains the data that has already been collected by other sources, such as data disseminated in a scientific journal). Data analysis methodologies vary and include data triangulation and data percolation. The latter offers an articulate method of collecting, classifying, and analyzing data using five possible angles of analysis (at least three) to maximize the research's objectivity and permit an understanding of the phenomena under investigation as complete as possible: qualitative and quantitative methods, literature reviews (including scholarly articles), interviews with experts, and computer simulation. The data are thereafter "percolated" using a series of pre-determined steps so as to extract the most relevant information.

# In other fields

Although data are also increasingly used in other fields, it has been suggested that the highly interpretive nature of them might be at odds with the ethos of data as "given".
Peter Checkland Peter Checkland (born 18 December 1930, in Birmingham Birmingham ( ) is a City status in the United Kingdom, city and metropolitan borough in the West Midlands (county), West Midlands, England. It is the second-largest city, urban area and ES ...
introduced the term ''capta'' (from the Latin ''capere'', “to take”) to distinguish between an immense number of possible data and a sub-set of them, to which attention is oriented.
Johanna Drucker Johanna Drucker (born May 30, 1952) is an American author, book artist, visual theorist, and cultural critic. Her scholarly writing documents and critiques visual language: letterform A letterform, letter-form or letter form, is a term used especia ...

has argued that since the humanities affirm knowledge production as "situated, partial, and constitutive," using ''data'' may introduce assumptions that are counterproductive, for example that phenomena are discrete or are observer-independent. The term ''capta'', which emphasizes the act of observation as constitutive, is offered as an alternative to ''data'' for visual representations in the humanities.

*
Biological data This is a list of file formats used by computer A computer is a machine that can be programmed to carry out sequences of arithmetic or logical operations automatically. Modern computers can perform generic sets of operations known as Comp ...
*
Computer memory In computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes and development of both computer hardware , hardware and soft ...
*
Data acquisitionData acquisition is the process of sampling signals that measure real world physical conditions and converting the resulting samples into digital numeric values that can be manipulated by a computer. Data acquisition systems, abbreviated by the initi ...
*
Data analysis Data analysis is a process of inspecting, cleansing, transforming, and modelling In general, a model is an informative representation of an object, person or system. The term originally denoted the plans of a building in late 16th-century Engl ...
*
Data bankIn telecommunication Telecommunication is the transmission of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" and thus defines both its essence and the nature of it ...
*
Data cable A data cable is any media that allows baseband transmissions (binary 1,0s) from a transmitter to a receiver. Examples Are: *Networking Media **Ethernet Cables ( Cat5, Cat5e, Cat6, Cat6a) **Token Ring Cables ( Cat4) **Coaxial cable Coaxial cab ...
*
Data curationData curation is the organization and integration of data collected from various sources. It involves annotation, publication and presentation of the data such that the value of the data is maintained over time, and the data remains available for re ...
* Dark data *
Data domain In data management Data Management comprises all List of academic disciplines, disciplines related to managing data as a valuable resource. Concept The concept of data management arose in the 1980s as technology moved from sequential access, sequ ...
*
* Data farming *
Data governance Data governance is a term used on both a macro and a micro level. The former is a political concept and forms part of international relations and Internet governance Governance is all the processes of interactions be they through the laws ...
*
Data integrityData integrity is the maintenance of, and the assurance of, data accuracy and consistency over its entire life-cycle and is a critical aspect to the design, implementation, and usage of any system that stores, processes, or retrieves data. The term ...
*
Data maintenance Data Management comprises all disciplines related to managing data Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" and thus defines both its es ...
*
Data management Data management comprises all disciplines related to managing data Data (; ) are individual facts A fact is something that is truth, true. The usual test for a statement of fact is verifiability—that is whether it can be demonstrated ...
*
Data mining Data mining is a process of extracting and discovering patterns in large data set A data set (or dataset) is a collection of data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sens ...
*
Data modeling Data modeling in software engineering is the process of creating a data model for an information system by applying certain formal techniques. Overview Data modeling is a process used to define and analyze data requirements needed to sup ...
*
Data point In statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data Data (; ) are individual facts, statistics, or items of information, often numeric. In a mor ...
*
Data visualization Data visualization (often abbreviated data viz) is an interdisciplinary field that deals with the graphic Graphics () are visual The visual system comprises the sensory organ A sense is a biological system A biological system is a c ...
*
Computer data processing A computer is a machine A machine is a man-made device that uses power to apply forces and control movement to perform an action. Machines can be driven by animals and people A people is a plurality of person A person (plural ...
*
Data preservationData preservation is the act of conserving and maintaining both the safety and integrity of data Data are units of information Information can be thought of as the resolution of uncertainty; it answers the question of "What an entity is" ...
* *
Data protectionInformation privacy is the relationship between the collection and dissemination of data, technology Technology ("science of craft", from Ancient Greek, Greek , ''techne'', "art, skill, cunning of hand"; and , ''wikt:-logia, -logia'') is the s ...
*
Data remanence Data remanence is the residual representation of digital data that remains even after attempts have been made to remove or erase the data. This residue may result from data being left intact by a nominal file deletion File deletion is the remov ...
*
Data science#REDIRECT Data science Data science is an Interdisciplinarity, interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data, and apply knowledge a ...

*
Data set A data set (or dataset) is a collection of data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sense, data are a set of values of qualitative property, qualitative or quantity, quantit ...
*
Data structure In computer science Computer science deals with the theoretical foundations of information, algorithms and the architectures of its computation as well as practical techniques for their application. Computer science is the study of ...

*
Data warehouse In computing, a data warehouse (DW or DWH), also known as an enterprise data warehouse (EDW), is a system used for Business reporting, reporting and data analysis and is considered a core component of business intelligence. DWs are central reposi ...
*
Database In computing, a database is an organized collection of Data (computing), data stored and accessed electronically from a computer system. Where databases are more complex they are often developed using formal #Design and modeling, design and mode ...

*
Datasheet Front page of a floppy disk controller datasheet (1979) A data sheet, data-sheet, or spec sheet is a document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of n ...
* Environmental data rescue *
Fieldwork Field research, field studies, or fieldwork is the empirical research, collection of raw data outside a laboratory, library, or workplace setting. The approaches and methods used in field research vary across branches of science, disciplines. ...
* Information engineering *
Machine learning Machine learning (ML) is the study of computer algorithms that can improve automatically through experience and by the use of data. It is seen as a part of artificial intelligence. Machine learning algorithms build a model based on sample data ...

*
Open data Open Data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open-source data movement are similar ...

*
Scientific data archivingResearch data archiving is the Computer_data_storage#Volatility, long-term storage of scholarly research data, including the natural sciences, social sciences, and life sciences. The various academic journals have differing policies regarding how muc ...
*
Statistics Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data Data (; ) are individual facts, statistics, or items of information, often numeric. In a more technical sens ...

* Secondary Data