HOME

TheInfoList



OR:

Raw data, also known as primary data, are ''
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after
test score A test score is a piece of information, usually a number, that conveys the performance of an examinee on a test. One formal definition is that it is "a summary of the evidence contained in an examinee's responses to the items of a test that are ...
s). If a scientist sets up a computerized
thermometer A thermometer is a device that measures temperature (the hotness or coldness of an object) or temperature gradient (the rates of change of temperature in space). A thermometer has two important elements: (1) a temperature sensor (e.g. the bulb ...
which records the temperature of a chemical mixture in a test tube every minute, the list of temperature readings for every minute, as printed out on a spreadsheet or viewed on a computer screen are "raw data". Raw data have not been subjected to processing, "cleaning" by researchers to remove
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s, obvious instrument reading errors or data entry errors, or any analysis (e.g., determining central tendency aspects such as the
average In colloquial, ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean the sum of the numbers divided by ...
or
median The median of a set of numbers is the value separating the higher half from the lower half of a Sample (statistics), data sample, a statistical population, population, or a probability distribution. For a data set, it may be thought of as the “ ...
result). As well, raw data have not been subject to any other manipulation by a software program or a human researcher, analyst or technician. They are also referred to as ''primary'' data. Raw data is a relative term (see
data Data ( , ) are a collection of discrete or continuous values that convey information, describing the quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpreted for ...
), because even once raw data have been "cleaned" and processed by one team of researchers, another team may consider these processed data to be "raw data" for another stage of research. Raw data can be inputted to a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to Execution (computing), execute. It is one component of software, which also includes software documentation, documentation and other intangibl ...
or used in manual procedures such as analyzing
statistics Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
from a survey. The term "raw data" can refer to the binary data on electronic storage devices, such as hard disk drives (also referred to as "low-level data").


Generating data

Data has two ways of being created or made. The first is what is called 'captured data', and is found through purposeful investigation or analysis. The second is called 'exhaust data', and is gathered usually by machines or terminals as a secondary function. For example, cash registers, smartphones, and speedometers serve a main function but may collect data as a secondary task. Exhaust data is usually too large or of little use to process and becomes 'transient' or thrown away.


Examples

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computer, computing machinery. It includes the study and experimentation of algorithmic processes, and the development of both computer hardware, hardware and softw ...
, raw data may have the following attributes: it may possibly contain human, machine, or instrument errors, it may not be validated; it might be in different area ( colloquial) formats; uncoded or unformatted; or some entries might be "suspect" (e.g.,
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s), requiring
confirmation In Christian denominations that practice infant baptism, confirmation is seen as the sealing of the covenant (religion), covenant created in baptism. Those being confirmed are known as confirmands. The ceremony typically involves laying on o ...
or
citation A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose o ...
. For example, a data input sheet might contain dates as raw data in many forms: "31st January 1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, this raw data may be processed stored as a normalized format, perhaps a Julian date, to make it easier for computers and humans to interpret during later processing. Raw data (sometimes colloquially called "sources" data or "eggy" data, the latter a reference to the data being "uncooked", that is, "unprocessed", like a raw egg) are the data input to processing. A distinction is made between ''data'' and ''information'', to the effect that information is the ''end'' product of ''data'' processing. Raw data that has undergone processing are sometimes referred to as "cooked" data in a colloquial sense. Although raw data has the potential to be transformed into "
information Information is an Abstraction, abstract concept that refers to something which has the power Communication, to inform. At the most fundamental level, it pertains to the Interpretation (philosophy), interpretation (perhaps Interpretation (log ...
," extraction, organization, analysis, and formatting for presentation are required before raw data can be transformed into usable information. For example, a
point-of-sale terminal A payment terminal, also known as a point of sale (POS) terminal, credit card machine, card reader, PIN pad, EFTPOS terminal (or by the older term as PDQ terminal which stands for "Process Data Quickly"), is a device which interfaces with paym ...
(POS terminal, a computerized
cash register A cash register, sometimes called a till or automated money handling system, is a mechanical or electronic device for registering and calculating transactions at a point of sale. It is usually attached to a Cash register#Cash drawer, drawer fo ...
) in a busy supermarket collects huge volumes of raw data each day about customers' purchases. However, this list of grocery items and their prices and the time and date of purchase does not yield much information until it is processed. Once processed and analyzed by a software program or even by a researcher using a pen and paper and a
calculator An electronic calculator is typically a portable electronic device used to perform calculations, ranging from basic arithmetic to complex mathematics. The first solid-state electronic calculator was created in the early 1960s. Pocket-si ...
, this raw data may indicate the particular items that each customer buys, when they buy them, and at what price; as well, an analyst or manager could calculate the average total sales per customer or the average expenditure per day of the week by hour. This processed and analyzed data provides information for the manager, that the manager could then use to help her determine, for example, how many cashiers to hire and at what times. Such ''information'' could then become ''data'' for further processing, for example as part of a predictive
marketing Marketing is the act of acquiring, satisfying and retaining customers. It is one of the primary components of Business administration, business management and commerce. Marketing is usually conducted by the seller, typically a retailer or ma ...
campaign. As a result of processing, raw data sometimes ends up being put in a
database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...
, which enables the raw data to become accessible for further processing and analysis in any number of different ways.
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web, the HTML markup language, the URL system, and HTTP. He is a professorial research fellow a ...
(inventor of the
World Wide Web The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
) argues that sharing raw data is important for society.
Inspired
b
a post
by
Rufus Pollock Rufus Pollock (born 1980) is a British economist, activist and social entrepreneur. He has been a leading figure in the global open knowledge and open data movements, starting with his founding in 2004 of the non-profit Open Knowledge Foundatio ...
of the Open Knowledge Foundation his call to action i
"Raw Data Now"
meaning that everyone should demand that governments and businesses share the data they collect as raw data. He points out that "data drives a huge amount of what happens in our lives… because somebody takes the data and does something with it." To Berners-Lee, it is essentially from this sharing of raw data, that advances in science will emerge. Advocates of
open data Open data are data that are openly accessible, exploitable, editable and shareable by anyone for any purpose. Open data are generally licensed under an open license. The goals of the open data movement are similar to those of other "open(-so ...
argue that once citizens and civil society organizations have access to data from businesses and governments, it will enable citizens and NGOs to do their ''own'' analysis of the data, which can empower people and civil society. For example, a government may claim that its policies are reducing the
unemployment rate Unemployment, according to the OECD (Organisation for Economic Co-operation and Development), is the proportion of people above a specified age (usually 15) not being in paid employment or self-employment but currently available for work d ...
, but a
poverty Poverty is a state or condition in which an individual lacks the financial resources and essentials for a basic standard of living. Poverty can have diverse Biophysical environmen ...
advocacy group may be able to have its staff econometricians do their own analysis of the raw data, which may lead this group to draw different conclusions about the data set.


See also

*
Standard score In statistics, the standard score or ''z''-score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores ...


References


Further reading


Give Us the Data Raw, and Give it to Us Now
- the blog post from Rufus Pollock tha
inspired
Tim Berners-Lee * Tim Berners-Lee Gives the Web a New Definition {{DEFAULTSORT:Raw Data Data types Research Information