HOME

TheInfoList



OR:

Raw data, also known as primary data, are ''
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after test scores). If a scientist sets up a computerized thermometer which records the temperature of a chemical mixture in a test tube every minute, the list of temperature readings for every minute, as printed out on a spreadsheet or viewed on a computer screen are "raw data". Raw data have not been subjected to processing, "cleaning" by researchers to remove
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s, obvious instrument reading errors or data entry errors, or any analysis (e.g., determining central tendency aspects such as the average or
median In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as "the middle" value. The basic fe ...
result). As well, raw data have not been subject to any other manipulation by a software program or a human researcher, analyst or technician. They are also referred to as ''primary'' data. Raw data is a relative term (see
data In the pursuit of knowledge, data (; ) is a collection of discrete values that convey information, describing quantity, quality, fact, statistics, other basic units of meaning, or simply sequences of symbols that may be further interpret ...
), because even once raw data have been "cleaned" and processed by one team of researchers, another team may consider these processed data to be "raw data" for another stage of research. Raw data can be inputted to a
computer program A computer program is a sequence or set of instructions in a programming language for a computer to execute. Computer programs are one component of software, which also includes documentation and other intangible components. A computer progra ...
or used in manual procedures such as analyzing
statistics Statistics (from German: '' Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, indust ...
from a survey. The term "raw data" can refer to the binary data on electronic storage devices, such as hard disk drives (also referred to as "low-level data").


Generating data

Data has two ways of being created or made. The first is what is called 'captured data', and is found through purposeful investigation or analysis. The second is called 'exhaust data', and is gathered usually by machines or terminals as a secondary function. For example, cash registers, smartphones, and speedometers serve a main function but may collect data as a secondary task. Exhaustive data is usually too large or of little use to process and becomes 'transient' or thrown away.


Examples

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, raw data may have the following attributes: it may possibly contain human, machine, or instrument errors, it may not be validated; it might be in different area ( colloquial) formats; uncoded or unformatted; or some entries might be "suspect" (e.g.,
outlier In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are ...
s), requiring
confirmation In Christian denominations that practice infant baptism, confirmation is seen as the sealing of the covenant (religion), covenant created in baptism. Those being confirmed are known as confirmands. For adults, it is an wikt:affirmation, affirma ...
or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January 1999", "31/01/1999", "31/1/99", "31 Jan", or "today". Once captured, this raw data may be processed stored as a normalized format, perhaps a Julian date, to make it easier for computers and humans to interpret during later processing. Raw data (sometimes colloquially called "sources" data or "eggy" data, the latter a reference to the data being "uncooked", that is, "unprocessed", like a raw
egg An egg is an organic vessel grown by an animal to carry a possibly fertilized egg cell (a zygote) and to incubate from it an embryo within the egg until the embryo has become an animal fetus that can survive on its own, at which point the a ...
) are the data input to processing. A distinction is made between ''data'' and ''information'', to the effect that information is the ''end'' product of ''data'' processing. Raw data that has undergone processing are sometimes referred to as "cooked" data in a colloquial sense. Although raw data has the potential to be transformed into " information," extraction, organization, analysis, and formatting for presentation are required before raw data can be transformed into usable information. For example, a point-of-sale terminal (POS terminal, a computerized cash register) in a busy supermarket collects huge volumes of raw data each day about customers' purchases. However, this list of grocery items and their prices and the time and date of purchase does not yield much information until it is processed. Once processed and analyzed by a software program or even by a researcher using a pen and paper and a calculator, this raw data may indicate the particular items that each customer buys, when they buy them, and at what price; as well, an analyst or manager could calculate the average total sales per customer or the average expenditure per day of the week by hour. This processed and analyzed data provides information for the manager, that the manager could then use to help her determine, for example, how many cashiers to hire and at what times. Such ''information'' could then become ''data'' for further processing, for example as part of a predictive
marketing Marketing is the process of exploring, creating, and delivering value to meet the needs of a target market in terms of goods and services; potentially including selection of a target audience; selection of certain attributes or themes to empha ...
campaign. As a result of processing, raw data sometimes ends up being put in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
, which enables the raw data to become accessible for further processing and analysis in any number of different ways.
Tim Berners-Lee Sir Timothy John Berners-Lee (born 8 June 1955), also known as TimBL, is an English computer scientist best known as the inventor of the World Wide Web. He is a Professorial Fellow of Computer Science at the University of Oxford and a profess ...
(inventor of the World Wide Web) argues that sharing raw data is important for society.
Inspired
b
a post
by Rufus Pollock of the Open Knowledge Foundation his call to action i
"Raw Data Now"
meaning that everyone should demand that governments and businesses share the data they collect as raw data. He points out that "data drives a huge amount of what happens in our lives… because somebody takes the data and does something with it." To Berners-Lee, it is essentially from this sharing of raw data, that advances in science will emerge. Advocates of open data argue that once citizens and civil society organizations have access to data from businesses and governments, it will enable citizens and NGOs to do their ''own'' analysis of the data, which can empower people and civil society. For example, a government may claim that its policies are reducing the unemployment rate, but a
poverty Poverty is the state of having few material possessions or little income. Poverty can have diverse social, economic, and political causes and effects. When evaluating poverty in ...
advocacy group may be able to have its staff econometricians do their own analysis of the raw data, which may lead this group to draw different conclusions about the data set.


See also

*
Standard score In statistics, the standard score is the number of standard deviations by which the value of a raw score (i.e., an observed value or data point) is above or below the mean value of what is being observed or measured. Raw scores above the mean ...


References


Further reading


Give Us the Data Raw, and Give it to Us Now
- the blog post from Rufus Pollock tha
inspired
Tim Berners-Lee * Tim Berners-Lee Gives the Web a New Definition {{DEFAULTSORT:Raw Data Data types Research Information