Machine-generated Data
   HOME

TheInfoList



OR:

Machine-generated data is
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
automatically generated by a
computer process In computing, a process is the instance of a computer program that is being executed by one or many threads. There are many different process models, some of which are light weight, but almost all processes (even entire virtual machines) are ro ...
, application, or other mechanism without the active intervention of a human. While the term dates back over fifty years, there is some current indecision as to the scope of the term. Monash Research's Curt Monash defines it as "data that was produced entirely by machines OR data that is more about observing humans than recording their choices." Meanwhile, Daniel Abadi, CS Professor at
Yale Yale University is a private research university in New Haven, Connecticut. Established in 1701 as the Collegiate School, it is the third-oldest institution of higher education in the United States and among the most prestigious in the wor ...
, proposes a narrower definition, "Machine-generated data is data that is generated as a result of a decision of an independent computational agent or a measurement of an event that is not caused by a human action." Regardless of definition differences, both exclude data manually entered by a person.Monash, Three Broad Categories of Data Machine-generated data crosses all
industry sector Industry classification or industry taxonomy is a type of economic taxonomy that classifies companies, organizations and traders into industrial groupings based on similar production processes, similar products, or similar behavior in financial ...
s. Often and increasingly, humans are unaware their actions are generating the data.


Relevance

Machine-generated data has no single form; rather, the type, format, metadata, and frequency respond to some particular business purpose. Machines often create it on a defined time schedule or in response to a state change, action, transaction, or other event. Since the event is historical, the data is not prone to be updated or modified. Partly because of this quality, the
U.S. The United States of America (U.S.A. or USA), commonly known as the United States (U.S. or US) or America, is a country primarily located in North America. It consists of 50 states, a federal district, five major unincorporated territori ...
court system A court is any person or institution, often as a government institution, with the authority to adjudicate legal disputes between parties and carry out the administration of justice in civil, criminal, and administrative matters in accordanc ...
s consider machine-generated data as highly reliable. Machine-generated data is the lifeblood of the Internet of Things (IoT).


Growth

In 2009,
Gartner Gartner, Inc is a technological research and consulting firm based in Stamford, Connecticut that conducts research on technology and shares this research both through private consulting as well as executive programs and conferences. Its client ...
published that data will grow by 650% over the following five years.ScienceLogic Most of the growth in data is the byproduct of machine-generated data. IDC estimated that in 2020, there will be 26 times more connected things than people. Wikibon issued a forecast of $514 billion to be spent on the Industrial Internet in 2020.
Wikibon


Processing

Given the fairly static yet voluminous nature of machine-generated data, data owners rely on highly scalable tools to process and analyze the resulting
dataset A data set (or dataset) is a collection of data. In the case of tabular data, a data set corresponds to one or more database tables, where every column of a table represents a particular variable, and each row corresponds to a given record of the ...
. Almost all machine-generated data is unstructured but then derived into a common structure. Typically, these derived structures contain many
data point In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected at ...
s/columns. With these data points, the challenge lies mostly with analyzing the data. Given high performance requirements along with large data sizes, traditional
database index A database index is a data structure that improves the speed of data retrieval operations on a database table at the cost of additional writes and storage space to maintain the index data structure. Indexes are used to quickly locate data without ...
ing and partitioning limits the size and history of the dataset for processing. Alternative approaches exist with columnar databases as only particular "columns" of the dataset would be accessed during particular analysis.


Examples

* Web server logsMonash, Examples of Machine Generated Data *
Call detail record A call detail record (CDR) is a data record produced by a telephone exchange or other telecommunications equipment that documents the details of a telephone call or other telecommunications transactions (e.g., text message) that passes through that ...
s * Financial instrument trades *Network event logs *
Security information and event management Security information and event management (SIEM) is a field within the field of computer security, where software products and services combine security information management (SIM) and security event management (SEM). They provide real-time ana ...
(SIEM) logs *
Telemetry Telemetry is the in situ collection of measurements or other data at remote points and their automatic transmission to receiving equipment (telecommunication) for monitoring. The word is derived from the Greek roots ''tele'', "remote", an ...
collected by the government


Notes


Reference List


Bibliography

* * * * * * * {{refend Computer data