Digital data stream
   HOME

TheInfoList



OR:

In
connection-oriented communication Connection-oriented communication is a network communication mode in telecommunications and computer networking, where a communication session or a semi-permanent connection is established before any useful data can be transferred. The establishe ...
, a data stream is the transmission of a sequence of digitally encoded
coherent Coherence, coherency, or coherent may refer to the following: Physics * Coherence (physics), an ideal property of waves that enables stationary (i.e. temporally and spatially constant) interference * Coherence (units of measurement), a deri ...
signal In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The '' IEEE Transactions on Signal Processing' ...
s to convey
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random ...
. Typically, the transmitted symbols are grouped into a series of
packet Packet may refer to: * A small container or pouch ** Packet (container), a small single use container ** Cigarette packet ** Sugar packet * Network packet, a formatted unit of data carried by a packet-mode computer network * Packet radio, a fo ...
s. Data streaming has become ubiquitous. Anything transmitted over the
Internet The Internet (or internet) is the global system of interconnected computer networks that uses the Internet protocol suite (TCP/IP) to communicate between networks and devices. It is a '' network of networks'' that consists of private, pub ...
is transmitted as a data stream. Using a
mobile phone A mobile phone, cellular phone, cell phone, cellphone, handphone, hand phone or pocket phone, sometimes shortened to simply mobile, cell, or just phone, is a portable telephone that can make and receive calls over a radio frequency link whi ...
to have a conversation transmits the sound as a data stream.


Formal definition

In a formal way, a data stream is any ordered pair ( s, \Delta ) where: # s is a
sequence In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is calle ...
of
tuple In mathematics, a tuple is a finite ordered list (sequence) of elements. An -tuple is a sequence (or ordered list) of elements, where is a non-negative integer. There is only one 0-tuple, referred to as ''the empty tuple''. An -tuple is defi ...
s and # \Delta is a sequence of positive
real Real may refer to: Currencies * Brazilian real (R$) * Central American Republic real * Mexican real * Portuguese real * Spanish real * Spanish colonial real Music Albums * ''Real'' (L'Arc-en-Ciel album) (2000) * ''Real'' (Bright album) (2010) ...
time interval Time is the continued sequence of existence and events that occurs in an apparently irreversible succession from the past, through the present, into the future. It is a component quantity of various measurements used to sequence events, to c ...
s.


Content

Data Stream contains different sets of data, that depend on the chosen data format. * Attributes – each attribute of the data stream represents a certain type of data, e.g. segment / data point ID, timestamp, geodata. *
Timestamp A timestamp is a sequence of characters or encoded information identifying when a certain event occurred, usually giving date and time of day, sometimes accurate to a small fraction of a second. Timestamps do not have to be based on some absolut ...
attribute helps to identify when an event occurred. * Subject ID is an encoded-by-algorithm ID, that has been extracted out of a cookie. *
Raw Data Raw data, also known as primary data, are ''data'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after test scores). If a scientist ...
includes information straight from the data provider without being processed by an algorithm nor human. * Processed Data is a data that has been prepared (somehow modified, validated or cleaned), to be used for future actions.


Usage

There are various areas where data streams are used: * Fraud detection & scoring – raw data is used as source data for an anti-fraud algorithm (
data analysis techniques for fraud detection Fraud represents a significant problem for governments and businesses and specialized analysis techniques for discovering fraud using them are required. Some of these methods include knowledge discovery in databases (KDD), data mining, machine l ...
). For example, timestamp or amount of cookie occurrences or analysis of data points are used within the scoring system to detect fraud or to make sure that a message receiver is not a bot (so-called Non-Human Traffic). *
Artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
– raw data is treated like a train set and a test set during AI and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
algorithms building. *
Raw data Raw data, also known as primary data, are ''data'' (e.g., numbers, instrument readings, figures, etc.) collected from a source. In the context of examinations, the raw data might be described as a raw score (after test scores). If a scientist ...
is used for profiling and personalization to customize user profiles and divide them for segmentation, e.g., per gender or location (based on
data point In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected at ...
). *
Business intelligence Business intelligence (BI) comprises the strategies and technologies used by enterprises for the data analysis and management of business information. Common functions of business intelligence technologies include reporting, online analytical p ...
– raw data is a source of information for BI systems, used for enriching user profiles with detailed information about them, e.g., purchase path or geodata. This information is used for
business analysis Business analysis is a professional discipline of identifying business needs and determining solutions to business problems. Solutions often include a software-systems development component, but may also consist of process improvements, organiza ...
and predictive research. * Targeting – processed data by data scientists improve online campaigns and is used for reaching the target audience. * CRM Enrichment – raw data is integrated with
customer-relationship management Customer relationship management (CRM) is a process in which a business or other organization administers its interactions with customers, typically using data analysis to study large amounts of information. CRM systems compile data from a r ...
system. CRM integration allows to fill the gaps in users' profiles with demographic data, interests or buying intentions.


Integration

Core integrations with data streams are: * Data streams are integrated with systems such as
customer data platform A customer data platform (CDP) is a collection of software which creates a persistent, unified customer database that is accessible to other systems. Data is pulled from multiple sources, cleaned and combined to create a single customer profile. ...
(CDP), customer relationship management (CRM) or data management platform (DMP) to enrich users' profiles with external data. It is possible to expand the knowledge about existing users by using external sources. * Data streams are used to enrich business intelligence systems and make analysis more precise and conclusions more accurate. * In the case of content management system (CMS) integration, Data Stream is used to identify the users and personalize their visit, even if it's their first one. By data analysis, the actual content of the website is adapted to the user. * Data streams are integrated with demand side platform (DSP) within programmatic advertising ecosystem. Parties (e.g., advertisers) can exchange the users' IDs and concatenate with them existing profiles. * Data streams are used to choose respective user segments (e.g., people interested in the automotive industry) and use them in an online campaign. Segments are enriched with more user characteristics out of data stream and then sent to DSP.


Data sources visible

In a data stream it is visible what device has been used by the user side – it is visible on user agent: * mobile – when a user uses a mobile browser to explore, it has narrow screen resolution and mobile app version, respectively; * desktop – when a user uses a desktop browser or app version. The following information is shared out of used device: * Actual URL to the visited website, where an event occurred * User Agent *
Geolocation Geopositioning, also known as geotracking, geolocalization, geolocating, geolocation, or geoposition fixing, is the process of determining or estimating the geographic position of an object. Geopositioning yields a set of geographic coordinates ...
*
Internet Protocol The Internet Protocol (IP) is the network layer communications protocol in the Internet protocol suite for relaying datagrams across network boundaries. Its routing function enables internetworking, and essentially establishes the Internet. ...
(IP)


Formats

A
data point In statistics, a unit of observation is the unit described by the data that one analyzes. A study may treat groups as a unit of observation with a country as the unit of analysis, drawing conclusions on group characteristics from data collected at ...
is a tag that collects information about a certain action, performed by a user on a website. Data points exists in two types, the values of which are used to create appropriate audiences. Those are: * 'event' with information about occurrences of the specific event (e.g., click on a link or displaying ad) * 'attribute' with numerical or alphanumerical values. Segment is a logical statement, built on specific Data Points using AND, OR or NOT operators.
Hybrid data – raw data out of both Data Point and Segment data formats.
URLs – is a set of information about a particular URL that has been visited.


GDPR

Information gathered out of websites are based on user behavior. Data providers deliver both personal or non-personal information. There are two types of user data available in data stream: * Personally identifiable information (PII) – information that allows clearly or by combining with data identification methods identify a person. Examples of PII are: insurance ID, email address, phone number,
IP address An Internet Protocol address (IP address) is a numerical label such as that is connected to a computer network that uses the Internet Protocol for communication.. Updated by . An IP address serves two main functions: network interface ident ...
, geolocation, biometric data. * Non-personally identifiable information (non-PII) is information that can't be used to identify a person or to track a location. A cookie or a device ID is an example of non-PII.


References

{{DEFAULTSORT:Data Stream Computing terminology Big data Business analysis