In
computer programming
Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, data-driven programming is a
programming paradigm
A programming paradigm is a relatively high-level way to conceptualize and structure the implementation of a computer program. A programming language can be classified as supporting one or more paradigms.
Paradigms are separated along and descri ...
in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken.
Standard examples of data-driven languages are the text-processing languages
sed and
AWK,
and the document transformation language
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats c ...
, where the data is a sequence of lines in an
input stream
In computer science, a stream is a sequence of potentially unlimited data elements made available over time. A stream can be thought of as items on a conveyor belt being processed one at a time rather than in large batches. Streams are processe ...
– these are thus also known as line-oriented languages – and pattern matching is primarily done via
regular expression
A regular expression (shortened as regex or regexp), sometimes referred to as rational expression, is a sequence of characters that specifies a match pattern in text. Usually such patterns are used by string-searching algorithms for "find" ...
s or line numbers.
Related paradigms
Data-driven programming is similar to
event-driven programming
In computer programming, event-driven programming is a programming paradigm in which the Control flow, flow of the program is determined by external Event (computing), events. User interface, UI events from computer mouse, mice, computer keyboard, ...
, in that both are structured as pattern matching and resulting processing, and are usually implemented by a
main loop
In computer science, the event loop (also known as message dispatcher, message loop, message pump, or run loop) is a programming construct or software design pattern, design pattern that waits for and dispatches event-driven programming, events or ...
, though they are typically applied to different domains. The condition/action model is also similar to
aspect-oriented programming
In computing, aspect-oriented programming (AOP) is a programming paradigm that aims to increase modularity by allowing the separation of cross-cutting concerns. It does so by adding behavior to existing code (an advice) ''without'' modifying t ...
, where when a
join point (condition) is reached, a
pointcut (action) is executed. A similar paradigm is used in some
tracing frameworks such as
DTrace
DTrace is a comprehensive dynamic tracing framework originally created by Sun Microsystems for troubleshooting kernel and application problems on production systems in real time.
Originally developed for Solaris, it has since been released un ...
, where one lists probes (instrumentation points) and associated actions, which execute when the condition is satisfied.
Adapting
abstract data type
In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
design methods to
object-oriented programming
Object-oriented programming (OOP) is a programming paradigm based on the concept of '' objects''. Objects can contain data (called fields, attributes or properties) and have actions they can perform (called procedures or methods and impl ...
results in a data-driven design.
This type of design is sometimes used in object-oriented programming to define classes during the conception of a piece of software.
Applications
Data-driven programming is typically applied to streams of structured data, for filtering, transforming, aggregating (such as computing statistics), or calling other programs. Typical streams include
log files,
delimiter-separated values
Formats that use delimiter-separated values (also DSV)DSV stands for ''Delimiter Separated Values'' store two-dimensional arrays of data by separating the values in each row with specific delimiter character (computing), characters. Most database ...
, or email messages, notably for
email filtering
Email filtering is the processing of email to organize it according to specified criteria. The term can apply to the intervention of human intelligence, but most often refers to the automatic processing of messages at an SMTP server, possibly ap ...
. For example, an AWK program may take as input a stream of log statements, and for example send all to the console, write ones starting with WARNING to a "WARNING" file, and send an email to a
sysadmin in case any line starts with "ERROR". It could also record how many warnings are logged per day. Alternatively, one can process streams of delimiter-separated values, processing each line or aggregated lines, such as the sum or max. In email, a language like
procmail can specify conditions to match on some emails, and what actions to take (deliver, bounce, discard, forward, etc.).
Some data-driven languages are
Turing-complete
In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be ...
, such as AWK and even sed, while others are intentionally very limited, notably for filtering. An extreme example of the latter is
pcap, which only consists of filtering, with the only action being "capture". Less extremely,
sieve
A sieve (), fine mesh strainer, or sift is a tool used for separating wanted elements from unwanted material or for controlling the particle size distribution of a sample, using a screen such as a woven mesh or net or perforated sheet m ...
has filters and actions, but in the base standard has no variables or loops, only allowing stateless filtering statements: each input element is processed independently. Variables allow state, which allow operations that depend on more than one input element, such as aggregation (summing inputs) or
throttling (allow at most 5 mails per hour from each sender, or limiting repeated log messages).
Data-driven languages frequently have a default action: if no condition matches, line-oriented languages may print the line (as in sed), or deliver a message (as in sieve). In some applications, such as filtering, matching is may be done ''exclusively'' (so only ''first'' matching statement), while in other cases ''all'' matching statements are applied. In either case, failure to match ''any'' pattern may be "default behavior" or can be seen as an error, to be caught by a catch-all statement at the end.
Benefits and issues
While the benefits and issues may vary between implementation, there are a few big potential benefits of and problems with this paradigm. Functionality simply requires that it knows the
abstract data type
In computer science, an abstract data type (ADT) is a mathematical model for data types, defined by its behavior (semantics) from the point of view of a '' user'' of the data, specifically in terms of possible values, possible operations on data ...
of the variables it is working with. Functions and
interfaces
Interface or interfacing may refer to:
Academic journals
* ''Interface'' (journal), by the Electrochemical Society
* '' Interface, Journal of Applied Linguistics'', now merged with ''ITL International Journal of Applied Linguistics''
* '' Inter ...
can be used on all objects with the same data fields, for instance the object's "position". Data can be grouped into objects or "entities" according to preference with little to no consequence.
While data-driven design does prevent coupling of data and functionality, in some cases, data-driven programming has been argued to lead to bad
object-oriented design
Object-oriented analysis and design (OOAD) is a technical approach for analyzing and designing an application, system, or business by applying object-oriented programming, as well as using visual modeling throughout the software development proc ...
, especially when dealing with more abstract data. This is because a purely data-driven object or entity is defined by the way it is
represented. Any attempt to change the structure of the object would immediately break the functions that rely on it.
As an example, one might represent
driving directions as a series of intersections (two intersecting streets) where the driver must turn right or left. If an intersection (in the United States) is represented in data by the
zip code (5-digit number) and two
street name
A street name is an identifying name given to a street or road. In toponymic terminology, names of streets and roads are referred to as odonyms or hodonyms (from Ancient Greek 'road', and 'name', i.e., the Doric Greek, Doric and Aeolic Gre ...
s (strings of text), bugs may appear when a city where streets
intersect multiple times is encountered. While this example may be oversimplified, restructuring of data is a fairly common problem in software engineering, either to eliminate bugs, increase efficiency, or support new features.
Languages
*
AWK
*
BASIC
Basic or BASIC may refer to:
Science and technology
* BASIC, a computer programming language
* Basic (chemistry), having the properties of a base
* Basic access authentication, in HTTP
Entertainment
* Basic (film), ''Basic'' (film), a 2003 film
...
*
Clojure
Clojure (, like ''closure'') is a dynamic programming language, dynamic and functional programming, functional dialect (computing), dialect of the programming language Lisp (programming language), Lisp on the Java (software platform), Java platfo ...
*
fdm
*
Lua
*
maildrop
*
Oz
*
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
– data-driven programming as in AWK and sed is one paradigm supported by Perl
*
procmail
*
Raku - Raku has grammars (and regexes) built in, and so supports data-driven programming
*
REBOL, a Redbol language
*
Red, a Redbol language
Ren-C, a Redbol language*
sed
*
Sieve
A sieve (), fine mesh strainer, or sift is a tool used for separating wanted elements from unwanted material or for controlling the particle size distribution of a sample, using a screen such as a woven mesh or net or perforated sheet m ...
Tab (language)*
XSLT
XSLT (Extensible Stylesheet Language Transformations) is a language originally designed for transforming XML documents into other XML documents, or other formats such as HTML for web pages, plain text, or XSL Formatting Objects. These formats c ...
See also
*
Array programming
In computer science, array programming refers to solutions that allow the application of operations to an entire set of values at once. Such solutions are commonly used in computational science, scientific and engineering settings.
Modern program ...
*
Backus–Naur form
In computer science, Backus–Naur form (BNF, pronounced ), also known as Backus normal form, is a notation system for defining the Syntax (programming languages), syntax of Programming language, programming languages and other Formal language, for ...
References
External links
"The important part is moving program logic away from hardwired control structures and into data."
{{Types of programming languages
Programming paradigms