HOME

TheInfoList



OR:

In
computer programming Computer programming or coding is the composition of sequences of instructions, called computer program, programs, that computers can follow to perform tasks. It involves designing and implementing algorithms, step-by-step specifications of proc ...
, dataflow programming is a
programming paradigm A programming paradigm is a relatively high-level way to conceptualize and structure the implementation of a computer program. A programming language can be classified as supporting one or more paradigms. Paradigms are separated along and descri ...
that models a program as a directed graph of the data flowing between operations, thus implementing
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dat ...
principles and architecture. Dataflow
programming language A programming language is a system of notation for writing computer programs. Programming languages are described in terms of their Syntax (programming languages), syntax (form) and semantics (computer science), semantics (meaning), usually def ...
s share some features of functional languages, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term ''datastream'' instead of ''
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Dat ...
'' to avoid confusion with dataflow computing or
dataflow architecture Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and ex ...
, based on an indeterministic machine paradigm. Dataflow programming was pioneered by Jack Dennis and his graduate students at MIT in the 1960s.


Considerations

Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential, procedural,
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
(indicating that the program chooses a specific path), or
imperative programming In computer science, imperative programming is a programming paradigm of software that uses Statement (computer science), statements that change a program's state (computer science), state. In much the same way that the imperative mood in natural ...
. The program focuses on commands, in line with the von Neumann vision of sequential programming, where data is normally "at rest". In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like
black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...
es. An operation runs as soon as all of its inputs become valid. Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.


State

One of the key concepts in computer programming is the idea of
state State most commonly refers to: * State (polity), a centralized political organization that regulates law and society within a territory **Sovereign state, a sovereign polity in international law, commonly referred to as a country **Nation state, a ...
, essentially a snapshot of various conditions in the system. Most programming languages require a considerable amount of state information, which is generally hidden from the programmer. Often, the computer itself has no idea which piece of information encodes the enduring state. This is a serious problem, as the state information needs to be shared across multiple processors in parallel processing machines. Most languages force the programmer to add extra code to indicate which data and parts of the code are important to the state. This code tends to be both expensive in terms of performance, as well as difficult to read or debug. Explicit parallelism is one of the main reasons for the poor performance of Enterprise Java Beans when building data-intensive, non-
OLTP Online transaction processing (OLTP) is a type of database system used in transaction-oriented applications, such as many operational systems. "Online" refers to the fact that such systems are expected to respond to user requests and process them i ...
applications. Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an
assembly line An assembly line, often called ''progressive assembly'', is a manufacturing process where the unfinished product moves in a direct line from workstation to workstation, with parts added in sequence until the final product is completed. By mechan ...
, each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time.


Representation

Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the
command line A command-line interface (CLI) is a means of interacting with software via command (computing), commands each formatted as a line of text. Command-line interfaces emerged in the mid-1960s, on computer terminals, as an interactive and more user ...
parameters, and illustrate how that data is used and modified. The flow of data is explicit, often visually illustrated as a line or pipe. In terms of encoding, a dataflow program might be implemented as a
hash table In computer science, a hash table is a data structure that implements an associative array, also called a dictionary or simply map; an associative array is an abstract data type that maps Unique key, keys to Value (computer science), values. ...
, with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid. For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's runtime. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.


Incremental updates

Some recent dataflow libraries such as Differential/ Timely Dataflow have used
incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is su ...
for much more efficient data processing.


History

A pioneer dataflow language was BLOck DIagram ( BLODI), published in 1961 by John Larry Kelly, Jr., Carol Lochbaum and Victor A. Vyssotsky for specifying sampled data systems. A BLODI specification of functional units (amplifiers, adders, delay lines, etc.) and their interconnections was compiled into a single loop that updated the entire system for one clock tick. In a 1966 Ph.D. thesis, ''The On-line Graphical Specification of Computer Procedures'', Bert Sutherland created one of the first graphical dataflow programming frameworks in order to make parallel programming easier. Subsequent dataflow languages were often developed at the large
supercomputer A supercomputer is a type of computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second (FLOPS) instead of million instruc ...
labs. POGOL, an otherwise conventional data-processing language developed at NSA, compiled large-scale applications composed of multiple file-to-file operations, e.g. merge, select, summarize, or transform, into efficient code that eliminated the creation of or writing to intermediate files to the greatest extent possible.
SISAL Sisal (, ; ''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The sisal fiber is ...
, a popular dataflow language developed at
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a Federally funded research and development centers, federally funded research and development center in Livermore, California, United States. Originally established in 1952, the laboratory now i ...
, looks like most statement-driven languages, but variables should be assigned once. This allows the
compiler In computing, a compiler is a computer program that Translator (computing), translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primaril ...
to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including SAC, ''Single Assignment C'', which tries to remain as close to the popular
C programming language C (''pronounced'' '' – like the letter c'') is a general-purpose programming language. It was created in the 1970s by Dennis Ritchie and remains very widely used and influential. By design, C's features cleanly reflect the capabilities of ...
as possible. The United States Navy funded development of signal processing graph notation (SPGN) and ACOS starting in the early 1980s. This is in use on a number of platforms in the field today.Underwater Acoustic Data Processing, Y.T. Chan A more radical concept is Prograph, in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Prograph was originally written on the
Macintosh Mac is a brand of personal computers designed and marketed by Apple Inc., Apple since 1984. The name is short for Macintosh (its official name until 1999), a reference to the McIntosh (apple), McIntosh apple. The current product lineup inclu ...
, which remained single-processor until the introduction of the DayStar Genesis MP in 1996. There are many hardware architectures oriented toward the efficient implementation of dataflow programming models. MIT's tagged token dataflow architecture was designed by Greg Papadopoulos. Data flow has been proposed as an abstraction for specifying the global behavior of distributed system components: in the live distributed objects programming model, distributed data flows are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages.


Languages

Dataflow programming languages include: * Céu (programming language) * ASCET * AviSynth scripting language, for video processing * BMDFM Binary Modular Dataflow Machine *
CAL Cal or CAL may refer to: Arts and entertainment * ''Cal'' (novel), a 1983 novel by Bernard MacLaverty * "Cal" (short story), a science fiction short story by Isaac Asimov * ''Cal'' (1984 film), an Irish drama starring John Lynch and Helen Mir ...
*
Cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system that was used to write several languages of the Ancient Near East. The script was in active use from the early Bronze Age until the beginning of the Common Era. Cuneiform script ...
, a functional workflow language. * CMS Pipelines * Hume *
Joule The joule ( , or ; symbol: J) is the unit of energy in the International System of Units (SI). In terms of SI base units, one joule corresponds to one kilogram- metre squared per second squared One joule is equal to the amount of work d ...
* Keysight VEE *
KNIME KNIME (), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" con ...
is a free and open-source data analytics, reporting and integration platform * LabVIEW, G * Linda * Lucid * Lustre * Max/MSP * Microsoft Visual Programming Language - A component of Microsoft Robotics Studio designed for
robotics Robotics is the interdisciplinary study and practice of the design, construction, operation, and use of robots. Within mechanical engineering, robotics is the design and construction of the physical structures of robots, while in computer s ...
programming * Nextflow: a workflow language * Orange - An open-source, visual programming tool for
data mining Data mining is the process of extracting and finding patterns in massive data sets involving methods at the intersection of machine learning, statistics, and database systems. Data mining is an interdisciplinary subfield of computer science and ...
, statistical
data analysis Data analysis is the process of inspecting, Data cleansing, cleansing, Data transformation, transforming, and Data modeling, modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Da ...
, and
machine learning Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
. * Oz now also distributed since 1.4.0 * Pipeline Pilot * Prograph * Pure Data * Quartz Composer - Designed by
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
; used for graphic animations and effects * SAC Single assignment C *
SIGNAL A signal is both the process and the result of transmission of data over some media accomplished by embedding some variation. Signals are important in multiple subject fields including signal processing, information theory and biology. In ...
(a dataflow-oriented synchronous language enabling multi-clock specifications) *
Simulink Simulink is a MATLAB-based graphical programming environment for modeling, simulating and analyzing multidomain dynamical systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offe ...
*
SISAL Sisal (, ; ''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The sisal fiber is ...
* SystemVerilog - A hardware description language *
Verilog Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits, with the highest level of abstraction being at the re ...
- A hardware description language absorbed into the SystemVerilog standard in 2009 * VisSim - A block diagram language for simulation of dynamic systems and automatic firmware generation *
VHDL VHDL (Very High Speed Integrated Circuit Program, VHSIC Hardware Description Language) is a hardware description language that can model the behavior and structure of Digital electronics, digital systems at multiple levels of abstraction, ran ...
- A hardware description language *Wapice IOT-TICKET implements an unnamed visual dataflow programming language for IoT data analysis and reporting. * XEE (Starlight) XML engineering environment * XProc


Libraries

* Apache Beam: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink, Google Dataflow etc.) *
Apache Flink Apache Flink is an Open-source software, open-source, unified stream processing, stream-processing and batch processing, batch-processing software framework, framework developed by the Apache Software Foundation. The core of Apache Flink is a dis ...
: Java/Scala library that allows streaming (and batch) computations to be run atop a distributed Hadoop (or other) cluster *
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Californ ...
*
SystemC SystemC is a set of C++ classes and macros which provide an event-driven simulation interface (see also discrete event simulation). These facilities enable a designer to ''simulate'' concurrent processes, each described using plain C++ synta ...
: Library for C++, mainly aimed at hardware design. *
TensorFlow TensorFlow is a Library (computing), software library for machine learning and artificial intelligence. It can be used across a range of tasks, but is used mainly for Types of artificial neural networks#Training, training and Statistical infer ...
: A machine-learning library based on dataflow programming.


See also

*
Actor model The actor model in computer science is a mathematical model of concurrent computation that treats an ''actor'' as the basic building block of concurrent computation. In response to a message it receives, an actor can: make local decisions, create ...
*
Data-driven programming In computer programming, data-driven programming is a programming paradigm in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken. Standard examples of dat ...
*
Digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...
*
Event-driven programming In computer programming, event-driven programming is a programming paradigm in which the Control flow, flow of the program is determined by external Event (computing), events. User interface, UI events from computer mouse, mice, computer keyboard, ...
* Flow-based programming * Functional reactive programming * Glossary of reconfigurable computing * High-performance reconfigurable computing *
Incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is su ...
*
Parallel programming model In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its ''generalit ...
* Partitioned global address space *
Pipeline (Unix) In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of process (computing), processes chained together by their standard streams, so that the output text of ...
*
Quantum circuit In quantum information theory, a quantum circuit is a model for quantum computation, similar to classical circuits, in which a computation is a sequence of quantum gates, measurements, initializations of qubits to known values, and possibly o ...
*
Signal programming SIGNAL is a programming language based on synchronized dataflow (flows + synchronization): a process is a set of equations on elementary flows describing both data and control. The SIGNAL formal model provides the capability to describe systems ...
*
Stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views Stream (computing), streams, or sequences of events in time, as the centr ...
* Yahoo Pipes


References


External links


Book: Dataflow and Reactive Programming SystemsBasics of Dataflow Programming in F# and C#

Dataflow Programming - Concept, Languages and Applications

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Handling huge loads without adding complexity
The basic concepts of dataflow programming, Dr. Dobb's, Sept. 2011 {{Types of programming languages Concurrent programming languages Programming paradigms