HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, dataflow programming is a
programming paradigm Programming paradigms are a way to classify programming languages based on their features. Languages can be classified into multiple paradigms. Some paradigms are concerned mainly with implications for the execution model of the language, suc ...
that models a program as a
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
of the data flowing between operations, thus implementing
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Da ...
principles and architecture. Dataflow
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s share some features of
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
s, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term ''datastream'' instead of ''
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Da ...
'' to avoid confusion with dataflow computing or
dataflow architecture Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and executi ...
, based on an indeterministic machine paradigm. Dataflow programming was pioneered by
Jack Dennis Jack Bonnell Dennis (born October 13, 1931) is a computer scientist and Emeritus Professor of Computer Science and Engineering at Massachusetts Institute of Technology. The work of Dennis in computer systems and computer languages is recogniz ...
and his graduate students at MIT in the 1960s.


Considerations

Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential, procedural,
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
(indicating that the program chooses a specific path), or
imperative programming In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program ...
. The program focuses on commands, in line with the
von Neumann Von Neumann may refer to: * John von Neumann (1903–1957), a Hungarian American mathematician * Von Neumann family * Von Neumann (surname), a German surname * Von Neumann (crater), a lunar impact crater See also * Von Neumann algebra * Von Ne ...
vision of sequential programming, where data is normally "at rest". In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like
black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...
es. An operation runs as soon as all of its inputs become valid. Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.


State

One of the key concepts in computer programming is the idea of
state State may refer to: Arts, entertainment, and media Literature * ''State Magazine'', a monthly magazine published by the U.S. Department of State * ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States * ''Our S ...
, essentially a snapshot of various conditions in the system. Most programming languages require a considerable amount of state information, which is generally hidden from the programmer. Often, the computer itself has no idea which piece of information encodes the enduring state. This is a serious problem, as the state information needs to be shared across multiple processors in parallel processing machines. Most languages force the programmer to add extra code to indicate which data and parts of the code are important to the state. This code tends to be both expensive in terms of performance, as well as difficult to read or debug.
Explicit parallelism In computer programming, explicit parallelism is the representation of concurrent computations by means of primitives in the form of special-purpose directives or function calls. Most parallel primitives are related to process synchronization, co ...
is one of the main reasons for the poor performance of
Enterprise Java Beans Jakarta Enterprise Beans (EJB; formerly Enterprise JavaBeans) is one of several Java APIs for modular construction of enterprise software. EJB is a server-side software component that encapsulates business logic of an application. An EJB web c ...
when building data-intensive, non- OLTP applications. Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an
assembly line An assembly line is a manufacturing process (often called a ''progressive assembly'') in which parts (usually interchangeable parts) are added as the semi-finished assembly moves from workstation to workstation where the parts are added in se ...
, each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time.


Representation

Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the
command line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
parameters, and illustrate how that data is used and modified. The flow of data is explicit, often visually illustrated as a line or pipe. In terms of encoding, a dataflow program might be implemented as a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
, with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid. For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's runtime. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.


Incremental Updates

Some recent dataflow libraries such as Differential/ Timely Dataflow have used
incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is s ...
for much more efficient data processing.


History

A pioneer dataflow language was BLODI (BLOck DIagram), developed by John Larry Kelly, Jr., Carol Lochbaum and Victor A. Vyssotsky for specifying sampled data systems. A BLODI specification of functional units (amplifiers, adders, delay lines, etc.) and their interconnections was compiled into a single loop that updated the entire system for one clock tick. In a 1966 Ph.D. thesis, ''The On-line Graphical Specification of Computer Procedures'',
Bert Sutherland William Robert Sutherland (May 10, 1936 – February 18, 2020) was an American computer scientist who was the longtime manager of three prominent research laboratories, including Sun Microsystems Laboratories (1992–1998), the Systems Sci ...
created one of the first graphical dataflow programming frameworks in order to make parallel programming easier. Subsequent dataflow languages were often developed at the large
supercomputer A supercomputer is a computer with a high level of performance as compared to a general-purpose computer. The performance of a supercomputer is commonly measured in floating-point operations per second ( FLOPS) instead of million instructio ...
labs. POGOL, an otherwise conventional data-processing language developed at
NSA The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence (DNI). The NSA is responsible for global monitoring, collec ...
, compiled large-scale applications composed of multiple file-to-file operations, e.g. merge, select, summarize, or transform, into efficient code that eliminated the creation of or writing to intermediate files to the greatest extent possible.
SISAL Sisal (, ) (''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The term sisal may ...
, a popular dataflow language developed at
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
, looks like most statement-driven languages, but variables should be assigned once. This allows the
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including SAC, ''Single Assignment C'', which tries to remain as close to the popular
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well a ...
as possible. The United States Navy funded development of ACOS and SPGN (signal processing graph notation) starting in the early 1980s. This is in use on a number of platforms in the field today.Underwater Acoustic Data Processing, Y.T. Chan A more radical concept is
Prograph Prograph is a visual, object-oriented, dataflow, multiparadigm programming language that uses iconic symbols to represent actions to be taken on data. Commercial Prograph software development environments such as Prograph Classic and Prograph CPX ...
, in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Incidentally, Prograph was originally written on the
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc., Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and ...
, which remained single-processor until the introduction of the DayStar Genesis MP in 1996. There are many hardware architectures oriented toward the efficient implementation of dataflow programming models. MIT's tagged token dataflow architecture was designed by Greg Papadopoulos. Data flow has been proposed as an abstraction for specifying the global behavior of distributed system components: in the
live distributed object Live distributed object (also abbreviated as ''live object'') refers to a running instance of a distributed multi-party (or peer-to-peer) protocol, viewed from the object-oriented perspective, as an entity that has a distinct identity, may encap ...
s programming model,
distributed data flow Distributed data flow (also abbreviated as ''distributed flow'') refers to a set of events in a distributed application or protocol. Distributed data flows serve a purpose analogous to variables or method parameters in programming languages ...
s are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages.


Languages

Dataflow programming languages include: * Céu (programming language) * ASCET * AviSynth scripting language, for video processing *
BMDFM Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single applicatio ...
Binary Modular Dataflow Machine *
CAL Cal or CAL may refer to: Arts and entertainment * ''Cal'' (novel), a 1983 novel by Bernard MacLaverty * "Cal" (short story), a science fiction short story by Isaac Asimov * ''Cal'' (1984 film), an Irish drama starring John Lynch and Helen Mir ...
*
Cuneiform Cuneiform is a logo- syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedge- ...
, a
functional Functional may refer to: * Movements in architecture: ** Functionalism (architecture) ** Form follows function * Functional group, combination of atoms within molecules * Medical conditions without currently visible organic basis: ** Functional sy ...
workflow language. *
CMS Pipelines {{Infobox programming language , name = Pipelines , logo = image:pipjarg1.jpeg , paradigm = Dataflow programming , year = 1986 , developer = IBM , designer = John P. Hartmann ( IBM) , latest_release_version = 1.1.12/0012 , latest_release_date = ...
*
Hume Hume most commonly refers to: * David Hume (1711–1776), Scottish philosopher Hume may also refer to: People * Hume (surname) * Hume (given name) * James Hume Nisbet (1849–1923), Scottish-born novelist and artist In fiction * Hume, ...
*
Joule The joule ( , ; symbol: J) is the unit of energy in the International System of Units (SI). It is equal to the amount of work done when a force of 1 newton displaces a mass through a distance of 1 metre in the direction of the force appli ...
* Keysight VEE *
KNIME KNIME (), the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks ...
is a free and open-source data analytics, reporting and integration platform *
LabVIEW Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a system-design platform and development environment for a visual programming language from National Instruments. The graphical language is named "G"; not to be confused with G-c ...
, G * Linda *
Lucid LUCID (Langton Ultimate Cosmic ray Intensity Detector) is a cosmic ray detector built by Surrey Satellite Technology Ltd and designed at Simon Langton Grammar School for Boys, in Canterbury, England. Its main purpose is to monitor cosmic rays ...
* Lustre * Max/MSP *
Microsoft Visual Programming Language Microsoft Visual Programming Language, or VPL, is a visual programming and dataflow programming language developed by Microsoft for the Microsoft Robotics Studio. VPL is based on the event-driven and data-driven approach. The programming langua ...
- A component of Microsoft Robotics Studio designed for
robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrat ...
programming * Orange - An open-source, visual programming tool for data mining, statistical
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, enc ...
, and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. * Oz now also distributed since 1.4.0 *
Pipeline Pilot Pipeline Pilot is a desktop software program sold by Dassault Systèmes for processing and analyzing data. Originally used in the natural sciences, the product's basic ETL ( Extract, transform, load) and analytics capabilities have broadened over ...
*
Prograph Prograph is a visual, object-oriented, dataflow, multiparadigm programming language that uses iconic symbols to represent actions to be taken on data. Commercial Prograph software development environments such as Prograph Classic and Prograph CPX ...
*
Pure Data Pure Data (Pd) is a visual programming language developed by Miller Puckette in the 1990s for creating interactive computer music and multimedia works. While Puckette is the main author of the program, Pd is an open-source project with a large d ...
*
Quartz Composer Quartz Composer is a node-based visual programming language provided as part of the Xcode development environment in macOS for processing and rendering graphical data. Quartz Composer uses OpenGL (including GLSL), OpenCL (only in Mac OS X Sno ...
- Designed by
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ancest ...
; used for graphic animations and effects * SAC Single assignment C *
SIGNAL In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The '' IEEE Transactions on Signal Processing' ...
(a dataflow-oriented synchronous language enabling multi-clock specifications) *
Simulink Simulink is a MATLAB-based graphical programming environment for modeling, simulating and analyzing multidomain dynamical systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offers t ...
*
SISAL Sisal (, ) (''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The term sisal may ...
* SystemVerilog - A hardware description language *
Verilog Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is als ...
- A hardware description language absorbed into the SystemVerilog standard in 2009 *
VisSim VisSim is a visual block diagram program for simulation of dynamical systems and model-based design of embedded systems, with its own visual language. It is developed by Visual Solutions of Westford, Massachusetts. Visual Solutions was acquire ...
- A block diagram language for simulation of dynamic systems and automatic firmware generation *
VHDL The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gate ...
- A hardware description language * XEE (Starlight) XML engineering environment *
XProc XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines. Below is an example abbreviated XProc file: This is a pipeline that consists of two atomic steps, XInclude and V ...


Libraries

* Apache Beam: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink, Google Dataflow etc.) *
Apache Flink Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink execut ...
: Java/Scala library that allows streaming (and batch) computations to be run atop a distributed Hadoop (or other) cluster *
Apache Spark Apache Spark is an open-source unified analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance. Originally developed at the University of Califor ...
* SystemC: Library for C++, mainly aimed at hardware design. *
TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...
: A machine-learning library based on dataflow programming.


See also

*
Actor model The actor model in computer science is a mathematical model of concurrent computation that treats ''actor'' as the universal primitive of concurrent computation. In response to a message it receives, an actor can: make local decisions, create mor ...
*
Data-driven programming In computer programming, data-driven programming is a programming paradigm in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken. Standard examples of da ...
*
Digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...
*
Event-driven programming In computer programming, event-driven programming is a programming paradigm in which the flow of the program is determined by events such as user actions (mouse clicks, key presses), sensor outputs, or message passing from other programs or thr ...
*
Flow-based programming In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are speci ...
*
Functional reactive programming Functional reactive programming (FRP) is a programming paradigm for reactive programming (asynchronous dataflow programming) using the building blocks of functional programming (e.g. map, reduce, filter). FRP has been used for programming gr ...
* Glossary of reconfigurable computing * High-performance reconfigurable computing *
Incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is s ...
*
Parallel programming model In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its ''generality ...
*
Partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portio ...
*
Pipeline (Unix) In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process ('' std ...
*
Quantum circuit In quantum information theory, a quantum circuit is a model for quantum computation, similar to classical circuits, in which a computation is a sequence of quantum gates, measurements, initializations of qubits to known values, and possibly o ...
*
Signal programming Signal programming is used in the same sense as dataflow programming, and is similar to event-driven programming. The word signal is used instead of the word dataflow in documentation of such libraries as Qt, GTK+ and libsigc++. The time instan ...
*
Stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
* Yahoo Pipes


References


External links


Book: Dataflow and Reactive Programming SystemsBasics of Dataflow Programming in F# and C#

Dataflow Programming - Concept, Languages and Applications

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Handling huge loads without adding complexity
The basic concepts of dataflow programming, Dr. Dobb's, Sept. 2011 {{Types of programming languages Concurrent programming languages Programming paradigms