HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, dataflow programming is a
programming paradigm Programming paradigms are a way to classify programming languages based on their features. Languages can be classified into multiple paradigms. Some paradigms are concerned mainly with implications for the execution model of the language, s ...
that models a program as a directed graph of the data flowing between operations, thus implementing
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Da ...
principles and architecture. Dataflow
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming l ...
s share some features of
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
s, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term ''datastream'' instead of ''
dataflow In computing, dataflow is a broad concept, which has various meanings depending on the application and context. In the context of software architecture, data flow relates to stream processing or reactive programming. Software architecture Da ...
'' to avoid confusion with dataflow computing or dataflow architecture, based on an indeterministic machine paradigm. Dataflow programming was pioneered by Jack Dennis and his graduate students at MIT in the 1960s.


Considerations

Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential, procedural,
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an '' ...
(indicating that the program chooses a specific path), or
imperative programming In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program co ...
. The program focuses on commands, in line with the von Neumann vision of sequential programming, where data is normally "at rest". In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like black boxes. An operation runs as soon as all of its inputs become valid. Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.


State

One of the key concepts in computer programming is the idea of
state State may refer to: Arts, entertainment, and media Literature * ''State Magazine'', a monthly magazine published by the U.S. Department of State * ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States * '' Our ...
, essentially a snapshot of various conditions in the system. Most programming languages require a considerable amount of state information, which is generally hidden from the programmer. Often, the computer itself has no idea which piece of information encodes the enduring state. This is a serious problem, as the state information needs to be shared across multiple processors in parallel processing machines. Most languages force the programmer to add extra code to indicate which data and parts of the code are important to the state. This code tends to be both expensive in terms of performance, as well as difficult to read or debug. Explicit parallelism is one of the main reasons for the poor performance of Enterprise Java Beans when building data-intensive, non- OLTP applications. Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an
assembly line An assembly line is a manufacturing process (often called a ''progressive assembly'') in which parts (usually interchangeable parts) are added as the semi-finished assembly moves from workstation to workstation where the parts are added in sequ ...
, each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time.


Representation

Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the command line parameters, and illustrate how that data is used and modified. The flow of data is explicit, often visually illustrated as a line or pipe. In terms of encoding, a dataflow program might be implemented as a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
, with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid. For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's runtime. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.


Incremental Updates

Some recent dataflow libraries such as Differential/ Timely Dataflow have used
incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is s ...
for much more efficient data processing.


History

A pioneer dataflow language was BLODI (BLOck DIagram), developed by John Larry Kelly, Jr., Carol Lochbaum and Victor A. Vyssotsky for specifying sampled data systems. A BLODI specification of functional units (amplifiers, adders, delay lines, etc.) and their interconnections was compiled into a single loop that updated the entire system for one clock tick. In a 1966 Ph.D. thesis, ''The On-line Graphical Specification of Computer Procedures'', Bert Sutherland created one of the first graphical dataflow programming frameworks in order to make parallel programming easier. Subsequent dataflow languages were often developed at the large supercomputer labs. POGOL, an otherwise conventional data-processing language developed at NSA, compiled large-scale applications composed of multiple file-to-file operations, e.g. merge, select, summarize, or transform, into efficient code that eliminated the creation of or writing to intermediate files to the greatest extent possible.
SISAL Sisal (, ) (''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The term sisal may ...
, a popular dataflow language developed at
Lawrence Livermore National Laboratory Lawrence Livermore National Laboratory (LLNL) is a federal research facility in Livermore, California, United States. The lab was originally established as the University of California Radiation Laboratory, Livermore Branch in 1952 in response ...
, looks like most statement-driven languages, but variables should be assigned once. This allows the
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs that ...
to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including
SAC SAC or Sac may refer to: Organizations Education * Santa Ana College, California, US * San Antonio College, Texas, US * St. Andrew's College, Aurora, Canada * Students' Administrative Council, University of Toronto, Canada * SISD Student Activiti ...
, ''Single Assignment C'', which tries to remain as close to the popular
C programming language ''The C Programming Language'' (sometimes termed ''K&R'', after its authors' initials) is a computer programming book written by Brian Kernighan and Dennis Ritchie, the latter of whom originally designed and implemented the language, as well as ...
as possible. The United States Navy funded development of ACOS and SPGN (signal processing graph notation) starting in the early 1980s. This is in use on a number of platforms in the field today.Underwater Acoustic Data Processing, Y.T. Chan A more radical concept is Prograph, in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Incidentally, Prograph was originally written on the
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
, which remained single-processor until the introduction of the DayStar Genesis MP in 1996. There are many hardware architectures oriented toward the efficient implementation of dataflow programming models. MIT's tagged token dataflow architecture was designed by Greg Papadopoulos. Data flow has been proposed as an abstraction for specifying the global behavior of distributed system components: in the live distributed objects programming model,
distributed data flow Distributed data flow (also abbreviated as ''distributed flow'') refers to a set of events in a distributed application or protocol. Distributed data flows serve a purpose analogous to variables or method parameters in programming languages suc ...
s are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages.


Languages

Dataflow programming languages include: *
Céu (programming language) Céu is "Structured Synchronous Reactive Programming" According to its web page, Céu supports synchronous concurrency with shared memory and deterministic Determinism is a philosophical view, where all events are determined completely by pr ...
* ASCET *
AviSynth AviSynth is a frameserver program for Microsoft Windows, Linux and macOS initially developed by Ben Rudiak-Gould, Edwin van Eggelen, Klaus Post, Richard Berg and Ian Brabham in May 2000 and later picked up and maintained by the open source commu ...
scripting language, for video processing *
BMDFM Binary Modular Dataflow Machine (BMDFM) is a software package that enables running an application in parallel on shared memory symmetric multiprocessing (SMP) computers using the multiple processors to speed up the execution of single application ...
Binary Modular Dataflow Machine *
CAL Cal or CAL may refer to: Arts and entertainment * ''Cal'' (novel), a 1983 novel by Bernard MacLaverty * "Cal" (short story), a science fiction short story by Isaac Asimov * ''Cal'' (1984 film), an Irish drama starring John Lynch and Helen Mir ...
*
Cuneiform Cuneiform is a logo- syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedg ...
, a functional workflow language. *
CMS Pipelines CMS Pipelines is a feature of the VM/CMS operating system that allows one to create and use a pipeline. The programs in a pipeline operate on a sequential stream of records. A program writes records that are read by the next program in the pipeli ...
*
Hume Hume most commonly refers to: * David Hume (1711–1776), Scottish philosopher Hume may also refer to: People * Hume (surname) * Hume (given name) * James Hume Nisbet (1849–1923), Scottish-born novelist and artist In fiction * Hume, the ...
*
Joule The joule ( , ; symbol: J) is the unit of energy in the International System of Units (SI). It is equal to the amount of work done when a force of 1 newton displaces a mass through a distance of 1 metre in the direction of the force applie ...
* Keysight VEE * KNIME is a free and open-source data analytics, reporting and integration platform *
LabVIEW Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a system-design platform and development environment for a visual programming language from National Instruments. The graphical language is named "G"; not to be confused with G- ...
, G * Linda *
Lucid LUCID (Langton Ultimate Cosmic ray Intensity Detector) is a cosmic ray detector built by Surrey Satellite Technology Ltd and designed at Simon Langton Grammar School for Boys, in Canterbury, England. Its main purpose is to monitor cosmic ray ...
* Lustre * Max/MSP * Microsoft Visual Programming Language - A component of Microsoft Robotics Studio designed for
robotics Robotics is an interdisciplinarity, interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist human ...
programming * Orange - An open-source, visual programming tool for data mining, statistical
data analysis Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, en ...
, and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. * Oz now also distributed since 1.4.0 * Pipeline Pilot * Prograph *
Pure Data Pure Data (Pd) is a visual programming language developed by Miller Puckette in the 1990s for creating interactive computer music and multimedia works. While Puckette is the main author of the program, Pd is an open-source software, open-source ...
*
Quartz Composer Quartz Composer is a node-based visual programming language provided as part of the Xcode development environment in macOS for processing and rendering graphical data. Quartz Composer uses OpenGL (including GLSL), OpenCL (only in Mac OS X Sn ...
- Designed by
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus '' Malus''. The tree originated in Central Asia, where its wild ances ...
; used for graphic animations and effects *
SAC SAC or Sac may refer to: Organizations Education * Santa Ana College, California, US * San Antonio College, Texas, US * St. Andrew's College, Aurora, Canada * Students' Administrative Council, University of Toronto, Canada * SISD Student Activiti ...
Single assignment C * SIGNAL (a dataflow-oriented synchronous language enabling multi-clock specifications) * Simulink *
SISAL Sisal (, ) (''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The term sisal may ...
* SystemVerilog - A hardware description language *
Verilog Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is a ...
- A hardware description language absorbed into the SystemVerilog standard in 2009 *
VisSim VisSim is a visual block diagram program for simulation of dynamical systems and model-based design of embedded systems, with its own visual language. It is developed by Visual Solutions of Westford, Massachusetts. Visual Solutions was acquired ...
- A block diagram language for simulation of dynamic systems and automatic firmware generation *
VHDL The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gat ...
- A hardware description language *
XEE (Starlight) Starlight is a software product originally developed at Pacific Northwest National Laboratory and now by Future Point Systems. It is an advanced visual analysis environment. In addition to using information visualization to show the importance of ...
XML engineering environment * XProc


Libraries

* Apache Beam: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink, Google Dataflow etc.) *
Apache Flink Apache Flink is an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation. The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala. Flink execu ...
: Java/Scala library that allows streaming (and batch) computations to be run atop a distributed Hadoop (or other) cluster * Apache Spark * SystemC: Library for C++, mainly aimed at hardware design. * TensorFlow: A machine-learning library based on dataflow programming.


See also

* Actor model * Data-driven programming *
Digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner ar ...
* Event-driven programming * Flow-based programming * Functional reactive programming *
Glossary of reconfigurable computing This is a glossary of terms used in the field of Reconfigurable computing and reconfigurable computing systems, as opposed to the traditional Von Neumann architecture. ...
*
High-performance reconfigurable computing Reconfigurable computing is a computer architecture combining some of the flexibility of software with the high performance of hardware by processing with very flexible high speed computing fabrics like field-programmable gate arrays (FPGAs). Th ...
*
Incremental computing Incremental computing, also known as incremental computation, is a software feature which, whenever a piece of data changes, attempts to save time by only recomputing those outputs which depend on the changed data. When incremental computing is s ...
* Parallel programming model * Partitioned global address space *
Pipeline (Unix) In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process ('' s ...
*
Quantum circuit In quantum information theory, a quantum circuit is a model for quantum computation, similar to classical circuits, in which a computation is a sequence of quantum gates, measurements, initializations of qubits to known values, and possibly ...
* Signal programming * Stream processing * Yahoo Pipes


References


External links


Book: Dataflow and Reactive Programming SystemsBasics of Dataflow Programming in F# and C#

Dataflow Programming - Concept, Languages and Applications

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Handling huge loads without adding complexity
The basic concepts of dataflow programming, Dr. Dobb's, Sept. 2011 {{Types of programming languages Concurrent programming languages Programming paradigms