Dataflow Language
   HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, dataflow programming is a
programming paradigm Programming paradigms are a way to classify programming languages based on their features. Languages can be classified into multiple paradigms. Some paradigms are concerned mainly with implications for the execution model of the language, suc ...
that models a program as a
directed graph In mathematics, and more specifically in graph theory, a directed graph (or digraph) is a graph that is made up of a set of vertices connected by directed edges, often called arcs. Definition In formal terms, a directed graph is an ordered pa ...
of the data flowing between operations, thus implementing dataflow principles and architecture. Dataflow
programming language A programming language is a system of notation for writing computer programs. Most programming languages are text-based formal languages, but they may also be graphical. They are a kind of computer language. The description of a programming ...
s share some features of
functional language In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
s, and were generally developed in order to bring some functional concepts to a language more suitable for numeric processing. Some authors use the term ''datastream'' instead of '' dataflow'' to avoid confusion with dataflow computing or
dataflow architecture Dataflow architecture is a dataflow-based computer architecture that directly contrasts the traditional von Neumann architecture or control flow architecture. Dataflow architectures have no program counter, in concept: the executability and executi ...
, based on an indeterministic machine paradigm. Dataflow programming was pioneered by
Jack Dennis Jack Bonnell Dennis (born October 13, 1931) is a computer scientist and Emeritus Professor of Computer Science and Engineering at Massachusetts Institute of Technology. The work of Dennis in computer systems and computer languages is recogniz ...
and his graduate students at MIT in the 1960s.


Considerations

Traditionally, a program is modelled as a series of operations happening in a specific order; this may be referred to as sequential, procedural,
control flow In computer science, control flow (or flow of control) is the order in which individual statements, instructions or function calls of an imperative program are executed or evaluated. The emphasis on explicit control flow distinguishes an ''im ...
(indicating that the program chooses a specific path), or
imperative programming In computer science, imperative programming is a programming paradigm of software that uses statements that change a program's state. In much the same way that the imperative mood in natural languages expresses commands, an imperative program ...
. The program focuses on commands, in line with the
von Neumann Von Neumann may refer to: * John von Neumann (1903–1957), a Hungarian American mathematician * Von Neumann family * Von Neumann (surname), a German surname * Von Neumann (crater), a lunar impact crater See also * Von Neumann algebra * Von Ne ...
vision of sequential programming, where data is normally "at rest". In contrast, dataflow programming emphasizes the movement of data and models programs as a series of connections. Explicitly defined inputs and outputs connect operations, which function like
black box In science, computing, and engineering, a black box is a system which can be viewed in terms of its inputs and outputs (or transfer characteristics), without any knowledge of its internal workings. Its implementation is "opaque" (black). The te ...
es. An operation runs as soon as all of its inputs become valid. Thus, dataflow languages are inherently parallel and can work well in large, decentralized systems.


State

One of the key concepts in computer programming is the idea of
state State may refer to: Arts, entertainment, and media Literature * ''State Magazine'', a monthly magazine published by the U.S. Department of State * ''The State'' (newspaper), a daily newspaper in Columbia, South Carolina, United States * ''Our S ...
, essentially a snapshot of various conditions in the system. Most programming languages require a considerable amount of state information, which is generally hidden from the programmer. Often, the computer itself has no idea which piece of information encodes the enduring state. This is a serious problem, as the state information needs to be shared across multiple processors in parallel processing machines. Most languages force the programmer to add extra code to indicate which data and parts of the code are important to the state. This code tends to be both expensive in terms of performance, as well as difficult to read or debug. Explicit parallelism is one of the main reasons for the poor performance of
Enterprise Java Beans Jakarta Enterprise Beans (EJB; formerly Enterprise JavaBeans) is one of several Java APIs for modular construction of enterprise software. EJB is a server-side software component that encapsulates business logic of an application. An EJB web c ...
when building data-intensive, non-
OLTP In online transaction processing (OLTP), information systems typically facilitate and manage transaction-oriented applications. This is contrasted with online analytical processing. The term "transaction" can have two different meanings, both of w ...
applications. Where a sequential program can be imagined as a single worker moving between tasks (operations), a dataflow program is more like a series of workers on an
assembly line An assembly line is a manufacturing process (often called a ''progressive assembly'') in which parts (usually interchangeable parts) are added as the semi-finished assembly moves from workstation to workstation where the parts are added in se ...
, each doing a specific task whenever materials are available. Since the operations are only concerned with the availability of data inputs, they have no hidden state to track, and are all "ready" at the same time.


Representation

Dataflow programs are represented in different ways. A traditional program is usually represented as a series of text instructions, which is reasonable for describing a serial system which pipes data between small, single-purpose tools that receive, process, and return. Dataflow programs start with an input, perhaps the
command line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
parameters, and illustrate how that data is used and modified. The flow of data is explicit, often visually illustrated as a line or pipe. In terms of encoding, a dataflow program might be implemented as a
hash table In computing, a hash table, also known as hash map, is a data structure that implements an associative array or dictionary. It is an abstract data type that maps keys to values. A hash table uses a hash function to compute an ''index'', ...
, with uniquely identified inputs as the keys, used to look up pointers to the instructions. When any operation completes, the program scans down the list of operations until it finds the first operation where all inputs are currently valid, and runs it. When that operation finishes, it will typically output data, thereby making another operation become valid. For parallel operation, only the list needs to be shared; it is the state of the entire program. Thus the task of maintaining state is removed from the programmer and given to the language's runtime. On machines with a single processor core where an implementation designed for parallel operation would simply introduce overhead, this overhead can be removed completely by using a different runtime.


Incremental Updates

Some recent dataflow libraries such as Differential/ Timely Dataflow have used incremental computing for much more efficient data processing.


History

A pioneer dataflow language was BLODI (BLOck DIagram), developed by John Larry Kelly, Jr., Carol Lochbaum and Victor A. Vyssotsky for specifying
sampled data systems Sample or samples may refer to: Base meaning * Sample (statistics), a subset of a population – complete data set * Sample (signal), a digital discrete sample of a continuous analog signal * Sample (material), a specimen or small quantity of so ...
. A BLODI specification of functional units (amplifiers, adders, delay lines, etc.) and their interconnections was compiled into a single loop that updated the entire system for one clock tick. In a 1966 Ph.D. thesis, ''The On-line Graphical Specification of Computer Procedures'',
Bert Sutherland William Robert Sutherland (May 10, 1936 – February 18, 2020) was an American computer scientist who was the longtime manager of three prominent research laboratories, including Sun Microsystems Laboratories (1992–1998), the Systems Sci ...
created one of the first graphical dataflow programming frameworks in order to make parallel programming easier. Subsequent dataflow languages were often developed at the large supercomputer labs. POGOL, an otherwise conventional data-processing language developed at
NSA The National Security Agency (NSA) is a national-level intelligence agency of the United States Department of Defense, under the authority of the Director of National Intelligence (DNI). The NSA is responsible for global monitoring, collecti ...
, compiled large-scale applications composed of multiple file-to-file operations, e.g. merge, select, summarize, or transform, into efficient code that eliminated the creation of or writing to intermediate files to the greatest extent possible.
SISAL Sisal (, ) (''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The term sisal may ...
, a popular dataflow language developed at Lawrence Livermore National Laboratory, looks like most statement-driven languages, but variables should be assigned once. This allows the
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
to easily identify the inputs and outputs. A number of offshoots of SISAL have been developed, including SAC, ''Single Assignment C'', which tries to remain as close to the popular C programming language as possible. The United States Navy funded development of ACOS and SPGN (signal processing graph notation) starting in the early 1980s. This is in use on a number of platforms in the field today.Underwater Acoustic Data Processing, Y.T. Chan A more radical concept is Prograph, in which programs are constructed as graphs onscreen, and variables are replaced entirely with lines linking inputs to outputs. Incidentally, Prograph was originally written on the
Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and software en ...
, which remained single-processor until the introduction of the DayStar Genesis MP in 1996. There are many hardware architectures oriented toward the efficient implementation of dataflow programming models. MIT's tagged token dataflow architecture was designed by
Greg Papadopoulos Gregory Michael Papadopoulos (born 1958) is an American engineer, computer scientist, executive, and venture capitalist. He is the creator and lead proponent for Redshift, a theory on whether technology markets are over or under-served by Moore's ...
. Data flow has been proposed as an abstraction for specifying the global behavior of distributed system components: in the live distributed objects programming model, distributed data flows are used to store and communicate state, and as such, they play the role analogous to variables, fields, and parameters in Java-like programming languages.


Languages

Dataflow programming languages include: * Céu (programming language) * ASCET *
AviSynth AviSynth is a frameserver program for Microsoft Windows, Linux and macOS initially developed by Ben Rudiak-Gould, Edwin van Eggelen, Klaus Post, Richard Berg and Ian Brabham in May 2000 and later picked up and maintained by the open source communi ...
scripting language, for video processing * BMDFM Binary Modular Dataflow Machine * CAL *
Cuneiform Cuneiform is a logo-syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedge-sh ...
, a functional workflow language. *
CMS Pipelines {{Infobox programming language , name = Pipelines , logo = image:pipjarg1.jpeg , paradigm = Dataflow programming , year = 1986 , developer = IBM , designer = John P. Hartmann ( IBM) , latest_release_version = 1.1.12/0012 , latest_release_date = ...
* Hume *
Joule The joule ( , ; symbol: J) is the unit of energy in the International System of Units (SI). It is equal to the amount of work done when a force of 1 newton displaces a mass through a distance of 1 metre in the direction of the force applie ...
*
Keysight VEE Keysight VEE is a graphical dataflow programming software development environment from Keysight Technologies for automated test, measurement, data analysis and reporting. VEE originally stood for Visual Engineering Environment and developed by HP ...
*
KNIME KNIME (), the Konstanz Information Miner, is a free and open-source data analytics, reporting and integration platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks ...
is a free and open-source data analytics, reporting and integration platform *
LabVIEW Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a system-design platform and development environment for a visual programming language from National Instruments. The graphical language is named "G"; not to be confused with G-c ...
, G *
Linda Linda may refer to: As a name * Linda (given name), a female given name (including a list of people and fictional characters so named) * Linda (singer) (born 1977), stage name of Svetlana Geiman, a Russian singer * Anita Linda (born Alice Lake i ...
* Lucid * Lustre *
Max/MSP Max, also known as Max/MSP/Jitter, is a visual programming language for music and multimedia developed and maintained by San Francisco-based software company Cycling '74. Over its more than thirty-year history, it has been used by composers, per ...
*
Microsoft Visual Programming Language Microsoft Visual Programming Language, or VPL, is a visual programming and dataflow programming language developed by Microsoft for the Microsoft Robotics Studio. VPL is based on the event-driven and data-driven approach. The programming langua ...
- A component of
Microsoft Robotics Studio Microsoft Robotics Developer Studio (Microsoft RDS, MRDS) is a discontinued Windows-based environment for robot control and simulation that was aimed at academic, hobbyist, and commercial developers and handled a wide variety of robot hardware. I ...
designed for
robotics Robotics is an interdisciplinary branch of computer science and engineering. Robotics involves design, construction, operation, and use of robots. The goal of robotics is to design machines that can help and assist humans. Robotics integrate ...
programming *
Orange Orange most often refers to: *Orange (fruit), the fruit of the tree species '' Citrus'' × ''sinensis'' ** Orange blossom, its fragrant flower *Orange (colour), from the color of an orange, occurs between red and yellow in the visible spectrum * ...
- An open-source, visual programming tool for data mining, statistical data analysis, and
machine learning Machine learning (ML) is a field of inquiry devoted to understanding and building methods that 'learn', that is, methods that leverage data to improve performance on some set of tasks. It is seen as a part of artificial intelligence. Machine ...
. * Oz now also distributed since 1.4.0 * Pipeline Pilot * Prograph * Pure Data * Quartz Composer - Designed by
Apple An apple is an edible fruit produced by an apple tree (''Malus domestica''). Apple trees are cultivated worldwide and are the most widely grown species in the genus ''Malus''. The tree originated in Central Asia, where its wild ancestor, ' ...
; used for graphic animations and effects * SAC Single assignment C *
SIGNAL In signal processing, a signal is a function that conveys information about a phenomenon. Any quantity that can vary over space or time can be used as a signal to share messages between observers. The '' IEEE Transactions on Signal Processing' ...
(a dataflow-oriented synchronous language enabling multi-clock specifications) *
Simulink Simulink is a MATLAB-based graphical programming environment for modeling, simulating and analyzing multidomain dynamical systems. Its primary interface is a graphical block diagramming tool and a customizable set of block libraries. It offers t ...
*
SISAL Sisal (, ) (''Agave sisalana'') is a species of flowering plant native to southern Mexico, but widely cultivated and naturalized in many other countries. It yields a stiff fibre used in making rope and various other products. The term sisal may ...
*
SystemVerilog SystemVerilog, standardized as IEEE 1800, is a hardware description and hardware verification language used to model, design, simulate, test and implement electronic systems. SystemVerilog is based on Verilog and some extensions, and since ...
- A hardware description language *
Verilog Verilog, standardized as IEEE 1364, is a hardware description language (HDL) used to model electronic systems. It is most commonly used in the design and verification of digital circuits at the register-transfer level of abstraction. It is als ...
- A hardware description language absorbed into the SystemVerilog standard in 2009 * VisSim - A block diagram language for simulation of dynamic systems and automatic firmware generation *
VHDL The VHSIC Hardware Description Language (VHDL) is a hardware description language (HDL) that can model the behavior and structure of digital systems at multiple levels of abstraction, ranging from the system level down to that of logic gate ...
- A hardware description language * XEE (Starlight) XML engineering environment *
XProc XProc is a W3C Recommendation to define an XML transformation language to define XML Pipelines. Below is an example abbreviated XProc file: This is a pipeline that consists of two atomic steps, XInclude and V ...


Libraries

*
Apache Beam Apache Beam is an open source unified programming model to define and execute data processing pipelines, including ETL, batch and stream (continuous) processing. Beam Pipelines are defined using one of the provided SDKs and executed in one of ...
: Java/Scala SDK that unifies streaming (and batch) processing with several execution engines supported (Apache Spark, Apache Flink, Google Dataflow etc.) * Apache Flink: Java/Scala library that allows streaming (and batch) computations to be run atop a distributed Hadoop (or other) cluster * Apache Spark * SystemC: Library for C++, mainly aimed at hardware design. *
TensorFlow TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks. "It is machine learnin ...
: A machine-learning library based on dataflow programming.


See also

* Actor model *
Data-driven programming In computer programming, data-driven programming is a programming paradigm in which the program statements describe the data to be matched and the processing required rather than defining a sequence of steps to be taken. Standard examples of dat ...
* Digital signal processing *
Event-driven programming In computer programming, event-driven programming is a programming paradigm in which the flow of the program is determined by events such as user actions (mouse clicks, key presses), sensor outputs, or message passing from other programs or thr ...
*
Flow-based programming In computer programming, flow-based programming (FBP) is a programming paradigm that defines applications as networks of "black box" processes, which exchange data across predefined connections by message passing, where the connections are speci ...
*
Functional reactive programming Functional reactive programming (FRP) is a programming paradigm for reactive programming ( asynchronous dataflow programming) using the building blocks of functional programming (e.g. map, reduce, filter). FRP has been used for programming graphi ...
* Glossary of reconfigurable computing * High-performance reconfigurable computing * Incremental computing *
Parallel programming model In computing, a parallel programming model is an abstraction of parallel computer architecture, with which it is convenient to express algorithms and their composition in programs. The value of a programming model can be judged on its ''generality ...
*
Partitioned global address space In computer science, partitioned global address space (PGAS) is a parallel programming model paradigm. PGAS is typified by communication operations involving a global memory address space abstraction that is logically partitioned, where a portio ...
*
Pipeline (Unix) In Unix-like computer operating systems, a pipeline is a mechanism for inter-process communication using message passing. A pipeline is a set of processes chained together by their standard streams, so that the output text of each process ('' std ...
*
Quantum circuit In quantum information theory, a quantum circuit is a model for quantum computation, similar to classical circuits, in which a computation is a sequence of quantum gates, measurements, initializations of qubits to known values, and possibly o ...
* Signal programming *
Stream processing In computer science, stream processing (also known as event stream processing, data stream processing, or distributed stream processing) is a programming paradigm which views data streams, or sequences of events in time, as the central input and ou ...
*
Yahoo Pipes Yahoo! Pipes was a web application from Yahoo! that provided a graphical user interface for building data mashups that aggregate web feeds, web pages, and other services; creating Web-based apps from various sources; and publishing those apps. ...


References


External links


Book: Dataflow and Reactive Programming SystemsBasics of Dataflow Programming in F# and C#

Dataflow Programming - Concept, Languages and Applications

Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing

Handling huge loads without adding complexity
The basic concepts of dataflow programming, Dr. Dobb's, Sept. 2011 {{Types of programming languages Concurrent programming languages Programming paradigms