Cuneiform is an
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
workflow language
for large-scale scientific data analysis.
It is a
statically typed
In computer programming, a type system is a logical system comprising a set of rules that assigns a property called a ''type'' (for example, integer, floating point, string) to every '' term'' (a word, phrase, or other set of symbols). Usu ...
functional programming language
In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that map ...
promoting
parallel computing
Parallel computing is a type of computing, computation in which many calculations or Process (computing), processes are carried out simultaneously. Large problems can often be divided into smaller ones, which can then be solved at the same time. ...
. It features a versatile
foreign function interface
A foreign function interface (FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written or compiled in another one. An FFI is often used in contexts where calls are made into a bin ...
allowing users to integrate software from many external programming languages. At the organizational level Cuneiform provides facilities like
conditional branching
In computer science, conditionals (that is, conditional statements, conditional expressions and conditional constructs) are programming language constructs that perform different computations or actions or return different values depending on t ...
and
general recursion making it
Turing-complete
In computability theory, a system of data-manipulation rules (such as a model of computation, a computer's instruction set, a programming language, or a cellular automaton) is said to be Turing-complete or computationally universal if it can be ...
. In this, Cuneiform is the attempt to close the gap between scientific workflow systems like
Taverna
A taverna (; ) is a small Greek restaurant that serves Greek cuisine. The taverna is an integral part of Greek culture and has become familiar to people from other countries who visit Greece, as well as through the establishment of tavernes ...
,
KNIME
KNIME (), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" con ...
, or
Galaxy
A galaxy is a Physical system, system of stars, stellar remnants, interstellar medium, interstellar gas, cosmic dust, dust, and dark matter bound together by gravity. The word is derived from the Ancient Greek, Greek ' (), literally 'milky', ...
and large-scale data analysis programming models like
MapReduce
MapReduce is a programming model and an associated implementation for processing and generating big data sets with a parallel and distributed algorithm on a cluster.
A MapReduce program is composed of a ''map'' procedure, which performs filte ...
or
Pig Latin while offering the generality of a functional programming language.
Cuneiform is implemented in distributed
Erlang. If run in distributed mode it drives a
POSIX
The Portable Operating System Interface (POSIX; ) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines application programming interfaces (APIs), along with comm ...
-compliant distributed file system like
Gluster
Gluster Inc. (formerly known as Z RESEARCH) was a software company that provided an open source platform for scale-out public and private cloud storage. The company was privately funded and headquartered in Sunnyvale, California, with an engineer ...
or
Ceph (or a
FUSE integration of some other file system, e.g.,
HDFS). Alternatively, Cuneiform scripts can be executed on top of
HTCondor
HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks.
It can be used to manage workload on a dedicated cluster of computers, or to farm out wor ...
or
Hadoop
Apache Hadoop () is a collection of Open-source software, open-source software utilities for reliable, scalable, distributed computing. It provides a software framework for Clustered file system, distributed storage and processing of big data usin ...
.
Cuneiform is influenced by the work of Peter Kelly who proposes functional programming as a model for scientific workflow execution.
In this, Cuneiform is distinct from related workflow languages based on
dataflow programming
In computer programming, dataflow programming is a programming paradigm that models a program as a directed graph of the data flowing between operations, thus implementing dataflow principles and architecture. Dataflow programming languages share ...
like
Swift
Swift or SWIFT most commonly refers to:
* SWIFT, an international organization facilitating transactions between banks
** SWIFT code
* Swift (programming language)
* Swift (bird), a family of birds
It may also refer to:
Organizations
* SWIF ...
.
External software integration
External tools and libraries (e.g.,
R or
Python libraries) are integrated via a
foreign function interface
A foreign function interface (FFI) is a mechanism by which a program written in one programming language can call routines or make use of services written or compiled in another one. An FFI is often used in contexts where calls are made into a bin ...
. In this it resembles, e.g.,
KNIME
KNIME (), the Konstanz Information Miner, is a data analytics, reporting and integrating platform. KNIME integrates various components for machine learning and data mining through its modular data pipelining "Building Blocks of Analytics" con ...
which allows the use of external software through snippet nodes, or
Taverna
A taverna (; ) is a small Greek restaurant that serves Greek cuisine. The taverna is an integral part of Greek culture and has become familiar to people from other countries who visit Greece, as well as through the establishment of tavernes ...
which offers
BeanShell services for integrating
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
software. By defining a task in a foreign language it is possible to use the API of an external tool or library. This way, tools can be integrated directly without the need of writing a wrapper or reimplementing the tool.
Currently supported foreign programming languages are:
*
Bash
*
Elixir
An elixir is a sweet liquid used for medical purposes, to be taken orally and intended to cure one's illness. When used as a dosage form, pharmaceutical preparation, an elixir contains at least one active ingredient designed to be taken orall ...
*
Erlang
*
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
*
JavaScript
JavaScript (), often abbreviated as JS, is a programming language and core technology of the World Wide Web, alongside HTML and CSS. Ninety-nine percent of websites use JavaScript on the client side for webpage behavior.
Web browsers have ...
*
MATLAB
MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
*
GNU Octave
GNU Octave is a scientific programming language for scientific computing and numerical computation. Octave helps in solving linear and nonlinear problems numerically, and for performing other numerical experiments using a language that is mostly ...
*
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
*
Python
*
R
*
Racket
Foreign language support for
AWK and
gnuplot are planned additions.
Type system
Cuneiform provides a simple, statically checked type system.
[
] While Cuneiform provides lists as
compound data types it omits traditional list accessors (head and tail) to avoid the possibility of runtime errors which might arise when accessing the empty list. Instead lists are accessed in an all-or-nothing fashion by only mapping or folding over them. Additionally, Cuneiform omits (at the organizational level) arithmetics which excludes the possibility of division by zero. The omission of any partially defined operation allows to guarantee that runtime errors can arise exclusively in foreign code.
Base data types
As base data types Cuneiform provides Booleans, strings, and files. Herein, files are used to exchange data in arbitrary format between foreign functions.
Records and pattern matching
Cuneiform provides
records (structs) as compound data types. The example below shows the definition of a variable
r
being a record with two fields
a1
and
a2
, the first being a string and the second being a Boolean.
let r : =
;
Records can be accessed either via projection or via
pattern matching
In computer science, pattern matching is the act of checking a given sequence of tokens for the presence of the constituents of some pattern. In contrast to pattern recognition, the match usually must be exact: "either it will or will not be a ...
. The example below extracts the two fields
a1
and
a2
from the record
r
.
let a1 : Str = ( r, a1 );
let = r;
Lists and list processing
Furthermore, Cuneiform provides lists as compound data types. The example below shows the definition of a variable
xs
being a file list with three elements.
let xs : ile=
a.txt', 'b.txt', 'c.txt' : File
Lists can be processed with the for and fold operators. Herein, the for operator can be given multiple lists to consume list element-wise (similar to
for/list
in
Racket,
mapcar
in
Common Lisp
Common Lisp (CL) is a dialect of the Lisp programming language, published in American National Standards Institute (ANSI) standard document ''ANSI INCITS 226-1994 (S2018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperli ...
or
zipwith
in
Erlang).
The example below shows how to map over a single list, the result being a file list.
for x <- xs do
process-one( arg1 = x )
: File
end;
The example below shows how to zip two lists the result also being a file list.
for x <- xs, y <- ys do
process-two( arg1 = x, arg2 = y )
: File
end;
Finally, lists can be aggregated by using the fold operator. The following example sums up the elements of a list.
fold acc = 0, x <- xs do
add( a = acc, b = x )
end;
Parallel execution
Cuneiform is a purely functional language, i.e., it does not support
mutable references. In the consequence, it can use subterm-independence to divide a program into parallelizable portions. The Cuneiform scheduler distributes these portions to worker nodes. In addition, Cuneiform uses a
Call-by-Name evaluation strategy to compute values only if they contribute to the computation result. Finally, foreign function applications are
memoized to speed up computations that contain previously derived results.
For example, the following Cuneiform program allows the applications of
f
and
g
to run in parallel while
h
is dependent and can be started only when both
f
and
g
are finished.
The following Cuneiform program creates three parallel applications of the function
f
by mapping
f
over a three-element list:
Similarly, the applications of
f
and
g
are independent in the construction of the record
r
and can, thus, be run in parallel:
Examples
A hello-world script:
def greet( person : Str ) ->
in Bash **
( greet( person = "world" ), out );
This script defines a task
greet
in
Bash which prepends
"Hello "
to its string argument
person
.
The function produces a record with a single string field
out
.
Applying
greet
, binding the argument
person
to the string
"world"
produces the record
. Projecting this record to its field
out
evaluates the string
"Hello world"
.
Command line tools can be integrated by defining a task in
Bash:
def samtoolsSort( bam : File ) ->
in Bash **
In this example a task
samtoolsSort
is defined.
It calls the tool
SAMtools, consuming an input file, in BAM format, and producing a sorted output file, also in BAM format.
Release history
In April 2016, Cuneiform's implementation language switched from
Java
Java is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea (a part of Pacific Ocean) to the north. With a population of 156.9 million people (including Madura) in mid 2024, proje ...
to
Erlang and, in February 2018, its major distributed execution platform changed from a Hadoop to distributed Erlang. Additionally, from 2015 to 2018
HTCondor
HTCondor is an open-source high-throughput computing software framework for coarse-grained distributed parallelization of computationally intensive tasks.
It can be used to manage workload on a dedicated cluster of computers, or to farm out wor ...
had been maintained as an alternative execution platform.
Cuneiform's surface syntax was revised twice, as reflected in the major version number.
Version 1
In its first draft published in May 2014, Cuneiform was closely related to
Make in that it constructed a static
data dependency
A data dependency in computer science is a situation in which a program statement (instruction) refers to the data of a preceding statement. In compiler theory, the technique used to discover data dependencies among statements (or instructions) i ...
graph which the interpreter traversed during execution. The major difference to later versions was the lack of conditionals, recursion, or static type checking. Files were distinguished from strings by juxtaposing single-quoted string values with a tilde
~
. The script's query expression was introduced with the
target
keyword. Bash was the default foreign language.
Function application
In mathematics, function application is the act of applying a function to an argument from its domain so as to obtain the corresponding value from its range. In this sense, function application can be thought of as the opposite of function abs ...
had to be performed using an
apply
form that took
task
as its first keyword argument. One year later, this surface syntax was replaced by a streamlined but similar version.
The following example script downloads a reference genome from an FTP server.
declare download-ref-genome;
deftask download-fa( fa : ~path ~id ) **
ref-genome-path = ~'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes';
ref-genome-id = ~'chr22';
ref-genome = apply(
task : download-fa
path : ref-genome-path
id : ref-genome-id
);
target ref-genome;
Version 2

The second draft of the Cuneiform surface syntax, first published in March 2015, remained in use for three years outlasting the transition from Java to Erlang as Cuneiform's implementation language. Evaluation differs from earlier approaches in that the interpreter reduces a query expression instead of traversing a static graph. During the time the surface syntax remained in use the interpreter was formalized and simplified which resulted in a first specification of Cuneiform's semantics. The syntax featured conditionals. However, Booleans were encoded as lists, recycling the empty list as Boolean false and the non-empty list as Boolean true. Recursion was added later as a byproduct of formalization. However, static type checking was introduced only in Version 3.
The following script decompresses a zipped file and splits it into evenly sized partitions.
deftask unzip( : zip( File ) ) in bash **
deftask split( : file( File ) ) in bash **
sotu = "sotu/stateoftheunion1790-2014.txt.zip";
fileLst = split( file: unzip( zip: sotu ) );
fileLst;
Version 3
The current version of Cuneiform's surface syntax, in comparison to earlier drafts, is an attempt to close the gap to mainstream functional programming languages. It features a simple, statically checked type system and introduces records in addition to lists as a second type of compound
data structure
In computer science, a data structure is a data organization and storage format that is usually chosen for Efficiency, efficient Data access, access to data. More precisely, a data structure is a collection of data values, the relationships amo ...
. Booleans are a separate base data type.
The following script untars a file resulting in a file list.
def untar( tar : File ) -> ile
in Bash **
let hg38Tar : File =
'hg38/hg38.tar';
let ile =
untar( tar = hg38Tar );
faLst;
References
{{Reflist, 30em
Programming languages
Workflow languages
Functional languages
Scripting languages
Linux programming tools
Hadoop
Statically typed programming languages
Cross-platform free software