Data Format Description Language
   HOME

TheInfoList



OR:

Data Format Description Language (DFDL, often pronounced ''daff-o-dil''), published as an
Open Grid Forum The Open Grid Forum (OGF) is a community of users, developers, and vendors for standardization of grid computing. It was formed in 2006 in a merger of the Global Grid Forum and the Enterprise Grid Alliance. The OGF models its process on the In ...
Recommendation in February 2021, is a modeling language for describing general text and
binary data Binary data is data whose unit can take on only two possible states. These are often labelled as 0 and 1 in accordance with the binary numeral system and Boolean algebra. Binary data occurs in many different technical and scientific fields, wher ...
in a standard way. A DFDL model or schema allows any text or binary data to be read (or "parsed") from its native format and to be presented as an instance of an ''information set''. (An information set is a logical representation of the data contents, independent of the physical format. For example, two records could be in different formats, because one has fixed-length fields and the other uses delimiters, but they could contain exactly the same data, and would both be represented by the same information set). The same DFDL schema also allows data to be taken from an instance of an information set and written out (or "serialized") to its native format. DFDL is ''
descriptive In the study of language, description or descriptive linguistics is the work of objectively analyzing and describing how language is actually used (or how it was used in the past) by a speech community. François & Ponsonnet (2013). All acad ...
'' and not ''
prescriptive Linguistic prescription, or prescriptive grammar, is the establishment of rules defining preferred usage of language. These rules may address such linguistic aspects as spelling, pronunciation, vocabulary, syntax, and semantics. Sometimes infor ...
''. DFDL is not a data format, nor does it impose the use of any particular data format. Instead it provides a standard way of describing many different kinds of data formats. This approach has several advantages. It allows an application author to design an appropriate data representation according to their requirements while describing it in a standard way which can be shared, enabling multiple programs to directly interchange the data. DFDL achieves this by building upon the facilities of W3C XML Schema 1.0. A subset of XML Schema is used, enough to enable the modeling of non-XML data. The motivations for this approach are to avoid inventing a completely new schema language, and to make it easy to convert general text and binary data, via a DFDL information set, into a corresponding XML document. Educational material is available in the form of DFDL Tutorials, videos and several hands-on DFDL labs.


History

DFDL was created in response to a need for grid APIs to be able to understand data regardless of source. A language was needed capable of modeling a wide variety of existing text and binary data formats.
working group
was established at the Global Grid Forum (which later became the
Open Grid Forum The Open Grid Forum (OGF) is a community of users, developers, and vendors for standardization of grid computing. It was formed in 2006 in a merger of the Global Grid Forum and the Enterprise Grid Alliance. The OGF models its process on the In ...
) in 2003 to create a specification for such a language. A decision was made early on to base the language on a subset of
W3C XML Schema XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item cont ...
, using annotations to carry the extra information necessary to describe non-XML physical representations. This is an established approach that is already being used today in commercial systems. DFDL takes this approach and evolves it into an open standard capable of describing many text or binary data formats. Work continued on the language, resulting in the publication of a DFDL 1.0 specification as OGF Proposed Recommendation GFD.174 in January 2011. The official OGF Recommendation i
GFD.240
published in February 2021 which obsoletes all prior versions and incorporates all issues noted to date (also available a

.
summary
of DFDL and its features is available at the OGF. Any issues with the specification are being tracked using GitHu
issue trackers


Implementations

Implementations of DFDL processors that can parse and serialize data using DFDL schemas are available. * IBM has a production-ready DFDL 1.0 streaming parser, modeler and visual tester. This is available in several IBM products including
IBM App Connect Enterprise IBM App Connect Enterprise (abbreviated as IBM ACE, formerly known as IBM Integration Bus or WebSphere Message Broker) is IBM's premier integration software offering, allowing business information to flow between disparate applications across mu ...
(formerly known a
IBM Integration Bus
.
free developer edition
is available.
Apache Daffodil
is an open-source DFDL processor having both parser and unparser, as well as integrations into
Apache NiFi Apache NiFi is a software project from the Apache Software Foundation designed to automate the flow of data between software systems. Leveraging the concept of extract, transform, load (ETL), it is based on the "''NiagaraFiles''" software previo ...
, and th
XML Calabash XProc pipeline engine
It continues to be under active development. * European Space Agency projec
S2G Data Viewer
includes a parser DFDL4SDFDL4S
/ref> that implements a subset of the DFDL 1.0 specification. A public repository for DFDL schemas that describe commercial and scientific data formats has been established o
GitHub
DFDL schemas for formats like UN/EDIFACT, NACHA, MIL-STD-2045, NITF, and ISO8583 are available for free download.


Example

Take as an example the following text data stream which gives the name, age and location of a person: The logical model for this data can be described by the following fragment of an XML Schema document. The order, names, types and cardinality of the fields are expressed by the XML schema model. To additionally model the physical representation of the data stream, DFDL augments the XML schema fragment with annotations on the xs:element and xs:sequence objects, as follows: The property attributes on these DFDL annotations express that the data are represented in an ASCII text format with fields being of variable length and delimited by commas An alternative, more compact syntax is also provided, where DFDL properties are carried as non-native attributes on the XML Schema objects themselves.


Features

The goal of DFDL is to provide a rich modeling language capable of representing any text or binary data format. The 1.0 release is a major step towards this goal. The capability includes support for: * Text data types such as strings, numbers, zoned decimals, calendars and Booleans * Binary data types such as two's complement integers, BCD, packed decimals, floats, calendars and Booleans * Fixed length data and data delimited by text or binary markup * Language data structures found in languages like
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily us ...
, C and PL/1 * Industry standards such as CSV, SWIFT, FIX,
HL7 Health Level Seven or HL7 refers to a set of international standards for transfer of clinical and administrative data between software applications used by various healthcare providers. These standards focus on the application layer, which is "la ...
, X12, HIPAA,
EDIFACT United Nations/Electronic Data Interchange for Administration, Commerce and Transport (UN/EDIFACT) is an international standard for electronic data interchange (EDI) developed for the United Nations and approved and published by UNECE, the UN Econ ...
,
ISO 8583 ISO 8583 is an international standard for ''financial transaction card originated'' interchange messaging. It is the International Organization for Standardization standard for systems that exchange electronic transactions initiated by cardholde ...
* Any encoding and endian-ness * Bit data of arbitrary length * Pattern languages for text numbers and calendars * Ordered, unordered and floating content * Default values on parsing and serializing * Nil values capability for handling out-of-band data * Fixed and variable arrays *
XPath XPath (XML Path Language) is an expression language designed to support the query or transformation of XML documents. It was defined by the World Wide Web Consortium (W3C) and can be used to compute values (e.g., strings, numbers, or Boolean v ...
2.0 expression language including variables to model dynamic data * Speculative parsing and other mechanisms to resolve choices and optionality * Validation to XML Schema 1.0 rules * A scoping mechanism that allows common property values to be applied at multiple annotation points * Hiding elements in the data from the information set * Calculating element values for the information set


See also

*
Open Grid Forum The Open Grid Forum (OGF) is a community of users, developers, and vendors for standardization of grid computing. It was formed in 2006 in a merger of the Global Grid Forum and the Enterprise Grid Alliance. The OGF models its process on the In ...
*
W3C XML Schema XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item cont ...


References

{{Reflist


External links


Open Grid ForumOGF DFDL home pageOGF DFDL 1.0 specification GFD.240 (pdf)W3C XML Schema 1.0DFDL Working Group documents
including videos
DFDLSchemas on GitHubXML Calabash
Data modeling languages Grid computing