HOME

TheInfoList



OR:

Hierarchical Data Format (HDF) is a set of
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file format ...
s (HDF4, HDF5) designed to store and organize large amounts of data. Originally developed at the U.S.
National Center for Supercomputing Applications The National Center for Supercomputing Applications (NCSA) is a state-federal partnership to develop and deploy national-scale computer infrastructure that advances research, science and engineering based in the United States. NCSA operates as a ...
, it is supported by The HDF Group, a non-profit corporation whose mission is to ensure continued development of HDF5 technologies and the continued accessibility of data stored in HDF. In keeping with this goal, the HDF libraries and associated tools are available under a liberal, BSD-like license for general use. HDF is supported by many commercial and non-commercial software platforms and programming languages. The freely available HDF distribution consists of the library, command-line utilities, test suite source, Java interface, and the Java-based HDF Viewer (HDFView). The current version, HDF5, differs significantly in design and API from the major legacy version HDF4.


Early history

The quest for a portable scientific data format, originally dubbed AEHOO (All Encompassing Hierarchical Object Oriented format) began in 1987 by the Graphics Foundations Task Force (GFTF) at the National Center for Supercomputing Applications (NCSA). NSF grants received in 1990 and 1992 were important to the project. Around this time
NASA The National Aeronautics and Space Administration (NASA ) is an independent agency of the US federal government responsible for the civil space program, aeronautics research, and space research. NASA was established in 1958, succeedin ...
investigated 15 different file formats for use in the
Earth Observing System The Earth Observing System (EOS) is a program of NASA comprising a series of artificial satellite missions and scientific instruments in Earth orbit designed for long-term global observations of the land surface, biosphere, atmosphere, and oceans ...
(EOS) project. After a two-year review process, HDF was selected as the standard data and information system.


HDF4

HDF4 is the older version of the format, although still actively supported by The HDF Group. It supports a proliferation of different data models, including multidimensional arrays,
raster images upright=1, The Smiley, smiley face in the top left corner is a raster image. When enlarged, individual pixels appear as squares. Enlarging further, each pixel can be analyzed, with their colors constructed through combination of the values for ...
, and tables. Each defines a specific aggregate data type and provides an API for reading, writing, and organizing the data and metadata. New data models can be added by the HDF developers or users. HDF is self-describing, allowing an application to interpret the structure and contents of a file with no outside information. One HDF file can hold a mix of related objects which can be accessed as a group or as individual objects. Users can create their own grouping structures called "vgroups." The HDF4 format has many limitations. It lacks a clear object model, which makes continued support and improvement difficult. Supporting many different interface styles (images, tables, arrays) leads to a complex API. Support for metadata depends on which interface is in use; ''SD'' (Scientific Dataset) objects support arbitrary named attributes, while other types only support predefined metadata. Perhaps most importantly, the use of 32-bit signed integers for addressing limits HDF4 files to a maximum of 2 GB, which is unacceptable in many modern scientific applications.


HDF5

The HDF5 format is designed to address some of the limitations of the HDF4 library, and to address current and anticipated requirements of modern systems and applications. In 2002 it won an
R&D 100 Award Research and development (R&D or R+D), known in Europe as research and technological development (RTD), is the set of innovative activities undertaken by corporations or governments in developing new services or products, and improving existi ...
. HDF5 simplifies the file structure to include only two major types of object: *Datasets, which are typed multidimensional arrays *Groups, which are container structures that can hold datasets and other groups This results in a truly hierarchical, filesystem-like data format. In fact, resources in an HDF5 file can be accessed using the
POSIX The Portable Operating System Interface (POSIX) is a family of standards specified by the IEEE Computer Society for maintaining compatibility between operating systems. POSIX defines both the system- and user-level application programming in ...
-like syntax ''/path/to/resource''. Metadata is stored in the form of user-defined, named attributes attached to groups and datasets. More complex storage APIs representing images and tables can then be built up using datasets, groups and attributes. In addition to these advances in the file format, HDF5 includes an improved type system, and dataspace objects which represent selections over dataset regions. The API is also object-oriented with respect to datasets, groups, attributes, types, dataspaces and property lists. The latest version of
NetCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidat ...
, version 4, is based on HDF5. Because it uses B-trees to index table objects, HDF5 works well for
time series In mathematics, a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. Ex ...
data such as stock price series, network monitoring data, and 3D meteorological data. The bulk of the data goes into straightforward arrays (the table objects) that can be accessed much more quickly than the rows of an SQL database, but B-tree access is available for non-array data. The HDF5 data storage mechanism can be simpler and faster than an SQL star schema.


Feedback

Criticism of HDF5 follows from its monolithic design and lengthy specification. *HDF5 does not enforce the use of
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
, so client applications may be expecting ASCII in most places. *Dataset data cannot be freed in a file without generating a file copy using an external tool (h5repack).


Interfaces


Officially supported APIs

* C *
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
*
CLI CLI may refer to: Computing * Call Level Interface, an SQL database management API * Command-line interface, of a computer program * Command-line interpreter or command language interpreter; see List of command-line interpreters * CLI (x86 instr ...
- .Net * Fortran, Fortran 90 * HDF5 Lite (H5LT) – a light-weight interface for C * HDF5 Image (H5IM) – a C interface for images or rasters * HDF5 Table (H5TB) – a C interface for tables * HDF5 Packet Table (H5PT) – interfaces for C and
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
to handle "packet" data, accessed at high-speeds * HDF5 Dimension Scale (H5DS) – allows dimension scales to be added to HDF5 *
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...


Third-party bindings

*
CGNS CGNS stands for CFD General Notation System. It is a general, portable, and extensible standard for the storage and retrieval of CFD analysis data. It consists of a collection of conventions, and free and open software implementing those convent ...
uses HDF5 as main storage *
Common Lisp Common Lisp (CL) is a dialect of the Lisp programming language, published in ANSI standard document ''ANSI INCITS 226-1994 (S20018)'' (formerly ''X3.226-1994 (R1999)''). The Common Lisp HyperSpec, a hyperlinked HTML version, has been derived fr ...
librar
hdf5-cffi
* D offer
bindings to the C API
with a high-level h5py style D wrapper under development *
Dymola Dymola is a commercial modeling and simulation environment based on the open Modelica modeling language. Large and complex systems are composed of component models; mathematical equations describe the dynamic behavior of the system. Developed b ...
introduced support for HDF5 export using an implementation called SDF (Scientific Data Format) with release Dymola 2016 FD01 * Erlang,
Elixir ELIXIR (the European life-sciences Infrastructure for biological Information) is an initiative that will allow life science laboratories across Europe to share and store their research data as part of an organised network. Its goal is to bring t ...
, and LFE may use th
bindings for BEAM languages
*
GNU Data Language The GNU Data Language (GDL) is a free alternative to IDL (Interactive Data Language), achieving full compatibility with IDL 7 and partial compatibility with IDL 8.http://aramis.obspm.fr/~coulais/IDL_et_GDL/Adass2011/O11_ADASS2011_GDL_Coulais.pdf ...
* Go
gonum

hdf5
package.
HDFql
enables users to manage HDF5 files through a high-level language (similar to SQL) in C, C++, Java, Python, C#, Fortran and R. *
Huygens Software Huygens software refers to different multiplatform microscope image processing packages from Scientific Volume Imaging, made for restoring 2D and 3D microscopy images or time series and analyzing and visualizing them. The restoration is based ...
uses HDF5 as primary storage format since version 3.5 * IDL *
IGOR Pro IGOR Pro is a scientific data analysis software, numerical computing environment and programming language that runs on Windows or Mac operating systems. It is developed by WaveMetrics Inc., and was originally aimed at time series analysis, but h ...
offer
full support of HDF5
files. * JHDF5, an alternative
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
binding that takes a different approach from the official HDF5 Java binding which some users find simpler
jHDF
A pure
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mo ...
implementation providing read-only access to HDF5 files *
JSON JSON (JavaScript Object Notation, pronounced ; also ) is an open standard file format and data interchange format that uses human-readable text to store and transmit data objects consisting of attribute–value pairs and arrays (or other se ...
throug
hdf5-json
*
Julia Julia is usually a feminine given name. It is a Latinate feminine form of the name Julio and Julius. (For further details on etymology, see the Wiktionary entry "Julius".) The given name ''Julia'' had been in use throughout Late Antiquity (e.g ...
provides HDF5 support through th
HDF5
package. *
LabVIEW Laboratory Virtual Instrument Engineering Workbench (LabVIEW) is a system-design platform and development environment for a visual programming language from National Instruments. The graphical language is named "G"; not to be confused with G-c ...
can gain HDF support through third-party libraries, such a
h5labview
an
lvhdf5
* Lua through th
lua-hdf5
library. *
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementat ...
,
Scilab Scilab is a free and open-source, cross-platform numerical computational package and a high-level, numerically oriented programming language. It can be used for signal processing, statistical analysis, image enhancement, fluid dynamics simula ...
or
Octave In music, an octave ( la, octavus: eighth) or perfect octave (sometimes called the diapason) is the interval between one musical pitch and another with double its frequency. The octave relationship is a natural phenomenon that has been refer ...
– use HDF5 as primary storage format in recent releases *
Mathematica Wolfram Mathematica is a software system with built-in libraries for several areas of technical computing that allow machine learning, statistics, symbolic computation, data manipulation, network analysis, time series analysis, NLP, optimiza ...
offers immediate analysis of HDF and HDF5 data *
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
PDL::IO::HDF5
/ref> * Python supports HDF5 vi
h5py
(both high- and low-level access to HDF5 abstractions) and vi

(a high-level interface with advanced indexing and database-like query capabilities). HDF4 is available vi
Python-HDF4
and/o
PyHDF
for both Python 2 and Python 3. The popular data manipulation package pandas can import from and export to HDF5 via . * R offers support in th
rhdf5
an
hdf5r
packages. *
Rust Rust is an iron oxide, a usually reddish-brown oxide formed by the reaction of iron and oxygen in the catalytic presence of water or air moisture. Rust consists of hydrous iron(III) oxides (Fe2O3·nH2O) and iron(III) oxide-hydroxide (FeO( ...
can gain HDF support through third-party libraries lik
hdf5


Tools


Apache Spark HDF5 Connector
HDF5 Connector for Apache Spark
Apache Drill HDF5 Plugin
HDF5 Plugin for Apache Drill enables SQL Queries over HDF5 Files.
HDF Product Designer
Interoperable HDF5 data product creation GUI tool
HDF Explorer
A data visualization program that reads the HDF, HDF5 and netCDF data file formats
HDFView
A browser and editor for HDF files
ViTables
A browser and editor for HDF5 and PyTables files written in Python
Panoply
A netCDF, HDF and GRIB Data Viewer


See also

* Common Data Format (CDF) *
FITS Flexible Image Transport System (FITS) is an open standard defining a digital file format useful for storage, transmission and processing of data: formatted as multi-dimensional arrays (for example a 2D image), or tables. FITS is the most commo ...
, a data format used in astronomy * GRIB (GRIdded Binary), a data format used in meteorology * HDF Explorer *
NetCDF NetCDF (Network Common Data Form) is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data. The project homepage is hosted by the Unidat ...
, The Netcdf Java library reads HDF5, HDF4, HDF-EOS and other formats using pure Java *
Protocol Buffers Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs to communicate with each other over a network or for storing data. The method involves an i ...
- Google's data interchange format


References


External links

*{{Official website
What is HDF5?HDF-EOS Tools and Information CenterOpen Navigation Surface
C (programming language) libraries Computer file formats Earth sciences data formats Meteorological data and networks