Data Interchange Format
   HOME

TheInfoList



OR:

Data Interchange Format (.dif) is a
text file A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operat ...
format Format may refer to: Printing and visual media * Text formatting, the typesetting of text elements * Paper formats, or paper size standards * Newspaper format, the size of the paper page Computing * File format, particular way that informatio ...
used to import/export single
spreadsheet A spreadsheet is a computer application for computation, organization, analysis and storage of data in tabular form. Spreadsheets were developed as computerized analogs of paper accounting worksheets. The program operates on data entered in ...
s between spreadsheet programs. Applications that still support the DIF format are
Collabora Online Collabora Online is an open source online office suite that can be integrated with any web application, it is developed by Collabora Productivity, a division of Collabora. Collabora Online has LibreOffice at its core and allows for collaborativ ...
,
Excel ExCeL London (an abbreviation for Exhibition Centre London) is an exhibition centre, international convention centre and former hospital in the Custom House area of Newham, East London. It is situated on a site on the northern quay of the ...
, Microsoft Excel's implementation caused interoperability problems, see § Discrepancies in implementations.
Gnumeric Gnumeric is a spreadsheet program that is part of the GNOME Free Software Desktop Project. Gnumeric version 1.0 was released on 31 December 2001. Gnumeric is distributed as free software under the GNU General Public License; it is intended to r ...
, and
LibreOffice Calc LibreOffice Calc is the spreadsheet component of the LibreOffice software package. After forking from OpenOffice.org in 2010, LibreOffice Calc underwent a massive re-work of external reference handling to fix many defects in formula calculation ...
. Historical applications that used to support it until they became end of life or no longer acknowledge support of the format are
dBase dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day. The dBase system includes the core database engine, a query system, a forms engine, and a programming language ...
, FileMaker,
Framework A framework is a generic term commonly referring to an essential supporting structure which other things are built on top of. Framework may refer to: Computing * Application framework, used to implement the structure of an application for an op ...
,
Lotus 1-2-3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles i ...
,
Multiplan Multiplan is spreadsheet program developed by Microsoft and introduced in 1982 as a competitor to VisiCalc. Multiplan was released first for computers running CP/M; it was developed using a Microsoft proprietary p-code C compiler as part of ...
,
OpenOffice.org Calc OpenOffice.org (OOo), commonly known as OpenOffice, is a discontinued open-source office suite. Active successor projects include LibreOffice (the most actively developed), Apache OpenOffice, Collabora Online (enterprise ready LibreOffice) a ...
and StarCalc. A limitation with DIF format is that it cannot handle multiple spreadsheets in a single workbook. Due to the similarity in abbreviation and in age (both date to the early 1980s), the DIF spreadsheet format it is often confused with Navy DIF; Navy DIF, however, is an unrelated "document interchange format" for word processors. "Among the file formats designed to facilitate the interchange of text files between microcomputers running different word processing software, IBM's Document Content Architecture (DCA) and the U.S. Navy's document interchange format (DIF) seem to have the greatest support."


History

DIF was developed by Software Arts, Inc. (the developers of the
VisiCalc VisiCalc (for "visible calculator") is the first spreadsheet computer program for personal computers, originally released for Apple II by VisiCorp on 17 October 1979. It is often considered the application that turned the microcomputer from a hob ...
program) in the early 1980s. The specification was included in many copies of VisiCalc, and published in
Byte Magazine ''Byte'' (stylized as ''BYTE'') was a microcomputer magazine, influential in the late 1970s and throughout the 1980s because of its wide-ranging editorial coverage. "''Byte'' magazine, the leading publication serving the homebrew market ..." '' ...
.
Bob Frankston Robert M. Frankston (born June 14, 1949) is an American software engineer and businessman who co-created, with Dan Bricklin, the VisiCalc spreadsheet program. Frankston is also the co-founder of Software Arts. Early life and education Franksto ...
developed the format, with input from others, including
Mitch Kapor Mitchell David Kapor ( ; born November 1, 1950) is an American entrepreneur best known for his work as an application developer in the early days of the personal computer software industry, later founding Lotus, where he was instrumental in deve ...
, who helped so that it could work with his VisiPlot program. (Kapor later went on to found Lotus and make
Lotus 1-2-3 Lotus 1-2-3 is a discontinued spreadsheet program from Lotus Software (later part of IBM). It was the first killer application of the IBM PC, was hugely popular in the 1980s, and significantly contributed to the success of IBM PC-compatibles i ...
happen.) The specification was copyright 1981. DIF was a registered trademark of Software Arts Products Corp. (a legal name for Software Arts at the time).


Syntax

DIF stores everything in an
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
text file to mitigate many cross-platform issues back in the days of its creation. However modern spreadsheet software, e.g.
OpenOffice.org Calc OpenOffice.org (OOo), commonly known as OpenOffice, is a discontinued open-source office suite. Active successor projects include LibreOffice (the most actively developed), Apache OpenOffice, Collabora Online (enterprise ready LibreOffice) a ...
and
Gnumeric Gnumeric is a spreadsheet program that is part of the GNOME Free Software Desktop Project. Gnumeric version 1.0 was released on 31 December 2001. Gnumeric is distributed as free software under the GNU General Public License; it is intended to r ...
, offer more
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
to export/import. The file is divided into 2 sections: header and data. Everything in DIF is represented by a 2- or 3-line chunk. Headers get a 3-line chunk; data, 2. Header chunks start with a text identifier that is all caps, only alphabetic characters, and less than 32 letters. The following line must be a pair of numbers, and the third line must be a quoted string. On the other hand, data chunks start with a number pair and the next line is a quoted string or a keyword.


Values

A value occupies two lines, the first a pair of numbers and the second either a string or a keyword. The first number of the pair indicates type: *−1 – directive type, the second number is ignored, the following line is one of these keywords: **BOT – beginning of tuple (start of row) **EOD – end of data *0 – numeric type, value is the second number, the following line is one of these keywords: **V – valid **NA – not available **ERROR – error **TRUE – true boolean value **FALSE – false boolean value *1 – string type, the second number is ignored, the following line is the string in double quotes


Header chunk

A header chunk is composed of an identifier line followed by the two lines of a value. *TABLE - a numeric value follows of the version, the disused second line of the value contains a generator comment *VECTORS - the number of columns follows as a numeric value *TUPLES - the number of rows follows as a numeric value *DATA - after a dummy 0 numeric value, the data for the table follow, each row preceded by a BOT value, the entire table terminated by an EOD value The numeric values in header chunks use just an empty string instead of the validity keywords.


Discrepancies in implementations

Some implementations (notably those of older Microsoft products) swapped the meaning of VECTORS and TUPLES. Some implementations are insensitive to errors in the dimensions of the table as written in the header and simply use the layout in the DATA section.


Example

For example, assume we have two columns with one column header row and two data rows: In a .dif file, this would be (→ indicates comments):
TABLE
0,1
"EXCEL"
VECTORS     → the number of columns follows as a numeric value
0,2         → '0' indicates that it's a numeric type, '2' since we have 2 columns
""
TUPLES      → the number of rows follows as a numeric value
0,3         → '0' indicates that it's a numeric type, '3' since we have 3 rows
""
DATA        → after a dummy 0 numeric value, the data for the table follow
0,0         → this is the dummy 0 numeric value
""
-1,0        → '-1' for the directive type. This is followed by either a 'BOT' or an 'EOD'
BOT         → signifies the start of a row
1,0         → '1' since the cell contains a string. (The second number is ignored)
"Text"      → this is the String that's in the cell
1,0         → '1' since the cell contains a string.
"Number" 
-1,0  
BOT         → another row 
1,0         → a string follows
"hello"
0,1         → numeric value ('0') of value '1'
V           → 'V' is for 'Valid'
-1,0 
BOT         → another row
1,0
"has a double quote "" in text"
0,-3
V
-1,0 
EOD         → End of Data


See also

*
Data exchange Data exchange is the process of taking data structured under a ''source'' schema and transforming it into a ''target'' schema, so that the target data is an accurate representation of the source data.A. Doan, A. Halevy, and Z. Ives.Principles of da ...
*
Comma-separated values A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. The use of the comma as a field separ ...
(CSV format)


Notes


References


Sources

* Jeff Walden: File Formats for Popular PC Software. John Wiley & Sons, Inc., 1986. *
Comment Comment may refer to: * Comment (linguistics) or rheme, that which is said about the topic (theme) of a sentence * Bernard Comment (born 1960), Swiss writer and publisher Computing * Comment (computer programming), explanatory text or informat ...
from
Dan Bricklin Daniel Singer Bricklin (born July 16, 1951) is an American businessman and engineer who is the co-creator, with Bob Frankston, of the VisiCalc spreadsheet program. He also founded Software Garden, Inc., of which he is currently president, and T ...
, one of the developers of
VisiCalc VisiCalc (for "visible calculator") is the first spreadsheet computer program for personal computers, originally released for Apple II by VisiCorp on 17 October 1979. It is often considered the application that turned the microcomputer from a hob ...
, on the discussion page of this article * Commodore 64 Data Files, A BASIC Tutorial. (1984). David Miller. {{ISBN, 0835907910. Pages 212-231.


External links


Announcement of DIF Clearinghouse
by Software Arts Products Corp. Spreadsheet file formats Data serialization formats