A document file format is a
text or
binary file
A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document fi ...
format for storing
document
A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" o ...
s on a
storage media, especially for use by
computers.
There currently exist a multitude of incompatible document file formats.
Examples of XML-based
open
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* ''Open'' (Blues Image album), 1969
* ''Open'' (Gotthard album), 1999
* ''Open'' (Cowboy Junkies album), 2001
* ''Open'' (Y ...
standards are
DocBook,
XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, prior ...
, and, more recently, the
ISO/
IEC standards
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed ...
(ISO 26300:2006) and
Office Open XML (ISO 29500:2008).
In 1993, the
ITU-T
The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Commu ...
tried to establish a standard for document file formats, known as the
Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description language
In digital printing, a page description language (PDL) is a computer language that describes the appearance of a printed page in a higher level than an actual output bitmap (or generally raster graphics). An overlapping term is printer control ...
s such as
PostScript
PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, ...
and
PDF have become the ''
de facto
''De facto'' ( ; , "in fact") describes practices that exist in reality, whether or not they are officially recognized by laws or other formal norms. It is commonly used to refer to what happens in practice, in contrast with '' de jure'' ("by l ...
'' standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of
ISO/
IEC standards for PDF began to be published, including the specification for PDF itself,
ISO-32000.
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
is the most used and open international standard and it is also used as document file format. It has also become
ISO/
IEC standard (ISO 15445:2000).
The default binary file format used by
Microsoft Word
Microsoft Word is a word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other platforms includi ...
(
.doc) has become widespread ''
de facto
''De facto'' ( ; , "in fact") describes practices that exist in reality, whether or not they are officially recognized by laws or other formal norms. It is commonly used to refer to what happens in practice, in contrast with '' de jure'' ("by l ...
'' standard for office documents, but it is a
proprietary format and is not always fully supported by other word processors.
Common document file formats
*
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
,
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
—
plain text
In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a limi ...
formats
*
Amigaguide
AmigaGuide is a hypertext document file format designed for the Amiga. Files are stored in ASCII so it is possible to read and edit a file without the need for special software.
Since Workbench 2.1 an Amiga Guide system for O.S. inline help files ...
*
.doc for
Microsoft Word
Microsoft Word is a word processing software developed by Microsoft. It was first released on October 25, 1983, under the name ''Multi-Tool Word'' for Xenix systems. Subsequent versions were later written for several other platforms includi ...
— Structural binary format developed by Microsoft (specifications available since 2008 under the
Open Specification Promise)
*
DjVu — file format designed primarily to store scanned documents
*
DocBook — an XML format for technical documentation
*
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
(.html, .htm), (open standard, ISO from 2000), in combination with possible
image files referred to.
*
FictionBook (.fb2) — open XML-based e-book format
*
Markdown (.md) — markup language for creating formatted text using plain text
*
Office Open XML — .docx (XML-based standard for office documents)
*
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed ...
— .odt (XML-based standard for office documents)
*
OpenOffice.org XML — .sxw (open, XML-based format for office documents)
*
OXPS — Open XML Paper Specification (Windows 8.1 and above, older version is XPS used in Windows 7)
*
PalmDoc —
handheld document format
* .pages for
Pages
*
PDF — Open standard for document exchange. ISO standards include
PDF/X (eXchange),
PDF/A (Archive),
PDF/E (Engineering),
ISO 32000
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
(PDF),
PDF/UA (Accessibility) and
PDF/VT (Variable data and transactional printing). PDF is readable on almost every platform with free or open source readers. Open source PDF creators are also available.
*
PostScript
PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, ...
— .ps
*
Rich Text Format (RTF) — meta data format being developed by Microsoft since 1987 for Microsoft products and
cross-platform
In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software ...
document interchange
*
SYmbolic LinK (SYLK)
*
Scalable Vector Graphics (SVG) - Graphics format primarily for vector-based images.
*
TeX — Open-source typesetting program and format. First successful mathematical notation language.
*
TEI — XML format for digital publication
*
Troff
*
Uniform Office Format
Uniform Office Format (UOF; Chinese 标文通, literally "standard text general"), sometimes known as Unified Office Format, is an open standard for office applications developed in China. It includes word processing, presentation, and spreadshee ...
— Chinese standard
*
WordPerfect (.wpd, .wp, .wp7, .doc) (Note: possible confusion with Word format extension)
See also
*
List of document file formats
This is a list of file formats used by computers, organized by type. Filename extension it is usually noted in parentheses if they differ from the file format name or abbreviation. Many operating systems do not limit filenames to one extension s ...
*
List of document markup languages
*
Comparison of document markup languages The following tables compare general and technical information for a number of document markup languages. Please see the individual markup languages' articles for further information.
General information
Basic general information about the marku ...
*
Open format
An open file format is a file format for storing digital data, defined by an openly published specification usually maintained by a standards organization, and which can be used and implemented by anyone. Open file format is licensed with open lic ...
*
Word Processor
A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Word processor (electronic device), Early word processors were stand-alone devices ded ...
*
Desktop Publishing
*
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosper ...
References
External links
Lost in Translation: Interoperability Issues for Open Standards - ODF and OOXML as Examples
{{Office document file formats
Computer file formats