A document file format is a
text or
binary file format for storing
document
A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" or ...
s on a
storage media, especially for use by
computer
A computer is a machine that can be programmed to Execution (computing), carry out sequences of arithmetic or logical operations (computation) automatically. Modern digital electronic computers can perform generic sets of operations known as C ...
s.
There currently exist a multitude of incompatible document file formats.
Examples of XML-based
open
Open or OPEN may refer to:
Music
* Open (band), Australian pop/rock band
* The Open (band), English indie rock band
* ''Open'' (Blues Image album), 1969
* ''Open'' (Gotthard album), 1999
* ''Open'' (Cowboy Junkies album), 2001
* ''Open'' (YF ...
standards are
DocBook,
XHTML
Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated.
While HTML, prior ...
, and, more recently, the
ISO
ISO is the most common abbreviation for the International Organization for Standardization.
ISO or Iso may also refer to: Business and finance
* Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007
* Iso ...
/
IEC
The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
standards
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed wi ...
(ISO 26300:2006) and
Office Open XML
Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version a ...
(ISO 29500:2008).
In 1993, the
ITU-T
The ITU Telecommunication Standardization Sector (ITU-T) is one of the three sectors (divisions or units) of the International Telecommunication Union (ITU). It is responsible for coordinating standards for telecommunications and Information Commu ...
tried to establish a standard for document file formats, known as the
Open Document Architecture (ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description languages such as
PostScript
PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Doug Br ...
and
PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
have become the ''
de facto
''De facto'' ( ; , "in fact") describes practices that exist in reality, whether or not they are officially recognized by laws or other formal norms. It is commonly used to refer to what happens in practice, in contrast with ''de jure'' ("by la ...
'' standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of
ISO
ISO is the most common abbreviation for the International Organization for Standardization.
ISO or Iso may also refer to: Business and finance
* Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007
* Iso ...
/
IEC
The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
standards for PDF began to be published, including the specification for PDF itself,
ISO-32000.
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
is the most used and open international standard and it is also used as document file format. It has also become
ISO
ISO is the most common abbreviation for the International Organization for Standardization.
ISO or Iso may also refer to: Business and finance
* Iso (supermarket), a chain of Danish supermarkets incorporated into the SuperBest chain in 2007
* Iso ...
/
IEC
The International Electrotechnical Commission (IEC; in French: ''Commission électrotechnique internationale'') is an international standards organization that prepares and publishes international standards for all electrical, electronic and r ...
standard (ISO 15445:2000).
The default binary file format used by
Microsoft Word (
.doc) has become widespread ''
de facto
''De facto'' ( ; , "in fact") describes practices that exist in reality, whether or not they are officially recognized by laws or other formal norms. It is commonly used to refer to what happens in practice, in contrast with ''de jure'' ("by la ...
'' standard for office documents, but it is a
proprietary format
A proprietary file format is a file format of a company, organization, or individual that contains data that is ordered and stored according to a particular encoding-scheme, designed by the company or organization to be secret, such that the decodi ...
and is not always fully supported by other word processors.
Common document file formats
*
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
,
UTF-8
UTF-8 is a variable-width encoding, variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit'' ...
—
plain text formats
*
Amigaguide
AmigaGuide is a hypertext document file format designed for the Amiga. Files are stored in ASCII so it is possible to read and edit a file without the need for special software.
Since Workbench 2.1 an Amiga Guide system for O.S. inline help files ...
*
.doc for
Microsoft Word — Structural binary format developed by Microsoft (specifications available since 2008 under the
Open Specification Promise The Microsoft Open Specification Promise (or OSP) is a promise by Microsoft, published in September 2006, to not assert its patents, in certain conditions, against implementations of a certain list of specifications.
The OSP is not a licence, but r ...
)
*
DjVu
DjVu ( , like French "déjà vu") is a computer file format designed primarily to store scanned documents, especially those containing a combination of text, line drawings, indexed color images, and photographs. It uses technologies such as ima ...
— file format designed primarily to store scanned documents
*
DocBook — an XML format for technical documentation
*
HTML
The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaScri ...
(.html, .htm), (open standard, ISO from 2000), in combination with possible
image files referred to.
*
FictionBook
FictionBook is an open XML-based e-book format which originated and gained popularity in Russia. FictionBook files have the filename extension. Some readers also support ZIP-compressed FictionBook files ( or )
The FictionBook format does not ...
(.fb2) — open XML-based e-book format
*
Markdown (.md) — markup language for creating formatted text using plain text
*
Office Open XML
Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version a ...
— .docx (XML-based standard for office documents)
*
OpenDocument
The Open Document Format for Office Applications (ODF), also known as OpenDocument, is an open file format for word processing documents, spreadsheets, presentations and graphics and using ZIP-compressed XML files. It was developed wi ...
— .odt (XML-based standard for office documents)
*
OpenOffice.org XML
OpenOffice.org XML is an open XML-based file format developed as an open community effort by Sun Microsystems in 2000–2002. The open-source software application suite OpenOffice.org 1.x and StarOffice 6 and 7 used the format as their native an ...
— .sxw (open, XML-based format for office documents)
*
OXPS — Open XML Paper Specification (Windows 8.1 and above, older version is XPS used in Windows 7)
*
PalmDoc —
handheld document format
* .pages for
Pages
Page most commonly refers to:
* Page (paper), one side of a leaf of paper, as in a book
Page, PAGE, pages, or paging may also refer to:
Roles
* Page (assistance occupation), a professional occupation
* Page (servant), traditionally a young mal ...
*
PDF
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
— Open standard for document exchange. ISO standards include
PDF/X
PDF/X is a subset of the PDF ISO standard. The purpose of PDF/X is to facilitate graphics exchange, and it therefore has a series of printing-related requirements which do not apply to standard PDF files. For example, in PDF/X-1a all fonts need t ...
(eXchange),
PDF/A
PDF/A is an ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archiving and long-term preservation of electronic documents. PDF/A differs from PDF by prohibiting features unsuitable for long-term archiving, ...
(Archive),
PDF/E
ISO 24517-1:2008 is an ISO Standard published in 2008.
* Document management—Engineering document format using PDF—Part 1: Use of PDF 1.6 (PDF/E-1)
This standard defines a format (PDF/E) for the creation of documents used in geospatial, con ...
(Engineering),
ISO 32000
Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
(PDF),
PDF/UA
PDF/UA (PDF/Universal Accessibility), formally ISO 14289, is an International Organization for Standardization (ISO) standard for accessible PDF technology. A technical specification intended for developers implementing PDF writing and processing ...
(Accessibility) and
PDF/VT
PDF/VT is an international standard published by ISO in August 2010 as ISO 16612-2. It defines the use of PDF as an exchange format optimized for variable and transactional printing. Built on top of PDF/X-4, it is the first variable-data printing ...
(Variable data and transactional printing). PDF is readable on almost every platform with free or open source readers. Open source PDF creators are also available.
*
PostScript
PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Doug Br ...
— .ps
*
Rich Text Format (RTF) — meta data format being developed by Microsoft since 1987 for Microsoft products and
cross-platform
In computing, cross-platform software (also called multi-platform software, platform-agnostic software, or platform-independent software) is computer software that is designed to work in several computing platforms. Some cross-platform software r ...
document interchange
*
SYmbolic LinK (SYLK)
*
Scalable Vector Graphics (SVG) - Graphics format primarily for vector-based images.
*
TeX
Tex may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Joe Tex (1933–1982), stage name of American soul singer Joseph Arrington Jr.
Entertainment
* ''Tex'', the Italian ...
— Open-source typesetting program and format. First successful mathematical notation language.
*
TEI — XML format for digital publication
*
Troff
*
Uniform Office Format
Uniform Office Format (UOF; Chinese 标文通, literally "standard text general"), sometimes known as Unified Office Format, is an open standard for office applications developed in China. It includes word processing, presentation, and spreadshee ...
— Chinese standard
*
WordPerfect (.wpd, .wp, .wp7, .doc) (Note: possible confusion with Word format extension)
See also
*
List of document file formats
*
List of document markup languages
The following is a list of document markup languages. You may also find the List of markup languages of interest.
Well-known document markup languages
* HyperText Markup Language (HTML) – the original markup language that was defined as a part o ...
*
Comparison of document markup languages
*
Open format
*
Word Processor
A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Word processor (electronic device), Early word processors were stand-alone devices ded ...
*
Desktop Publishing
Desktop publishing (DTP) is the creation of documents using page layout software on a personal ("desktop") computer. It was first used almost exclusively for print publications, but now it also assists in the creation of various forms of online c ...
*
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
References
External links
Lost in Translation: Interoperability Issues for Open Standards - ODF and OOXML as Examples
{{Office document file formats
Computer file formats