HOME

TheInfoList



OR:

A document file format is a
text Text may refer to: Written word * Text (literary theory) In literary theory, a text is any object that can be "read", whether this object is a work of literature, a street sign, an arrangement of buildings on a city block, or styles of clothi ...
or
binary file A binary file is a computer file that is not a text file. The term "binary file" is often used as a term meaning "non-text file". Many binary file formats contain parts that can be interpreted as text; for example, some computer document files ...
format for storing
document A document is a writing, written, drawing, drawn, presented, or memorialized representation of thought, often the manifestation of nonfiction, non-fictional, as well as fictional, content. The word originates from the Latin ', which denotes ...
s on a
storage media Data storage is the recording (storing) of information (data) in a storage medium. Handwriting, phonographic recording, magnetic tape, and optical discs are all examples of storage media. Biological molecules such as RNA and DNA are cons ...
, especially for use by
computer A computer is a machine that can be Computer programming, programmed to automatically Execution (computing), carry out sequences of arithmetic or logical operations (''computation''). Modern digital electronic computers can perform generic set ...
s. There currently exists a multitude of incompatible document file formats. Examples of
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
-based
open Open or OPEN may refer to: Music * Open (band), Australian pop/rock band * The Open (band), English indie rock band * ''Open'' (Blues Image album), 1969 * ''Open'' (Gerd Dudek, Buschi Niebergall, and Edward Vesala album), 1979 * ''Open'' (Go ...
standards are
DocBook DocBook is a Semantics (computer science), semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of docume ...
,
XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages which mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, pr ...
, and, more recently, the
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
/
IEC The International Electrotechnical Commission (IEC; ) is an international standards organization that prepares and publishes international standards for all electrical, electronic and related technologies. IEC standards cover a vast range of ...
standards
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processor, word processing documents, spreadsheets, Presentation program, presentations and ...
(ISO 26300:2006) and
Office Open XML Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version ...
(ISO 29500:2008). In 1993, the
ITU-T The International Telecommunication Union Telecommunication Standardization Sector (ITU-T) is one of the three Sectors (branches) of the International Telecommunication Union (ITU). It is responsible for coordinating Standardization, standards fo ...
tried to establish a standard for document file formats, known as the
Open Document Architecture The Open Document Architecture (ODA) and interchange format (informally referred to as just ODA) is a free and open international standard document file format maintained by the ITU-T to replace all proprietary document file formats. ODA is det ...
(ODA) which was supposed to replace all competing document file formats. It is described in ITU-T documents T.411 through T.421, which are equivalent to ISO 8613. It did not succeed.
Page description language In digital printing, a page description language (PDL) is a computer language that describes the appearance of a printed page in a higher level than an actual output bitmap (or generally raster graphics). An overlapping term is printer control ...
s such as
PostScript PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
and
PDF Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe Inc., Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, computer hardware, ...
have become the '' de facto'' standard for documents that a typical user should only be able to create and read, not edit. In 2001, a series of
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
/
IEC The International Electrotechnical Commission (IEC; ) is an international standards organization that prepares and publishes international standards for all electrical, electronic and related technologies. IEC standards cover a vast range of ...
standards for PDF began to be published, including the specification for PDF itself, ISO-32000.
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
is the most used and open international standard and it is also used as document file format. It has also become
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
/
IEC The International Electrotechnical Commission (IEC; ) is an international standards organization that prepares and publishes international standards for all electrical, electronic and related technologies. IEC standards cover a vast range of ...
standard (ISO 15445:2000). The default binary file format used by
Microsoft Word Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
( .doc) has become a widespread '' de facto'' standard for office documents, but it is a
proprietary format A proprietary file format is a file format of a company, organization, or individual that contains data that is ordered and stored according to a particular encoding-scheme, such that the decoding and interpretation of this stored data is easily ac ...
and is not always fully supported by other word processors.


Common document file formats

Below is a list of some of the more common document file formats, common
file extension File or filing may refer to: Mechanical tools and processes * File (tool), a tool used to remove fine amounts of material from a workpiece. ** Filing (metalworking), a material removal process in manufacturing ** Nail file, a tool used to gen ...
s used by the formats in parentheses: *
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
,
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
(.txt, others) — any of some plain text encodings that may have differing line endings depending on what system they were created or edited on * Amigaguide (.guide) — a
hypertext Hypertext is E-text, text displayed on a computer display or other electronic devices with references (hyperlinks) to other text that the reader can immediately access. Hypertext documents are interconnected by hyperlinks, which are typic ...
document format designed for the
Amiga Amiga is a family of personal computers produced by Commodore International, Commodore from 1985 until the company's bankruptcy in 1994, with production by others afterward. The original model is one of a number of mid-1980s computers with 16-b ...
that is used to document Amiga programs *
Microsoft Word Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
(.doc, .docx) — structural binary ( .doc) and XML-based text formats ( .docx) developed primarily by Microsoft, both of which are subject to the
Microsoft Open Specification Promise The Microsoft Open Specification Promise (or OSP) is a promise by Microsoft, published in September 2006, to not assert its patents, in certain conditions, against implementations of a certain list of specifications. The OSP is not a licence, but ...
and are used to store
word processing A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features. Word processor (electronic device), Early word processors were stand-alone devices dedicate ...
documents * DjVu (.djv, .djvu) — a file format designed primarily to store scanned documents, especially ones containing a mixture of text, line drawings, and images *
DocBook DocBook is a Semantics (computer science), semantic markup language for technical documentation. It was originally intended for writing technical documents related to computer hardware and software, but it can be used for any other sort of docume ...
(.dbk, .xml) — an XML-based format intended for writing technical documentation *
HTML Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
(.html, .htm) — an
ad hoc ''Ad hoc'' is a List of Latin phrases, Latin phrase meaning literally for this. In English language, English, it typically signifies a solution designed for a specific purpose, problem, or task rather than a Generalization, generalized solution ...
hypertext document format originally created for the
World Wide Web The World Wide Web (WWW or simply the Web) is an information system that enables Content (media), content sharing over the Internet through user-friendly ways meant to appeal to users beyond Information technology, IT specialists and hobbyis ...
, initially developed as an
open standard An open standard is a standard that is openly accessible and usable by anyone. It is also a common prerequisite that open standards use an open license that provides for extensibility. Typically, anybody can participate in their development due to ...
by the
W3C The World Wide Web Consortium (W3C) is the main international standards organization for the World Wide Web. Founded in 1994 by Tim Berners-Lee, the consortium is made up of member organizations that maintain full-time staff working together in ...
and currently being developed as one by the
WHATWG The Web Hypertext Application Technology Working Group (WHATWG) is a community of people interested in evolving HTML and related technologies. The WHATWG was founded by individuals from Apple Inc., the Mozilla Foundation and Opera Software, ...
* FictionBook (.fb2, .fb3) — an open, XML-based
e-book An ebook (short for electronic book), also spelled as e-book or eBook, is a book publication made available in electronic form, consisting of text, images, or both, readable on the flat-panel display of computers or other electronic devices. Al ...
format that originated and gained popularity in Russia *
Markdown Markdown is a lightweight markup language for creating formatted text using a plain-text editor. John Gruber created Markdown in 2004 as an easy-to-read markup language. Markdown is widely used for blogging and instant messaging, and also used ...
(.md) — a simple, plain text markup language with several different implementations that is popular on blogs and content management systems *
OpenDocument The Open Document Format for Office Applications (ODF), also known as OpenDocument, standardized as ISO 26300, is an open file format for word processor, word processing documents, spreadsheets, Presentation program, presentations and ...
(.odt, .fodt) — an open, XML-based standard for office documents, including word processing documents, spreadsheets, presentations, and graphics * OpenOffice.org XML (.sxw, .sxc, .sxd, .sxi, others) — an open, XML-based format for office documents including word processing documents, spreadsheets, presentations, graphics, and formulas *
Open XML Paper Specification Open XML Paper Specification (also referred to as OpenXPS) is an open specification for a page description language and a fixed-document format. Microsoft developed it as the XML Paper Specification (XPS). In June 2009, Ecma International adop ...
(.xps, .oxps) — an
XAML Extensible Application Markup Language (XAML ) is a declarative XML-based language developed by Microsoft for initializing structured values and objects. It is available under Microsoft's Open Specification Promise. XAML is used extensively i ...
-based page description format designed by Microsoft (.xps), intended to compete with the Portable Document Format (PDF) and was later standardized by
Ecma International Ecma International () is a Nonprofit organization, nonprofit standards organization for information and communication systems. It acquired its current name in 1994, when the European Computer Manufacturers Association (ECMA) changed its name to ...
as ECMA-388 (.oxps) * PalmDOC (.pdb) — a special version of the PDB record database format used by
Palm OS Palm OS (also known as Garnet OS) is a discontinued mobile operating system initially developed by Palm, Inc., for personal digital assistants (PDAs) in 1996. Palm OS was designed for ease of use with a touchscreen-based graphical user interface. ...
used to store e-books and other text documents for handheld devices * Pages (.pages) — a document file format used to store word processing documents for
Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
's Pages app, as a part of its
iWork iWork is an office suite of applications created by Apple Inc., Apple for its macOS, iPadOS, and iOS operating systems, and also available cross-platform through the iCloud website. iWork includes the presentation program, presentation applicat ...
office suite *
Portable Document Format Portable document format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating syste ...
(.pdf) — a now standardized (ISO 32000), open format based on
PostScript PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
, developed by
Adobe Adobe (from arabic: الطوب Attub ; ) is a building material made from earth and organic materials. is Spanish for mudbrick. In some English-speaking regions of Spanish heritage, such as the Southwestern United States, the term is use ...
in 1992, that is able to store documents, forms,
rich media Interactive media refers to digital experiences that dynamically respond to user input, delivering content such as text, images, animations, video, audio, and even AI-driven interactions. Over the years, interactive media has expanded across ...
, and graphics (PDF and
PDF/UA PDF/UA (PDF/Universal Accessibility), formally ISO 14289, is an International Organization for Standardization (ISO) standard for accessible PDF technology. A technical specification intended for developers implementing PDF writing and processing ...
) for document exchange (
PDF/X PDF/X is a subset of the ISO standard for PDF. The purpose of PDF/X is to facilitate graphics exchange, and it therefore has a series of printing-related requirements which do not apply to standard PDF files. For example, in PDF/X-1a all fonts nee ...
and
PDF/VT PDF/VT is an international standard published by ISO in August 2010 as ISO 16612-2. It defines the use of PDF as an exchange format optimized for variable and transactional printing. Built on top of PDF/X-4, it is the first variable-data printi ...
), archival (
PDF/A PDF/A is an International Organization for Standardization, ISO-standardized version of the Portable Document Format (PDF) specialized for use in the archive, archiving and long-term digital preservation, preservation of electronic documents. PDF ...
), and engineering (
PDF/E ISO 24517-1:2008 is an ISO Standard published in 2008. * Document management—Engineering document format using PDF—Part 1: Use of PDF 1.6 (PDF/E-1) This standard defines a format (PDF/E) for the creation of documents used in geospatial, con ...
) *
PostScript PostScript (PS) is a page description language and dynamically typed, stack-based programming language. It is most commonly used in the electronic publishing and desktop publishing realm, but as a Turing complete programming language, it c ...
(.ps) — a page description and programming language designed by Adobe for use with printing, display systems, and storing documents *
Rich Text Format ) As an example, the following RTF code would be rendered as follows: This is some bold text. Character encoding A standard RTF file can only consist of 7-bit ASCII characters, but can use escape sequences to encode other characters. ...
(.rtf) — a
proprietary {{Short pages monitor