LaTeXML
   HOME

TheInfoList



OR:

LaTeXML is a free
public domain The public domain (PD) consists of all the creative work to which no exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly waived, or may be inapplicable. Because those rights have expired ...
software package which converts
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
documents to
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable. T ...
,
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
,
EPUB EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartpho ...
,
JATS The Jat people ((), ()) are a traditionally agricultural community in Northern India and Pakistan. Originally pastoralists in the lower Indus river-valley of Sindh, Jats migrated north into the Punjab region in late medieval times, and su ...
and TEI.


Workflow

LaTeXML's primary output format is an XML representation of (La) TeX's document model. A postprocessor can convert these XML documents into other structured formats. Common use cases create
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
with mathematical formulas as images or
XHTML Extensible HyperText Markup Language (XHTML) is part of the family of XML markup languages. It mirrors or extends versions of the widely used HyperText Markup Language (HTML), the language in which Web pages are formulated. While HTML, prior ...
,
HTML5 HTML5 is a markup language used for structuring and presenting content on the World Wide Web. It is the fifth and final major HTML version that is a World Wide Web Consortium (W3C) recommendation. The current specification is known as the HTML ...
, and
EPUB EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes styled ''ePub''. EPUB is supported by many e-readers, and compatible software is available for most smartpho ...
with formulas as
MathML Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content. It aims at integrating mathematical formulae into World Wide W ...
. Compared to other LaTeX-to-XML processors, LaTeXML aims to conserve the semantic structures of the
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
markup. This makes it a good basis for semantic services lik
Math search
Conversion times range from 30 milliseconds for a single formula (in the LaTeXML daemon) to minutes for book-size documents.


History

LaTeXML was started in the context of the
Digital Library of Mathematical Functions The Digital Library of Mathematical Functions (DLMF) is an online project at the National Institute of Standards and Technology (NIST) to develop a database of mathematical reference data for special functions and their applications. It is inte ...
at
NIST The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical sci ...
, where
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
documents needed to be prepared for publication on the Web. The system has been under active development for over a decade, and has attracted a small, but dedicated community of developers and users centered on Bruce Miller, the original project author. The current released version is LaTeXML 0.8.7. It was released in December 2022, and development remains active on th
public repository


Notable usage

LaTeXML was used to convert 90% (60% without errors) of 530,000 documents from the
arXiv arXiv (pronounced "archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists of ...
to XML. As a result of this ongoing effort for enhancing coverage, LaTeXML supports a large range of LaTeX packages. The ACL 2014 conference used LaTeXML to convert submitted papers to XML. This followed existing work which has been trying to convert the ACL Anthology papers to high-quality semantic markup for further analysis. Since February, 2013, LaTeXML has been used as to render the web pages on the peer produced mathematics website,
PlanetMath PlanetMath is a free, collaborative, mathematics online encyclopedia. The emphasis is on rigour, openness, pedagogy, real-time content, interlinked content, and also community of about 24,000 people with various maths interests. Intended to be c ...
. Since July, 2015, it was adopted by Authorea for their advanced LaTeX support. In 2018, the second data release of the European Space Agency's
Gaia In Greek mythology, Gaia (; from Ancient Greek , a poetical form of , 'land' or 'earth'),, , . also spelled Gaea , is the personification of the Earth and one of the Greek primordial deities. Gaia is the ancestral mother—sometimes parthen ...
project was realized via LaTeXML. In February of 2022,
arXiv arXiv (pronounced "archive"—the X represents the Greek letter chi ⟨χ⟩) is an open-access repository of electronic preprints and postprints (known as e-prints) approved for posting after moderation, but not peer review. It consists of ...
announced an experimental service based on LaTeXML, offering 1.78 million documents as HTML5. A LaTeXML developer claimed successful conversion of 74% of arXiv, with 97% of articles "at least partially viewable". As of the start of 2024, that experiment has been promoted to arXiv's main article pages.


Implementation

The core of LaTeXML is a
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offic ...
reimplementation of TeX's parsing and digestion algorithm coupled with a customizable XML emitter. To conserve the semantic structures in the
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
markup, LaTeXML needs XML bindings for all
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
packages with high-level macro definitions. The LaTeXML distribution currently provides XML bindings for over 200 commonly used LaTeX packages such as AMSTeX, Babel and
PGF/TikZ PGF/Ti''k''Z is a pair of languages for producing vector graphics (e.g., technical illustrations and drawings) from a geometric/algebraic description, with standard features including the drawing of points, lines, arrows, paths, circles, ellipse ...
(which only has experimental support). The LaTeXML conversion consists of two stages: * the first one parses
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
and converts that into a
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
-near XML document type, and * the second (post-processing) transforms the XML into one of the standardized structured output formats. LaTeXML 0.8 added daemon functionality which enabled multiple conversions and easy embedding into web services. LaTeXML 0.8.7 was the first version emitting the "
MathML Mathematical Markup Language (MathML) is a mathematical markup language, an application of XML for describing mathematical notations and capturing both its structure and content. It aims at integrating mathematical formulae into World Wide W ...
Core" markup language for mathematical syntax, new in MathML 4.


See also

*
pdfTeX __NOTOC__ The computer program pdfTeX is an extension of Knuth's typesetting program TeX, and was originally written and developed into a publicly usable product by Hàn Thế Thành as a part of the work for his PhD thesis at the Faculty of I ...


References


External links


Official Homepage for LaTeXML

LaTeXML source code

LaTeXML web server, services, and demos
{{LaTeX navbox Free TeX software Free mathematics software Public-domain software with source code MathML TeX software for Windows TeX software for macOS Free software programmed in Perl