HOME

TheInfoList



OR:

ALTO (Analyzed Layout and Text Object) is an open
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing arbitrary data. It defines a set of rules for encoding documents in a format that is both human-readable and machine-readable ...
Schema developed by the EU-funded project called METAe. The standard was initially developed for the description of text OCR and layout information of pages for digitized material. The goal was to describe the layout and text in a form to be able to reconstruct the original appearance based on the digitized information - similar to the approach of a lossless image saving operation. ALTO is often used in combination with
Metadata Encoding and Transmission Standard The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide ...
(METS) for the description of the whole digitized object and creation of references across the ALTO files, e.g. reading sequence description. The standard is hosted by the Library of Congress since 2010 and maintained by the Editorial Board initialized at the same time. In the time from the final version of the ALTO standard in June 2004 (version 1.0) ALTO was maintained by CC
CCS Content Conversion Specialists GmbH, Hamburg
up to version 1.4.


Versions

The latest schema version and an overview about all versions with the links to the schema can be found a
https://github.com/altoxml


Structure

An ALTO file consists of three major sections as children of the root element:Structure of ALTO Files
/ref> * section contains
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
about the ALTO file itself and processing information on how the file was created. * section contains the text and paragraph styles with their individual descriptions: ** has font descriptions ** has paragraph descriptions, e.g. alignment information * section contains the content information. It is subdivided into elements.


See also

*
Metadata Encoding and Transmission Standard The Metadata Encoding and Transmission Standard (METS) is a metadata standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide ...
(METS) *
Dublin Core 220px, Logo image of DCMI, which formulates Dublin Core The Dublin Core, also known as the Dublin Core Metadata Element Set (DCMES), is a set of fifteen "core" elements (properties) for describing resources. This fifteen-element Dublin Core has ...
, an ISO metadata standard * Preservation Metadata: Implementation Strategies (PREMIS) *
Open Archives Initiative Protocol for Metadata Harvesting The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a protocol developed for harvesting metadata descriptions of records in an archive so that services can be built using metadata from many archives. An implementation of OAI- ...
(OAI-PMH) *
hOCR hOCR is an open standard of data representation for formatted text obtained from optical character recognition (OCR). The definition encodes text, style, layout information, recognition confidence metrics and other information using Extensible Ma ...


References


External links


ALTO (Analyzed Layout and Text Object) standards
on Library of Congress website
https://altoxml.github.io/
resp
https://github.com/altoxml
ALTOxml on GitHub
More info about METS/ALTO by CCS GmbH

METS ALTO Introduction by CCS GmbH
{{Webarchive, url=https://web.archive.org/web/20140904022519/http://content-conversion.com/wp-content/uploads/2014/09/CCS-METS-ALTO-Info_basic_20140902.pdf , date=2014-09-04
XSLT-Transformations from and to ALTO
XML Markup languages Technical communication Open file formats Metadata