Page Analysis And Ground Truth Elements
   HOME





Page Analysis And Ground Truth Elements
Page Analysis and Ground Truth Elements (PAGE) is an XML standard for encoding digitised documents. Comparable to ALTO (XML), it allows the organisation and structure of a page and its contents to be described. PAGE XML can be used to describe: * page content (regions, lines of text, words, glyphs, reading order, text content, ...) * the evaluation of the layout analysis (evaluation profiles, evaluation results, ...) * the cutting of the document image (cutting grids) The format is developed by the Pattern Recognition & Image Analysis Lab (PRIMA) at the University of Salford in Manchester. It was designed to be used in conjunction with automatic segmentation and transcription techniques ( OCR and HTR): indeed, PAGE aims to support each of the different steps in the processing chain for image document analysis (from image enhancement to layout analysis to OCR). The PAGE XML schema is notably used as an export and import format by automatic transcription software such as eScriptor ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ALTO (XML)
Analyzed Layout and Text Object (ALTO) is an open XML Schema developed by the EU-funded project called METAe. The standard was initially developed for the description of text Optical character recognition, OCR and layout information of pages for digitized material. The goal was to describe the layout and text in a form to be able to reconstruct the original appearance based on the digitized information - similar to the approach of a lossless image saving operation. ALTO is often used in combination with Metadata Encoding and Transmission Standard (METS) for the description of the whole digitized object and creation of references across the ALTO files, e.g. reading sequence description. The standard is hosted by the Library of Congress since 2010 and maintained by the Editorial Board initialized at the same time. In the time from the final version of the ALTO standard in June 2004 (version 1.0) ALTO was maintained by CCCCS Content Conversion Specialists GmbH, Hamburgup to version ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

University Of Salford
The University of Salford is a Public university, public research university in Salford, Greater Manchester, Salford, Greater Manchester, England, west of Manchester city centre. The Royal Technical Institute, Salford, which opened in 1896, became a College of Advanced Technology (United Kingdom), College of Advanced Technology in 1956 and gained university status in 1967, following the Robbins Report into higher education. It has students () and is in of parkland on the banks of the River Irwell. History Origins of the Royal Technical Institute The university's origins can be traced to the opening in 1896 of the Royal Technical Institute, Salford, a merger of Salford Working Men's College (founded in 1858) and Pendleton Mechanics' Institute (founded in 1850). The Royal Technical Institute received royal letters after the then-Duke of York, Duke and Duchess of York (later George V of the United Kingdom, King George V and Mary of Teck, Queen Mary) officiated at its opening ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Optical Character Recognition
Optical character recognition or optical character reader (OCR) is the electronics, electronic or machine, mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image (for example: from a television broadcast). Widely used as a form of data entry from printed paper data recordswhether passport documents, invoices, bank statements, computerized receipts, business cards, mail, printed data, or any suitable documentationit is a common method of digitizing printed texts so that they can be electronically edited, searched, stored more compactly, displayed online, and used in machine processes such as cognitive computing, machine translation, (extracted) text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligen ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Handwritten Text Recognition
Handwriting recognition (HWR), also known as handwritten text recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices. The image of the written text may be sensed "off line" from a piece of paper by optical scanning (optical character recognition) or intelligent word recognition. Alternatively, the movements of the pen tip may be sensed "on line", for example by a pen-based computer screen surface, a generally easier task as there are more clues available. A handwriting recognition system handles formatting, performs correct segmentation into characters, and finds the most possible words. Offline recognition Offline handwriting recognition involves the automatic conversion of text in an image into letter codes that are usable within computer and text-processing applications. The data obtained by this form is regarded as a static representation ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

EScriptorium
eScriptorium is a platform for manual or automated segmentation and text recognition of historical manuscripts and prints. Details The software is an open source software developed at the Paris Sciences et Lettres University as part of the projects ''Scripta'' and ''RESILIENCE'' with contributions from other institutions, partly funded by the EU's Horizon 2020 funding program and a grant from the Andrew W. Mellon Foundation. Scanned pages from manuscripts and prints can be imported into eScriptorium and exported as text in various formats (text, ALTO or PAGE XML, TEI). The text areas with text lines in the images are first recognized manually or automatically (segmentation). The text lines are then transcribed manually or automatically. Both automatic segmentation and text recognition can be trained using manually created or corrected examples (ground truth). The new models created in this way can be shared with others and can therefore be easily reused. eScriptorium is ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


Transkribus
Transkribus is a platform for the text recognition, image analysis and structure recognition of historical documents. The platform was created in the context of the two EU projects "tranScriptorium" (2013–2015) and "READ" (Recognition and Enrichment of Archival Documents – 2016–2019). It was developed by the University of Innsbruck. Since July 1, 2019 the platform has been directed and further developed by the READ-COOP, a non-profit cooperative. The platform integrates tools developed by research groups throughout Europe, including the ''Pattern Recognition and Human Language Technology'' (PRHLT) group of the Technical University of Valencia and the Computational Intelligence Technology Lab (CITlab) group of University of Rostock. Comparable programs that offer similar functions are eScriptorium eScriptorium is a platform for manual or automated segmentation and text recognition of historical manuscripts and prints. Details The software is an open source sof ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




Deutsche Forschungsgemeinschaft
The German Research Foundation ( ; DFG ) is a German research funding organization, which functions as a self-governing institution for the promotion of science and research in the Federal Republic of Germany. In 2019, the DFG had a funding budget of €3.3 billion. Function The DFG supports research in science, engineering, and the humanities through a variety of grant programmes, research prizes, and by funding infrastructure. The self-governed organization is based in Bonn and financed by the German states and the federal government of Germany. the organization consists of approximately 100 research universities and other research institutions. The DFG endows various research prizes, including the Leibniz Prize. The Polish-German science award Copernicus is offered jointly with the Foundation for Polish Science. According to a 2017 article in ''The Guardian'', the DFG has announced it will publish its research in online open-access journals. Background In 1937, th ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]