EScriptorium
   HOME

TheInfoList



OR:

eScriptorium is a platform for manual or automated segmentation and text recognition of historical
manuscript A manuscript (abbreviated MS for singular and MSS for plural) was, traditionally, any document written by hand or typewritten, as opposed to mechanically printed or reproduced in some indirect or automated way. More recently, the term has ...
s and prints.


Details

The software is an
open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
software developed at the
Paris Sciences et Lettres University PSL University (PSL or in French Université PSL, for Paris Sciences et Lettres) is a ''Grands établissements, Grand établissement'' based in Paris, France. It was established in 2010 and formally created as a university in 2019. It is a colle ...
as part of the projects ''Scripta'' and ''RESILIENCE'' with contributions from other institutions, partly funded by the EU's
Horizon 2020 The Framework Programmes for Research and Technological Development, also called Framework Programmes or abbreviated FP1 to FP9, are funding programmes created by the European Union/European Commission to support and foster research in the Europe ...
funding program and a grant from the
Andrew W. Mellon Foundation The Andrew W. Mellon Foundation, commonly known as the Mellon Foundation, is a New York City-based private foundation with wealth accumulated by Andrew Mellon of the Mellon family of Pittsburgh, Pennsylvania. It is the product of the 1969 merger ...
. Scanned pages from manuscripts and prints can be imported into eScriptorium and exported as text in various formats (text,
ALTO The musical term alto, meaning "high" in Italian (Latin: '' altus''), historically refers to the contrapuntal part higher than the tenor and its associated vocal range. In four-part voice leading alto is the second-highest part, sung in ch ...
or PAGE XML, TEI). The text areas with text lines in the images are first recognized manually or automatically (segmentation). The text lines are then transcribed manually or automatically. Both automatic segmentation and text recognition can be trained using manually created or corrected examples (
ground truth Ground truth is information that is known to be real or true, provided by direct observation and measurement (i.e. empirical evidence) as opposed to information provided by inference. Etymology The ''Oxford English Dictionary'' (s.v. ''ground ...
). The new models created in this way can be shared with others and can therefore be easily reused. eScriptorium is built on top of the free OCR software ''Kraken'' by Benjamin Kiessling, a derivative of the OCR software ''
OCRopus OCRopus is a Free software, free Document Layout Analysis, document analysis and optical character recognition (OCR) system released under the Apache License, Apache License v2.0 with a very modular design using command-line interfaces. OCRopus i ...
'', which is suitable for handwritten and printed texts and also supports scripts such as Hebrew and Arabic, which are written from right to left. Comparable programs that offer similar functions to eScriptorium are OCR4All and Transkribus.


Individual references


External links

{{Commons category, EScriptorium, eScriptorium Optical character recognition software Free software programmed in JavaScript Free software programmed in Python