GOCR
   HOME

TheInfoList



OR:

GOCR (or JOCR) is a free
optical character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...
program, initially written by Jörg Schulenburg. It can be used to convert or scan image files (
portable pixmap Netpbm (formerly Pbmplus) is an open-source package of graphics programs and a programming library. It is used mainly in the Unix world, where one can find it included in all major open-source operating system distributions, but also works on Micr ...
or
PCX PCX, standing for ''PiCture eXchange'', was an image file format developed by the now-defunct ZSoft Corporation of Marietta, Georgia, United States. It was the native file format for PC Paintbrush and became one of the first widely accepted DOS ...
) into
text file A text file (sometimes spelled textfile; an old alternative name is flatfile) is a kind of computer file that is structured as a sequence of lines of electronic text. A text file exists stored as data within a computer file system. In operating ...
s.


Features

GOCR claims it can handle single-column sans-serif fonts of 20–60 pixels in height. It reports trouble with serif fonts, overlapping characters, handwritten text, heterogeneous fonts, noisy images, large angles of skew, and text in anything other than a
Latin alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the o ...
. GOCR can also translate
barcode A barcode or bar code is a method of representing data in a visual, machine-readable form. Initially, barcodes represented data by varying the widths, spacings and sizes of parallel lines. These barcodes, now commonly referred to as linear or o ...
s.


User interface

GOCR can be used as a stand-alone
command-line A command-line interpreter or command-line processor uses a command-line interface (CLI) to receive commands from a user in the form of lines of text. This provides a means of setting parameters for the environment, invoking executables and pro ...
application, or as a back-end to other programs. It comes with a gocr.tcl graphic interface. GOCR can be also used as an OCR engine in
OCRFeeder OCRFeeder is an optical character recognition suite for GNOME, which also supports virtually any command-line OCR engine, such as CuneiForm, GOCR, Ocrad and Tesseract. It converts paper documents to digital document files and can serve to make t ...
.


Development

Version 0.3.0 was released in December 2000, 0.3.5 in February 2002, and 0.37 in August 2002. Between version 0.40 (March 2005) and 0.43 (December 2006), the recognition engine was gradually replaced with a vector version. Version 0.48 was released in August 2009. Version 0.49 was released in September 2010. Version 0.50 was released in March 2013. Version 0.51 was released in August 2017.


Nomenclature

The application was originally named GOCR which stands for GNU Optical Character Recognition. When it came time to register the project on
SourceForge SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirrorin ...
the name GOCR was already taken so the project was registered as JOCR (Jörg's Optical Character Recognition). As a result of this situation the project and application are known as both GOCR and JOCR. Schulenburg admits that this is problematic.


Formats

Acceptable image formats are: * PNM * PBM * PGM * PPM * PCX (some) * TGA Other formats are automatically converted using netpbm-progs,
gzip gzip is a file format and a software application used for file compression and decompression. The program was created by Jean-loup Gailly and Mark Adler as a free software replacement for the compress program used in early Unix systems, and in ...
and bzip2 via the use of a unix pipe. These images types include: * pnm.gz * pnm.bz2 * PNG * JPG * TIFF * GIF * BMP


References


External links


GOCR Main Page
* (may be out of date) {{DEFAULTSORT:Gocr Free graphics software Optical character recognition software Free software programmed in C