HOME

TheInfoList



OR:

Forms processing is a process by which one can capture information entered into data fields and convert it into an electronic format. This can be done manually or automatically, but the general process is that
hard copy ''Hard Copy'' is an American tabloid television show that ran in syndication from 1989 to 1999. ''Hard Copy'' was aggressive in its use of questionable material on television, including gratuitous violence. The original hosts of ''Hard Copy'' ...
data is filled out by humans and then "captured" from their respective fields and entered into a database or other electronic format.


Overview

In the broadest sense, forms processing systems can range from the processing of small application forms to large scale survey forms with multiple pages. There are several common issues involved in forms processing when done manually. These are a lot of tedious human efforts put in, the data keyed in by the user may result in typos, and many hours of labor result from this lengthy process. If the forms are processed using
computer software Software is a set of computer programs and associated documentation and data. This is in contrast to hardware, from which the system is built and which actually performs the work. At the lowest programming level, executable code consists ...
driven applications these common issues can be resolved and minimized to great extent. Most methods for forms processing address the following areas.


Manual data entry

This method of
data processing Data processing is the collection and manipulation of digital data to produce meaningful information. Data processing is a form of ''information processing'', which is the modification (processing) of information in any manner detectable by an ...
involves human operators keying in data found on the form. The manual process of data entry has many disadvantages in speed, accuracy and cost. Based on average professional
typist Typist is a person who types, a clerical worker who writes documents, using a typewriter. Skills and occupations Typist may also refer to: *Data entry clerk, someone who types data into a database via a computer or terminal. * Audio typist, someone ...
speeds of 50 to 80 wpm, one could generously estimate about two hundred pages per hour for forms with fifteen one-word fields (not counting the time for reading and sorting pages). In contrast, modern commercial scanners can scan and digitize up to 200 pages per ''minute''. The second major disadvantage to manual data entry is the likelihood of
typographical errors A typographical error (often shortened to typo), also called a misprint, is a mistake (such as a spelling mistake) made in the typing of printed (or electronic) material. Historically, this referred to mistakes in manual type-setting (typography). ...
. When factoring in the cost of labor and working space, manual data entry is a very inefficient process.


Automated forms processing

This method can automate data processing by using pre-defined templates and configurations. A template in this case, would be a ''map'' of the document, detailing where the data fields are located within the form or document. As compared to the manual data entry process, automatic form input systems are preferable, since they help reduce the problems faced during manual data processing. Automatic form input systems use different types of recognition methods such as
optical character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...
(OCR) for machine print, optical mark reading (OMR) for check/mark sense boxes,
bar code A barcode or bar code is a method of representing data in a visual, Machine-readable data, machine-readable form. Initially, barcodes represented data by varying the widths, spacings and sizes of parallel lines. These barcodes, now commonly refe ...
recognition (BCR) for barcodes, and
intelligent character recognition In computer science, intelligent character recognition (ICR) is an advanced optical character recognition (OCR) or — rather more specific — handwriting recognition system that allows fonts and different styles of handwriting to be learned by a ...
(ICR) for hand print. With automated form processing system technology users are able to process documents from their scanned images into a computer readable format such as ANSI, XML, CSV, PDF or input directly into a database. Forms Processing has developed beyond basic capture of the data. Forms processing not only encompasses a recognition process but also helps manage the complete
life cycle Life cycle, life-cycle, or lifecycle may refer to: Science and academia *Biological life cycle, the sequence of life stages that an organism undergoes from birth to reproduction ending with the production of the offspring *Life-cycle hypothesis, ...
of documents which starts from scanning of the document to the extraction of the data, and often to delivery into a back-end system. In some cases it may also include processing or generating well formatted results through calculations and analysis. An automated forms processing system can be valuable if there is a need to process hundreds or thousands of images every day.


First Step: Assessment of the form structure

The first step in understanding automated forms processing is to analyze the type of form from which the extraction of data is desired. Forms can be classified as one of two high level categories for the purpose of extracting data. Four categories have been proposed however the document capture industry has settled up these two: # Fixed forms. This type of form is defined as one in which the data to be extracted is always found in the same absolute position on a page. This allows a type of lens grid to be applied to the document and every subsequent occurrence of this document in order to extract the data. An example of a fixed form is a typical credit application form. # Semi-structured (or unstructured) form. This form is one in which the location of the data and fields holding the data vary from document to document. This type of document is perhaps most easily defined by the fact that it is not a fixed form. In the document capture industry, a semi-structured form is also called an unstructured form. Examples of these types of forms include letters, contracts, and invoices. According to a study by AIIM, about 80% of the documents in an organization fall under the semi-structured definition. Although the components (described below) used for the extraction of data from either type of form is the same the way in which these are applied varies considerably based upon the type of document.


Components

Various components included in data processing using automatic form-input system include #OCR –
Optical character recognition Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scen ...
#OMR –
Optical mark recognition Optical mark recognition (also called optical mark reading and OMR) is the process of reading information that people mark on surveys, tests and other paper documents. OMR is used to read questionnaires, multiple choice examination papers in the ...
#ICR –
Intelligent character recognition In computer science, intelligent character recognition (ICR) is an advanced optical character recognition (OCR) or — rather more specific — handwriting recognition system that allows fonts and different styles of handwriting to be learned by a ...
#BCR –
Barcode A barcode or bar code is a method of representing data in a visual, machine-readable form. Initially, barcodes represented data by varying the widths, spacings and sizes of parallel lines. These barcodes, now commonly referred to as linear or o ...
recognition #MICR –
Magnetic ink character recognition Magnetic ink character recognition code, known in short as MICR code, is a character recognition technology used mainly by the banking industry to streamline the processing and clearance of cheques and other documents. MICR encoding, called the '' ...
OCR recognizes machine-printed uppercase/lowercase alphabetic, numeric, accented characters, many
currency symbols A currency symbol or currency sign is a graphic symbol used to denote a currency unit. Usually it is defined by the monetary authority, like the national central bank for the currency concerned. In formatting, the symbol can use various format ...
, digits, arithmetic symbols, expanded punctuation characters and more. ICR recognizes hand-printed American and
European English The English language in Europe, as a native language, is mainly spoken in the United Kingdom and Ireland. Outside of these states, it has official status in Malta, the Crown Dependencies (the Isle of Man, Jersey and Guernsey), Gibraltar and the ...
characters using pre-defined character sets: uppercase, lowercase,
mixed case Capitalization (American English) or capitalisation (British English) is writing a word with its first letter as a capital letter (uppercase letter) and the remaining letters in lower case, in writing systems with a case distinction. The term a ...
alphabetic, digits, currency (including $ (dollar), ¢ (cent) € (Euro) £ (pound), ¥ (Yen)), arithmetic and punctuation characters (including period, comma,
single quote Single may refer to: Arts, entertainment, and media * Single (music), a song release Songs * "Single" (Natasha Bedingfield song), 2004 * "Single" (New Kids on the Block and Ne-Yo song), 2008 * "Single" (William Wei song), 2016 * "Single", by ...
, double quote, ! & ( ) ? @ \ # % * + – / : ; < = >) MICR is recognition technology to facilitate the processing of the MICR fonts of cheques. This minimizes chances of errors in clearing of cheques. It is also useful for easier and faster transfer of funds. MICR provides a secure, high-speed method of scanning and processing information. Optical Mark Recognition (OMR) identifies bubbles filled in by hand or check boxes on printed forms. Usually OMR supports single and multiple mark recognition. The fields to be recognized can be specified as grids (rows by columns) or single bubbles. Barcode Recognition can read more than 20 industry 1D and 2D barcodes including Code39, CODABAR,
Interleaved 2 of 5 Interleaved 2 of 5 (ITF) is a continuous two-width barcode symbology encoding digits. It is used commercially on 135 film, for ITF-14 barcodes, and on cartons of some products, while the products inside are labeled with UPC or EAN. ITF enco ...
, Code93 and more. It automatically detects all barcodes in an image or specified area within the image.


Process

The process of automated forms processing typically includes the following steps: #A batch of completed forms is scanned using a high-speed scanner #Images are cleaned with document image processing algorithms to improve accuracy #Forms are classified based on original template forms and the fields are extracted using the appropriate recognition components #Fields which the system flagged with a low confidence are queued for verification by a human operator #Verified data is saved into a database or exported to searchable text format such as CSV, XML or PDF


Prerequisites

Though automated forms processing has many great advantages over manual data entry, it still comes with some limitations. To achieve the best accuracy, some prerequisites should be followed. #Scan format: It includes the format of scanned file, Resolution and DPI, Color Mode #Configuration: The scanned image layout needs to be configured for this automation #Recognition: The pre defined out put formats #Result /analyze: Any specific format of result of capture value data presentation. One very important consideration is indexing, determining the
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
that will be used to describe the data contained within the documents. This attribute perhaps drives the forms processing solution more than any other.


External links


AIIM market intelligence reports


References

{{reflist Automatic identification and data capture