HOME

TheInfoList



OR:

Document A document is a written, drawn, presented, or memorialized representation of thought, often the manifestation of non-fictional, as well as fictional, content. The word originates from the Latin ''Documentum'', which denotes a "teaching" or ...
Capture Software refers to applications that provide the ability and feature set to automate the process of scanning paper documents or importing electronic documents, often for the purposes of feeding advanced
document classification Document classification or document categorization is a problem in library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" (or "intellectually") ...
and data collection processes. Most scanning hardware, both scanners and
copier A photocopier (also called copier or copy machine, and formerly Xerox machine, the generic trademark) is a machine that makes copies of documents and other visual images onto paper or plastic film quickly and cheaply. Most modern photocopiers u ...
s, provides the basic ability to scan to any number of
image file formats An Image file format is a file format for a digital image. There are many formats that can be used, such as JPEG, PNG, and GIF. Most formats up until 2022 were for storing 2D images, not 3D ones. The data stored in an image file format may be c ...
, including:
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
,
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word processin ...
,
JPG JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image ...
, BMP, etc. This basic functionality is augmented by document capture software, which can add efficiency and standardization to the process.


Typical Features

Typical features of Document Capture Software include: *
Barcode A barcode or bar code is a method of representing data in a visual, machine-readable form. Initially, barcodes represented data by varying the widths, spacings and sizes of parallel lines. These barcodes, now commonly referred to as linear or o ...
recognition * Patch Code recognition * Separation * Optical Character Recognition (OCR) * Optical Mark Recognition (OMR) * Quality Assurance * Indexing * Migration


Goal for Implementation of a Document Capture Solution

The goal for implementing a document capture solution is to reduce the amount of time spent scanning, separating, enhancing, organizing, classifying, normalizing, and collecting information from document collections, and to produce metadata along with an image/PDF file, and/or OCR text. This information is then migrated to a file share, FTP site, database,
Document Management A document management system (DMS) is usually a computerized system used to store, share, track and manage files or documents. Some systems include history tracking where a log of the various versions created and modified by different users is r ...
or
Enterprise Content Management Enterprise content management (ECM) extends the concept of content management by adding a timeline for each content item and, possibly, enforcing processes for its creation, approval and distribution. Systems using ECM generally provide a secure ...
system. These systems often provide a search function, allowing search of the assets based on the produced
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
, and then viewed using
document imaging Document imaging is an information technology category for systems capable of replicating documents commonly used in business. Document imaging systems can take many forms including microfilm, on demand printers, facsimile machines, copiers, multifu ...
software.


Document Capture System Solutions - General


Integration with Document Management System

ECM (Enterprise Content management) and their DMS component (Document Management System) are being adopted by many organizations as a corporate document management system for all types of electronic files, e.g. MS word, PDF ... However, much of the information held by organisations is on paper and this needs to be integrated within the same document repository. By converting paper documents into digital format through scanning, organizations convert paper into image formats such as TIF, JPG, and PDF, and also extract valuable index information or business data from the document using OCR technology. Digital documents and associated metadata can easily be stored in the ECM in a variety of formats. The most popular of these formats is PDF which not only provides an accurate representation of the document but also allows all the OCR text in the document to be stored behind the PDF image. This format is known as PDF with hidden text or text-searchable PDF. This allows users to search for documents by using keywords in the metadata fields or by searching the content of PDF files across the repository.


Advantages of scanning documents into a ECM/DMS

Information held on paper is usually just as valuable to organisations as the electronic documents that are generated internally. Often this information represents a large proportion of the day to day correspondence with suppliers and customers. Having the ability to manage and share this information internally through a document management system such as
SharePoint SharePoint is a web-based collaborative platform that integrates natively with Microsoft Office. Launched in 2001, SharePoint is primarily sold as a document management and storage system, but the product is highly configurable and its usage v ...
or a CMIS-compatible repository improves collaboration between departments or employees and also eliminates the risk of losing this information through disasters such as floods or fire. Organisations adopting an ECM/DMS often implement electronic workflow which allows the information held on paper to be included as part of an electronic business process and incorporated into a customer record file along with other associated office documents and emails. For business critical documents, such as purchase orders and supplier invoices, digitising documents helps speed up business transactions as well as reduce manual effort involved in keying data into business systems, such as CRM, ERP and Accounting. Scanned invoices can also be routed to managers for payment approval via email or an electronic workflow.


Electronic Document Capture

In the earlier implementations of Document Capture Software, the technology focused solely on the digitization and capture of information from paper documents. Document images were acquired from document scanners via TWAIN/ISIS drivers. Only image-based file formats like TIF, JPG, and BMP were typically compatible with these solutions. But in recent years, as the volume of electronically-created documents and the number of proprietary file formats continues to increase at exponential rates, the need for handling documents existing in electronic formats has grown. The relevant document capture products have adapted to function with non-image file formats with the end-goal of creating a unified processing workflow capable of handling all incoming documents The ability to import files from a variety of sources is one example of such adaptation. Importing documents from ECM/DMS software solutions, email servers, FTP, and EDI is now as much of a requirement of document capture software as is paper capture. The normalization of output files to text-based PDF format is now another critical factor in long-term archival of proprietary electronic file formats. Normalization expands access and usage of files to users throughout the enterprise, rather than only those that created the original electronic file.


References

{{DEFAULTSORT:Document Capture Software Applications of artificial intelligence Optical character recognition Data management SharePoint