HOME

TheInfoList



OR:

Fuzzy matching is a technique used in
computer-assisted translation Computer-aided translation (CAT), also referred to as computer-assisted translation or computer-aided human translation (CAHT), is the use of software to assist a human translator in the translation process. The translation is created by a huma ...
as a special case of
record linkage Record linkage (also known as data matching, data linkage, entity resolution, and many other terms) is the task of finding records in a data set that refer to the same entity across different data sources (e.g., data files, books, websites, and d ...
. It works with matches that may be less than 100% perfect when finding correspondences between segments of a text and entries in a
database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases spa ...
of previous translations. It usually operates at sentence-level segments, but some translation technology allows matching at a phrasal level. It is used when the translator is working with
translation memory A translation memory (TM) is a database that stores "segments", which can be sentences, paragraphs or sentence-like units (headings, titles or elements in a list) that have previously been translated, in order to aid human translators. The translati ...
(TM). It uses
approximate string matching In computer science, approximate string matching (often colloquially referred to as fuzzy string searching) is the technique of finding strings that match a pattern approximately (rather than exactly). The problem of approximate string matching ...
.


Background

When an exact match cannot be found in the TM database for the text being translated, there is an option to search for a match that is less than exact; the translator sets the threshold of the fuzzy match to a percentage value less than 100%, and the database will then return any matches in its memory corresponding to that percentage. Its primary function is to assist the translator by speeding up the translation process; fuzzy matching is not designed to replace the human translator.


History

Because of the polymorphous and dynamic nature of
language Language is a structured system of communication. The structure of a language is its grammar and the free components are its vocabulary. Languages are the primary means by which humans communicate, and may be conveyed through a variety of ...
, particularly English (which accounts for 90% of all source texts undergoing translation in the localisation industry{{Citation needed, date=May 2010), methods are always being sought to make the translation process easier and faster. Since the late 1980s, translation memory tools have been developed to increase productivity and make the whole translation process faster for the translator. In the 1990s, fuzzy matching began to take off as a prominent feature of TM tools, and despite some issues concerning the extra work involved in editing a fuzzy match "proposal", it is still a popular subset of TM. It is currently a feature of most popular TM tools.


Methodology

The TM tool searches the database to locate segments that are an approximate match for a segment in a new source text to be translated. The TM, in effect, "proposes" the match to the translator; it is then up to the translator to accept this proposal or to edit this proposal to more fully equate with the new source text that is undergoing translation. In this way, fuzzy matching can speed up the translation process and lead to increased productivity. This raises questions about the quality of the resulting translations. On occasions a translator is under pressure to deliver on time and is thus led to accept a fuzzy match proposal without checking its suitability and context. TM databases are built up by input from numerous different translators working on a variety of different texts, with a danger that sentences extracted from this word "tapestry" will be a stitched-together hodgepodge of styles, and the antithesis of the striven-after consistency – what some critics have dubbed "sentence salad". The question of faith in the TM's proposals can be a problem when trying to strike a balance between a faster translation process and the quality of that translation. Nevertheless, fuzzy matching is still an important part of the translator's tool-kit.


External links


Esselink, B. (2000) ''A Practical Guide to Localization''
John Benjamins Publishing Company Computer-assisted translation