Cologne Phonetics
   HOME

TheInfoList



OR:

Cologne phonetics (also Kölner Phonetik, Cologne process) is a
phonetic algorithm A phonetic algorithm is an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for English and are not useful for indexing words in other languages. Because English spelling varies significantly depending ...
which assigns to words a sequence of digits, the phonetic code. The aim of this procedure is that identical sounding words have the same code assigned to them. The algorithm can be used to perform a similarity search between words. For example, it is possible in a name list to find entries like "Meier" under different spellings such as "Maier", "Mayer", or "Mayr". The Cologne phonetics is related to the well known
Soundex Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly enc ...
phonetic algorithm but is optimized to match the German language. The algorithm was published in 1969 by Hans Joachim Postel.


Method

The Cologne phonetics matches each letter of a word to a digit between "0" and "8". To select the appropriate digit, at most one adjacent letter is used as a context. Some rules apply specifically to the initials of words. In this way similar sounds are supposed to be assigned the same code. The letters "W" and "V" for example, are both encoded with the number "3". The phonetic code for "Wikipedia" is "3412" (W=3, K=4, P=1, and D=2). Unlike the Soundex code, the length of the codes from the Cologne phonetics method is not limited.


Procedure

{, class="wikitable" ! Letter !! Context !!align="center", Code , - , A, E, I, J, O, U, Y , , , , align="center", 0 , - , H , , , , align="center", - , - , B , , , , align="center" rowspan="2", 1 , - , P , , not before H , - , D, T , , not before C, S, Z , , align="center", 2 , - , F, V, W , , , , align="center" rowspan="2", 3 , - , P , , before H , - , G, K, Q , , , , align="center" rowspan="3", 4 , - , rowspan="2", C , , in the initial sound before A, H, K, L, O, Q, R, U, X , - , before A, H, K, O, Q, U, X except after S, Z , - , X , , not after C, K, Q , , align="center", 48 , - , L , , , , align="center", 5 , - , M, N , , , , align="center", 6 , - , R , , , , align="center", 7 , - , S, Z , , , , align="center" rowspan="6", 8 , - , rowspan="3", C , , after S, Z , - , in initial position except before A, H, K, L, O, Q, R, U, X , - , not before A, H, K, O, Q, U, X , - , D, T , , before C, S, Z , - , X , , after C, K, Q That for the letter "C" the rule "SC" has priority over the rule "CH" was taken into account by the addition of ''"except after S, Z"'' in line 10 of the table. This is not explicitly mentioned in the original publication but can be inferred from the examples listed there, e.g. for "Breschnew" the code "17863" is specified. Lowercase letters are encoded accordingly; all other characters (such as hyphens) are ignored. For the umlauts Ä, Ö, Ü, as well as ß, that are not taken into account in the conversion table, it suggests itself to match them to the vowels (code "0") respective to the group S, Z (code "8"). Processing of a word is done in three steps: # Encode letter by letter from left to right according to the conversion table. # Remove all digits occurring more than once next to each other. # Remove all code "0" except at the beginning.


Example

The name Müller-Lüdenscheidt will be coded as follows: # Encode each letter: 60550750206880022 # Collapse of all multiple consecutive code digits: 6050750206802 # Remove all "0" digits: 65752682


Literature

Hans Joachim Postel: ''Die Kölner Phonetik. Ein Verfahren zur Identifizierung von Personennamen auf der Grundlage der Gestaltanalyse.'' in: IBM-Nachrichten, 19. Jahrgang, 1969, S. 925-931.


See also

*
Metaphone Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English sp ...


External links

* Martin Wilz
Aspekte der Kodierung phonetischer Ähnlichkeiten in deutschen Eigennamen
(PDF-Datei; 502 kB). Magisterarbeit an der Philosophischen Fakultät der Universität zu Köln, 2005; enthält eine Implementierung in der Programmiersprache
Perl Perl is a family of two high-level, general-purpose, interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it also referred to its redesigned "sister language", Perl 6, before the latter's name was offici ...
. * Maroš Kollár
Perl-Implementierung der Kölner Phonetik
und ähnlicher Verfahren als freie Software im CPAN (Comprehensive Perl Archive Network) * Andy Theiler
PHP und Oracle PL/SQL-Implementierung
der Kölner Phonetik * Nicolas Zimmer
PHP-Implementation der Kölner Phonetik
in einem Kommentar zum Eintrag ''soundex'' im PHP-Manual, 2008. Phonetic algorithms