Caverphone
   HOME

TheInfoList



OR:

The Caverphone within
linguistics Linguistics is the scientific study of human language. It is called a scientific study because it entails a comprehensive, systematic, objective, and precise analysis of all aspects of language, particularly its nature and structure. Linguis ...
and
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, e ...
, is a
phonetic matching algorithm A phonetic algorithm is an algorithm for indexing of words by their pronunciation. Most phonetic algorithms were developed for English and are not useful for indexing words in other languages. Because English spelling varies significantly depending ...
invented to identify English names with their sounds, originally built to process a custom dataset compound between 1893 and 1938 in southern
Dunedin Dunedin ( ; mi, Ōtepoti) is the second-largest city in the South Island of New Zealand (after Christchurch), and the principal city of the Otago region. Its name comes from , the Scottish Gaelic name for Edinburgh, the capital of Scotland. Th ...
, New Zealand. Started from a similar concept as
metaphone Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English sp ...
, it has been developed to accommodate and process general English since then.


Etymology

The Caverphone was created by David Hood in the
Caversham Project Caversham is one of the older suburbs of the city of Dunedin, in New Zealand's South Island. It is sited at the western edge of the city's central plain at the mouth of the steep Caversham Valley, which rises to the saddle of Lookout Point. Ma ...
at the
University of Otago , image_name = University of Otago Registry Building2.jpg , image_size = , caption = University clock tower , motto = la, Sapere aude , mottoeng = Dare to be wise , established = 1869; 152 years ago , type = Public research collegiate u ...
in
New Zealand New Zealand ( mi, Aotearoa ) is an island country in the southwestern Pacific Ocean. It consists of two main landmasses—the North Island () and the South Island ()—and over 700 smaller islands. It is the sixth-largest island count ...
in 2002, revised in 2004. It was created to assist in data matching between late 19th century and early 20th century electoral rolls, where the name only needed to be in a "commonly recognisable form". The algorithm was intended to apply to those names that could not easily be matched between electoral rolls, after the exact matches were removed from the pool of potential matches. The algorithm is optimised for accents present in the study area (southern part of the city of
Dunedin Dunedin ( ; mi, Ōtepoti) is the second-largest city in the South Island of New Zealand (after Christchurch), and the principal city of the Otago region. Its name comes from , the Scottish Gaelic name for Edinburgh, the capital of Scotland. Th ...
, New Zealand).


Procedure


Caverphone 1.0

The rules of the algorithm are applied consecutively to any particular name, as a series of replacements. The algorithm is as follows: # Convert to ''lowercase'' # Remove anything not A-Z # If the name starts with... ## cough, replace it by cou2f ## rough, replace it by rou2f ## tough, replace it by tou2f ## enough, replace it by enou2f ## gn, replace it by 2n # If the name ends with ## mb, replace it by m2 # Replace ## cq with 2q ## ci with si ## ce with se ## cy with sy ## tch with 2ch ## c with k ## q with k ## x with k ## v with f ## dg with 2g ## tio with sio ## tia with sia ## d with t ## ph with fh ## b with p ## sh with s2 ## z with s ## any initial ''vowel'' with an A ## all other ''vowels'' with a 3 ## 3gh3 with 3kh3 ## gh with 22 ## g with k ## groups of the letter s with a S ## groups of the letter t with a T ## groups of the letter p with a P ## groups of the letter k with a K ## groups of the letter f with a F ## groups of the letter m with a M ## groups of the letter n with a N ## w3 with W3 ## wy with Wy ## wh3 with Wh3 ## why with Why ## w with 2 ## any initial h with an A ## all other occurrences of h with a 2 ## r3 with R3 ## ry with Ry ## r with 2 ## l3 with L3 ## ly with Ly ## l with 2 ## j with y ## y3 with Y3 ## y with 2 # remove all ## 2 ## 3 # put six 1 on the end # take the ''first six characters'' as the code


Caverphone 2.0

#Start with a word #Convert to lowercase #Remove anything not in the standard alphabet (typically a-z) #Remove final e #If the name starts with ##cough make it cou2f ##rough make it rou2f ##tough make it tou2f ##enough make it enou2f ##trough make it trou2f ##gn make it 2n #If the name ends with ##mb make it m2 #Replace ##cq with 2q ##ci with si ##ce with se ##cy with sy ##tch with 2ch ##c with k ##q with k ##x with k ##v with f ##dg with 2g ##tio with sio ##tia with sia ##d with t ##ph with fh ##b with p ##sh with s2 ##z with s ##an initial ''vowel'' with an A ##all other ''vowels'' with a 3 ##j with y ##an initial y3 with Y3 ##an initial y with A ##y with 3 ##3gh3 with 3kh3 ##gh with 22 ##g with k ##groups of the letter s with a S ##groups of the letter t with a T ##groups of the letter p with a P ##groups of the letter k with a K ##groups of the letter f with a F ##groups of the letter m with a M ##groups of the letter n with a N ##w3 with W3 ##wh3 with Wh3 ##if the name ends in w replace the final w with 3 ##w with 2 ##an initial h with an A ##all other occurrences of h with a 2 ##r3 with R3 ##if the name ends in r replace the final r with 3 ##r with 2 ##l3 with L3 ##if the name ends in l replace the final l with 3 ##l with 2 #remove all 2s #if the name end in 3, replace the final 3 with A #remove all 3s #put ten 1s on the end #take the ''first ten characters'' as the code ----


Examples


Caverphone 1.0

Lee -> lee
lee -> l33
l33 -> L33
L33 -> L
L -> L111111
L111111 -> L11111
Thompson -> thompson
thompson -> th3mps3n
th3mps3n -> th3mpS3n
th3mpS3n -> Th3mpS3n
Th3mpS3n -> Th3mPS3n
Th3mPS3n -> Th3MPS3n
Th3MPS3n -> Th3MPS3N
Th3MPS3N -> T23MPS3N
T23MPS3N ->  TMPSN
TMPSN111111 -> TMPSN1


Caverphone 2.0

Lee -> lee
lee -> le
le -> l3
l3 -> L3
L3 -> LA
LA -> LA1111111111
LA1111111111 -> LA11111111
Thompson -> thompson
thompson -> th3mps3n
th3mps3n -> th3mpS3n
th3mpS3n -> Th3mpS3n
Th3mpS3n -> Th3mPS3n
Th3mPS3n -> Th3MPS3n
Th3MPS3n -> Th3MPS3N
Th3MPS3N -> T23MPS3N
T23MPS3N ->  TMPSN
TMPSN1111111111 -> TMPSN11111


See also

*
Soundex Soundex is a phonetic algorithm for indexing names by sound, as pronounced in English. The goal is for homophones to be encoded to the same representation so that they can be matched despite minor differences in spelling. The algorithm mainly enc ...
*
New York State Identification and Intelligence System The New York State Identification and Intelligence System Phonetic Code, commonly known as NYSIIS, is a phonetic algorithm devised in 1970 as part of the New York State New York, officially the State of New York, is a state in the Northeaster ...
* Match rating approach *
Metaphone Metaphone is a phonetic algorithm, published by Lawrence Philips in 1990, for indexing words by their English pronunciation. It fundamentally improves on the Soundex algorithm by using information about variations and inconsistencies in English sp ...
* Cologne phonetics


References


External links


Caversham Project
- Caversham data set of names and accents in the southern part of
Dunedin Dunedin ( ; mi, Ōtepoti) is the second-largest city in the South Island of New Zealand (after Christchurch), and the principal city of the Otago region. Its name comes from , the Scottish Gaelic name for Edinburgh, the capital of Scotland. Th ...
, New Zealand in 1893-1938.
Original (2002) Caverphone algorithm

Revised (2004) Caverphone algorithm
* Implementations: *
C# Revised Implementation
** Java implementation in th
Apache Commons Codec
project *
PHP implementation
** Python Implementatio
caverphone algorithm (version 2.0)
- AdvaS Advanced Search project {{Authority control Phonetic algorithms