Pronunciation Assessment
   HOME

TheInfoList



OR:

Automatic pronunciation assessment is the use of speech recognition to verify the correctness of pronounced
speech Speech is a human vocal communication using language. Each language uses Phonetics, phonetic combinations of vowel and consonant sounds that form the sound of its words (that is, all English words sound different from all French words, even if ...
, as distinguished from manual assessment by an instructor or proctor. Also called speech verification, pronunciation evaluation, and pronunciation scoring, the main application of this technology is computer-aided pronunciation teaching (CAPT) when combined with
computer-aided instruction Educational technology (commonly abbreviated as edutech, or edtech) is the combined use of computer hardware, software, and Education sciences, educational theory and practice to facilitate learning. When referred to with its abbreviation, edt ...
for computer-assisted language learning (CALL), speech remediation, or accent reduction. Pronunciation assessment does not determine unknown speech (as in dictation or
automatic transcription Transcription software assists in the conversion of human speech into a text transcript. Audio or video files can be transcribed manually or automatically. Transcriptionists can replay a recording several times in a transcription editor and type w ...
) but instead, knowing the expected word(s) in advance, it attempts to verify the correctness of the learner's pronunciation and ideally their intelligibility to listeners, sometimes along with often inconsequential prosody such as intonation, pitch, tempo, rhythm, and syllable and word stress. Pronunciation assessment is also used in reading tutoring, for example in products such as Microsoft Teams and from Amira Learning. Automatic pronunciation assessment can also be used to help diagnose and treat speech disorders such as apraxia. The earliest work on pronunciation assessment avoided measuring genuine listener intelligibility, a shortcoming corrected in 2011 at the Toyohashi University of Technology, and included in the
Versant The Versant suite of tests are computerized tests of spoken language available from Pearson PLC. Versant tests were the first fully automated tests of spoken language to use advanced speech processing technology (including speech recognition) to ...
high-stakes English fluency assessment from Pearson and mobile apps from 17zuoye Education & Technology, but still missing in 2023 products from Google Search, Microsoft, Educational Testing Service, Speechace, and ELSA. Assessing authentic listener intelligibility is essential for avoiding inaccuracies from
accent Accent may refer to: Speech and language * Accent (sociolinguistics), way of pronunciation particular to a speaker or group of speakers * Accent (phonetics), prominence given to a particular syllable in a word, or a word in a phrase ** Pitch ac ...
bias, especially in high-stakes assessments; from words with multiple correct pronunciations; and from phoneme coding errors in machine-readable pronunciation dictionaries. In 2022, researchers found that some newer speech to text systems, based on end-to-end reinforcement learning to map audio signals directly into words, produce word and phrase confidence scores closely correlated with genuine listener intelligibility. In the
Common European Framework of Reference for Languages The Common European Framework of Reference for Languages: Learning, Teaching, Assessment, abbreviated in English as CEFR or CEF or CEFRL, is a guideline used to describe achievements of learners of foreign languages across Europe and, increasing ...
(CEFR) assessment criteria for "overall phonological control", intelligibility outweighs formally correct pronunciation at all levels. Although there are as yet no industry-standard benchmarks for evaluating pronunciation assessment accuracy, researchers occasionally release evaluation speech corpuses for others to use for improving assessment quality. Such evaluation databases often emphasize formally unaccented pronunciation to the exclusion of genuine intelligibility evident from blinded listener transcriptions. Some promising areas for improvement being developed in 2024 include articulatory feature extraction and transfer learning to suppress unnecessary corrections. Other interesting advances under development include "
augmented reality Augmented reality (AR) is an interactive experience that combines the real world and computer-generated content. The content can span multiple sensory modalities, including visual, auditory, haptic, somatosensory and olfactory. AR can be de ...
" interfaces for mobile devices using optical character recognition to provide pronunciation training on text found in user environments.


See also

* Phonetics * Speech segmentation — often called "forced alignment" (of audio to its expected phonemes) in this context * Statistical classification


References


External links

* International Speech Communication Association (ISCA) Special Interest Group o
Speech and Language Technologies in Education (SLaTE)
{{Natural language processing Educational technology Language learning software Natural language processing Phonetics Speech recognition Statistical classification