Speech Recognition & Synthesis, formerly known as Speech Services, is a

screen reader A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to blindness, blind people, and are useful to visually impaired people, Illiteracy, illiterate, ...

application developed by

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

for its Android operating system. It powers applications to read aloud (speak) the text on the screen, with support for many languages. Text-to-Speech may be used by apps such as

Google Play Books Google Play Books, formerly Google eBooks, is an ebook digital distribution service operated by Google, part of its Google Play product line. Users can purchase and download ebooks and audiobooks from Google Play, which offers over five million ...

for reading books aloud,

Google Translate Google Translate is a multilingualism, multilingual neural machine translation, neural machine translation service developed by Google to translation, translate text, documents and websites from one language into another. It offers a web applic ...

for reading aloud translations for the pronunciation of words, Google TalkBack, and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.

Supported languages

* Afrikaans (South Africa) * Albanian (Albania) * Amharic (Ethiopia) * Arabic (Saudi Arabia) * Assamese (India) * Basque (Spain) * Bengali (Bangladesh) * Bengali (India) * Bodo (India) * Bosnian (Bosnia and Herzegovina) * Bulgarian (Bulgaria) * Burmese (Myanmar) * Cantonese (Hong Kong) * Catalan (Spain) * Chinese (China) * Chinese (Taiwan) * Croatian (Croatia) * Czech (Czech Republic) * Danish (Denmark) * Dogri (India) * Dutch (Belgium) * Dutch (Netherlands) * English (Australia) * English (Nigeria) * English (India) * English (United Kingdom) * English (United States) * Estonian (Estonia) * Filipino (Philippines) * Finnish (Finland) * French (Canada) * French (France) * Galician (Spain) * German (Germany) * Greek (Greece) * Gujarati (India) * Hausa (Nigeria) * Hebrew (Israel) * Hindi (India) * Hungarian (Hungary) * Icelandic (Iceland) * Indonesian (Indonesia) * Italian (Italy) * Japanese (Japan) * Javanese (Indonesia) * Kannada (India) * Kashmiri (India) * Khmer (Cambodia) * Konkani (India) * Korean (South Korea) * Latin (Vatican City) * Latvian (Latvia) * Lithuanian (Lithuania) * Maithili (India) * Malay (Malaysia) * Malayalam (India) * Manipuri (India) * Marathi (India) * Nepali (Nepal) * Norwegian (Norway) * Odia (India) * Polish (Poland) * Portuguese (Brazil) * Portuguese (Portugal) * Punjabi (India) * Romanian (Romania) * Russian (Russia) * Sanskrit (India) * Santali (India) * Serbian (Serbia) * Sindhi (India) * Sinhala (Sri Lanka) * Slovak (Slovakia) * Slovenian (Slovenia) * Spanish (Spain) * Spanish (United States) * Sundanese (Indonesia) * Swahili (Kenya) * Swedish (Sweden) * Tamil (India) * Telugu (India) * Thai (Thailand) * Turkish (Turkey) * Ukrainian (Ukraine) * Urdu (Pakistan) * Urdu (India) * Vietnamese (Vietnam) * Welsh (United Kingdom)

History

Some app developers have started adapting and tweaking their Android Auto apps to include Text-to-Speech, such as Hyundai in 2015. Apps such as textPlus and

WhatsApp WhatsApp (officially WhatsApp Messenger) is an American social media, instant messaging (IM), and voice-over-IP (VoIP) service owned by technology conglomerate Meta. It allows users to send text, voice messages and video messages, make vo ...

use Text-to-Speech to read notifications aloud and provide voice-reply functionality. Google Cloud Text-to-Speech is powered by WaveNet, software created by Google's UK-based AI subsidiary

DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...

, which was bought by Google in 2014. It tries to distinguish from its competitors,

Amazon Amazon most often refers to: * Amazon River, in South America * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon (company), an American multinational technology company * Amazons, a tribe of female warriors in Greek myth ...

and

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

. Most voice synthesizers (including Apple's

Siri Siri ( , backronym: Speech Interpretation and Recognition Interface) is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, Apple TV, audioOS, and visionOS operating sys ...

) use concatenative synthesis, in which a program stores individual

phonemes A phoneme () is any set of similar speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible phonetic unit—that helps distinguish one word from another. All languages con ...

and then pieces them together to form words and sentences. WaveNet synthesizes speech with human-like emphasis and inflection on syllables, phonemes, and words. Unlike most other text-to-speech systems, a WaveNet model creates raw audio waveforms from scratch. The model uses a neural network that has been trained using a large volume of speech samples. During training, the network extracts the underlying structure of the speech, such as which tones follow each other and what a realistic speech waveform looks like. When given a text input, the trained WaveNet model can generate the corresponding speech waveforms from scratch, one sample at a time, with up to 24,000 samples per second and smooth transitions between the individual sounds. The service was renamed Speech Recognition & Synthesis in 2023.

References

External links

* {{Android (operating system) Speech Recognition & Synthesis Screen readers Internet properties established in 2013

Supported languages

History

See also

References

External links