Google Text-to-Speech
   HOME

TheInfoList



OR:

Speech Recognition & Synthesis, formerly known as Speech Services, is a
screen reader A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to blindness, blind people, and are useful to visually impaired people, Illiteracy, illiterate, ...
application developed by
Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...
for its Android operating system. It powers applications to read aloud (speak) the text on the screen, with support for many languages. Text-to-Speech may be used by apps such as
Google Play Books Google Play Books, formerly Google eBooks, is an ebook digital distribution service operated by Google, part of its Google Play product line. Users can purchase and download ebooks and audiobooks from Google Play, which offers over five million ...
for reading books aloud,
Google Translate Google Translate is a multilingualism, multilingual neural machine translation, neural machine translation service developed by Google to translation, translate text, documents and websites from one language into another. It offers a web applic ...
for reading aloud translations for the pronunciation of words, Google TalkBack, and other spoken feedback accessibility-based applications, as well as by third-party apps. Users must install voice data for each language.


Supported languages

* Afrikaans (South Africa) * Albanian (Albania) * Amharic (Ethiopia) * Arabic (Saudi Arabia) * Assamese (India) * Basque (Spain) * Bengali (Bangladesh) * Bengali (India) * Bodo (India) * Bosnian (Bosnia and Herzegovina) * Bulgarian (Bulgaria) * Burmese (Myanmar) * Cantonese (Hong Kong) * Catalan (Spain) * Chinese (China) * Chinese (Taiwan) * Croatian (Croatia) * Czech (Czech Republic) * Danish (Denmark) * Dogri (India) * Dutch (Belgium) * Dutch (Netherlands) * English (Australia) * English (Nigeria) * English (India) * English (United Kingdom) * English (United States) * Estonian (Estonia) * Filipino (Philippines) * Finnish (Finland) * French (Canada) * French (France) * Galician (Spain) * German (Germany) * Greek (Greece) * Gujarati (India) * Hausa (Nigeria) * Hebrew (Israel) * Hindi (India) * Hungarian (Hungary) * Icelandic (Iceland) * Indonesian (Indonesia) * Italian (Italy) * Japanese (Japan) * Javanese (Indonesia) * Kannada (India) * Kashmiri (India) * Khmer (Cambodia) * Konkani (India) * Korean (South Korea) * Latin (Vatican City) * Latvian (Latvia) * Lithuanian (Lithuania) * Maithili (India) * Malay (Malaysia) * Malayalam (India) * Manipuri (India) * Marathi (India) * Nepali (Nepal) * Norwegian (Norway) * Odia (India) * Polish (Poland) * Portuguese (Brazil) * Portuguese (Portugal) * Punjabi (India) * Romanian (Romania) * Russian (Russia) * Sanskrit (India) * Santali (India) * Serbian (Serbia) * Sindhi (India) * Sinhala (Sri Lanka) * Slovak (Slovakia) * Slovenian (Slovenia) * Spanish (Spain) * Spanish (United States) * Sundanese (Indonesia) * Swahili (Kenya) * Swedish (Sweden) * Tamil (India) * Telugu (India) * Thai (Thailand) * Turkish (Turkey) * Ukrainian (Ukraine) * Urdu (Pakistan) * Urdu (India) * Vietnamese (Vietnam) * Welsh (United Kingdom)


History

Some app developers have started adapting and tweaking their Android Auto apps to include Text-to-Speech, such as Hyundai in 2015. Apps such as textPlus and
WhatsApp WhatsApp (officially WhatsApp Messenger) is an American social media, instant messaging (IM), and voice-over-IP (VoIP) service owned by technology conglomerate Meta. It allows users to send text, voice messages and video messages, make vo ...
use Text-to-Speech to read notifications aloud and provide voice-reply functionality. Google Cloud Text-to-Speech is powered by WaveNet, software created by Google's UK-based AI subsidiary
DeepMind DeepMind Technologies Limited, trading as Google DeepMind or simply DeepMind, is a British–American artificial intelligence research laboratory which serves as a subsidiary of Alphabet Inc. Founded in the UK in 2010, it was acquired by Go ...
, which was bought by Google in 2014. It tries to distinguish from its competitors,
Amazon Amazon most often refers to: * Amazon River, in South America * Amazon rainforest, a rainforest covering most of the Amazon basin * Amazon (company), an American multinational technology company * Amazons, a tribe of female warriors in Greek myth ...
and
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
. Most voice synthesizers (including Apple's
Siri Siri ( , backronym: Speech Interpretation and Recognition Interface) is a digital assistant purchased, developed, and popularized by Apple Inc., which is included in the iOS, iPadOS, watchOS, macOS, Apple TV, audioOS, and visionOS operating sys ...
) use concatenative synthesis, in which a program stores individual
phonemes A phoneme () is any set of similar speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible phonetic unit—that helps distinguish one word from another. All languages con ...
and then pieces them together to form words and sentences. WaveNet synthesizes speech with human-like emphasis and inflection on syllables, phonemes, and words. Unlike most other text-to-speech systems, a WaveNet model creates raw audio waveforms from scratch. The model uses a neural network that has been trained using a large volume of speech samples. During training, the network extracts the underlying structure of the speech, such as which tones follow each other and what a realistic speech waveform looks like. When given a text input, the trained WaveNet model can generate the corresponding speech waveforms from scratch, one sample at a time, with up to 24,000 samples per second and smooth transitions between the individual sounds. The service was renamed Speech Recognition & Synthesis in 2023.


See also

*
Speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...
*
VoiceOver Voice-over (also known as off-camera or off-stage commentary) is a production technique used in radio, television, filmmaking, theatre, and other media in which a descriptive or expository voice that is not part of the narrative (i.e., non- ...
* Live Transcribe


References


External links

* {{Android (operating system) Speech Recognition & Synthesis Screen readers Internet properties established in 2013