CereProc ( ) is a

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...

company based in

Edinburgh Edinburgh ( ; gd, Dùn Èideann ) is the capital city of Scotland and one of its 32 Council areas of Scotland, council areas. Historically part of the county of Midlothian (interchangeably Edinburghshire before 1921), it is located in Lothian ...

, Scotland, founded in 2005. The company specialises in creating natural and expressive-sounding

text to speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or Computer hardware, hardware products. A text-to-speech (TTS) system conve ...

voices, synthesis voices with regional accents, and in

voice cloning Digital cloning is an emerging technology, that involves deep-learning algorithms, which allows one to manipulate currently existing Sound, audio, Photograph, photos, and videos that are hyper-realistic. One of the impacts of such technology is t ...

Voice building technology

CereProc creates voices using two different voice-building technologies: unit selection synthesis and parametric modelling. CereProc's unit selection voices are built from large

database In computing, a database is an organized collection of data stored and accessed electronically. Small databases can be stored on a file system, while large databases are hosted on computer clusters or cloud storage. The design of databases sp ...

s of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual

phones A telephone is a telecommunications device that permits two or more users to conduct a conversation when they are too far apart to be easily heard directly. A telephone converts sound, typically and most efficiently the human voice, into ele ...

syllable A syllable is a unit of organization for a sequence of speech sounds typically made up of a syllable nucleus (most often a vowel) with optional initial and final margins (typically, consonants). Syllables are often considered the phonological "bu ...

morpheme A morpheme is the smallest meaningful Constituent (linguistics), constituent of a linguistic expression. The field of linguistics, linguistic study dedicated to morphemes is called morphology (linguistics), morphology. In English, morphemes are ...

word A word is a basic element of language that carries an semantics, objective or pragmatics, practical semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of w ...

phrase In syntax and grammar, a phrase is a group of words or singular word acting as a grammatical unit. For instance, the English expression "the very happy squirrel" is a noun phrase which contains the adjective phrase "very happy". Phrases can consi ...

s, and

sentences ''The Four Books of Sentences'' (''Libri Quattuor Sententiarum'') is a book of theology written by Peter Lombard in the 12th century. It is a systematic compilation of theology, written around 1150; it derives its name from the ''sententiae'' o ...

. The division into segments is done using a specially modified speech recogniser. An

index Index (or its plural form indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on a Halo megastru ...

of the units in the speech database is then created based on the segmentation and acoustic parameters like the

fundamental frequency The fundamental frequency, often referred to simply as the ''fundamental'', is defined as the lowest frequency of a periodic waveform. In music, the fundamental is the musical pitch of a note that is perceived as the lowest partial present. In ...

( pitch), duration, position in the syllable, and neighbouring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection). Unit selection provides the greatest naturalness, because it applies

digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are ...

(DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural. CereProc's parametric voices produce speech synthesis based on statistical modelling methodologies. In this system, the

frequency spectrum The power spectrum S_(f) of a time series x(t) describes the distribution of power into frequency components composing that signal. According to Fourier analysis, any physical signal can be decomposed into a number of discrete frequencies, ...

(

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx (biology), syrinx in birds) is filtered. In birds it consists of the Vertebrate trachea, trachea, the Syrinx (bio ...

(vocal source), and duration ( prosody) of speech are modelled simultaneously. Speech

waveforms In electronics, acoustics, and related fields, the waveform of a signal is the shape of its graph as a function of time, independent of its time and magnitude scales and of any displacement in time.David Crecraft, David Gorham, ''Electronics ...

are generated from these parameters using a

vocoder A vocoder (, a portmanteau of ''voice'' and ''encoder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder was ...

. Critically, these voices can be built from significantly less recorded speech than unit selection voices and have a much smaller footprint when installed, because of this they are used for private voice cloning.

Voices and languages

CereProc has 58 generally available voices that speak 23 languages in a number of different regional accents: *American English: Isabella, Katherine, Hannah, Megan, Adam, Nathan, Andy (child voice), Jordan (child voice), Carolyn, Sam (Non-binary) *Southern English: Sarah, William, Jack, Lauren, Giles, Amy, Lily (child voice) *Northern English: Jess *Scottish English: Heather, Kirsty, Stuart, Andrew (child voice), Mairi (child voice) *Glasgow English: Dodo *Lancashire English: Claire *Irish English: Caitlin *West Midlands English: Sue *Special FX voices: Demon, Ghost, Goblin, Pixie, Robot *Metropolitan French: Suzanne, Laurent *Canadian French: Florence *Catalan: Rita *Castilian Spanish: Sara *Mexican Spanish: Ana *Italian: Laura, Dario, Francesco (child voice), Nicoletta (child voice) *Irish: Peig *Dutch: Ada *Standard German: Gudrun, Alex *Austrian German: Leopold *European Portuguese: Lúcia *Brazilian Portuguese: Gabriel *Japanese: Yuki *Scottish Gaelic: Ceitidh *Swedish: Ylva *Polish: Pola *Romanian: Daria *French-accented English: Nicole *Russian: Avrora *Mandarin: Mailin *Danish: Marie *Norwegian (Bokmål): Clara *Norwegian (Nynorsk): Hulda *Lithuanian: Mantas, Egle In addition, the company has developed a number of celebrity voices that are not generally available to the public. These include

George W. Bush George Walker Bush (born July 6, 1946) is an American politician who served as the 43rd president of the United States from 2001 to 2009. A member of the Republican Party, Bush family, and son of the 41st president George H. W. Bush, he ...

Barack Obama Barack Hussein Obama II ( ; born August 4, 1961) is an American politician who served as the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, Obama was the first African-American president of the U ...

and

Arnold Schwarzenegger Arnold Alois Schwarzenegger (born July 30, 1947) is an Austrian and American actor, film producer, businessman, retired professional bodybuilder and politician who served as the 38th governor of California between 2003 and 2011. ''Time'' ...

Voice cloning

In 2009, film critic

Roger Ebert Roger Joseph Ebert (; June 18, 1942 – April 4, 2013) was an American film critic, film historian, journalist, screenwriter, and author. He was a film critic for the ''Chicago Sun-Times'' from 1967 until his death in 2013. In 1975, Ebert beca ...

employed CereProc to create a synthetic version of his voice. Ebert had lost the power of speech following surgery to treat

thyroid cancer Thyroid cancer is cancer that develops from the tissues of the thyroid gland. It is a disease in which cells grow abnormally and have the potential to spread to other parts of the body. Symptoms can include swelling or a lump in the neck. C ...

. CereProc mined tapes and DVD commentaries featuring Ebert's voice to create a text-to-speech voice that sounded more like his own. Roger Ebert used the voice in his March 2, 2010, appearance on ''

The Oprah Winfrey Show ''The Oprah Winfrey Show'', often referred to as ''The Oprah Show'' or simply ''Oprah'', is an American daytime broadcast syndication, syndicated talk show that aired nationally for 25 seasons from September 8, 1986, to May 25, 2011, in Chicag ...

''. NFL player

Steve Gleason Stephen Michael "Steve" Gleason (born March 19, 1977) is a former professional American football Safety (gridiron football position), safety with the New Orleans Saints of the National Football League (NFL). Originally signed by the Indianapolis ...

had his voice cloned by CereProc following his diagnosis with MND. Gleason appeared in

Microsoft Microsoft Corporation is an American multinational technology corporation producing computer software, consumer electronics, personal computers, and related services headquartered at the Microsoft Redmond campus located in Redmond, Washing ...

Super Bowl XLVIII Super Bowl XLVIII was an American football game between the American Football Conference (AFC) champion Denver Broncos and National Football Conference (NFC) champion Seattle Seahawks to decide the National Football League (NFL) champion for th ...

commercial praising the power of technology, using his synthetic voice to narrate. CereProc voice cloning technology is currently being used in the UK by people with MND, to create synthesis voices before they lose the power of speech. This process was featured in a

BBC Radio 4 BBC Radio 4 is a British national radio station owned and operated by the BBC that replaced the BBC Home Service in 1967. It broadcasts a wide variety of spoken-word programmes, including news, drama, comedy, science and history from the BBC' ...

documentary, ''Giving the Critic Back His Voice'', broadcast in August 2011."Giving the Critic Back His Voice"

BBC #REDIRECT BBC #REDIRECT BBC #REDIRECT BBC Here i going to introduce about the best teacher of my life b BALAJI sir. He is the precious gift that I got befor 2yrs . How has helped and thought all the concept and made my success in the 10th board ex ...

Radio Scotland Programmes. Retrieved October 26, 2011.

System compatibility

CereProc voices can be deployed on different

operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also in ...

s and on different types of devices. CereProc desktop voices are compatible with

Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for serv ...

and Apple Mac

OS X macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...

. They install as system voices and are able to be used by other speech-enabled applications. CereProc's client/server system cServer, aimed principally at the corporate IVR market, can be run on Windows and

Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, which ...

. CereProc Mobile voices can be deployed on

Android Android may refer to: Science and technology * Android (robot), a humanoid robot or synthetic organism designed to imitate a human * Android (operating system), Google's mobile operating system ** Bugdroid, a Google mascot sometimes referred to ...

and Apple

iOS iOS (formerly iPhone OS) is a mobile operating system created and developed by Apple Inc. exclusively for its hardware. It is the operating system that powers many of the company's mobile devices, including the iPhone; the term also includes ...

. The SDK is available for Android, Linux, MacOS, iOS, and Windows. The SDK has bindings for C/C++, C#, Java, and Python.

References

External links