CereProc ( ) is a

speech synthesis Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

company based in

Edinburgh Edinburgh is the capital city of Scotland and one of its 32 Council areas of Scotland, council areas. The city is located in southeast Scotland and is bounded to the north by the Firth of Forth and to the south by the Pentland Hills. Edinburgh ...

, Scotland, founded in 2005. The company specialises in creating natural and expressive-sounding text to speech voices, synthesis voices with regional accents, and in voice cloning.

Voice building technology

CereProc creates voices using two different voice-building technologies: unit selection synthesis and parametric modelling. CereProc's unit selection voices are built from large

database In computing, a database is an organized collection of data or a type of data store based on the use of a database management system (DBMS), the software that interacts with end users, applications, and the database itself to capture and a ...

s of recorded speech. During database creation, each recorded utterance is segmented into some or all of the following: individual phones,

syllable A syllable is a basic unit of organization within a sequence of speech sounds, such as within a word, typically defined by linguists as a ''nucleus'' (most often a vowel) with optional sounds before or after that nucleus (''margins'', which are ...

morpheme A morpheme is any of the smallest meaningful constituents within a linguistic expression and particularly within a word. Many words are themselves standalone morphemes, while other words contain multiple morphemes; in linguistic terminology, this ...

word A word is a basic element of language that carries semantics, meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consensus among linguist ...

phrase In grammar, a phrasecalled expression in some contextsis a group of words or singular word acting as a grammatical unit. For instance, the English language, English expression "the very happy squirrel" is a noun phrase which contains the adject ...

s, and

sentences The ''Sentences'' (. ) is a compendium of Christian theology written by Peter Lombard around 1150. It was the most important religious textbook of the Middle Ages. Background The sentence genre emerged from works like Prosper of Aquitaine's ...

. The division into segments is done using a specially modified speech recogniser. An

index Index (: indexes or indices) may refer to: Arts, entertainment, and media Fictional entities * Index (''A Certain Magical Index''), a character in the light novel series ''A Certain Magical Index'' * The Index, an item on the Halo Array in the ...

of the units in the speech database is then created based on the segmentation and acoustic parameters like the

fundamental frequency The fundamental frequency, often referred to simply as the ''fundamental'' (abbreviated as 0 or 1 ), is defined as the lowest frequency of a Periodic signal, periodic waveform. In music, the fundamental is the musical pitch (music), pitch of a n ...

( pitch), duration, position in the syllable, and neighbouring phones. At runtime, the desired target utterance is created by determining the best chain of candidate units from the database (unit selection). Unit selection provides the greatest naturalness, because it applies

digital signal processing Digital signal processing (DSP) is the use of digital processing, such as by computers or more specialized digital signal processors, to perform a wide variety of signal processing operations. The digital signals processed in this manner are a ...

(DSP) to the recorded speech only at concatenation points. DSP often makes recorded speech sound less natural. CereProc's parametric voices produce speech synthesis based on statistical modelling methodologies. In this system, the

frequency spectrum In signal processing, the power spectrum S_(f) of a continuous time signal x(t) describes the distribution of power into frequency components f composing that signal. According to Fourier analysis, any physical signal can be decomposed int ...

(

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source (larynx in mammals; syrinx in birds) is filtered. In birds, it consists of the trachea, the syrinx, the oral cavity, the upper part of t ...

(vocal source), and duration ( prosody) of speech are modelled simultaneously. Speech

waveforms In electronics, acoustics, and related fields, the waveform of a signal is the shape of its graph as a function of time, independent of its time and magnitude scales and of any displacement in time.David Crecraft, David Gorham, ''Electroni ...

are generated from these parameters using a

vocoder A vocoder (, a portmanteau of ''vo''ice and en''coder'') is a category of speech coding that analyzes and synthesizes the human voice signal for audio data compression, multiplexing, voice encryption or voice transformation. The vocoder wa ...

. Critically, these voices can be built from significantly less recorded speech than unit selection voices and have a much smaller footprint when installed, because of this they are used for private voice cloning.

Voices and languages

CereProc has 81 generally-available voices that speak 24 languages in a number of different regional accents: *American English: Isabella, Katherine, Hannah, Megan, Adam, Nathan, Andy (child voice), Jordan (child voice), Carolyn, Sam (gender neutral voice) *Southern English: Sarah, William, Jack, Lauren, Giles, Amy, Lily (child voice), Ben (child voice) *Northern English: Jess *Scottish English: Heather, Kirsty, Stuart, Andrew (child voice), Mairi (child voice) *Glasgow English: Dodo *Lancashire English: Claire *Irish English: Caitlin *Welsh English: Seren (child voice), Catrin (child voice), Gethin (child voice), Owain (child voice), Rhodri (teenage voice), Tomos (teenage voice), Ffion (teenage voice), Rhian (teenage voice) *West Midlands English: Sue *Special FX voices: Demon, Ghost, Goblin, Pixie, Robot *Metropolitan French: Suzanne, Laurent *Canadian French: Florence *Catalan: Rita *Castilian Spanish: Sara *Mexican Spanish: Ana *Italian: Laura, Dario, Francesco (child voice), Nicoletta (child voice) *Irish: Peig *Dutch: Ada *Standard German: Gudrun, Alex *Austrian German: Leopold *European Portuguese: Lúcia *Brazilian Portuguese: Gabriel *Japanese: Yuki *Scottish Gaelic: Ceitidh *Swedish: Ylva, Anders *Polish: Pola *Romanian: Daria *French-accented English: Nicole *Russian: Avrora *Mandarin: Mailin *Danish: Marie, Lars *Norwegian (Bokmål): Clara, Magnus *Norwegian (Nynorsk): Hulda *Lithuanian: Mantas, Egle *Welsh: Seren (child voice), Catrin (child voice), Gethin (child voice), Owain (child voice), Rhodri (teenage voice), Tomos (teenage voice), Ffion (teenage voice), Rhian (teenage voice) In addition, the company has developed a number of celebrity voices that are not generally available to the public. These include

George W. Bush George Walker Bush (born July 6, 1946) is an American politician and businessman who was the 43rd president of the United States from 2001 to 2009. A member of the Bush family and the Republican Party (United States), Republican Party, he i ...

Barack Obama Barack Hussein Obama II (born August 4, 1961) is an American politician who was the 44th president of the United States from 2009 to 2017. A member of the Democratic Party, he was the first African American president in American history. O ...

and

Arnold Schwarzenegger Arnold Alois Schwarzenegger (born July30, 1947) is an Austrian and American actor, businessman, former politician, and former professional bodybuilder, known for his roles in high-profile action films. Governorship of Arnold Schwarzenegger, ...

Voice cloning

In 2009, film critic

Roger Ebert Roger Joseph Ebert ( ; June 18, 1942 – April 4, 2013) was an American Film criticism, film critic, film historian, journalist, essayist, screenwriter and author. He wrote for the ''Chicago Sun-Times'' from 1967 until his death in 2013. Eber ...

employed CereProc to create a synthetic version of his voice. Ebert had lost the power of speech following surgery to treat

thyroid cancer Thyroid cancer is cancer that develops from the tissues of the thyroid gland. It is a disease in which cells grow abnormally and have the potential to spread to other parts of the body. Symptoms can include swelling or a lump in the neck, ...

. CereProc mined tapes and DVD commentaries featuring Ebert's voice to create a text-to-speech voice that sounded more like his own. Roger Ebert used the voice in his March 2, 2010, appearance on ''

The Oprah Winfrey Show ''The Oprah Winfrey Show'' is an American first-run syndicated talk show that was hosted by Oprah Winfrey. The show ran for twenty-five seasons from September 8, 1986, to May 25, 2011, in which it broadcast 4,561 episodes. The show was taped i ...

''. NFL player Steve Gleason had his voice cloned by CereProc following his diagnosis with MND. Gleason appeared in

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

Super Bowl XLVIII Super Bowl XLVIII was an American football game between the American Football Conference (AFC) champion 2013 Denver Broncos season, Denver Broncos and National Football Conference (NFC) champion 2013 Seattle Seahawks season, Seattle Seahawks to ...

commercial praising the power of technology, using his synthetic voice to narrate. CereProc voice cloning technology is currently being used in the UK by people with MND, to create synthesis voices before they lose the power of speech. This process was featured in a

BBC Radio 4 BBC Radio 4 is a British national radio station owned and operated by the BBC. The station replaced the BBC Home Service on 30 September 1967 and broadcasts a wide variety of spoken-word programmes from the BBC's headquarters at Broadcasti ...

documentary, ''Giving the Critic Back His Voice'', broadcast in August 2011."Giving the Critic Back His Voice"

BBC The British Broadcasting Corporation (BBC) is a British public service broadcaster headquartered at Broadcasting House in London, England. Originally established in 1922 as the British Broadcasting Company, it evolved into its current sta ...

Radio Scotland Programmes. Retrieved October 26, 2011.

System compatibility

CereProc voices can be deployed on different

operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...

s and on different types of devices. CereProc desktop voices are compatible with

Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...

and Apple Mac

OS X macOS, previously OS X and originally Mac OS X, is a Unix, Unix-based operating system developed and marketed by Apple Inc., Apple since 2001. It is the current operating system for Apple's Mac (computer), Mac computers. With ...

. They install as system voices and are able to be used by other speech-enabled applications. CereProc's client/server system cServer, aimed principally at the corporate IVR market, can be run on Windows and

Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...

. CereProc Mobile voices can be deployed on Android and Apple

iOS Ios, Io or Nio (, ; ; locally Nios, Νιός) is a Greek island in the Cyclades group in the Aegean Sea. Ios is a hilly island with cliffs down to the sea on most sides. It is situated halfway between Naxos and Santorini. It is about long an ...

. The SDK is available for Android, Linux, MacOS, iOS, and Windows. The SDK has bindings for C/C++, C#, Java, and Python.

References

External links