Gnuspeech is an extensible

text-to-speech Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal langua ...

computer software package that produces artificial speech output based on real-time articulatory speech synthesis by rules. That is, it converts text strings into phonetic descriptions, aided by a pronouncing dictionary, letter-to-sound rules, and rhythm and intonation models; transforms the phonetic descriptions into parameters for a low-level articulatory

speech synthesizer Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal languag ...

; uses these to drive an articulatory model of the human

vocal tract The vocal tract is the cavity in human bodies and in animals where the sound produced at the sound source ( larynx in mammals; syrinx in birds) is filtered. In birds it consists of the trachea, the syrinx, the oral cavity, the upper part of th ...

producing an output suitable for the normal sound output devices used by various computer

operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...

s; and does this at the same or faster rate than the speech is spoken for adult speech.

Design

The synthesizer is a tube resonance, or waveguide, model that models the behavior of the real

directly, and reasonably accurately, unlike formant synthesizers that indirectly model the speech spectrum. The control problem is solved by using René Carré's Distinctive Region Model which relates changes in the radii of eight longitudinal divisions of the vocal tract to corresponding changes in the three frequency

formants In speech science and phonetics, a formant is the broad spectral maximum that results from an acoustic resonance of the human vocal tract. In acoustics, a formant is usually defined as a broad peak, or local maximum, in the spectrum. For harmoni ...

in the speech spectrum that convey much of the information of speech. The regions are, in turn, based on work by the Stockholm Speech Technology Laboratory of the Royal Institute of Technology (

KTH KTH may refer to: * Keat Hong LRT station, Singapore, LRT station abbreviation * Kent House railway station, London, National Rail station code * KTH Royal Institute of Technology, a university in Sweden * KTH Krynica, a Polish ice hockey team * Khy ...

) on "formant sensitivity analysis" - that is, how formant frequencies are affected by small changes in the radius of the vocal tract at various places along its length.

History

Gnuspeech was originally commercial software produced by the now-defunct Trillium Sound Research for the

NeXT Next may refer to: Arts and entertainment Film * ''Next'' (1990 film), an animated short about William Shakespeare * ''Next'' (2007 film), a sci-fi film starring Nicolas Cage * '' Next: A Primer on Urban Painting'', a 2005 documentary film Lit ...

computer as various grades of "TextToSpeech" kit. Trillium Sound Research was a

technology transfer Technology transfer (TT), also called transfer of technology (TOT), is the process of transferring (disseminating) technology from the person or organization that owns or holds it to another person or organization, in an attempt to transform invent ...

spin-off company formed at the University of Calgary, Alberta, Canada, based on long-standing research in the computer science department on computer-human interaction using speech, where papers and manuals relevant to the system are maintained. The initial version in 1992 used a formant-based speech synthesiser. When NeXT ceased manufacturing hardware, the synthesizer software was completely re-written and also ported to NSFIP (NextStep For Intel Processors) using the waveguide approach to acoustic tube modeling based on the research at the Center for Computer Research in Music and Acoustics ( CCRMA) at Stanford University, especially the Music Kit. The synthesis approach is explained in more detail in a paper presented to the American Voice I/O Society in 1995. The system used the onboard 56001 Digital Signal Processor (DSP) on the NeXT computer and a Turtle Beach add-on board with the same DSP on the NSFIP version to run the waveguide (also known as the tube model). Speed limitations meant that the shortest vocal tract length that could be used for speech in real time (that is, generated at the same or faster rate than it was "spoken") was around 15 centimeters, because the sample rate for the waveguide computations increases with decreasing vocal tract length. Faster processor speeds are progressively removing this restriction, an important advance for producing children's speech in real time. Since

NeXTSTEP NeXTSTEP is a discontinued object-oriented, multitasking operating system based on the Mach kernel and the UNIX-derived BSD. It was developed by NeXT Computer in the late 1980s and early 1990s and was initially used for its range of propri ...

is discontinued and

computers are rare, one option for executing the original code is the use of

virtual machine In computing, a virtual machine (VM) is the virtualization/ emulation of a computer system. Virtual machines are based on computer architectures and provide functionality of a physical computer. Their implementations may involve specialized h ...

s. The Previous emulator, for example, can emulate the DSP in

computers, which can be used by the Trillium software. Trillium ceased trading in the late 1990s and the Gnuspeech project was first entered into the GNU Savannah repository under the terms of the

GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general ...

in 2002, as an official

GNU GNU () is an extensive collection of free software (383 packages as of January 2022), which can be used as an operating system or can be used in parts with other operating systems. The use of the completed GNU tools led to the family of operat ...

software. Due to its

free and open source Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...

license, which allows customization of the code, Gnuspeech has been utilized in academic research. Xiong, F.; Barker, J. - Deep Learning of Articulatory-Based Representations and Applications for Improving Dysarthric Speech Recognition. ITG Conference on Speech Communication, Germany, 2018.

References

External links

Gnuspeech on GNU Savannah

Overview of the Gnuspeech system
{{Speech synthesis Cross-platform free software Free speech synthesis software