CMU Sphinx, also called Sphinx for short, is the general term to describe a group of

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...

systems developed at

Carnegie Mellon University Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania. One of its predecessors was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools; it became the Carnegie Institute of Technology ...

. These include a series of speech recognizers (Sphinx 2 - 4) and an acoustic model trainer (SphinxTrain). In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training,

language model A language model is a probability distribution over sequences of words. Given any sequence of words of length , a language model assigns a probability P(w_1,\ldots,w_m) to the whole sequence. Language models generate probabilities by training on ...

compilation and a

public domain The public domain (PD) consists of all the creative work A creative work is a manifestation of creative effort including fine artwork (sculpture, paintings, drawing, sketching, performance art), dance, writing (literature), filmmaking, ...

pronunciation dictionary,

cmudict The CMU Pronouncing Dictionary (also known as CMUdict) is an open-source pronouncing dictionary originally created by the Speech Group at Carnegie Mellon University (CMU) for use in speech recognition research. CMUdict provides a mapping orthograp ...

. Sphinx encompasses a number of software systems, described below.

Sphinx

Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models ( HMMs) and an n-gram statistical language model. It was developed by

Kai-Fu Lee Kai-Fu Lee (; born December 3, 1961) is a Taiwanese computer scientist, businessman, and writer. He is currently based in Beijing, China. Lee developed a speaker-independent, continuous speech recognition system as his Ph.D. thesis at Carnegie ...

. Sphinx featured feasibility of continuous-speech, speaker-independent large-vocabulary recognition, the possibility of which was in dispute at the time (1986). Sphinx is of historical interest only; it has been superseded in performance by subsequent versions. An archival articlelee_k_f_1990_1.pdf
/ref> describes the system in detail.

Sphinx 2

A fast performance-oriented recognizer, originally developed by

Xuedong Huang Xuedong D. Huang (born October 20, 1962) is a Chinese American computer scientist and technology executive who has made contributions to spoken language processing and AI Cognitive Services. He is Microsoft's Technical Fellow and Chief Technology ...

at Carnegie Mellon and released as

open-source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...

with a

BSD The Berkeley Software Distribution or Berkeley Standard Distribution (BSD) is a discontinued operating system based on Research Unix, developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berk ...

-style license on

SourceForge SourceForge is a web service that offers software consumers a centralized online location to control and manage open-source software projects and research business software. It provides source code repository hosting, bug tracking, mirrorin ...

Kevin Lenzo Kevin Lenzo (born 1967) is an American computer scientist. He wrote the initial infobot, founded The Perl Foundation (and was its chairman until 2007) and the Yet Another Perl Conferences (YAPC)., released CMU Sphinx into Open source, founded C ...

at LinuxWorld in 2000. Sphinx 2 focuses on real-time recognition suitable for spoken language applications. As such it incorporates functionality such as end-pointing, partial hypothesis generation, dynamic language model switching and so on. It is used in dialog systems and language learning systems. It can be used in computer based PBX systems such as

Asterisk The asterisk ( ), from Late Latin , from Ancient Greek , ''asteriskos'', "little star", is a typographical symbol. It is so called because it resembles a conventional image of a heraldic star. Computer scientists and mathematicians often voc ...

. Sphinx 2 code has also been incorporated into a number of commercial products. It is no longer under active development (other than for routine maintenance). Current real-time decoder development is taking place in the Pocket Sphinx project. An archival articlehuang92sphinxii.pdf
/ref> describes the system.

Sphinx 3

Sphinx 2 used a ''semi-continuous'' representation for acoustic modeling (i.e., a single set of Gaussians is used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx 3 adopted the prevalent ''continuous'' HMM representation and has been used primarily for high-accuracy, non-real-time recognition. Recent developments (in algorithms and in hardware) have made Sphinx 3 "near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active development and in conjunction with SphinxTrain provides access to a number of modern modeling techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article on

Speech Recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...

for descriptions of these techniques).

Sphinx 4

Sphinx 4 is a complete rewrite of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition, written entirely in the Java programming language.

Sun Microsystems Sun Microsystems, Inc. (Sun for short) was an American technology company that sold computers, computer components, software, and information technology services and created the Java programming language, the Solaris operating system, ZFS, the ...

supported the development of Sphinx 4 and contributed software engineering expertise to the project. Participants included individuals at MERL,

MIT The Massachusetts Institute of Technology (MIT) is a private land-grant research university in Cambridge, Massachusetts. Established in 1861, MIT has played a key role in the development of modern technology and science, and is one of the m ...

and CMU. (Currently supported languages are C, C++, C#, Python, Ruby, Java, and JavaScript.) Current development goals include: * developing a new (acoustic model) trainer * implementing speaker adaptation (e.g. MLLR) * improving configuration management * creating a graph-based UI for graphical system design

PocketSphinx

A version of Sphinx that can be used in embedded systems (e.g., based on an

ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...

processor). PocketSphinx is under active development and incorporates features such as fixed-point arithmetic and efficient algorithms for GMM computation.

References

{{Reflist, 30em

External links

CMU Sphinx homepage

Sphinx' repository
on GitHub should be considered the definitive source for code
SourceForge
hosts older releases and files
NeXT on Campus Fall 1990
(This document is postscript format compressed with gzip.) ''Carnegie Mellon University - Breakthroughs in speech recognition and document management'', pgs. 12-13 Free software projects Speech recognition software Software using the BSD license