CMU Sphinx, also called Sphinx for short, is the general term to describe a group of
speech recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...
systems developed at
Carnegie Mellon University
Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institu ...
. These include a series of speech recognizers (Sphinx 2 - 4) and an
acoustic model trainer (SphinxTrain).
In 2000, the Sphinx group at Carnegie Mellon committed to open source several speech recognizer components, including Sphinx 2 and later Sphinx 3 (in 2001). The speech decoders come with acoustic models and sample applications. The available resources include in addition software for acoustic model training,
language model
A language model is a model of the human brain's ability to produce natural language. Language models are useful for a variety of tasks, including speech recognition, machine translation,Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013)"S ...
compilation and a
public domain
The public domain (PD) consists of all the creative work to which no Exclusive exclusive intellectual property rights apply. Those rights may have expired, been forfeited, expressly Waiver, waived, or may be inapplicable. Because no one holds ...
pronunciation dictionary,
cmudict.
Sphinx encompasses a number of software systems, described below.
Sphinx
Sphinx is a continuous-speech, speaker-independent recognition system making use of hidden Markov acoustic models (
HMMs) and an
n-gram
An ''n''-gram is a sequence of ''n'' adjacent symbols in particular order. The symbols may be ''n'' adjacent letter (alphabet), letters (including punctuation marks and blanks), syllables, or rarely whole words found in a language dataset; or ...
statistical language model. It was developed by
Kai-Fu Lee. Sphinx featured feasibility of continuous-speech, speaker-independent large-vocabulary recognition, the possibility of which was in dispute at the time (1986).
Sphinx is of historical interest only; it has been superseded in performance by subsequent versions.
Sphinx 2
A fast performance-oriented recognizer, originally developed by
Xuedong Huang at Carnegie Mellon and released as
open-source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
with a
BSD
The Berkeley Software Distribution (BSD), also known as Berkeley Unix or BSD Unix, is a discontinued Unix operating system developed and distributed by the Computer Systems Research Group (CSRG) at the University of California, Berkeley, beginni ...
-style license on
SourceForge
SourceForge is a web service founded by Geoffrey B. Jeffery, Tim Perdue, and Drew Streib in November 1999. SourceForge provides a centralized software discovery platform, including an online platform for managing and hosting open-source soft ...
by
Kevin Lenzo
Kevin Lenzo (born 1967) is an American computer scientist. He wrote the initial infobot, founded The Perl Foundation (and was its chairman until 2007) and the Yet Another Perl Conferences (YAPC)., released CMU Sphinx into Open source, founded ...
at LinuxWorld in 2000. Sphinx 2 focuses on real-time recognition suitable for spoken language applications. As such it incorporates functionality such as end-pointing, partial hypothesis generation, dynamic language model switching and so on. It is used in dialog systems and language learning systems. It can be used in computer based PBX systems such as
Asterisk
The asterisk ( ), from Late Latin , from Ancient Greek , , "little star", is a Typography, typographical symbol. It is so called because it resembles a conventional image of a star (heraldry), heraldic star.
Computer scientists and Mathematici ...
. Sphinx 2 code has also been incorporated into a number of commercial products. It is no longer under active development (other than for routine maintenance). Current real-time decoder development is taking place in the
Pocket Sphinx project.
Sphinx 3
Sphinx 2 used a ''semi-continuous'' representation for acoustic modeling (i.e., a single set of Gaussians is used for all models, with individual models represented as a weight vector over these Gaussians). Sphinx 3 adopted the prevalent ''continuous'' HMM representation and has been used primarily for high-accuracy, non-real-time recognition. Recent developments (in algorithms and in hardware) have made Sphinx 3 "near" real-time, although not yet suitable for critical interactive applications. Sphinx 3 is under active development and in conjunction with SphinxTrain provides access to a number of modern modeling techniques, such as LDA/MLLT, MLLR and VTLN, that improve recognition accuracy (see the article on
Speech Recognition
Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers. It is also ...
for descriptions of these techniques).
Sphinx 4
Sphinx 4 is a complete rewrite of the Sphinx engine with the goal of providing a more flexible framework for research in speech recognition, written entirely in the Java programming language.
Sun Microsystems
Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
supported the development of Sphinx 4 and contributed software engineering expertise to the project. Participants included individuals at MERL,
MIT
The Massachusetts Institute of Technology (MIT) is a private research university in Cambridge, Massachusetts, United States. Established in 1861, MIT has played a significant role in the development of many areas of modern technology and sc ...
and
CMU
Carnegie Mellon University (CMU) is a private research university in Pittsburgh, Pennsylvania, United States. The institution was established in 1900 by Andrew Carnegie as the Carnegie Technical Schools. In 1912, it became the Carnegie Institut ...
. (Currently supported languages are C, C++, C#, Python, Ruby, Java, and JavaScript.)
Current development goals include:
* developing a new (acoustic model) trainer
* implementing speaker adaptation (e.g. MLLR)
* improving configuration management
* creating a graph-based UI for graphical system design
PocketSphinx
A version of Sphinx that can be used in embedded systems (e.g., based on an
ARM
In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between ...
processor). PocketSphinx is under active development and incorporates features such as fixed-point arithmetic and efficient algorithms for
GMM computation.
See also
*
Speech recognition software for Linux
*
List of speech recognition software
Speech recognition software is available for many computing platforms, operating systems, use models, and software licenses. Here is a listing of such, grouped in various useful ways.
Acoustic models and speech corpus (compilation)
The following l ...
*
Project LISTEN
References
{{Reflist, 30em
External links
Sphinx developers recommend Vosk nowCMU Sphinx homepageSphinx' repositoryon GitHub should be considered the definitive source for code
SourceForgehosts older releases and files
NeXT on Campus Fall 1990(This document is postscript format compressed with gzip.) ''Carnegie Mellon University - Breakthroughs in speech recognition and document management'', pgs. 12-13
Free software projects
Speech recognition software
Software using the BSD license