VoxForge is a free

speech corpus A speech corpus (or spoken corpus) is a database of speech audio files and text transcriptions. In speech technology, speech corpora are used, among other things, to create acoustic models (which can then be used with a speech recognition or spea ...

and acoustic model repository for

open source Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use the source code, design documents, or content of the product. The open-source model is a decentralized sof ...

speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the m ...

engines. VoxForge was set up to collect transcribed speech to create a free

GPL The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the four freedoms to run, study, share, and modify the software. The license was the first copyleft for general u ...

speech corpus for use with open source speech recognition engines. The speech audio files will be 'compiled' into acoustic models for use with open source speech recognition engines such as

Julius The gens Julia (''gēns Iūlia'', ) was one of the most prominent patrician families in ancient Rome. Members of the gens attained the highest dignities of the state in the earliest times of the Republic. The first of the family to obtain the ...

, ISIP, and

Sphinx A sphinx ( , grc, σφίγξ , Boeotian: , plural sphinxes or sphinges) is a mythical creature with the head of a human, the body of a lion, and the wings of a falcon. In Greek tradition, the sphinx has the head of a woman, the haunches of ...

and HTK (note: HTK has distribution restrictions). VoxForge hasForum post on voxforge.org
/ref> used

LibriVox LibriVox is a group of worldwide volunteers who read and record public domain texts, creating free public domain audiobooks for download from their website and other digital library hosting sites on the internet. It was founded in 2005 by Hugh Mc ...

as a source of audio data since 2007.

References

Sources

Deep learning for spoken language identification

* ttps://www.cs.cmu.edu/~ianlane/pub/LANE-mturk10.pdf Tools for Collecting Speech Corpora via Mechanical-Turk
An Integrated Approach to Robust Speech Recognition for a Command and Control Application on the Motorcycle

External links

* Computational linguistics Free software projects Speech recognition Speech recognition software Corpora {{corpora-stub

See also

References

Sources

External links