HOME

TheInfoList



OR:

Lingua Libre is an online collaborative project and tool by the
Wikimedia France Wikimedia chapters are national or sub-national not-for-profit organizations created to promote the interests of Wikimedia projects locally. Chapters are legally independent of the Wikimedia Foundation, entering into an agreement with the founda ...
association, which aims to build a
collaborative Collaboration (from Latin ''com-'' "with" + ''laborare'' "to labor", "to work") is the process of two or more people, entities or organizations working together to complete a task or achieve a goal. Collaboration is similar to cooperation. Most ...
,
multilingual Multilingualism is the use of more than one language, either by an individual speaker or by a group of speakers. It is believed that multilingual speakers outnumber monolingual speakers in the world's population. More than half of all Eu ...
,
audiovisual Audiovisual (AV) is electronic media possessing both a sound and a visual component, such as slide-tape presentations, films, television programs, corporate conferencing, church services, and live theater productions. Audiovisual service pr ...
corpus Corpus is Latin for "body". It may refer to: Linguistics * Text corpus, in linguistics, a large and structured set of texts * Speech corpus, in linguistics, a large set of speech audio files * Corpus linguistics, a branch of linguistics Music * ...
under
free license A free license or open license is a license which allows others to reuse another creator’s work as they wish. Without a special license, these uses are normally prohibited by copyright, patent or commercial license. Most free licenses are w ...
.


Description

Lingua Libre enables to record
words A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
,
phrases In syntax and grammar, a phrase is a group of words or singular word acting as a grammatical unit. For instance, the English expression "the very happy squirrel" is a noun phrase which contains the adjective phrase "very happy". Phrases can cons ...
or
sentences ''The Four Books of Sentences'' (''Libri Quattuor Sententiarum'') is a book of theology written by Peter Lombard in the 12th century. It is a systematic compilation of theology, written around 1150; it derives its name from the ''sententiae'' o ...
of any language, oral (
audio recording Sound recording and reproduction is the electrical, mechanical, electronic, or digital inscription and re-creation of sound waves, such as spoken voice, singing, instrumental music, or sound effects. The two main classes of sound recording t ...
) or signed (
video recording Video is an electronic medium for the recording, copying, playback, broadcasting, and display of moving visual media. Video was first developed for mechanical television systems, which were quickly replaced by cathode-ray tube (CRT) syste ...
). Words are presented to the speaker in the form of a list, created on the spot or in advance, or reusing an existing Wikimedia category. The speaker simply reads the word displayed on the screen, and the software moves on to the next word when it detects a silence after the read word. This principle, borrowed from the open source software Shtooka recorder with the help of its creator, Nicolas Vion, makes it possible to record several hundreds of words per hour. The recordings are then uploaded automatically from the web client to the
Wikimedia Commons Wikimedia Commons (or simply Commons) is a media repository of free-to-use images, sounds, videos and other media. It is a project of the Wikimedia Foundation. Files from Wikimedia Commons can be used across all of the Wikimedia projects in ...
media library. In spring 2021, Lingua Libre was offline due to a fire in Strasbourg, but no audio recordings were lost.


Use of the recordings

The recordings can be consulted either on Lingua Libre or on
Commons The commons is the cultural and natural resources accessible to all members of a society, including natural materials such as air, water, and a habitable Earth. These resources are held in common even when owned privately or publicly. Commons c ...
. They are mainly used on other Wikimedia projects, for example to illustrate entries on Wiktionaries or proper nouns in Wikipedia articles. The re-use of the recordings in a language teaching context is envisaged. Language learners can freely download pronunciations and use them on
GoldenDict GoldenDict is a free and open-source dictionary program that gives translations of words and phrases for different languages. It allows the use of several popular dictionary file formats simultaneously and without conversion. The project aims to ...
, a popular dictionary Software. Thus, audio recordings can be used as ''“Pronunciation Dictionaries”'' on GoldenDict without needing internet connection. The recordings are also reused in
Natural Language Processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to proc ...
projects, for example to drive
Mozilla Mozilla (stylized as moz://a) is a free software community founded in 1998 by members of Netscape. The Mozilla community uses, develops, spreads and supports Mozilla products, thereby promoting exclusively free software and open standards, wi ...
's DeepSpeech
speech recognition Speech recognition is an interdisciplinary subfield of computer science and computational linguistics that develops methodologies and technologies that enable the recognition and translation of spoken language into text by computers with the mai ...
engines.


Versions

Lingua Libre was initiated on January 23, 2015 and has had three successive versions:


Lingua Libre v.1 (2016)

As part of the ''Languages of France'' project, which aims to document and promote the regional languages of France on Wikimedia and Internet projects in general, the conception of Lingua Libre started in November 2015, partly funded by the DGLFLF ( General Delegation for the French language and the languages of France). The first version of the project is launched in August 2016. Only suitable for audio recording, Lingua Libre is shown during a workshop on Occitan language in December 2016, and then presented to the online Wikimedia community and at international events in 2017.


Lingua Libre v.2 (2018)

A complete rebuilding is launched at the end of 2017. The new version of Lingua Libre is based on
MediaWiki MediaWiki is a free and open-source wiki software. It is used on Wikipedia and almost all other Wikimedia websites, including Wiktionary, Wikimedia Commons and Wikidata; these sites define a large part of the requirement set for MediaWik ...
, uses
Wikibase Wikibase is a set of MediaWiki extensions for working with versioned semi-structured data in a central repository based upon JSON instead of the unstructured data of MediaWiki wikitext. Its primary components are the ''Wikibase Repository'', an e ...
and
OAuth OAuth (short for "Open Authorization") is an open standard for access delegation, commonly used as a way for internet users to grant websites or applications access to their information on other websites but without giving them the passwords. Th ...
to better integrate into the Wikimedia environment. The interface is translated via Translatewiki.net so that the project can be used by a large number of communities. The new version of the site is ready in June 2018 and opens to the public in August 2018.


Lingua Libre v.2.2 (2020)

In 2020, important changes are made to the platform; a new look is developed especially for the site, the .org domain replaces the
.fr .fr is the Internet country code top-level domain (ccTLD) in the Domain Name System of the Internet for France. It is administered by AFNIC. The domain includes all individuals and organizations registered at the Association française pour le ...
domain used until then. Lingua Libre now supports signed language through video recording. File:Lingua Libre recording studio.png, alt=Screenshot of the Recording Studio in September 2017, Recording Studio in September 2017 (v.1) File:Enregistrement de mots sur Lingua Libre.jpg, alt=Screenshot of the Recording Studio in December 2018, Recording Studio in December 2018 (v.2) File:Lingua Libre - Record Wizard - Studio.png, alt=Screenshot of the Recording Studio in October 2020, Recording Studio in October 2020 (v.2.2)


Statistics

In the first two years of the project's launch, approximately 10,000 recordings were made. The transition to v.2 is accompanied by a sharp increase in the contribution. The number of recordings is multiplied by 10 in less than a year, exceeding the 100,000 threshold in May 2019. These recordings were made by 127 speakers in almost 50 languages. By September 2020, the platform had more than 300,000 recordings in 90 languages with more than 350 speakers. The 500,000 recordings milestone was reached in June 2021, thanks to 540 speakers of 120 languages. Lingua Libre's statistics page


See also

* Forvo *
Common Voice Common Voice is a crowdsourcing project started by Mozilla to create a free database for speech recognition software. The project is supported by volunteers who record sample sentences with a microphone and review recordings of other users. The tr ...
*
GoldenDict GoldenDict is a free and open-source dictionary program that gives translations of words and phrases for different languages. It allows the use of several popular dictionary file formats simultaneously and without conversion. The project aims to ...
*
Tatoeba Tatoeba is a free collection of example sentences with translations geared towards foreign language learners. Its name comes from the Japanese phrase "tatoeba" (), meaning "for example". It is written and maintained by a community of volunteer ...


References


External links

* * {{Portal bar, Linguistics Free software MediaWiki websites Wikis Language documentation Corpus linguistics Linguistics 2016 software Creative Commons-licensed databases