HFST
   HOME

TheInfoList



OR:

Helsinki Finite-State Technology (HFST) is a computer programming
library A library is a collection of materials, books or media that are accessible for use and not just for display purposes. A library provides physical (hard copies) or digital access (soft copies) materials, and may be a physical location or a vir ...
and set of utilities for natural language processing with finite-state automata and
finite-state transducer A finite-state transducer (FST) is a finite-state machine with two memory ''tapes'', following the terminology for Turing machines: an input tape and an output tape. This contrasts with an ordinary finite-state automaton, which has a single tape. ...
s. It is
free and open-source software Free and open-source software (FOSS) is a term used to refer to groups of software consisting of both free software and open-source software where anyone is freely licensed to use, copy, study, and change the software in any way, and the source ...
, released under a mix of the
GNU General Public License The GNU General Public License (GNU GPL or simply GPL) is a series of widely used free software licenses that guarantee end users the Four Freedoms (Free software), four freedoms to run, study, share, and modify the software. The license was th ...
version 3 (GPLv3) and the Apache License.


Features

The library functions as an interchanging interface to multiple backends, such as OpenFST,
foma Freedom of Mobile Multimedia Access (FOMA) is the brand name of the W-CDMA-based 3G telecommunications services being offered by the Japanese telecommunications service provider NTT DoCoMo. It is an implementation of the Universal Mobile Tel ...
and SFST. The utilities comprise various compilers, such as hfst-twolc (a compiler for morphological two-level rules), hfst-lexc (a compiler for lexicon definitions) and hfst-regexp2fst (a regular expression compiler). Functions from
Xerox Xerox Holdings Corporation (; also known simply as Xerox) is an American corporation that sells print and electronic document, digital document products and services in more than 160 countries. Xerox is headquartered in Norwalk, Connecticut (ha ...
's proprietary scripting language xfst is duplicated in hfst-xfst, and the pattern matching utility pmatch in hfst-pmatch, which goes beyond the finite-state formalism in having
recursive transition network A recursive transition network ("RTN") is a graph theoretical schematic used to represent the rules of a context-free grammar. RTNs have application to programming languages, natural language and lexical analysis. Any sentence that is construct ...
s (RTNs). The library and utilities are written in
C++ C++ (pronounced "C plus plus") is a high-level general-purpose programming language created by Danish computer scientist Bjarne Stroustrup as an extension of the C programming language, or "C with Classes". The language has expanded significan ...
, with an interface to the library in
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
and a utility for looking up results from transducers ported to
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's List ...
and Python. Transducers in HFST may incorporate weights depending on the backend. For performing FST operations, this is currently only possible via the OpenFST backend. HFST provides two ''native'' backends, one designed for fast lookup (''hfst-optimized-lookup''), the other for format interchange. Both of them can be weighted.


Uses

HFST has been used for writing various linguistic tools, such as spell-checkers, hyphenators, and morphologies. Morphological dictionaries written in other formalisms have also been converted to HFST's formats.


See also

*
Foma (software) Foma is a free and open source ''finite-state toolkit'' created and maintained by Mans Hulden. It includes a compiler, programming language, and C library for constructing finite-state automata and transducers (FST's) for various uses, most typic ...


Notes


External links

* * https://github.com/hfst/hfst/wiki - A documentation wiki


References

{{cite conference , url= https://researchportal.helsinki.fi/en/publications/hfsta-system-for-creating-nlp-tools , title= HFST - A System for Creating NLP Tools , first1= Krister , last1= Lindén , first2= Erik , last2= Axelson , first3= Senka , last3= Drobac , first4= Sam , last4= Hardwick , first5= Juha , last5= Kuokkala , first6= Jyrki , last6= Niemi , first7= Tommi , last7= Pirinen , first8= Miikka , last8= Silfverberg , date= 2013 , conference= Systems and Frameworks for Computational Morphology , conference-url= http://sfcm.eu/sfcm2013/ , editor1-first= Cerstin , editor1-last= Mahlow , editor2-first= Michael , editor2-last= Piotrowski , series= Communications in Computer and Information Science , volume= 380 , book-title= Systems and Frameworks for Computational Morphology , publisher= Springer , location= Humboldt-Universität in Berlin , pages= 53–71 Free software Finite automata