HOME

TheInfoList



OR:

Grammatical Framework (GF) is a programming language for writing grammars of natural languages. GF is capable of parsing and generating texts in several languages simultaneously while working from a language-independent representation of meaning. Grammars written in GF can be compiled into a platform independent format and then used from different programming languages including C and
Java Java (; id, Jawa, ; jv, ꦗꦮ; su, ) is one of the Greater Sunda Islands in Indonesia. It is bordered by the Indian Ocean to the south and the Java Sea to the north. With a population of 151.6 million people, Java is the world's mos ...
, C#,
Python Python may refer to: Snakes * Pythonidae, a family of nonvenomous snakes found in Africa, Asia, and Australia ** ''Python'' (genus), a genus of Pythonidae found in Africa and Asia * Python (mythology), a mythical serpent Computing * Python (pro ...
and
Haskell Haskell () is a general-purpose, statically-typed, purely functional programming language with type inference and lazy evaluation. Designed for teaching, research and industrial applications, Haskell has pioneered a number of programming lan ...
. A companion to GF is the ''GF Resource Grammar Library'', a reusable library for dealing with the morphology and syntax of a growing number of natural languages. Both GF itself and the GF Resource Grammar Library are open-source. Typologically, GF is a functional programming language. Mathematically, it is a type-theoretic formal system (a logical framework to be precise) based on Martin-Löf's intuitionistic type theory, with additional
judgments Judgement (or US spelling judgment) is also known as ''adjudication'', which means the evaluation of evidence to make a decision. Judgement is also the ability to make considered decisions. The term has at least five distinct uses. Aristotle ...
tailored specifically to the domain of linguistics.


Language features

* a static type system, to detect potential programming errors *
functional programming In computer science, functional programming is a programming paradigm where programs are constructed by applying and composing functions. It is a declarative programming paradigm in which function definitions are trees of expressions that ...
for powerful abstractions * support for writing libraries, to be used on other grammars * tools for
Information extraction Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents and other electronically represented sources. In most of the cases this activity concer ...
, to convert linguistic resources into GF


Tutorial

Goal: write a multilingual grammar for expressing statements about John and Mary loving each other.


Abstract and concrete modules

In GF, grammars are divided to two module types: * an abstract module, containing judgement forms and . ** or category declarations list categories i.e. all the possible types of trees there can be. ** or function declarations state functions and their
types Type may refer to: Science and technology Computing * Typing, producing text via a keyboard, typewriter, etc. * Data type In computer science and computer programming, a data type (or simply type) is a set of possible values and a set of allo ...
, these must be implemented by concrete modules (see below). * one or more concrete modules, containing judgement forms and . ** or linearization type definitions, says what type of objects linearization produces for each category listed in . ** or linearization rules implement functions declared in . They say how trees are linearized. Consider the following: Abstract syntax abstract Zero = Concrete syntax: English concrete ZeroEng of Zero = Notice: (token list or "string") as the only linearization type.


Making a grammar multilingual

A single abstract syntax may be applied to many concrete syntaxes, in our case one for each new natural language we wish to add. The same system of trees can be given: * different words * different word orders * different linearization types Concrete syntax: French concrete ZeroFre of Zero =


Translation and multilingual generation

We can now use our grammar to translate phrases between French and English. The following commands can be executed in the GF interactive shell. Import many grammars with the same abstract syntax > import ZeroEng.gf ZeroFre.gf Languages: ZeroEng ZeroFre Translation: pipe linearization to parsing > parse -lang=Eng "John loves Mary" , linearize -lang=Fre Jean aime Marie Multilingual generation: linearize into all languages > generate_random , linearize -treebank Zero: Pred Mary (Compl Love Mary) ZeroEng: Mary loves Mary ZeroFre: Marie aime Marie


Parameters, tables

Latin has ''cases'': nominative for subject, accusative for object. * ''Ioannes Mariam amat'' "John-Nom loves Mary-Acc" * ''Maria Ioannem amat'' "Mary-Nom loves John-Acc" We use a parameter type for case (just 2 of Latin's 6 cases). The linearization type of NP is a table type: from to . The linearization of is an inflection table. When using an NP, we select () the appropriate case from the table. Concrete syntax: Latin concrete ZeroLat of Zero =


Discontinuous constituents, records

In Dutch, the verb ''heeft lief'' is a discontinuous constituent. The linearization type of is a record type with two fields. The linearization of is a record. The values of fields are picked by projection () Concrete syntax: Dutch concrete ZeroDut of Zero =


Variable and inherent features, agreement, Unicode support

For Hebrew, NP has gender as its inherent feature a field in the record. VP has gender as its variable feature an argument of a table. In predication, the VP receives the gender of the NP. Concrete syntax: Hebrew concrete ZeroHeb of Zero =


Visualizing parse trees

GF has inbuilt functions which can be used for visualizing parse trees and word alignments. The following commands will generate parse trees for the given phrases and open the produced PNG image using the system's command. > parse -lang=Eng "John loves Mary" , visualize_parse -view="eog" > parse -lang=Dut "Jan heeft Marie lief" , visualize_parse -view="eog"


Generating word alignment

# In languages L1 and L2: link every word with its smallest spanning subtree. # Delete the intervening tree, combining links directly from L1 to L2. In general, this gives phrase alignment. Links can be crossing, phrases can be discontinuous. The command follows a similar syntax: > parse -lang=Fre "Marie aime Jean" , align_words -lang=Fre,Dut,Lat -view="eog"


Resource Grammar Library

In natural language applications, libraries are a way to cope with thousands of details involved in syntax, lexicon, and inflection. The GF Resource Grammar Library is the standard library for Grammatical Framework. It covers the morphology and basic syntax for an increasing number of languages, currently including Afrikaans, Amharic (partial), Arabic (partial), Basque (partial), Bulgarian, Catalan, Chinese, Czech (partial), Danish, Dutch, English, Estonian, Finnish, French, German, Greek ancient (partial), Greek modern, Hebrew (fragments), Hindi, Hungarian (partial), Interlingua, Italian, Japanese, Korean (partial), Latin (partial), Latvian, Maltese, Mongolian, Nepali, Norwegian bokmål, Norwegian nynorsk, Persian, Polish, Punjabi, Romanian, Russian, Sindhi, Slovak (partial), Slovene (partial), Somali (partial), Spanish, Swahili (fragments), Swedish, Thai, Turkish (fragments), and Urdu. In addition, 14 languages have WordNet lexicon and large-scale parsing extensions. A full API documentation of the library can be found at th
RGL Synopsis
page. Th

gives the languages currently available in the GF Resource Grammar Library, including their maturity.


Uses of GF

GF was first created in 1998 a
Xerox Research Centre Europe
Grenoble, in the project Multilingual Document Authoring. At Xerox, it was used for prototypes including a restaurant phrase book, a database query system, a formalization of an alarm system instructions with translations to 5 languages, and an authoring system for medical drug descriptions. Later projects using GF and involving third parties include:
REMU
Reliable Multilingual Digital Communication, a project funded by the Swedish Research Council between 2013–2017.
MOLTO
multilingual online translation, an EU project that ran between 2010–2013.
SALDO
Swedish morphological dictionary based on rules developed for GF an
Functional Morphology


multilingual generation of mathematical exercises (commercial project) * TALK: multilingual and multimodal spoken dialogue systems Academically, GF has been used in many PhD theses and resulted in a lot of scientific publications (see th

for some of them). Commercially, GF has been used by a number of companies, in domains such as e-commerce, health care and translating formal specifications to natural language.


Community


Developer mailing list

There is an active group for developers and users of GF alike, located at https://groups.google.com/group/gf-dev


Summer schools


2020 – GF as a resource for Computational Law (Singapore)

Th
seventh GF summer school
postponed due to COVID-19, is to be held in Singapore. Co-organised with the Singapore Management University'
Centre for Computational Law
the summer school will have a special focus on computational law.


2018 – Sixth GF Summer School (Stellenbosch, South Africa)

Th
sixth GF summer school
was the first one held outside Europe. The major themes of the summer school were African language resources, and the growing usage of GF in commercial applications.


2017 – GF in a Full Stack of Language Technology (Riga, Latvia)

Th
fifth GF summer school
was held in Riga, Latvia. This summer school had a number of participant from startups, presenting industrial use cases of GF.


2016 – Summer School in Rule-Based Machine Translation (Alicante, Spain)

GF was one of the four platforms featured at th
Summer School in Rule-Based Machine Translation
along with Apertium, Matxin and TectoMT.


2015 – Fourth GF Summer School (Gozo, Malta)

Th
fourth GF summer school
was held on Gozo island in Malta. Like the previous edition in 2013, this summer school featured collaborations with other resources, such as Apertium and FrameNet.


2013 – Scaling Up Grammatical Resources (Lake Chiemsee, Germany)

Th
third GF Summer school
was held on Frauenchiemsee island in Bavaria, Germany with the special theme "Scaling up Grammar Resources". This summer school focused on extending the existing resource grammars with the ultimate goal of dealing with any text in the supported languages. Lexicon extension is an obvious part of this work, but also new grammatical constructions were also of interest. There was a special interest in porting resources from other open-source approaches, such as WordNets and Apertium, and reciprocally making GF resources easily reusable in other approaches.


2011 – Frontiers of Multilingual Technologies (Barcelona, Spain)

Th
second GF Summer school
subtitled ''Frontiers of Multilingual Technologies'' was held in 2011 in Barcelona, Spain. It was sponsored b
CLT
the Centre for Language Technology of the University of Gothenburg, and b
UPC
Universitat Politècnica de Catalunya. The School addressed new languages and also promoted ongoing work in those languages which are already under construction. Missing EU languages were especially encouraged. The school began with a 2-day GF tutorial, serving those interested in getting an introduction to GF or an overview of on-going work. All results of the summer school are available as open-source software released under the LGPL license.


2009 – GF Summer School (Gothenburg, Sweden)

Th
first GF summer school
was held in 2009 in Gothenburg, Sweden. It was a collaborative effort to create grammars of new languages in Grammatical Framework, GF. These grammars were added to the Resource Grammar Library, which previously had 12 languages. Around 10 new languages are already under construction, and the School aimed to address 23 new languages. All results of the Summer School were made available as open-source software released under the LGPL license. The summer school was organized by th
Language Technology Group
at th
Department of Computer Science and Engineering
The group is a part of th
Centre of Language Technology
a focus research area of the
University of Gothenburg The University of Gothenburg ( sv, Göteborgs universitet) is a university in Sweden's second largest city, Gothenburg. Founded in 1891, the university is the third-oldest of the current Swedish universities and with 37,000 students and 6000 st ...
. The code created by the school participants is made accessible in the GF darcs repository, subdirectory .


References

{{Reflist


External links


Grammatical Framework homepage
Grammar frameworks Functional languages