Collaborating on language processing for Basque and Sami (Laponian)

Researchers working on Basque and Sami (Laponian) are collaborating on Automatic Language Processing. Linda Wiechetek, a researcher from the University of Tromsø (Norway) is visiting the Ixa Group in Donostia in the period April to July in 2010. Her visit is founded by the NILS mobility project.

(Deia, Berria, EITB)

Why Sami and Basque? Why do we work with this unusual language pair?
Some of the reasons for that are:

  • Both are small languages,
  • With limited resources to face the use of language technology. (Sami is even lesser resourced than Basque now adays).
  • Sami and Basque morphologies are very rich and demand adequate tools such as our morphological transducers and syntactic disambiguation and analysis modules. Many of the better resourced languages with highly developed language Technology such as English, Spanish and French do not need such complex modules to create their basic tools.
  • There are clear syntactic parallels betwen Basque and Sami including the grammatical cases/postpositions causing morpho-syntactic ambiguity.

In this context we are collaborating on the following ways:

  • Use of semantic prototype features in Constraint Grammar for syntactic disambiguation.
  • Use of semantic features in Constraint Grammar for lexical/syntactic transfer in Machine Translation.
  • Use of information on verb-subcategorization for syntactic disambiguation.
  • Use of verb-subcategorization information in for lexical and syntactic transfer in Machine Translation.

The parser for Basque is not very accurate yet, not as accurate as English parsers. The Sami parser on the other hand gets good results in accuracy, but the use of valency is necessary for other tasks such as MT and QA.
With this collaboration between Basque and Sami researchers we aim to improve our NLP tools.

Besides of that, now Linda is able to speak some Basque, and we are learning some words in Sami.
That's another way of collaboration ;-)