General resources

This line of research aims to compile and develop the basic resources needed for carrying out the rest of the tasks in the project. These could be, for example:

  • Tools for obtaining corpora using the Internet as the basis, and which can be monolingual and multi-lingual, general or domain-based, comparable or parallel
  • Corpora of all kinds obtained from the Internet
  • Tools for the automatic building of pivot-language based dictionaries
  • Tools for extracting terminology from monolingual and multilingual corpora, both parallel and comparable, enhanced and with new languages
  • Monolingual or bilingual dictionaries, general or terminological
  • Tools for incorporating semantic knowledge into dictionaries
  • Tools for building ontologies, both general and domain, manually or automatically
  • General and domain ontologies
  • Syntactic dependency analysers
  • Systems for identifying sentence and syntagma boundaries
  • Semantic analysers
  • Engine for continuous speech recognition
  • Speech recognisers for Basque and English
  • Objective evaluation techniques for text-to-voice conversion systems
  • Voice transformation techniques
  • Techniques for voice segment detection
  • Techniques for detecting turn changing in conversations
  • Techniques for classifying speakers
  • Dialogue systems.

Resources and tools that do not yet exist and which are regarded as basic for the development of language technologies will also be developed, even though they may not be strictly necessary for the development of the remaining tasks of the project.