Language, Speech and Multimedia Technologies Observatory
02/25/2011 - 14:35

Iñaki Alegria izan da Ixa taldearen ordezkaria Santiago de Cubako Centro de Lingüística Aplicada-k (CLA)
aurten antolatu duen XII Simposium-ean. 10 orduko ikastaro bat eman du
Iñakik morfologiako tresnak erraz inplementatzeko oso baliagarria  den Foma tresnaz

Oraintxe bete berri ditu Centro de Lingüística Aplicada horrek 40 urte. Zorionak!

CLA ikergunearen
40. urteurrena izan dela-eta argazkiko eskultura bidali digute  IXA taldekoei opari moduan, gure lankidetza ospatzeko edo.

Eskerrik asko. Eta zorionak Eloinari, Julio Viteliori, Leonel-i eta
ikergune hori sortu eta animatzen duten ikerlari horiei guztiei!

IXA taldea lankidetzan aritu izan da azken 10 urteetan CLA ikergunearekin.

Hortik atera da, adibidez, lehengo urtean argitaratu zen Cubako Diccionario Básico Escolar (DBE) hiztegiaren hirugarren edizioa.
Hiztegia XMLz kodetuta dago, eta hiztegiak editatzeko Ixa taldean garatu zen leXkit izeneko ingurunea erabili zen.
02/25/2011 - 09:55

Event type: 
Public event
Monday, 11 April, 2011 (All day) - Friday, 15 April, 2011 (All day)
Interraction type: 
Face-to-face meeting
RIL at the Hungarian Academy of Sciences
Benczúr u. 33



47° 30' 41.2488" N, 19° 4' 33.8808" E

A Thematic Training Course on Processing Morphologically Rich Languages will be hosted in Budapest by the University of Helsinki and the Hungarian Academy of Sciences. This PhD level course is a part of the thematic training programme offered by the Marie Curie ITN project CLARA. We also welcome other national and international course participants.

Role of CLARIN: 
CLARA is a CLARIN-related training programme
CLARIN representative(s): 
Csaba Oravecz
Tamás Váradi
Krister Lindén
Kimmo Koskenniemi
Relevant for groups:
02/23/2011 - 18:55
Hello, My Name is... RDFRDF stands for Resource Description Framework and it is a flexible schema-less data model. Do not confuse or compare it with XML (more about this later)! It is one of the core technologies of the Semantic Web and the current W3C standard to represent data on the web. But what is RDF exactly?
As I mentioned, it is a data model. It can be compared to the relational model which is the way you organize data in a relational database: group related things in tables with attributes, create links between tables, etc. RDF is just another way of organizing your data. In which way? As a graph.

New Career Opportunities Daily: The best jobs in media.
02/21/2011 - 14:35
META-NET ( aims to build the technological 
foundations of a multilingual European information society.

In order to make this a success this, we need the support and 
participation of the Language Technology (LT) community.

By joining the Multilingual European Technology Alliance (META), you can 
play a part in this exciting initiative, which aims to bring about a 
large-scale increase in funding for Language Technology Research on the 
national and international level.

META-NET is a Network of Excellence aiming to forging the Multilingual 
Europe Technology Alliance. It has three main lines of action:

1. META-VISION: Building a community with a shared vision and strategic 
research agenda for Europe's Language Technology landscape. The agenda 
will contain high level recommendations, ideas for visionary LT-based 
applications and suggestions for joint actions to be presented to the EC 
and national as well as regional bodies.

2. META-SHARE:  A sustainable network of repositories
02/20/2011 - 09:35

This joint research effort will integrate IBM's Watson and Nuance's voice and clinical language solutions to provide enhanced access to critical and timely information.
02/18/2011 - 09:45

Watson convincingly beat the best champion Jeopardy! players. The apparent significance of this varies hugely, depending on your background knowledge about the related machine learning, NLP, and search technology. For a random person, this might seem evidence of serious machine intelligence, while for people working on the system itself, it probably seems like a reasonably good assemblage of existing technologies with several twists to make the entire system work.

Above all, I think we should congratulate the people who managed to put together and execute this project—many years of effort by a diverse set of highly skilled people were needed to make this happen. In academia, it’s pretty difficult for one professor to assemble that quantity of talent, and in industry it’s rarely the case that such a capable group has both a worthwhile project and the support needed to pursue something like this for several years before success.

Alina invited me to the Jeopardy watching party at IBM, which was pretty fun, and it gave me a chance to talk to several people, principally Gerry Tesauro (2nd from the right). It’s cool to see people asking for autographs :)

I wasn’t surprised to see Watson win. Partly, this is simply because when a big company does a publicity stunt like this, it’s with a pretty solid expectation of victory. Partly, this is because I already knew that computers could answer trivia questions moderately well(*), so the question was just how far this could be improved. Gerry tells me that although Watson’s error rate is still significant, one key element is the ability to estimate with high accuracy when they can answer with high accuracy. Gerry also tells me the Watson papers will be coming out later this summer, with many more details.

What happens next? I don’t expect the project to be shelved like deep blue was, for two reasons. The first is that there is clearly very substantial room for improvement, and the second is that having a natural language question/answering device that can quickly search and respond from large sets of text is obviously valuable. The first means that researchers are interested, and the second that the money to support them can probably be found. The history of textual entailment challenges is another less centralized effort in about the same direction.

In the immediate future (next few years), applications in semi-open domains may become viable, particularly when a question/answer device knows when to answer “I don’t know”. Fully conversational speech recognition working in an open domain should take somewhat longer, because speech recognition software has additional error points, conversational systems aren’t so easy to come by, and in a fully open domain the error rates will be higher. Getting the error rate on questions down to the level that a human with access to the internet has difficulty beating is the tricky challenge which has not yet been addressed. It’s a worthy goal to work towards.

Many people believe in human exceptionalism, so when seeing a computer beat Jeopardy, they are surprised that humans aren’t exceptional there. We should understand that this has happened many times before, with chess and mathematical calculation being two areas where computers now dominate, but which were once thought to be the essence of intelligence by some. Similarly, it is not difficult to imagine automated driving (after all, animals can do it), gross object recognition, etc…

To avert surprise in the future, human exceptionalists should understand what the really hard things for an AI to do are. It’s important to understand that there are various levels of I in AI. A few I think about are:

  1. Animal Intelligence. The ability to understand your place in the world, navigate the world, and accomplish something. Some of these tasks are solved, but many others are not yet. This level implies that routine tasks can be automated. Automated driving, farming, factories, etc…
  2. Turing Test Intelligence. The ability to mimic a typical human well-enough to fool a typical human in open conversation. Watson doesn’t achieve this, but the thrust of the research is in this direction as open domain question answering is probably necessary for this. Nonroutine noncreative tasks might be accomplished by the computer. Think of an automated secretary.
  3. Pandora’s box Intelligence. The ability to efficiently self-program in an open domain so as to continuously improve. At this level human exceptionalism fails, and it is difficult to predict what happens next.

So, serious evidence of (2) or (3) is what I watch for.

(*) About 10 years ago, I had a friend2 on WWTBAM who called the friend for help on a question, who typed the question and multiple choice answers into CMU’s Zephyr system, where a bot I made queried (question,answer) pairs on Google to discover which had the most web pages. It worked.
02/16/2011 - 21:50

The initial agenda for the educational conference at the 2011 Semantic Technology Conference (SemTech 2011) has just been published and is available here:

SemTech 2011 Full Agenda

I hope you can join us.

Tony Shaw

Program Co-Chair

PS: Discounted early registration rates expire on March 9.

New Career Opportunities Daily: The best jobs in media.
02/16/2011 - 01:10
Gaia:"Modaltasuna eta ukapena hizkuntzaren prozesaketan: 
oraingo joerak eta etorkizuneko norabideak"
"Modality and negation in natural language processing:
 current trends and future directions"
Hizlaria: Roser Morante
Ikertzailea BIOGRAPH proiektuan (Walter Daelemans-en taldearena)
 CLiPS-Computational Linguistics research group
University of Antwerp,
Tokia: Informatika Fakultateko Batzar Aretoa
Otsailaren 23
Ordua: 16:00

Hizkuntza-teknologiako zenbait aplikazio berritan saiatzen da testuak ulertzen:
  • iritzi-azterketa (nola hitz egiten da testuan? positiboan edo negatiboan?),
  • sentimendu-analisia (testua triste edo alaia da?),
  • testu-laburpenak
  • galderak erantzuteko sistemak.
Horietan guztietan aditzaren modaltasuna eta ukapena arazo latzak izaten dira
 testua ondo ulertzeko.

Modaltasunaren adibideak: Ahalera: etor daiteke; Baldintza: etorriko balitz;
 Ondorioa: etorriko nintzateke; Subjuntiboa: etor dadin.
Ukapenaren adibidea: ez dator

Zelan aldatzen da esaldiaren esangura aditzaren modua aldatzen bada?
Zelan aldatzen da esaldiaren esangura esaldian ezezkoren bat baldin badago?
Hor kokatzen da hitzaldia.

Research on modality and negation focuses on finding subjective,
uncertain and counterfactual information in texts, be it in scientific
papers, product reviews, or opinions in blogs. This type of +research is
concerned with processing texts at the information level and aims at
deep text understanding. Modality and negation are phenomena relevant
for all applications that are concerned with +some form of text
understanding, including text mining, sentiment analysis, recognizing
textual entailment, information extraction, text summarization, and
question answering. Hence, the adequate +modeling of these phenomena is
of crucial importance to the natural language processing (NLP) community
as a whole.

Whereas from a theoretical perspective, the study of modality has a long
tradition, only in the recent years have these topics attracted the
attention of NLP researchers. Mainly, the development of +sentiment
analysis techniques and the growing need of mining biomedical texts have
been the causes for the interest in these semantic aspects of language.
In this talk I will define modality and +negation from an NLP
perspective, I will motivate the need for processing these phenomena,
and I will summarize existing research on processing modality and
negation, touching on diverse aspects +ranging from task modelling to
feature visualization. Finally, I will speculate about future.
developments in this research area.
02/06/2011 - 23:30


Urtarrilean abiatu berri den PATHS (Personalised Access To cultural Heritage Spaces, 2011-2013) proiektu europarrean dihardu IXA taldeak. Proiektuan beste 5 partaideekin batera arituko da lanean hurrengo 3 urtetan. Europako Batzordeak finantzatutako proiektu honetan helburua, Europeana liburutegi digitalaren erabilera hobetzen laguntzea da, ICT-2009.4.1: Digital Libraries and Digital Preservation ataleko helburuen barruan. Era honetara, jakintza eta kultura anitzeko espresiotara moldatutako ikuspegiak sortuko ditu eta, konkretuago, kultur lanetako atzipen gidatua eskainiko du PATHS proiektuak.


Europeana eduki digitalen liburutegi erraldoia da eta Europako hainbat museo,
liburutegi, agiri eta ikus-entzunezko bildumetara sarbide irekia da
Helburu nagusia Europar Batasuneko ondare kulturalaren digitalizazioa
eta babesa du. Gaur egun, 15 milioi ale ditu biltegiratuta,
askotariko formatuan: irudia, testua, audioa eta bideoa. Beste
helburuen artean, Europaren aniztasun kultural eta zientifikoaren
partekatzearen nahia du.


Internetetako liburu
digitalei esker kultur-ondare diren material multzo handia dago
eskuragarri gaur egun. Hala ere, kopuru handi hauek nahasgarriak izan
daitezke gidaritzarik gabeko erabiltzaileentzat. Erabiltzaile horiek
bilatutako informazioa interpretatzen zailtasunak izan ditzakete. PATHS proiektuak pertsonalizatutako
bisita gidatu interaktiboak
eskaintzen duen sistema
sortuko du
, zeinari esker erabiltzaileak liburutegi digitalean
barrena nabigatzeko aukera izango duen. Horrela, erabiltzaileari
erlazionatutako edo interesgarri dakioken edukiak eskainiko zaizkio,
informazio bilaketaren interpretazioan laguntza emanez.

Bilduma digitalean eman
daitezkeen ibilbide
("path", ingelesez) desberdinetan izango du
oinarria nabigazioak. Edozein gairi buruzkoa izan daiteke ibilbidea,
adibidez, artista eta bere medioei buruz ("Picassoren
margolanak"), garai historikoei buruz ("Gerra hotza"),
lekuei buruz ("Venezia"), edota pertsonai ezagunei buruzkoa
izan daitezke ("Muhammad Ali"). Ibilbideak
sortzeko/jarraitzeko hainbat modu egongo dira, hala nola,
aurrez-aldetik adituek definitutakoak, PATHS sistema berak
automatikoki proposatuak, edo, nahi izanez gero, erabiltzaileak
sortutakoak. Eduki digitaletara atzipen era berritzailea eskainiko
dio PATHS proiektuak erabiltzaileari, eta gainera erabiltzailearen
esperientzia erabilgarria izango zaio liburutegi digitala bera
aberasteko. Horretarako PATHS proiektuan erabiltzaileari zuzendutako
informazio-atzipen (user-driven information access) teknologia
hobetuko da, eta hizkuntza-teknologien bitartez edukiak analizatu eta
aberastuko dira. Erabiltzaileen beharretan zentratuko da proiektua
erabiltzaile mota desberdinak identifikatzeko eta hauetara

Ikerkuntza helburuak medio, proiektuaren xede nagusirako ondoko jomugak aurrez definitu

  • Ondare kulturalera
    atzipen egokia egiteko erabiltzaileen beharren analisia egingo da.

  • Ondare kulturalaren
    nabigazio sistema hobetu batentzat.

  • Nabigazio sistema

  • Atzipen
    eskainiko duen sarbidea.

  • Sistema
    eramangarrietara eta Facebook sare sozialetara eramango da.

  • Erabiltzaileen
    menpeko ebaluazioa egingo da.

Horretarako ondoko
alorretan ikertuko du

  • Informazioren
    : Erabiltzailearen araberako nabigazioa landuko da eta,
    horretarako, bere beharrak identifikatu eta modelatu beharko dira.

  • Hezkuntza eta
    : Ikasleen beharretara egokitzen diren ibilbide
    tematikoak sortuko dira, nabigazioa modu gidatuan izan dadin. Ala
    ikasleari libre utziko zaio bere kabuz egin dezan esplorazioak.

  • Edukiaren analisia
    eta aberasketa
    : Eduki digitalen errepresentazioa landuko da batetik
    bildumako aleen artean erlazioak zehazteko eta bestetik, ale
    digitalak kanpoko baliabideetara lotuko dira (adibidez, “Mona
    Lisaren” margolana Wikipediako sarrera jakin batera, eta “Mona
    Lisa” opera dagokion Wikipedia sarrerara)

PATHS proiektuan sei gara bazkideak:
01/31/2011 - 16:00

Global Industry Analysts projects that the speech technology industry will grow to $20.9 billion by 2015.

Syndicate content