Speaker
Affilliation
UPC Barcelona, visiting Sussex
Abstract
Arabic WordNet (AWN) can be considered one of the most important, freely available, lexical resources developed so far for Arabic language. AWN has been built following the design of Princeton WordNet and adopting EuroWordNet methodology of manually encoding a set of base concepts while maximizing compatibility across wordnets. As a result, there is a straightforward mapping from Arabic WordNet onto Princeton WordNet 2.0 and many other wordnets. The Suggested Upper Merged Ontology (SUMO) is mapped by hand to all synsets of Princeton WordNet and has been extended with a number of concepts that correspond to words that are lexicalized in Arabic but not in English, providing an interlingua which is not limited by the lexicalization of any particular human language and underlying the development of semantics-based computational tools for multilingual NLP. In this talk I will present the methodologies used and the challenges faced while constructing a WordNet for Arabic and highlight some experiments we conducted (to exploit Arabic lexical and morphological rules) to reduce human effort and extend AWN (semi-)automatically. I will conclude with showing the interfaces we developed for lexicographers and users of AWN, the downloadable AWN browser, and an online demo of Arabic Word Spotter which identifies those words that are covered in AWN in an Arabic web page and provides their translations.