US mini logoHome | A-Z Index | People | Reference | Contact us
University of Sussex
About | People | Projects | Doctoral Programme | Seminar Series | Resources

Building a WordNet for Arabic: Methodology and Challenges

Speaker

Musa Alkhalifa

Affilliation

UPC Barcelona, visiting Sussex

Abstract

Arabic WordNet (AWN) can be considered one of the most important, freely available, lexical resources developed so far for Arabic language. AWN has been built following the design of Princeton WordNet and adopting EuroWordNet methodology of manually encoding a set of base concepts while maximizing compatibility across wordnets. As a result, there is a straightforward mapping from Arabic WordNet onto Princeton WordNet 2.0 and many other wordnets. The Suggested Upper Merged Ontology (SUMO) is mapped by hand to all synsets of Princeton WordNet and has been extended with a number of concepts that correspond to words that are lexicalized in Arabic but not in English, providing an interlingua which is not limited by the lexicalization of any particular human language and underlying the development of semantics-based computational tools for multilingual NLP. In this talk I will present the methodologies used and the challenges faced while constructing a WordNet for Arabic and highlight some experiments we conducted (to exploit Arabic lexical and morphological rules) to reduce human effort and extend AWN (semi-)automatically. I will conclude with showing the interfaces we developed for lexicographers and users of AWN, the downloadable AWN browser, and an online demo of Arabic Word Spotter which identifies those words that are covered in AWN in an Arabic web page and provides their translations.

see also

Site maintained by: John Carroll Disclaimer | Feedback