US mini logoHome | A-Z Index | People | Reference | Contact us
University of Sussex
About | People | Projects | Doctoral Programme | Seminar Series | Resources

Ranking Word Senses for Disambiguation: Models and Applications

Participants

The project started in September 2005, is funded by the UK EPSRC, and is of 3 years duration.

Summary

When faced with the question "Which plants thrive in chalky soil?" humans have no trouble understanding that the plants are floral rather than industrial. Furthermore, humans recognise that the answers "Sweetcorn and cabbage family vegetables do well on chalky soil", "Sweetcorn and cabbage grow well on chalky ground", and "Maize and cabbage-like vegetables grow well on chalky soil" are all paraphrases and mean more or less the same thing. Semantic interpretation and disambiguation is performed effortlessly by humans but poses great difficulties to computer-based applications that extract, filter and manipulate information from textual data. Examples include Question Answering and Information Retrieval. With the rapidly growing amounts of text being stored by businesses and available over the Internet, such applications become increasingly important and timely and the development of improved methods for identifying the intended meaning of words (word senses) is a key technology for them.

The most accurate techniques for word sense disambiguation (WSD) to date are those which are trained on text in which each word has been manually annotated with its intended sense. A major shortcoming of these methods, though, is that accuracy is strongly correlated with the quantity of training data available, and this is in short supply because its production is very labour-intensive. For many words the distribution of their senses is highly skewed and WSD systems work best when they take the most frequent sense into account. However, the most frequent sense of a word is often not known, particularly in domains (subject areas) in which no text has ever been manually annotated.

In this project we will develop novel ways of estimating the frequency distributions of senses of words from raw (unannotated) text. We will exploit these distributions in WSD systems which do not rely on the availability of hand-labelled resources and will demonstrate the benefits of our methods in application to Question Answering.

Project Publications

Diana McCarthy and Roberto Navigli (accepted for publication, 2009) The English Lexical Substitution Task. To appear in Language Resources and Evaluation Special Issue on Computational Semantic Analysis of Language: SemEval-2007 and Beyond, Agirre, E., Màrquez, L. and Wicentowksi, R. (Eds). Springer.

Rob Koeling and Diana McCarthy (2008) From Predicting Predominant Senses to Using Local Context for Word Sense Disambiguation. In Semantics in Text Processing. STEP 2008 Conference Proceedings, Bos, J. and Delmonte, R. (Eds). College Publications. 129-138.

Ryu Iida, Diana McCarthy, and Rob Koeling (2008) Gloss-Based Semantic Similarity Metrics for Predominant Sense Acquisition. In Proceedings of the Third International Joint Conference on Natural Language Processing, 561-568. Hyderabad, India.

Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll (2007) Unsupervised Acquisition of Predominant Word Senses. Computational Linguistics, 33(4). 553-590.

Rob Koeling and Diana McCarthy (2007) Sussx: WSD using Automatically Acquired Predominant Senses. In Proceedings of the 4th International Workshop on Semantic Evaluations (SemEval-2007), 314-317. Prague, Czech Republic.

Mirella Lapata and Frank Keller (2007) An Information Retrieval Approach to Sense Ranking. In Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, 348-355. Rochester, New York.

Sebastian Padó and Mirella Lapata (2007) Dependency-based Construction of Semantic Space Models. Computational Linguistics, 33(2). 161-199.

Rob Koeling, Diana McCarthy, and John Carroll (2007) Text Categorization for Improved Priors of Word Meaning. In Proceedings of the Eighth International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2007). Mexico City, Mexico. (Received the 3rd Best Paper Award).

Roberto Navigli and Mirella Lapata (2007) Graph Connectivity Measures for Unsupervised Word Sense Disambiguation. In Proceedings of the 20th International Joint Conference on Artificial Intelligence, 1683-1688. Hyderabad, India.

Brody, Samuel, Roberto Navigli, and Mirella Lapata (2006) Ensemble Methods for Unsupervised WSD. In Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics, Sydney, Australia.

Diana McCarthy (2006) Relating WordNet Senses for Word Sense Disambiguation. In Proceedings of the EACL-2006 Workshop on Making Sense of Sense: Bringing Psycholinguistics and Computational Linguistics Together, 17-24. Trento, Italy. Gold Standard Data

Rob Koeling, Diana McCarthy, and John Carroll (2005) Domain-Specific Sense Distributions and Predominant Sense Acquisition. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing HLT/EMNLP-2005. Vancouver, Canada. Gold Standard Data

Background References

Mirella Lapata and Chris Brew (2004) Verb Class Disambiguation Using Informative Priors. Computational Linguistics, 30(1). 45-73.

Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll (2004) Finding Predominant Word Senses in Untagged Text. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain. 280-287. (Received the Best Paper Award).

Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll (2004) Automatic Identification of Infrequent Word Senses. In Proceedings of the 20th International Conference on Computational Linguistics (COLING), Geneva, Switzerland. 1220-1226.

see also

Site maintained by: Jonathon Read Disclaimer | Feedback