Speaker
Affilliation
Sussex
Abstract
The Distributional Similarity of Sub-Parses
Julie Weeds, David Weir and Bill Keller
In this work we explore computing distributional similarity between sub-parses, i.e., fragments of a parse tree. In the same way that lexical distributional similarity is used to estimate lexical semantic similarity, we propose using distributional similarity between sub-parses to estimate the semantic similarity of phrases. Such a technique will allow us to identify paraphrases where the component words are not semantically similar. We demonstrate the potential of the method by applying it to a small number of examples and showing that the paraphrases are more similar than the non-paraphrases.
Using Emoticons to Reduce Dependency in Machine Learning Techniques for Sentiment Classification
Jonathon Read
Sentiment Classification seeks to identify a piece of text according to its authors general feeling toward their subject, be it positive or negative. Traditional machine learning techniques have been applied to this problem with reasonable success, but they have been shown to work well only when there is a good match between the training and test data with respect to topic. This presentation demonstrates that match with respect to domain and time is also important, and presents preliminary experiments with training data labeled with emoticons, which has the potential of being independent of domain, topic and time.
Empirically-based Control of Natural Language Generation
Daniel S. Paiva and Roger Evans
We present a new approach to controlling the behaviour of a natural language generation system by correlating internal decisions taken during free generation of a wide range of texts with the surface stylistic characteristics of the resulting outputs, and using the correlation to control the generator. This contrasts with the generate-and-test architecture adopted by most previous empirically-based generation approaches, offering a more efficient, generic and holistic method of generator control. We illustrate the approach by describing a system in which stylistic variation (in the sense of Biber (1988)) can be effectively controlled during the generation of short medical information texts.