Participants
The LEXSYS project was funded by EPSRC (ref GR/K97400) 1996-1999. Principal investigators were David Weir and John Carroll; research fellows on the project were Nicolas Nicolov, Martine Smets, and Olga Shaumyan.Summary
One of the insights that has emerged from work on probabilistic approaches to natural language processing over the last decade or so is the importance of modeling the (statistical) dependencies between words. This has highlighted the value of lexicalized tree grammar frameworks, in which each word is associated with one or more `elementary' tree structures (which are combined to produce complete syntactic structures); these elementary trees are an appropriate place to specify information about those words' dependents.
In the LEXSYS project we have hand-crafted a wide-coverage, lexicalized tree grammar, implemented an associated parser that assigns rich descriptions to the sentences it parses, and created a system for structural disambiguation with such grammars. The main results of the project are the novel techniques we have devised and implemented to tackle a number of important problems in developing large grammars and processing with them. These can be divided into three areas:
- Grammar size
We have developed techniques for encoding what is logically a single grammar in a variety of different ways, each encoding tailored to a particular task. For example, one encoding is oriented towards optimizing the grammar development process, whereas an entirely different encoding is used for parsing. Thus, although the grammar we have developed would appear to be large if its content were to be explicitly enumerated, we have exploited the fact that it is inherently redundant along certain dimensions to substantially reduce problems stemming from its size. - Efficiency
We have designed the grammar in a way that addresses the computational problems that typically arise when large feature structures are used extensively in hand-crafted grammars. The key to this has involved localizing those dependencies within the elementary structures of the grammar that a parser is required to check. We have investigated the extent to which this can be achieved in wide-coverage tree grammars, and eliminated feature passing in our grammar, thus avoiding the cost of feature-structure unification. - Disambiguation We have devised and experimented with a probabilistic technique for acquiring knowledge of which words are able to function as dependents of others, using the WordNet semantic hierarchy to group together senses of nouns into semantically similar classes. We use the resulting probabilistic model for disambiguation. We have also investigated the issue of how frequency information can be associated directly with lexicalized tree grammars, describing and classifying a number of schemes, and evaluating the degree to which each can, in principle, distinguish the probability of particular kinds of derivational phenomena.
Publications
John Carroll and David Weir. 1997 Encoding frequency information in lexicalized grammars. In Proceedings of the Fifth International Workshop on Parsing Technologies, pages 8-17.
John Carroll, Nicolas Nicolov, Olga Shaumyan, Martine Smets, and David Weir. 1998. The LEXSYS Project. In Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks, pages 29-33.
John Carroll, Nicolas Nicolov, Olga Shaumyan, Martine Smets, and David Weir. 1998. Grammar compaction and computation sharing in automata-based parsing.In Proceedings of the First Workshop on Tabulation in Parsing and Deduction (TAPD), pages 16-25.
John Carroll, Nicolas Nicolov, Olga Shaumyan, Martine Smets, and David Weir. 1999. Parsing with an extended domain of locality. In Proceedings of the Eighth Conference of the European Chapter of the Association for Computational Linguistics, pages 217-224.
John Carroll, Nicolas Nicolov, Olga Shaumyan, Martine Smets, and David Weir. 2000. Engineering a wide-coverage lexicalized grammar. In Proceedings of the Fifth International Workshop on Tree Adjoining Grammars and Related Frameworks.
Stephen Clark and David Weir. 1999. An iterative approach to estimating frequencies over a semantic hierarchy. In Proceedings of the Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora, pages 258-265.
Stephen Clark and David Weir. 2000. A class-based probabilistic approach to structural disambiguation. In Proceedings of the 18th International Conference on Computational Linguistics (COLING).
Roger Evans and David Weir. 1997. Automaton-based parsing for lexicalized grammars. In Proceedings of the Fifth International Workshop on Parsing Technologies, pages 66-76.
Roger Evans and David Weir. 1998. A structure-sharing parser for lexicalized grammars. In Proceedings of the 36th Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics, pages 372-378.
Roger Evans, Gerald Gazdar, and David Weir. 2000. 'Lexical Rules' are just lexical rules. In Anne Abeille and Owen Rambow, editors, Tree Adjoining Grammars: linguistic, formal and computational properties, CSLI Lecture Notes. University of Chicago Press.
Martine Smets. 1998. Comparison of XTAG and LEXSYS grammars. In Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks, pages 159-163.
Martine Smets and Roger Evans. 1998. A compact encoding of a DTG grammar. In Proceedings of the Fourth International Workshop on Tree Adjoining Grammars and Related Frameworks, pages 164-167.
K. Vijay-Shanker and David Weir. 1999. Exploring the underspecified world of Lexicalized Tree Adjoining Grammars. In Proceedings of the Sixth Meeting on Mathematics of Language.