Speaker
Affilliation
Nara Institute of Science and Technology
Abstract
Corpus-based approaches to natural language processing systems have now attained very good performance for basic NL analysis. Producing highly accurage NL analyzers requires robust and effective machine learning models, selection of useful features and accurately annotated training data. In this talk, I will first introduce the use of Support Vector Machines and the models for POS tagging, base phrase chunking and word dependency parsing. I will then talk about feature selection especially for speeding up the process. Finally, I will introduce our recent development of tools for managing annotated corpus and the dictionary that provide flexible search and error-correction of annotated corpora.