A parser evaluation corpus of English based on a grammatical relation annotation scheme is now available. It consists of 500 sentences (around 10000 words) extracted randomly from the SUSANNE corpus.
There are four files: the (tokenised) raw text, the lemmatised and numbered sentences, the grammatical relation annotation and software that can be used to automatically evaluate parser output. An up-to-date specification of the annotation scheme is also online. (Please note that this specification refers to the latest version of the annotated corpus, and supersedes the one in the publications listed below). The corpus is free for research purposes; for any proposed commercial use please contact John Carroll.
We would be pleased to receive comments on the scheme, the annotated corpus or the evaluation software.
Recent changes:
|
10 October 2002 - a new version of the grammatical relation annotation incorporating around 20 additions/fixes in response to comments from Ron Kaplan. 7 November 2002 - fixed annotation of "got" used as an auxiliary. |
Descriptions of the grammatical relation annotation scheme are published in
Proposals for improvements to the scheme are discussed in
Back to John A. Carroll's homepage