Efficiency

The efficiency of a given chart parser on a given input is inversely related to the amount of excess structure present in the chart at the end of the (attempted) parse. If the input string is ungrammatical and yet the chart is full of arcs, then clearly the parser has been doing lots of work to no good effect. Ideally, we would like the parser to build very few arcs for such an input, and for it then to be able to report that the string had no parses. Likewise, if the string is grammatical and the chart is full of arcs, then the parser will have been maximally efficient if every inactive arc present is employed in some structure assigned to the string, and if every active arc present represents an hypothesis that ultimately proves successful, such as results in the addition of an inactive edge. Unused inactive edges, and active edges that lead nowhere, represent wasted activity and thus directly reflect the inefficiency of the parser. The goal of experimenting with the rule invocation strategy is to reduce this wastage to a minimum.

The efficiency of a chart parser is also greatly dependent on how well the data structures used are tuned to support the basic parsing operations that are performed again and again. A chart parser spends a lot of its time retrieving information from the chart and the grammar; for instance:
When an inactive arc has been added to the chart, it looks for active arcs that require the relevant category at the point where the inactive arc begins, in order to apply the fundamental rule.
When an active arc has been added to the chart, it looks for inactive arcs for the first category that the active arc requires, which start where the active arc ends, in order to apply the fundamental rule.
When an inactive arc has been added to the chart, it looks for all grammar rules where the first category on the RHS is the same as the category of the arc, in order to apply the bottom-up rule.
When an active arc has been added to the chart, it looks for all grammar rules whose LHS is the first category that the arc requires, in order to apply the top-down rule.

If each time the parser went looking for an arc of a particular type it had to look through every arc in the chart, performance would slow down dramatically as soon as the chart achieved any size. This is the sort of performance that we would expect if the chart was represented as a simple list of edge data structures. A better implementation would keep the arcs of the chart in a data structure that was indexed by the starting and/or finishing points of the arcs, and possibly also by certain relevant categories in the arcs - for instance, by the label of an inactive arc and the first tofind category of an active arc. In this way, the search could proceed rapidly to a small set of potentially relevant arcs, without having to consider the whole chart.

Obviously our representation of the chart and/or agenda as a simple Prolog list of edge data structures is one of the worst possible choices as regards the efficiency of finding an edge of a given type in a large chart/agenda. The approach of representing the chart as Prolog assertions is marginally better, as many Prolog implementations index the clauses of a predicate to take account of some of the arguments of the predicate. This means that not all the possible clauses would have to be considered when a search is made for an edge with particular characteristics. The automatic indexing introduced by a Prolog compiler is not necessarily tuned to the type of searches that one actually wants to perform, however, and so rather more careful programming would be required to build a really efficient chart parser in Prolog.

Send us a comment.



[Contents] [Previous] [Next]
This document was translated by troff2html v0.21 on October 22, 1996.