The active chart

If we return to our original example, we can see that the use of a simple WFST will save work because it means that a successfully found noun phrase only needs to be found once. But it will not save the parser any time reinvestigating hypotheses that have previously failed. Moreover, if computations in different parts of the search space are both simultaneously in the middle of looking for noun phrases, there will be nothing to stop the work being done twice, even if it is successful in the end. The only way we can coordinate such work and avoid duplicating previous attempts at parsing is to have an explicit representation about the different goals and hypotheses that the parser has at any one time. The WFST as we have presented it is a good way of representing facts about structure, but what about structural hypotheses, conjectures or goals? By way of illustration, consider Figure 6.7. This wholly informal object is intended to represent the state of analysis of a parser that:
has established that the string may consist of an NP V NP sequence,
is exploring the hypothesis that it can be analyzed as an S consisting of an NP VP sequence,
has got to the point where it has decided to treat the NP it has already established as identical with the NP of the NP VP sequence, and
needs to establish that the subsequent V NP sequence can be covered by VP. We have already seen that WFSTs can readily represent partial analyses but what we have here is a partial analysis together with a hypothesis. We plainly cannot represent such hypotheses in our WFST data structure as characterized earlier. However, two small changes to the data structure will give us what we need. Instead of requiring our directed graph to be strictly acyclic, we will relax things to permit single arcs ('empty arcs') that cycle back to the node they start out from, as in Figure 6.8(a), but we will continue to prohibit graphs that have cycles like that shown in Figure 6.8(b). The motivation for this minor relaxation of the acyclicity requirement will become apparent as we go on. The second change we need to make is to elaborate the label on arcs from a simple category to a decorated rule of the grammar. If S -> NP VP is a rule of the grammar, then the following objects ('dotted rules') are all well-formed arc labels:




S -> .NP VP S -> NP.VP S -> NP VP.

Informally, the 'dot' in one of these labels indicates to what extent the hypothesis that this rule is applicable has been verified by the parser. The first kind of label, which will only ever occur on the kind of arc that cycles back to the node from which it emerged, denotes the hypothesis that an S can be found covering a substring that starts from the node in question and which covers that substring in virtue of the latter also being covered by an NP VP sequence. It indicates that such a hypothesis has been made, but that it has not necessarily been even partially verified. The second and third kind of label, which will never occur on the kind of arc that cycles back to the node from which it emerged, denote the same hypothesis, but indicate that the hypothesis has been partially or wholly confirmed. The second kind of label will only be found on arcs that cover an NP; that is, between nodes that are already spanned by an arc labelled NP -> <categories>., and thus indicates that the first part of the hypothesis, the presence of an initial NP, has been confirmed. The third kind of label indicates a fully confirmed hypothesis and will only be found on arcs that cover a string made up of an NP substring followed by a VP substring. This third kind of label is thus the closest equivalent to the earlier simple category labels that we used in WFSTs, only it provides a little extra information since it indicates the sequence that warrants the presence of the category ascription.



A WFST that has been modified to include hypotheses in the manner described is known as an active chart,

which we shall simply abbreviate to chart hereafter in accordance with common usage, following the work of Kay, and the family of parsers that exploit the chart as a data structure are known as chart parsers. A node in a chart is often referred to as a vertex and the arcs as edges. Arcs that represent unconfirmed hypotheses are known as active edges and those that represent confirmed hypotheses - that is, those with a label of the form C -> <categories>.) are known as inactive edges.

Obviously, charts can represent everything that a WFST can. Figure 6.9 shows how our earlier example WFSTs would appear in chart notation. As we would expect, these charts only exhibit inactive edges. Figure 6.10 is the chart that corresponds to our informal hypothesis diagram in Figure 6.7. Notice that this contains two active edges.



We pointed out earlier that the WFST is completely neutral with respect to parsing strategy and this is true also of the chart.

We can represent a chart as a set of structures each of which has the following attributes:




<START> = ... some integer ... <FINISH> = ... some integer ... <LABEL> = ... some category ... <FOUND> = ... some category sequence ... <TOFIND> = ... some category sequence ...

where <LABEL> is the LHS of the appropriate dotted rule, <FOUND> is the sequence of RHS categories to the left of the dot and <TOFIND> is the sequence of RHS categories to the right of the dot. In this representation, an edge whose value for TOFIND is the empty sequence will be an inactive edge, and all other edges will be active. We will occasionally use an n-tuple notation in the text to abbreviate such records. Thus, the two tuples:




<0, 2, S -> NP.VP> <3, 5, NP -> Det N.>

represent the following active and inactive edges, respectively:




<START> = 0 <FINISH> = 2 <LABEL> = S <FOUND> = <NP> <TOFIND> = <VP>

<START> = 3 <FINISH> = 5 <LABEL> = NP <FOUND> = <Det, N> <TOFIND> = <>

Send us a comment.



[Contents] [Previous] [Next]
This document was translated by troff2html v0.21 on October 22, 1996.