If you have access to computer mail facilities, you may occasionally get annoyed with people who besiege you with irrelevant messages and seem to expect a sensible reply each time. You may have speculated that, for people whose messages are particularly predictable and uninteresting, the process of producing replies might be automated in some way. Such an automated reply generator would not have to be very sophisticated, but it would have to respond robustly to any of a large number of potential natural language inputs, and the responses would have to be fairly plausible. Thus, such a system might be designed to respond to particular patterns in the input and produce responses that are related to them. For instance, a general rule like:
I have discovered a new bug in ... . ==> Yes, I found that one last week. But I do not have time to fix ... just now.
would enable the system to respond to a number of different inputs. So if the input was 'I have discovered a new bug in the operating system,' the response would be 'Yes, ... but I do not have time to fix the operating system just now.' A simple program along these lines, called ELIZA, was written by Joseph Weizenbaum in the 1960s. ELIZA was an attempt to reproduce some of the conversational abilities of a non-directive psychologist. It is reputed to have fooled some of its conversational partners into thinking that they were interacting with another human being.
Given the machinery of FSTs developed in the last chapter, a strategy for implementing our reply generator immediately suggests itself. We simply encode the different stimulus-response rules in one giant FST that transduces possible input sentences into possible outputs. Once we start to build an FST with the kind of complexity required by a system like ELIZA, however, we encounter certain problems. To be able to copy arbitrary parts of the input into the output - for instance, the name of the computer program in the above rule - we need to be able to specify an abbreviation that stands for any pair of identical legal symbols. Thus:
EQUAL abbreviates: a_a, about_about, acorn_acorn, ... .
for each possible word in the English language. While it would be tedious to specify such an abbreviation in our current notation, it would be simple to make our recognition programs deal with this identity mapping specially.
Less tractable are problems to do with producing patterns, which have the right degree of specificity without having the size of the FST explode drastically. Let us concentrate for the moment purely on the problem of recognizing instances of patterns for which certain responses are plausible. We will come back to the general transduction issues later. Consider, for instance, the problem for someone who wishes to write an ELIZA program that will give the following responses:
everyone hates me ==> who do you think hates you? everyone dismisses me when i ask a question ==> who do you think dismisses you?
but will not produce the following:
everyone with me is so stupid ==> who do you think with you? everyone except me is happy ==> who do you think except you?
The desired responses seem to be appropriate when the input is of the form 'everyone ... me ...', but only when the first slot is filled with an English verb, such as hates or dismisses. We can account for the fact that the second two responses are inappropriate by the fact that with and except are not verbs. In an FSTN to recognize this pattern, this regularity appears as follows:
Name EVERYONE-ME: Initial 1 Final 4 From 1 to 2 by everyone From 2 to 3 by VERB From 3 to 4 by me From 4 to 4 by ANYWORD. VERB abbreviates: hates, dismisses, likes, ... . ANYWORD abbreviates: a, about, acorn, ... .
where ANYWORD is an abbreviation that covers every English word.
Now let us try to generalize this. For, actually, in the pattern 'everyone ... me ...', we ought to allow for the first slot to be filled with more than one word, as the following examples show:
everyone will punish me ==> who do you think will punish you? everyone has forgotten me ==> who do you think has forgotten you?
The obvious solution is to include a whole bunch of connected arcs between 2 and 3, instead of the single VERB arc. This will encode the notion of 'valid sequence of verbs'. A fairly simple-minded version might involve the following arcs in the bunch:
From 2 to 21 by has From 21 to 3 by V-PERF From 2 to 22 by will From 22 to 3 by V-BASE From 2 to 3 by V-PRES
where:
V-PERF abbreviates: forgotten, punished, loved, ... . V-BASE abbreviates: forget, punish, love, ... . V-PRES abbreviates: forgets, punishes, loves ... .
This seems quite reasonable, but consider what happens when we want to allow our program to respond to other patterns such as the following:
everyone will love you ==> who do you think will love me?
Because the pattern for 'everyone ... you ...' also involves sequences of verbs, when we encode it as a network we will have to go through the construction of a bunch of arcs that looks just like the ones we have just developed for the first pattern. Indeed, we may need to have a copy of these arcs in many places in a complete ELIZA system. The FSTN notation is not being as useful here as it might be. Quite apart from forcing us to develop networks that are quite large for relatively uncomplicated patterns, it has not given us any way to think about the 'verb sequence' bunch of arcs as a single object. In particular, if we subsequently decide to refine our notion of valid sequence of verbs, we will have to make changes in all the places where this bundle of arcs appears in the networks. In our ELIZA program, we are thus encountering in a much more serious way the problem of notational inadequacy which we noted briefly in connection with our transducer SWAHILI-2 and the end of Chapter 2.
A fundamental advance is made by using recursive transition networks (RTNs) instead of FSTNs. Basically, RTNs are just like FSTNs except that they introduce the extra concept of a named subnetwork. That is, it is possible for an arc to name a subnetwork to be traversed, instead of a specific word (or class of words) that is to appear. The idea is that if we have a commonly used bunch of arcs, we can express this abstraction by making it into a self-contained, named network. This network can then be referenced by its name in a network that needs it, rather than having to appear expanded out in every place. Note that we have not used the names of networks at all so far. In contrast to the practice in this book, some authors name a subnetwork by the initial node that the traversal is to start from. This allows the network writer to choose one of several initial nodes as being appropriate for a particular use of a network, but has the disadvantage that node names have to be globally accessible.
Just as an FSTN can be regarded as a specification of an FSA, so an RTN can be regarded as a specification of a machine, a pushdown automaton (PA). Informally, in an RTN, to traverse an arc that is labelled with a subnetwork name instead of a word, it is necessary to traverse the subnetwork named, but remembering where to resume when that has been done. A pushdown automaton is an FSA that is equipped with an extra memory, a stack, that can be used for this purpose. We will examine the role of a stack in RTN traversal later in this chapter. For an RTN, network traversal is defined partially in terms of itself. This is the reason for the word 'recursive' in recursive transition network. We will see in a later section that, although this informal definition looks dangerously circular, it is possible to make computational sense of it.
Here is how our earlier finite state ELIZA network fragments could be reconceived in RTN terms:
Name EVERYONE-ME: Initial 1 Final 4 From 1 to 2 by everyone From 2 to 3 by VERB-GROUP From 3 to 4 by me From 4 to 4 by ANYWORD.
Name VERB-GROUP: Initial 1 Final 2 From 1 to 11 by has From 11 to 2 by V-PERF From 1 to 12 by will From 12 to 2 by V-BASE From 1 to 2 by V-PRES.
ANYWORD abbreviates: a, about, acorn, ... . V-PERF abbreviates: forgotten, punished, loved, ... . V-BASE abbreviates: forget, punish, love, ... . V-PRES abbreviates: forgets, punishes, loves ... .
Having the VERB-GROUP network mentioned in patterns like EVERYONE-ME is an asset because it allows several words to appear between the 'everyone' and the 'me', but eliminates the problems otherwise caused by strings like:
everyone who Mayumi saw me with ...
Note that we could absorb our convention for abbreviations into the more general notion of named subnetworks. But in computational practice, we want a better way of dealing with abbreviations than thinking of them as networks, whether explicitly or implicitly present.
In RTNs, as we have defined them, a (sub)network is only referred to outside of itself by its name, and therefore it does not matter if a node name that is used in one network is also used in another. For instance, there is a node '1' in each of the two networks shown.
Send us a comment.