We have seen that resolving ambiguity in sentences requires an interpreter to have a general knowledge of the world. Understanding the significance of a vague utterance expressed in context also requires knowledge. Thus, a theoretical model of language understanding is not complete without a model of knowledge representation and retrieval, and we cannot construct a robust understanding computer without providing it with an encyclopaedic knowledge of the world. These are rather pessimistic conclusions, but they need not prevent us from continuing with the theoretical study of language or indeed from constructing useful computer programs that operate in limited domains. They do, however, suggest that we must try to classify in a rigorous way the kinds of knowledge that guide a language user and codify enough of it to ensure that our overall models are realistic.
How can knowledge of the world guide a reader to correctly interpret a partially ambiguous piece of natural language encountered in context? Where a sentence is ambiguous, world knowledge must indicate that some possible readings are to be preferred over others. At the simplest, it may allow some readings to be rejected because they presuppose physically impossible situations, but even this will be inadequate if the writer is using metaphor or has established a context where the normal physical laws do not apply. Another idea is to exploit the potential a text has to create expectations in the mind of a reader and to think of the reader as preferring readings that are in accord with these expectations. One such model of expectation-based understanding is provided by Schank's script applier mechanism (discussed in Chapter 10). This works on the principle that many events in the world - especially those involving humans, such as going to a restaurant or using public transport - proceed in a stereotyped way. Therefore, when we read about them, at any point there is a limited number of things that we expect to happen next. We can represent knowledge of prototypical events as sequences of expected actions in 'scripts' for the events. To come up with sensible expectations, a robust natural language interpreter will need to have a large number of such scripts. This introduces many questions: How can an interpreter decide which script is appropriate for a given situation? Whether the current script is no longer adequate? How to handle deviations from expected behaviour? The more interesting stories about human beings frequently cannot be understood in terms of stereotyped situations. Rather, it is necessary to reason at a lower level about the goals and plans of the participants to generate expectations.
If we are reading a story, we will be more successful if we can understand the goals and plans of the various characters. We must also, however, bear in mind to some extent the objectives of the writer. Knowledge of the other participant in a communicative exchange is even more important if the medium is speech, as spoken utterances are frequently abbreviated, elliptical, oblique and subject to context-peculiar interpretation. Consider, for example, the following:
A: Excuse me, do you know if there is a newsagent near here? B: It's early closing today. A: What about Brighton? B: Johnson's in Ship Street is open until 5.
It is hard, if not impossible, to come up with any sensible interpretation of this exchange between two people if we ignore the fact that the people are communicating and cooperating. Likewise, a hearer who can ask 'Why is she telling me this?' and come up with a reasonable hypothesis is going to be in a better position to understand than someone who cannot. An intelligent hearer must regard utterances in the same way as any other actions performed by an intelligent being. That is, an utterance is an action that, given certain preconditions, will achieve effects planned for by the speaker.
The notion of planning has always been of great interest to AI workers and there is now the possibility that planning work in other domains, such as the movement of robot arms, will be applicable to natural language understanding and production (see Chapter 10).
Unfortunately for NLP, the plan associated with an utterance is rarely, if ever, transparently marked in the syntax. Often, the intention is conveyed in a form of words that superficially suggests a different intention. A classic example is an utterance like:
Can you pass the salt?
which looks as if it ought to be a question but which is normally intended as a request, or an utterance like:
It is rather cold in here.
which looks like a simple assertion of fact but may well convey intentions having to do with getting a window or door closed by the addressee. NLP researchers now hope that these so-called 'indirect speech acts' can be explained in a unified framework that treats utterances as actions involving the beliefs and goals of the participants, and that assumes principles of cooperative behaviour on behalf of the speaker and the addressee. That is, a successful utterance will cause a change in the addressee's beliefs or goals. The cooperative addressee will attempt to understand the relevance of this change for the speaker's overall plan and will hence establish a suitable cooperative response. To adequately appreciate the effects of utterances, then, it is necessary to be able to reason about beliefs.
In general, it is necessary to reason not only about the speaker's beliefs and the addressee's beliefs, but also about the speaker's beliefs about the addressee's beliefs, the addressee's beliefs about the speaker's beliefs about the addressee and so on. For instance, to be successfully ironic, the speaker must be expressing some proposition that he or she believes to be false. However, if that is all that is required, then simple lies would be instances of irony. But they are not. In addition, the speaker must believe that the addressee believes it is false and the speaker must believe that the addressee believes that the speaker believes it is false.
Once we start to consider either natural language conversations of more than one utterance or extended discourses, such as this introductory chapter, it becomes clear that there is more structure to be found than that in the sum of the utterances, even taking into account the goals of the participants. That is, people adhere to certain rules and conventions about how conversations and discourses should be organized - for example, when turn-taking should happen in the former.
From a computational point of view, the identification of these rules and conventions can serve not only to enable computers to produce more acceptable conversational behaviour, but also to control the inferences that must be made for successful understanding. For instance, if specific linguistic devices are used in a language for indicating the beginning - for example, 'by the way' - and the end - for example, 'anyway' - of a digression, or for indicating how the speaker or writer is shifting the focus from one topic to another, this is information that a computer program should pick up.
An example from the work of Grosz serves to illustrate the point. Early computational approaches to the interpretation of pronouns used simple heuristics based on recency. So, such a program, on encountering a pronoun, would prefer to interpret it as referring to an item in the previous sentence rather than the one before that, and so on. One of the naturally occurring dialogues that Grosz recorded, however, contained an example of what appears to be a pronoun referring back to something last mentioned 60 utterances (30 minutes) previously. Grosz accounts for this in terms of the discourse having an implicit structure that is tree shaped, rather than linear. Thus, although the reference is back to an object that is chronologically distant, it is to something relatively close in the implicit discourse structure.
Send us a comment.