The pathological nature of logical inference

It is important to note that both forwards and backwards inference involve searching in the process of ascertaining whether a goal is true. In forwards inference, a search must be made through all the consequences of the available facts to see if the goal belongs there. Some of these consequences will be quite irrelevant to the goal. In backwards inference, a search must be made through all possible ways in which the rules can be used to establish the goal as a conclusion. Some of these ways will be unsuccessful, because relevant conditions cannot be shown to be true.

One particular danger is that, given a particular set of facts and rules, the inference search space may be infinite, and the inference system may never terminate and tell us that a goal can or cannot be shown to be true. For instance, consider the use of the following information by a backwards inference system:


human(X) if human(Y) and mother(X, Y) human(beryl) mother(petunia, beryl)

If we ask such a system to find all possible humans, using the first rule the system will generate a subgoal essentially identical to the original goal - to find a human X, it will seek to find a human Y. Of course, the use of this rule will be safe if the system has always found an example of X and Y in a mother relation before embarking on the recursive subgoal, which relies, of course, on mother terminating nicely. But, unless we give it this hint, or write the rule in a specific order using knowledge about how the inference system works, there is nothing to stop the system getting into an infinite loop. This particular problem is very similar to the left recursion problem that we encountered in Chapter 5 with top-down parsers. If we try to introduce checks to catch infinite loops, we are liable to cut out valid parts of the search space. For instance, it is quite reasonable to find a human by finding another human who is one of their parents. This is how we would establish that Petunia was a human. If we were not allowed to have 'looking for a human' as a subgoal of 'looking for a human', we could not show this (given our premises).

If we have a logical language that allows function symbols, the kinds of infinite loops that are possible can be much more subtle. For instance, given the rules:


human(X) if human(mother_of(X)) human(mother_of(mother_of(petunia)))

a backwards inference system, asked to find a human X could use the first rule to generate a subgoal that was the same except that X was replaced by mother_of(X). We can thus get an infinite sequence of subgoals:


human(X) human(mother_of(X)) human(mother_of(mother_of(X))) human(mother_of(mother_of(mother_of(X)))) ...

As it happens, the third of these matches the information about Petunia, but there is no way in general that we can tell how many of these subgoals might be relevant.

The situation with a forwards inference system is, unfortunately, no better. Consider the following information, where richer(X, Y) means that X is richer than Y:


richer(X, Z) if richer(X, Y) and richer(Y, Z) richer(pickens, boesky) richer(boesky, icahn) richer(icahn, hunt)

and imagine what happens when we add richer(hunt, iacocca). As expected, the system will infer for each person that they are richer than iacocca (even if this is not true in the real world). But some things can be inferred in more than one way:


richer(pickens, iacocca) because richer(pickens, boesky) and richer(boesky, iacocca) richer(pickens, iacocca) because richer(pickens, icahn) and richer(icahn, iacocca) richer(pickens, iacocca) because richer(pickens, hunt) and richer(hunt, iacocca)

Moreover, a given fact can be used in many ways to prove something else. For instance, richer(icahn, iacocca) can be used to prove richer(boesky, iacocca) and richer(pickens, iacocca). We have to be careful to avoid infinite loops where we repeatedly rederive and add the same fact to the database.

In an example like this, we can check for loops quite simply, but once again function symbols produce problems. Consider the following statements about geometry, where p1 is a point and, given any two points, there is a point half-way between them:


point(half_way(X, Y)) if point(X) and point(Y) and different(X, Y) different(half_way(X, Y), X) different(half_way(X, Y), Y) point(p1)

If we add the facts point(p2) and different(p1,p2), then we get an infinite set of consequences including:


point(half_way(p1, p2))) point(half_way(half_way(p1, p2), p2))) point(half_way(half_way(half_way(p1, p2), p2), half_way(p1, p2)))) point(half_way(half_way(half_way(half_way(p1, p2), p2),half_way(p1, p2)), half_way(half_way(p1, p2), p2))))

As in the mother example, there is no straightforward way, given an arbitrary set of other rules, to determine just which of these are going to be of any use for productive inferences.

It is not just infinite loops that worry the computational linguist (and expert system designers, theorem provers, game programmers, and the like), in general it is just any large search spaces. And these are inevitable if we are to encode any reasonable amount of real-world knowledge in logic. For instance, if we wish to find out whether Morgan Brothers is registered with VISA, we might decide to find out whether Morgan Brothers is considered credit worthy. This might involve considering the income and expenditure of the company over the last year, looking for instances where the company may have been in trouble. Or it might involve considering what customers Morgan Brothers has had recently, or the reputations of the directors of the company, and so on. Each of these possibilities could involve a substantial search among possible proofs without any guarantee of success. And any proof procedure that works purely mechanically from the rules, without any special domain-dependent strategies, is bound to be inefficient and cannot be relied upon to generate answers in a reasonable time.

The computational intractability of logical inference is underlined by the formal result that theorem proving in the full predicate calculus is undecidable. This means that there cannot be a computer program which, given an arbitrary set of axioms and a desired conclusion:
Prints out 'yes' if the conclusion follows from the axioms.
Prints out 'no' if the conclusion does not follow from the axioms.
Always finishes running, printing 'yes' or 'no' as above.

What is it about logic that leads us to this predicament? The trouble is that in choosing logic as a meaning representation language we are choosing a language that makes very weak claims about what the world is like. So, it is not surprising that logic can only provide relatively weak and general inference techniques. For instance, using standard logic commits us to not much more than the following statements:
The world consists of objects and relations - although we do not know what these are.
A logical statement is either true or false - although it may be difficult to work out which.
Logical inference is monotonic - new information may lead to new inferences, but it cannot invalidate earlier ones - although we do not know how to carry out inferences without combinatorial search and infinite loops.
The logical operators and quantifiers (all, and, exists, not, or) are sensible ways to construct complex statements about the world.

How can we solve this problem with uniform proof procedures? One strategy is to investigate ways in which domain-dependent information can be used to guide a uniform proof procedure to make it more intelligent. Alternatively, we may wish to use restrictions of standard logic, where inferences are more computationally feasible. Another possibility is to use other logics, when non-standard rules of inference are required.

We mentioned earlier, the problem caused by the fact that even though inferential search spaces may not be infinite they can still be unmanageably large if we encode all our world knowledge simply as logical rules. When a human being has to reason about the world, he or she does not seem to take into account all the factors that might conceivably be relevant - somehow the context focuses the attention in a useful way. It is therefore plausible to design ways of organizing logical rules and restricting their visibility to the reasoner, so that it spends less time on dead ends. Such an approach, if it is only based on heuristics, is liable to adversely affect the ability of the inference system to produce all the logically valid consequences, unfortunately.

Restricting inferential visibility is one of the principles behind the ideas of frames (which we first mentioned in Chapter 1) and one of the motivations for the object-oriented programming paradigm that exploits the notion of frame. A frame can be thought of as a collection of inference rules that have a common condition. For instance, if we are in a context where we are talking about automobiles, then it might be sensible to make the following information about automobiles and their attributes immediately accessible:


automobile(X) if vehicle(X) and number_of_wheels(X, 4) and less(weight_of(X), 100Kg) and human(owner_of(X))

vehicle(X) if automobile(X) number_of_wheels(X, 4) if automobile(X) less(weight_of(X), 100Kg) if automobile(X) human(owner_of(X)) if automobile(X)

Similarly, since automobile involves the concept of person, we might in certain circumstances (for instance, when we need to know about the owner) make a similar set of inference rules about humans and their attributes immediately accessible. The organization of inference rules into larger chunks is one way in which domain-dependent knowledge can be used to control inference.

Send us a comment.



[Contents] [Previous] [Next]
This document was translated by troff2html v0.21 on October 22, 1996.