Czestawa Szymanowska leaves the house of her relatives in San Francisco and takes the bus to the airport. Although her relatives left Poland in the 1960s, they continue to speak Polish at home, so the fact that Czestawa can speak hardly a word of English has caused her no problems whatsoever. But she is alone at the airport and she needs to know if she can reroute her return flight via London, so she can visit a cousin who her relatives have told her now lives there. There are long queues at the check-in desks so Czestawa goes over to one of the public terminals in the foyer and types her question in, in Polish - actually only an approximation to Polish, since the terminal has an ASCII keyboard and she is forced to omit diacritics. The machine duly replies, in Polish.
Putting aside for the moment all questions about how such a machine could have understood her query and constructed an answer, let us simply consider the question of how such a machine might infer that her query was expressed in Polish, rather than Spanish, Russian, English or any of the other dozen or so languages that it is equipped to handle.
Here are a couple of strawman solutions. Firstly, different languages employ the letters of the Roman alphabet with different frequencies. The letter 'j', for example, is much more commonly used in Dutch than it is in English. So, one possibility is to compile letter frequency tables for the various languages and then use these to compare against the frequency profiles of pieces of unknown text. The problem with this solution, which might be very satisfactory for books or newspapers, in the present context is that the sample of text is likely to be very small, perhaps only two or three words, and thus will not have any meaningful letter frequency profile.
A second candidate solution is suggested by the fact that in order to be made available for its presumed function, such a machine must, presumably, contain a parser for Spanish, a parser for Arabic, a parser for English, and so on.
So, presented with the input in a language of unknown identity, we run the Spanish parser on it; if that fails to arrive at a parse, then we run the Arabic parser, and so on. Eventually, we try the Polish parser and succeed. The obvious problem with this solution, at least in a world of serial machines and relatively slow parsers, is that by the time that it realizes that the query was in Polish, the traveller Czestawa will have missed her plane. It is a case of using a sledgehammer to crack a nut.
Suppose this was what Czestawa typed in:
Czy pasazer jadacy do Warszawy moze jechac przez Londyn?
Now, the authors of this book do not speak a word of Polish, nor a word of
Spanish, but if they are told that what Czestawa
typed in is either Polish or Spanish, then
they can tell right away that it is Polish
- told that it was either Polish or Serbo-Croat, they might be less confident
in their judgement.
Indeed, if you were to put your hand over all but the first word, they would
still guess confidently that they were dealing with Polish, not Spanish. So,
the authors are
willing to make a decision here based on exposure to less than
half-dozen letters in a language none of whose words are known to them. The
reasoning used is something like this:
'Czy' could not be a Spanish word but it could be a Polish word.
If the first word is Polish rather than Spanish,
then the chances are extremely high that the whole utterance will be in
Polish, not a mixture of Polish and Spanish.
But how can we be so sure that 'czy' is not a word of Spanish? We cannot read, utter or understand any Spanish, and we have not bothered to look up 'czy' in a Spanish dictionary. Well, we cannot be absolutely sure, of course. But we have naive theories about the possible orthography (written form) of Spanish words and Polish words. These naive graphotactic theories are consistent with 'czy' being a Polish word, but not consistent with it being a Spanish word.
If we could articulate theories of this kind, improve them in the light of the way Polish and Spanish graphotactics really work, and encode them in a manner that a computer could understand, then we would be a long way towards solving the problem that Czestawa's interlocutor solves in our scenario above.
However, before we tackle the problem posed by our airport advice unit, let us turn to a much more trivial problem, one which is apparently quite unrelated: getting a computer to laugh.
Laughing, for present purposes, consists of displaying the sequences of characters 'ha!', 'haha!', 'hahaha!', and so on. Figure 2.1 is a diagram of an abstract machine, TO-LAUGH-1, that will do just that.

This diagram, a
finite-state transition network (FSTN)
is
a conventional illustration of a very simple finite-state automaton
(FSA) with four states, 1, 2, 3
and 4, state 1 being an initial state, signalled by
the '->' pointing into it, and state 4 being a final state,
signalled by
the '->' emerging from it. The four nodes of the graph representing
these states are
linked by four directed
arcs:
one, which starts from state 1 and goes to state 2, is labelled 'h';
another, which starts from state 2 and goes to state 3, is labelled 'a';
another, which starts from state 3 and goes to state 2, is labelled 'h';
and
the final one,
which is labelled '!', which goes from state 3 to state 4.
In machines of
this kind, we will also make use
of unlabelled arcs, which are referred to as 'jump' arcs for reasons that will
become apparent.
How is this diagram to be understood? Well, the machine must start in an initial state, which can only be state 1 in this instance. If the initial state is also a final state, then it can simply do nothing, but that does not apply in the present case.
If the state it finds itself in is not a final state, then it must change state by traversing an arc that starts from the state it is in and changing into the state to which the arc points. In traversing the arc, the machine must follow the instruction that labels the arc. Here, and for the present, we will interpret a label like 'h' as an instruction to print the letter 'h' - on a VDU, a printer, a paper tape, whatever. Since only one arc leaves state 1, our machine has no choices open to it: it has to change state to state 2 and print an 'h'. State 2 is not a final state, and like 1, it has but a single arc leaving it. So, the machine must follow this arc, print an 'a', and find itself in state 3. Now, for the first time, our machine is faced with a choice: it can move to state 4 or back to state 2. If it moves on to state 4, generating a '!', then it will have reached a final state. Now although an FSA, for that is what our diagram depicts, is not required to halt when it reaches a final state, unless the latter has no arcs leaving it, it is permitted to halt in final states. In this case, however, there is nowhere else to go and the machine will halt, leaving the string 'ha!' on the VDU as a trace of its activity. State 3 has another arc leaving it, labelled with 'h' and leading back to state 2. The machine may choose to traverse this arc. If it does so, then it will find itself faced with exactly the same sequence of arcs, actions and choices as it was when it was last in that state. And so we may see 'haha!' printed, or 'hahahaha!', or any such sequence.
We are interpreting this FSTN as a machine for producing laughter and it is important to appreciate that the choice that needs to be made when the machine finds itself in state 3, whether to move to state 4 or back to state 2, is a choice that is outside the machine's own competence. If we were to program a laughing machine based on this automaton, then our program would need to consult some outside authority, such as a random number generator, the time of day or the user, at this point.
But this 'production' or 'generation' interpretation of our FSTN is not the only possible interpretation, nor even the only useful one. Imagine a video game that tells the user jokes. When the user has got the joke he or she is supposed to type in 'haha!' or 'hahaha!' or the like. So, we see dialogues like this:
Game: <joke> User: haha! Game: You liked my joke. I will tell you another. Game: <joke> User: that was awful Game: You are not laughing. You did not like that joke. Never mind, perhaps this will amuse you. Game: <another joke> User: hohoho! Game: You are not laughing. You did not like that joke. Never mind, perhaps this will amuse you. Game: <yet another joke>
Putting on one side reservations we might have about the plausibility of any human user bothering to interact with a program as obviously dim as this one, let us consider how we could press our FSTN into service to check whether the user is 'laughing'. It turns out that we can, simply by changing our interpretation of the labels on the arcs. In the new interpretation, the game prints out its joke and then positions itself in an initial state corresponding to the initial node (1) of the network. It begins to inspect the user's input. If the first letter of that input is an 'h' then it traverses the arc that connects state 1 to state 2 and moves over the letter 'h' in the string that forms the user's input. Then it has to attempt to match the next letter in the string with the letter that labels one of the arcs leaving its current state; if there is a match, then it proceeds as before. If there is another letter in the string, but no match with an arc, then the machine fails to recognize the string. Likewise, if there are no further letters in the string and the machine finds itself in a non-final state. But if there are no further letters and the machine is in a final state, then it has succeeded in recognizing the string and so can print out its little message 'You liked my joke', and so on.
Notice that, in this dialogue, the game has failed to recognize 'hohoho!' as an instance of laughter. The reader should elaborate the machine in Figure 2.1 so as to permit the recognition of 'hohoho!', 'hoho!', 'hehe!', 'hehehehe!' and the like as laughter, but exclude 'hoha!', 'hehoha!' and other mixed cases.
In illustrating two different interpretations of our FSTN as machines, we have shown that the FSTN is actually quite a neutral description of the possible sequences of symbols that count as laughing. Because of this neutrality, we were able to interpret it both as a specification of a machine to recognize laughter and as a machine to generate laughter. To be precise, we should distinguish clearly between the abstract description of laughing (the FSTN) and the different interpretations of it (the FSAs). For instance, there is a conceptual difference between the nodes in the graph and the possible states of a machine that the graph might represent. In practice, however, it is common to blur such distinctions, and so we will talk about FSTNs and the FSAs they could represent interchangeably, only making the distinction when it is important.
Recognition with an FSA so far seems very simple. But suppose we now consider a variant of our network, TO-LAUGH-2, shown in Figure 2.2.

There is an important distinction between FSTNs like that in Figure 2.1 and that in Figure 2.2. The first specifies a deterministic automaton which, as its name suggests, is one whose behaviour (during recognition) is fully determined by the state it is in and the symbol it is looking at.
Our second machine is not deterministic because if it finds itself in state 2 and looking at an 'a' then it can either return to state 1 and attempt to find a following 'h', or it can move to state 3 in the hope that the 'a' is the penultimate symbol in the string (since 3 leads to a final state in one transition). As this description is intended to imply, an implementation of a non-deterministic machine can follow false paths. Such an implementation will either need the ability to explore more than one path through the network of arcs simultaneously or else have the ability to backtrack to an earlier choice point when it discovers that it has been on a wild goose chase.
It was mentioned previously that unlabelled arcs can also appear in FSTNs. When the machine is generating, it can traverse such an arc, a 'jump' arc, without generating any symbol. When it is recognizing, it can likewise traverse such an arc without consuming any symbols from the input string. Jump arcs are an important source of non-determinism, because when an ordinary labelled arc leaves a state as well as a jump arc, the machine always has the option of taking the jump, even if the next symbol is the one required by the labelled arc. Figure 2.3 shows another non-deterministic laughing machine, TO-LAUGH-3, where the non-determinism arises at state 3.

The distinction between deterministic and non-deterministic automata is an important one, and we shall return to it from time to time in the course of this book. In the present context, that provided by finite-state machines, the distinction is one of style rather than substance: for every non-deterministic FSTN, there is an equivalent deterministic FSTN that describes exactly the same set of strings of symbols. We will not attempt to prove this here
but our reader's attention is drawn to the fact that the deterministic FSTN in Figure 2.1 is exactly equivalent to the non-deterministic FSTN in Figure 2.2 in terms of the strings of symbols that it describes.
At this point, the link between our laughing machine and the Polish tourist in San Francisco airport may be beginning to become apparent. If we can encode our knowledge of possible English/French/Polish... words in FSTNs, there can be a relatively simple mechanism for determining in which languages a given sequence of words might be acceptable. Before we return to Czestawa, however, we will step back from the examples and consider an alternative way of talking about FSTNs. The graphic state diagram used in Figure 2.1 is perspicuous, at least for simple machines, but it does not lend itself to ready communication with computers, certainly not with those that are restricted to character and keyboard input. Nor does it lend itself readily to mathematical or logical analysis. Besides, a single notation provides only one perspective on a class of formal objects. Different notations, although all ultimately equivalent, can provide different perspectives and different ways of thinking about the formal objects, and how to use them and implement them. Accordingly, we will now define an intendedly perspicuous formal language for FSTNs.
Send us a comment.