The earliest computers were primarily number processors and the resources available on the first-generation programmable calculators are not dissimilar to the resources available on the early computers. If you try to imagine getting a cheap programmable calculator to do machine translation from Russian to English, then you may get a sense of the magnitude of the tasks that confronted the pioneers of natural language processing (NLP) in the 1950s and early 1960s. Even today, computers represent linguistic objects in non-linguistic ways. Consider the word GO. Many computers will represent this word as a sequence of two bytes; namely, 01000111 (= 71) and 01001111 (= 79), these being the ASCII codes for the letters G and O, respectively. If you give a computer a list of names and ask it to sort them, then George will precede Olga in the list that results, not because G precedes O in the alphabet, but because 71 is a smaller number than 79.
Three decades of computer science have given us programming languages that make it as easy to talk about linguistic objects like words and sentences as it is to talk about numbers and addition. But as recently as 30 years ago, things were very different, and the earliest work in NLP should be seen in the context of the resources that were available then.
As those who count sheep know well, counting is a very boring task. Even the very earliest computers counted fast and accurately, and they did not get bored. They could, for example, count how many times 'the' occurs in Hamlet. Some of the earliest work that came to be known as computational linguistics did exactly this kind of counting. A typical application was the attempted attribution of authorship to texts whose authorship was in doubt. In this kind of research, computers are used to compile statistics - for example, the frequency of occurrence of the word 'upon' - in texts whose origin is not in doubt. These figures are then compared with a corresponding set compiled from the disputed or unknown text and a case made that the text has, or has not, been written by the same author.
Other work once considered to be computational linguistics involved the use of computers to derive indexes and concordances from computer-readable texts. Nowadays, such work continues under the rubric of 'literary and linguistic computing', but no longer really counts as computational linguistics. And, of course, these days, even humble word-processing programs often come equipped with sophisticated indexing utilities that can be used to do certain tasks that once required serious computational effort.
One of the first linguistic applications of computers to be envisaged and funded was machine translation (MT). The military and intelligence communities in the US and abroad, in particular, had great hopes in MT and invested accordingly. But, despite the level of funding, the first generation of work in MT was very disappointing.
There was little appreciation of the fact that meaning was essentially involved, nor of the extent of ambiguity in ordinary text. The linguistic theories assumed, to the extent that any were assumed, were rudimentary. And even if they had not been, the computational resources necessary to support more sophisticated theories were simply not available. The first generation MT work amounted to little more than machine language programs for word-by-word substitution. With the wisdom of hindsight, it is unsurprising that the results were of no utility. By the mid-1960s this had become very apparent and US government agency funding for MT research dried up completely in the aftermath of a damning report on MT prepared by the National Academy of Sciences in 1966. No large commercial MT systems existed at the end of that decade. Two decades later, things had changed radically in this respect, as we shall see in the final section.
Many of the developments in NLP have arisen from a changing view of the nature of computers. For, although they are good at arithmetic, it is better to think of computers as very general symbol manipulation machines. The symbols that computers manipulate can represent numbers, or they can represent more complex objects like words, sentences, trees or networks. The machine code instructions that a computer executes perform very simple operations, like shunting information from one part of the machine's memory to another or adding together two numbers. The problem with many early programming languages, such as FORTRAN, was that they forced the programmer to think in terms of numbers and to specify algorithms at a level close to the actual machine code. The 'high level' languages that have developed since then - for example, APL, Pascal and Prolog - allow the programmer to specify instructions in terms of richer and more problem-oriented concepts. The existence of compilers for translating from this more abstract level to the primitive instructions that the machine actually executes relieves programmers of the burden of rephrasing every idea in these terms and leaves them free to concentrate on the problems they are really interested in.
A crucial landmark in the development of NLP as we know it today was the appearance in 1971 of Winograd's SHRDLU program. Winograd's program was written in LISP, the language of choice for most artificial intelligence (AI) researchers during the 1970s.
One of Winograd's major contributions was to provide an 'existence proof' - to show that natural language understanding, albeit in restricted domains, was indeed possible for the computer. SHRDLU demonstrated in a primitive way a number of abilities - like being able to interpret questions, statements and commands, being able to draw inferences, explain its actions and learn new words - which had not been seen together before in a computer program. SHRDLU was a considerable achievement for one person, and one that would have been impossible without the availability of high-level programming languages.
Computer programming is the activity of giving a computer a precise and detailed set of instructions for how to perform some task. Certainly, a lot of knowledge that humans have seems to be represented in this procedural way. For instance, the chances are that you think of the knot in your shoelaces in terms of the sequence of actions that you would have to go through to create it. In fact, you may find it hard to describe the knot adequately without actually going through the motions. Other human knowledge, on the other hand, seems to be less dependent on how it is to be used. For instance, the knowledge that Paris is the capital of France could be used in a number of different ways in different contexts.
If we look at a computer program that performs some task involving natural language, we might well ask 'What knowledge does this program have of grammar? Of word meanings? Of the application domain it operates within?' The trouble is that this knowledge may well be implicit in the instructions that specify how to perform the specific task. The procedural representation suggested by computer implementation can thus get in the way of a theoretical characterization of the task and of what knowledge is required to perform it. A way out of this problem is to represent the rules and principles themselves declaratively as symbolic structures to be manipulated by a program.
The idea of having programs working with explicitly represented, inspectable rules has been very successful in applications of AI, where these rule-based systems have been developed for tasks like medical diagnosis and the interpretation of geological measurements. Programming languages such as
OPS5) have emerged which allow the programmer to simply specify the rules and to leave many of the actual processing decisions up to the machine. One especially exciting development is the rise of "logic programming" languages. Prolog, due to Alain Colmerauer - himself a computational linguist, is the most well-known of these languages and the one that we use in this book. The idea of these languages (still to be completely realized) is for programmers to simply describe their problems in logic, expressing what is to be done rather than how.
In NLP, an example of this might be a programmer specifying a grammar in much the same way as a descriptive linguist. With this representation, the computer would then be able both to generate example sentences allowed by the grammar and to determine whether given sentences were indeed grammatical. As yet, logic programming languages can only produce this kind of performance for very simple grammars, but a great deal of effort is being spent on improving them. Attempts are even being made to design new kinds of computers that will support these languages better than conventional ones.
Send us a comment.