We implement and present our lexicons in the lexical knowledge representation language DATR (see Evans & Gazdar 1989a, 1989b 1996; Keller 1995, 1996). DATR is a rather spartan nonmonotonic language for defining inheritance networks with path-value equations. The development of DATR was guided by a number of concerns which we summarise here. The objective was to design a language which (i) has an explicit theory of inference, (ii) has an explicit declarative semantics, (iii) can be readily and efficiently implemented, (iv) has the necessary expressive power to encode the lexical information presupposed by work in the unification grammar tradition, and (v) can express all the evident generalisations and subgeneralisations about such entries. In keeping with its intendedly minimalist character, it lacks many of the constructs embodied either in general purpose AI knowledge representation languages or in contemporary grammar formalisms. The language is nonetheless sufficiently expressive to represent concisely the structure of lexical information at a variety of domains of language description.
It should be stressed that DATR itself is no more than a very general language for lexical description and therefore does not commit or restrict the linguist using it to any particular linguistic framework, theory or formalism, nor is it restricted in the class of natural languages that it can be used to describe. Clearly, it is well suited to lexical frameworks that embrace or are consistent with inheritance and non-monotonicity through networks of nodes, but these are not requirements. DATR can be (and has been) used to implement differing theoretical approaches (including ILEX , HPSG , LTAG , Word Grammar, Finite State Morphology, Network Morphology, Paradigm Function Morphology), and is perhaps best thought of as a programming language which can be used to implement and test linguistic theories. Indeed, it would not be entirely misleading to think of DATR as a kind of assembly language for constructing (or reconstructing) higher level theories of lexical representation. Unlike most other formal languages proposed for lexical knowledge representation, DATR is also not restricted in the domains of linguistic description to which it can sensibly be applied. It is designed to be equally applicable at phonological, orthographic, morphological, syntactic and semantic domains of description. But it is not intended to replace existing approaches to those domains. DATR cannot be (sensibly) used without a prior decision as to the theoretical frameworks in which the description is to be conducted; there is thus no `default' framework for describing, say, morphological facts in DATR.
In DATR, information is organised as a network of nodes, where a node is essentially just a collection of related information. In the context of lexical description, a node might correspond to a phoneme, a syllable, a morpheme, a word, a lexeme, etc., or a class of such items. For example, we might have a node describing an abstract Word in German, a node for the class of German nouns, a node for the subclass of German nouns that mark plurals with -s, a node for the particular noun lexeme Klub (`club') and still more for the individual words that are instances of this lexeme Klub, Klub-s. Each node has associated with it a set of equations that define partial functions from paths to values where paths and values are both sequences of atoms (which are primitive objects). Atoms in paths are sometimes referred to as attributes. The syntax and terminology of DATR, like its name and its minimalist philosophy, owes more than a little to that of the unification grammar language PATR (Shieber 1986).
There have been more than a dozen different implementations of the DATR language. They include Roger Evans 's (Brighton) implementation, which is written in Prolog and runs on most Unix platforms; Dafydd Gibbon 's (Bielefeld) DDATR Scheme, NODE Sicstus Prolog, ZDATR (in C), and awk implementations; Jim Kilbury 's (Düsseldorf) QDATR Arity, Quintus and Sicstus Prolog implementations; and Gabriel Illouz's (Paris) implementation of CDATR (in C). All of these are freely available on request, as is an extensive archive of over two hundred example fragments some of which illustrate formal techniques and others of which are applications of DATR to the lexical phonology, morphology, syntax or semantics of a wide variety of different languages (including nontrivial fragments of aspects of the lexicons of Arabic, Czech, Dakota, English, French, German, Gikuyu, Italian, Japanese, Latin, Polish, Portuguese, Russian and Spanish, and smaller indicative fragments for Baoule, Dan, Dutch, Hua, Nyanja, Serbo-Croat, Swahili, Tem and Welsh Romany.
