There are several ways of conceiving of the (inflectional) lexicon in NLP. One way is to construe the lexicon as a list of stems and irregular forms. Regular inflection is then handled extralexically. This approach was near-universal in the days when NLP's ambitions rarely strayed beyond the coverage of English. It presupposes a partition of inflection into the regular and the irregular. Subregularities have to be forced into one category or the other.
Another way takes the lexicon to consist of words - that is the inflected forms that constitute the basic elements of sentential syntax. The task of the lexicon is to capture the mapping from such forms to the sets of bundles of abstract syntactic and semantic information that they express. Thus the word banks, for example, needs to be specified in the lexicon in such a way that we can tell that it is either the plural of a noun or the third person singular of a verb. This is typically done by invoking disjunction and negation or by elaborating a type system in which most forms find a single place.
Yet another way is to construe the lexicon as defining a set of abstract objects, standardly called lexemes. These lexemes themselves constitute definitions of sets of inflected forms. From this perspective, pairs consisting of a morphosyntactic feature specification and a phonologically specified word form are just properties, generally implicit ones, of lexemes. Such properties are on a par with all the other properties of lexemes, syntactic, semantic, and so on. The various phoneme sequences that correspond to distinct word forms have no particular ontological status in this approach. However, given such a sequence as a starting point, the lexeme-based approach will implicitly define the set of morphosyntactic feature specifications that map into it.
To a limited extent, the second and third approaches are interchangeable. A lexeme-based lexicon can be compiled into a word-based lexicon (though not one that captures generalizations about words). And a word-based lexicon can be compiled into a lexeme-based lexicon (though not one that captures generalizations about lexemes). As long as the relevant generalizations are captured somewhere, it may not matter that they are lost in a compiled form. If the latter is hugely redundant and one's application makes compactness desirable (as it might be for lemmatisation, tagging, or parsing), then standard computer science techniques can deliver compaction.
We adopt the third approach in this tutorial. Here, inflectional morphology falls within the tradition that treats paradigms (inflectional classes, declensions, conjugations, etc.) not as epiphenomena but rather as analytically central. The core notion is the lexeme, not the word or the morpheme. Words exist, but only as realizations of (morphosyntactic specifications of) lexemes - hence the use of the term realizational to characterize this tradition. Morphemes also exist, but only as second class citizens. The appearance of a morpheme is just one among several ways that morphosyntactic information gets expressed in the realization of a lexeme as a word. And the rules responsible for realization are all default rules, so irregularity, subregularity and regularity are just special cases of the same thing.
Assuming a word-based view of the lexicon, give a concise statement of the morphosyntactic properties of the German adjective form /alt@r/.
Assuming a word-based view of the lexicon, give a concise statement of the morphosyntactic properties of the English word were (not forgetting its occurrence in clauses like if she were to leave, ...).