Why are some languages more regular than others?

Many years ago, I did anthropological fieldwork among the Dorze of Southern Ethiopia. Since no grammar of the Dorze language was available, I had to find out what were its basic morphological and syntactic rules. The good news was that once I had identified a rule, I could apply it across the board: there were hardly any exceptions. From this point of view, Dorze stood in sharp contrast with Amharic, the dominant language of then imperial Ethiopia. Amharic (like English) is a language with many irregularities. Dorze regularity was found not only at the morphological level, but also at the phonological level. The many words that had been borrowed from Amharic into Dorze had all, except for the most recent ones, acquired fully-regular dorze phonology.

Why are some languages quite regular and others not? I remember posing the question to the historical linguist Robert Hertzron, whom I met at the time in Addis Ababa. It is, he suggested, because, in the process of language acquisition, children tend spontaneously to over-regularize. They apply any rule they have acquired to all possible instances (in English, for instance, they may over-generalize the ordinary rule for past tense and say “he goed” instead of “he went”). In societies where adults correct children, these mistaken regularization are suppressed and irregularities are maintained; in societies where adults leave children alone in this respect, irregularities are less stable, and the language tends to be more regular. Gary Marcus et al. in their monograph on “Overregularization in language acquisition” (1992) quote Jill de Villiers half-joking: "Leave children alone and they'd tidy up the English language."

The view that children are the main source of language regularization is an old one. According to Max Müller, for instance, “It is likely… that the gradual disappearance of irregular declensions and conjugations is due, in literary as well as in illiterate language, to the dialect of children. The language of children is more regular than our own” [Müller 1890: 75].

This view however is far from being generally accepted. Marcus et al. (1992) for instance, , hold that over-regularization errors are rare. Language learners, they argue, tend to assume that only one form expresses a grammatical function (either “goed” or “went”, but not both) and that the form they hear from adults (e.g., “went”) is the right one (but see Maratsos 2000). Moreover, apparently inconsistent with Hertzron suggestion, it is generally agreed that the role of explicit correction in language learning is at best quite marginal.

Still, the question I am raising is not the historical question: why is there a tendency towards regularization in the evolution of a given language (in the absence of typically language-contact-related sources of irregularity)? It is the comparative question: why are some languages more regular than others (even limiting the comparison to related languages with similar morphologies)? It could be that both children and adults contribute to historical regularization, or even that adult contribution is more important, and that, nevertheless, children are more conformist; if they are given evidence that a certain form is the socially approved one, they may inhibit over-regularization to a higher degree. This evidence, by the way, can take many form and be more subtle than explicit correction. If on the other hand, adults don’t seem to care and there is no such evidence, then children may over-regularize more freely, with the cumulative result that irregularities would disappear in fewer generations. (It would be useful, incidentally to have evidence on the frequency of over-generalization in children drawn not just from English, where there is social pressure for conformity, as is the case now, but also for languages where there is no such pressure, such as Fijian.)

So, I don’t know what is the right answer to the question I was puzzling about in Ethiopia. I would welcome 1) any theoretical suggestion, and 2) any relevant (positive or negative) evidence of a correlation between, on the one hand, regularity/irregularity in morphological and phonological forms in the language and, one the other hand, the prevalence in the linguistic communities of views regarding ‘correct’ and ‘incorrect’ usage, views expressed in particular in the form of feedback, however subtle, to language learners.

some ideas
Tuesday, 03 January 2012 19:24

Exciting issue. I am not an expert, but some factors that might influence the relative regularity of languages have been proposed:

- The proportion of second language learners, and more generally of strangers among speakers. Linguists like Peter Trudgill  have claimed that rules were more systematic in 'exoteric' languages, i.e. languages that are learnt and spoken by non-natives. Their main argument is not that adult learners would overregularize more than children (though, for all I know, that is not entirely impossible). Rather, they claim that adult learners demand rules and explanations. (For an exposition of this claim, Wray and Grace 2007 can be useful) If this theory is right, though, your Dorze case would be an obvious counter-example.

- Use in writing by a long-lasting administration. Writing can be used as a tool for grammarians and language-legislator, but it can also freeze many linguistic accidents, thus preventing the regularization that often results from oral wear-and-tear. This might explain your Dorze case. But I admit that the argument cuts both ways, as linguists will sometimes also associate writing with regularization.

- Frequency of use. This one is a personal speculation. We know from quantitative studies, mostly of English, that, inside a given language, the death rate of irregular forms is strongly linked to their frequency of use. Perhaps one could use this observation to guess what shapes between-language differences in regularity? After all, average frequency of use is likely to differ between dialects as well as between words. There are language with few words whose speakers talk a lot (such as international English), alongside languages that have many words but are rarely spoken (like many literary languages). Words would be more frequently used in the former, which would lead me to expect more regularity. 

Mostly, though, linguists react with hostility to the suggestion that demography and social structure could influence regularity in languages. This is a short way to noticing variations in linguistic complexity, which is anathema to most linguists. This prevention is slowly giving way, making those issues very exciting to follow.


Wednesday, 04 January 2012 11:50

'Words would be more frequently used in the former, which would lead me to expect more regularity."

I meant *less* regularity.

Irregular patterns: Exponents or 'Adjustments'?
Friday, 06 January 2012 03:21

I think it's useful to keep in mind a distinction in the way irregular patterns may relate to the expression of morpho(syntactic) features. Consider two examples from Germanic languages:

(1) ox - oxen, goose-geese, sheep-sheep
(2) Baum-Bäume, Wurst-Würste, Schuh-Schuhe

In (1) we have the traditional examples of irregular number marking in English (including 'sheep', with no phonetically explicit plural marking). "Irregular" here refers to a pattern that is distinct from that created by the rule deriving the majority of plural forms (i.e., suffixation of -s) which is also the productive rule for plural marking (if a new item enters the language, as in the experimental context of the "Wug tests", this is the plural marking it's going to receive).

Examples (2) show one of the morphologically alternative ways to code plural number in German nouns. The irregularity here lies not in the main exponent of the feature [+plural], which is -e in all forms, but in the behavior of the stem. "Irregular" means that in addition to the rule-based generalization linking plural formation with the attachment of -e, there's an unpredictable (in the sense of being something external to the morphological rule itself) pattern whereby particular lexical items have to be marked as undergoing a vowel quality change ('Umlaut').

Now, the key question is: what's the status of the vowel quality changes in German? This is by no means a settled issue in Linguistic Theory (with similar patterns abounding in a variety of languages; e.g. Greenberg 1950: 150-151 on Nilotic languages) but it is likely to have consequences for our thoughts on the historical dynamics of such patterns. If the stem vowel changes are seen as simple 'phonological adjustments', being then "side-effects" of -e affixation in German (or rather, a an adjustment triggered by a [+plural] feature, or triggered by a floating phonological feature), then these patterns have in a sense no functional importance, as far as the expression of number is concerned (cf. e.g., Halle & Marantz’s 1993 Distributed Morphology). If, on the other hand, these vowel changes are seen as part of the expression of the notion of plurality, then these markings do have functional significance (Spencer 2001). It’s also important to note that for several researchers who opt for the first analysis, forms such as sheep (plural) and geese in English should be analyzed as having a ‘phonologically null’ affix as the exponent for the notion of plurality, thus: sheep-0 and geese-0. This affix is a regular morphological formative for such theories, much as -s or -e, with the sole difference that it happens to be phonologically void.

    As a final note, once one assumes that that even sound change, which is ‘regular’ in the sense of being conditioned by phonetic factors alone, may be conditioned by morphological factors, for example, when the application of such changes could led to the deletion of a marker for an important grammatical feature (Kiparsky 1972, Campbell 1974) than the relation of such ‘irregular’ morphological patterns to the morphosyntactic features they may, or may not, express is of great relevance to an understanding of the historical outcome or the historical origins of such patterns.  
Campbell, L. (1974) “On Conditions on Sound Change” In: J. Anderson & C. Jones (eds.) Historical Linguistics II. North-Holland Eds.: 89-97.
Greenberg, J. (1950) “Studies in African Linguistic Classification V: Eastern Sudanic” Southwestern Journal of Anthropology 6 (2): 143-160.
Halle, M. & A. Marantz (1993) “Distributed Morphology: The Pieces of Inflection” In: Ken Hale & S. J. Keyser (eds.) The View from Building 20. MIT Press: 111-176.
Kiparsky, P. (1972) “Explanation in Phonology” In: S. Peters (ed.) Goals of Linguistic Theory. Prentice-Hall.
Spencer, A. (2001) “Morphophonological Operations” In: Spencer, A. & A. Zwicky (orgs.) The Handbook of Morphology. Blackwell Publ.

anathema to most linguists?
Sunday, 15 January 2012 17:44
This is a response to Morin's comment: "linguists react with hostility to the suggestion that demography and social structure could influence regularity in languages" and his later claim about stuff that's "anathema to linguists".  Generalizations about what linguists think -- the meme that linguists are uninterested in all sorts of interesting things -- is a very pervasive, annoying and weird phenomenon in academic circles.  I'm a linguist, and I'm fairly sick of it. Linguists by definition know more about language than most people who haven't spent their lives studying it, and perhaps they do react negatively to the vigor and self-assuredness with which amateurs so often make pronouncements about language - but the linguists I know are also keenly aware of how much about language remains a mystery to us, and are generally as intrigued as any serious researcher in new and even shocking observations - so long as we can be sure they are real!


That said, my first worry about the example under discussion is whether spending more time with Dorze might not reveal irregularities and unpredictabilities that Sperber's fieldwork did not uncover.  But let's suppose not.  One question that comes to my mind is whether small population size might play a role in leveling morphological and phonological irregularity in a language, along the lines of the link between low phoneme inventory and small number of speakers suggested by Hay and Bauer (2007) ["Phoneme Inventory Size and Population Size", Language 83, 388-400] (an article that this linguist, at least, found fascinating).


Another observation is that even within languages that show sizable numbers of irregularities, such as English, there are pockets of absolute regularity.  Though there are plenty of unpredictable past-tense and past-participle forms of English verbs (sing-sang-sung etc.) there are zero irregular present participles.  You always add -ing to the form used in the infinitive, no English verb ever does it differently, and speakers appear to have no doubts about this.  So the language faculty must make available a switch of some sort that turns on and off even the possibility of irregularity for a given affix.  One could imagine some languages making use of this switch more than others, and others leaving it permanently in one or the other position  - possibly as a consequence of extra-linguistic factors of the sort under discussion here.

Dear angry reader...
Monday, 16 January 2012 14:25

Many bona fide professional linguists share my impression. In their 2007 book on language complexity, Geoffrey Sampson, David Gil and Peter Trudgill describe exactly that situation (see here). Like me, they feel that the notion of languages differing in complexity is not accepted in many, though not all, academic circles. Being linguists themselves, they have, like me, a lot of respect for the profession. None of us is claiming that "linguists are uninterested in all sorts of interesting things". On the contrary my comment was stressing the fact that, I quote, "this prevention is slowly giving way, making those issues very exciting to follow." Ex-ci-ting. I love and respect the work of those linguists, as that of many others.

Who's angry?
Monday, 16 January 2012 16:07

Thanks for your reply, but I am in no way an "angry reader". Frustrated, sure. But "angry" goes a bit too far. I am just a reader who, by virtue of my academic profession, you predicted would be "hostile" to your speculations about answers to Sperber's question and view them as "anathema". I think it's useful to call people to account every now and then on the stereotyping of linguists, because it is a weird and counterproductive problem of today's academic world -- and that's all I did (proceeding immediately to some more constructive remarks). The fact that a blurb for a book edited by some "bona fide professional linguists" agrees with you is neither here nor there.

P.S. My co-authors and I actually spent about a page on the culture/structure issue in our Language article about Pirahã (http://ling.auf.net/lingBuzz/000411). See the discussion on page 358.

Sociolinguistic Typology
Wednesday, 18 January 2012 23:35

There is a rather long – 288 pp – attempt to answer this and similar questions in my new book Sociolinguistic typology: social determinants of linguistic complexity published by Oxford University Press. The short answer seems to be that there are tendencies to regularisation and other forms of simplification in all languages. But there are also tendencies to irregularisation in all languages. And there seem to be sociolinguistic factors which influence which of these tendencies are most likely to predominate in the long run and in the short run. Those factors are: degree of adult language contact; degree of social stability; density of social networks; amount of communally shared information; and, yes, community size – but this latter is by no means the most important factor. It follows that some languages are indeed more complex than others – at least using my definition of complexity, which makes it equivalent to L2 difficulty. Some linguists still find this controversial. Sociolinguists and dialectologists who have studied adult language contact-induced simplification at first hand have, on the other hand, never had any problem with the notion of non-equicomplexity.

