Saturday, April 04, 2015

Improving language?

In a natural segue from Ibn Khaldun, I've been reading Ernest Gellner - specifically, Words and Things, his attack on Linguistic Philosophy (that is, on Wittgenstein and his followers at Oxford). As he presents it, Linguistic Philosophy amounted to, essentially, the descriptive study of lexicography and semantics. Since meaning is defined by usage, any statement that would be accepted as true in ordinary language is ipso facto true, and any philosophical argument suggesting otherwise can only be the result of some semantic misunderstanding; a philosopher's only legitimate goal is to figure out how words are used in ordinary language to prevent such misunderstandings. The key weak point of this view, for Gellner, is its underlying assumption that ordinary language is unimprovable:
To "observe how we use words" is to make statements, in ordinary language, about the role, function, effects, and context of expressions. But in doing this, the concepts and presuppositions of that ordinary language are taken for granted and insinuated as the only possible view [...] It is true that certain things may be said in favour of ordinary language. It would not be in use, and it would not have survived were it not wholly without merit. But this argument, as in politics where it is often used to buttress conservatism, proves fairly little. Very silly and undesirable things often survive, and neither society nor language is such a tightly integrated whole as would disastrously suffer from alteration of some one part. (pp. 195-197)
For Gellner, contra Wittgenstein, ordinary language can be improved upon by the very activity of reflecting on it, leaving a positive role for philosophy after all:
[T]here are many language games which become unworkable when properly understood: where self-consciousness not merely does not "leave everything as it is" but simply necessitates change. Many "conceptual systems", in primitive societies and in advanced ones, contain confusions and absurdities which are essential for their functioning. To lay them bare is to make such a framework unworkable. (p. 206)
The notion of improving language (my paraphrase) would need a lot more working out than I see in this book, but presumably means something like "make the concepts and presuppositions underlying language use more internally coherent and in better accord with non-linguistic experience."

Such a standard would not necessarily imply that one language can be superior to another. For one thing, while such concepts and presuppositions certainly play a role in language use, they don't seem to be critical to the definition of a language; you can change them and leave the language sufficiently intact to be mostly understood by speakers who have retained the old ones. A single language has room for many different kinds of language use.

However, it would suggest a potentially interesting alternative to a purely descriptive approach to linguistics. If Linguistic Philosophy was the effort to identify ways in which attention to ordinary native speakers' usage might correct misunderstandings embedded in philosophical thought, would Philosophical Linguistics be the effort to identify ways in which attention to philosophical thought might correct misunderstandings embedded in ordinary native speakers' usage?

Saturday, March 14, 2015

Sapir-Whorf is no shortcut

Lately the Sapir-Whorf hypothesis - that the language you speak influences the way you think - has had a bit of a revival; investigators such as Boroditsky or Levinson have finally managed to demonstrate small Whorfian effects on colour perception and sense of direction. Unfortunately, these successes only underscore how difficult it would be to make a convincing case for the version of this idea that perennially fascinates the public: the idea that language determines aspects of our worldview. Well before Sapir or Whorf, Nietzsche summarises it in Beyond Good and Evil:
"The strange family resemblance of all Indian, Greek, and German philosophizing is explained easily enough. Where there is affinity of languages, it cannot fail, owing to the common philosophy of grammar - I mean, owing to the unconscious domination and guidance by similar grammatical functions - that everything is prepared at the outset for a similar developmnent and sequence of philosophical systems; just as the way seems barred against certain other possibilities of world-interpretation. It is highly probable that philosophers within the domain of the Ural-Altaic languages (where the concept of the subject is least developed) look otherwise "into the world", and will be found on paths of thought different from those of the Indo-Germanic peoples and the Muslims [...]" (Walter Kaufman's translation)
If a community's grammar really does affect its worldview, two centuries of speculation have hardly brought us any nearer to proving it, much less figuring out how. The commonsense converse, that a community's worldview affects its grammar, is rather better supported. But this idea's attraction for intellectuals, I think, is basically technological: it holds out the promise of being able to change the way people think "just" by changing the way they talk, as envisioned for Newspeak and Láadan. Ironically, it's observably true that imposing a new language on a previously monolingual community usually implies major changes in the way they think - that's what happens when you introduce compulsory schooling - but that has less to do with the language than with the institutions diffusing it.

The technological question remains, then: can we redesign some aspects of our language to help us think more effectively?

For grammar, the answer is not obvious. For the lexicon, however, the answer is yes, and we do it all the time. If something seems to need a name, we give it one - "mouse" or "selfie". Sometimes we choose a name that transparently encodes an property of this item that's particularly important to remember - "henbane" or "fool's gold". Ask any taxonomist whether the existence and form of a name matters, or any mathematician whether all notations are equal.

But this isn't actually the shortcut that some science fiction would have us believe. Many readers probably know that "henbane" is some kind of plant, but couldn't identify it if it was sitting in front of them, much less take advantage of knowing the name to prevent some unfortunate fowl's death. Understanding a given domain requires you to have words for the items signified by its technical vocabulary, but the most important part of that is learning to identify and think about the referents. Hundreds of New Age texts attest to the fact that you can use the vocabulary of quantum mechanics without understanding the first thing about it.

This points the way towards a solution, but not a very linguistic one: If you want to make your language better for thinking with, then first learn to perceive and think about the world more clearly yourself, and then share what you learn (and the labels you've given to it) with other interested speakers. Make a point of spotting and labelling relevant differences between things or situations, and involve yourself in a wider range of situations than you're used to. A sign is a link between word and world - between the set of all possible combinations of phonemes, meaningless in themselves, and the set of everything the speaker has some idea how to recognise. Expanding the former is meaningless unless you're expanding the latter.

Saturday, March 07, 2015

Ibn Khaldun: Arabic dialects are independent languages

In Part 39 of the Muqaddimah, written in 1377, Ibn Khaldun discusses Arabic dialectology and language contact, reaching substantially correct conclusions marred only by the lack of attention to the role of purely internal developments in language change. The section is worth reading, if you haven't already come across it; it gives some idea of just how divergent the different Arabic "dialects" already were in his time. Like a lot of his work, if he had written it today, it would get many Arab nationalists up in arms! The translation is my own, and needs double-checking - appropriately, the Arabic of Ibn Khaldun is often difficult for modern Arabic readers.

"That the language of the city dwellers and townsmen is a language independent of the language of Mudar [Classical Arabic]

Know that the customary medium of discourse in the towns and among the city-dwellers is not the old language of Mudar, nor the language of the people of the generation (of Arabs). Rather, it is a different language, independent, and far from the language of Mudar and of this generation of Arabs in our time. Indeed, it is further from the language of Mudar (than the language of modern Arabs is).

The fact that it is an independent language is obvious; witness how many changes it has which grammarians consider as solecisms. Nevertheless, it varies in its expressions depending on the town. The language of the Mashriq is somewhat different from that of the Maghreb, and likewise that of Andalus from both. Yet each succeeds, with his own language, in realising his purpose and expressing what is within him. That is what is meant by "tongue" and "language". The loss of case-/mood-suffixes is not a problem for them, as we have already said regarding the Arabs of the present day.

As for the fact that it is further than the language of this generation (of Arabs) from the original language, that is because distance from the language depends on mixing with non-Arabness. The more one mixes with non-Arabs, the further one gets from the original tongue, because habits are acquired by learning, as we have said, and this (linguistic) habit is a mixture of the original habits which the Arabs had and the secondary habits which the non-Arabs had. So the more they hear it from non-Arabs and grow up with it, the further they get from the original habit.

You may observe this in the towns of Ifriqiya and the Maghreb and Andalus and the Mashriq:

  • As for Ifriqiya and the Maghreb, the Arabs there mixed with the non-Arab Berbers as they spread their civilisation among them. Hardly a town or a generation was isolated from them. Thus non-Arabness came to predominate over the Arab tongue which they had had. It became a different, mixed language, within which non-Arabness predominated for the reasons outlined. So it is further from the original tongue.
  • Likewise the Mashriq. When the Arabs prevailed over its nations, the Persians and the Turks, they mixed with them. Their languages then spread among them through the labourers and farmers and captives whom they took as servants and nannies and wet-nurses. As a result, their own language was corrupted by corruption of their (linguistic) habits, until it became a different language.
  • Likewise the people of Andalus, with the non-Arab Galicians and Franks.

All the people of the towns from these regions came to have a different language, specific to them and distinct from that of Mudar [=Classical Arabic], and distinct each from the other - as we shall recall. It is as if it were a different language due to their generations' mastery of the linguistic habit of it. And God creates and decrees what He will."

Thursday, December 11, 2014

A Mexican colony in Louisiana before Columbus?

In the latest issue of the International Journal of American Linguistics, Cecil Brown, Soren Wichmann, and David Beck announce a rather interesting finding: that Chitimacha [is] A Mesoamerican Language in the Lower Mississippi Valley. I don't know much about any of the languages involved, but insofar as I can judge it, it strikes me as quite convincing. They find 91 cognates between Chitimacha, a language of southern Louisiana, and Totozoquean, a language family of southern Mexico consisting of Totonacan and Mixe-Zoquean. Most of these cognates are very straightforward, with identical meanings and obviously similar, regularly corresponding sounds, and 36 of them involve words basic enough to be on the 100-word Swadesh or Leipzig-Jakarta lists. The grammatical similarities are rather less extensive, but there are a few. So, pending other specialists' comments, it looks like Chitimacha was brought to Louisiana by a migration across the Gulf of Mexico, from somewhere around the Isthmus area.

There is some useful shared cultural vocabulary, including "paper", "to write", "lime", "maize (corn)", "leached corn", and "to shell corn", and it looks like Caddo - spoken just upriver - in turn borrowed much of its maize-related vocabulary from Chitimacha. In combination with archeological evidence, this leads the authors to favour a migration date either some time around 850 AD, when the Caddo began low-level maize cultivation, or sometime around 1200-1450 AD, when they intensified it. Such a late date seems a little troubling, given how few cognates are to be found; Korandje separated from Songhay around 1200 AD, and there are well over 200 shared items there, mostly belonging to basic vocabulary. The ancestor of Chitimacha would have to have already been rather different from any other Totozoquean language even before they reached Louisiana; but then why did they apparently leave no trace in Mexico itself? Perhaps a study of southern Mexican place names could shed some light on the question.

This looks like historical linguistics at its best: a surprising long-distance connection affecting both language and culture. Now it's up to the historians and archeologists to fill in the gaps: why did southern Mexicans find it worth while to cross the Gulf to Louisiana in significant numbers?

Sunday, November 30, 2014

Good prescriptivism?

People tend to enter their first linguistics classes with a vague but strongly felt idea, instilled by English teachers or by society at large, that some ways of speaking are bad, illogical, sloppy, rule-breaking, etc. One of our first tasks is thus to explain to them that, actually, such ways of speaking are just as logical and law-governed as standard English, they're simply obeying a different set of rules. Not infrequently, we follow that up by telling them everything that's wrong with the prescriptive rules of Standard English, based ironically on a very similar set of tropes: they're illogical (stop splitting infinitives because you can't do that in Latin), they're historically inaccurate (don't use singular they even though the King James Bible does), they're incompatible with the rules of modern spoken English (eg "it is I") to the point of confusing them into gross solecisms ("they gave it to John and I"). Unless we're careful, the students end up walking away from all that with the impression that linguists think prescriptivism is bad, full stop. That, however, would be a mistake. As irritating as these problems and misconceptions are, they don't affect the case for having a prescriptive standard language - just the extent of its ambitions and the details of its usage.

Prescriptivism, of course, is all about power: who gets to talk how where, and who gets to say how they should talk. As good libertarians, our first reflex might be to say that this is all unnecessary: let everyone decide for themselves! That has two different problems. The first is that, when people decide for themselves, what they end up with is in fact a set of implicit rules for what's appropriate in which circumstances, and if you want to make life easier for visitors from other cultures, the least you can do is make those rules explicit somewhere. The other is that, in the event of any clashes, it's the more powerful individual that gets to decide, which is a particular problem in the case of public services. You want a driver's license, and you only speak English? Sorry, our local transport officials aren't really comfortable with English, so you'd better brush up on your Russian.

The latter example may sound like fantasy to American or English readers (not so much to the Irish or Welsh), but it's rather close to reality in a lot of the world. If you understand Arabic, have a look at this video of Moncef Marzouki, one of the two current presidential candidates in Tunisia, having a go at his Tunisian interviewer for using too many French words: "Respect the Arabic language! Plutôt, what does plutôt mean? You say plutôt, what's that? My sister in Douz won't understand plutôt. [...] [Interviewer: It's a chance for her to learn...] No, she needn't learn - you learn the language of Tunisians!"

It's populism, of course - but, like a lot of populism, it makes a good point. Why the heck should the average citizen have to speak a foreign language to deal with officials and other elites in his/her own country? (Especially in one as close to monolingual as Tunisia?) In such a situation, if the populace doesn't prescriptively impose their language preferences through concerted action, the bureaucracy will simply impose their own in one-to-one interactions.

Thursday, November 27, 2014

Berber subclassification: Reading Nait-Zerrad

Kamal Nait-Zerrad's 2001 article "Esquisse d'une classification linguistique des parlers berbères" presents a good deal of useful data, but does so in a manner that I find makes it rather difficult to figure out what's going on without plenty of pencil work. In case anyone else has the same experience, here's my take on it. I will not focus on, or even necessarily present, his interpretation here - read the article for that; rather, I'm more interested in figuring out the implications of the data he presents in the light of other work before and since, and in the light of accepted principles of historical-comparative linguistics.

First, he looks at a number of morphological and phonetic isoglosses:

1. The 3rd person singular preterite of CC verbs: yərra vs. yərru. Following Kossmann (2001), we now know that these are actually CC+glottal stop, so the data exemplifies two different sound changes: the relatively trivial *-aʔ > -a, and and the more surprising *-aʔ > o > u. The former is the commonest outcome; the latter is exemplified by: Ait Seghrouchen, Figuig, Beni Snous, Bissa, Timimoun, Mzab, Ouargla, Nefusa. (Ghadames still has o).

2. The proximal demonstrative suffix: -a vs. -u. Again, -a is the default, but -u appears in the same set of varieties as seen in 1, plus one more: Iznasen.

3. The 3rd person singular aorist of CCV verbs: ad yəbḍu vs. ad yəbḍa. Here, -u is the default, and is closer to the original, while -a has spread from the preterite. This applies to the same set of varieties as 2 (excluding Nefusi), plus several more: Rif, Metmata, Chaoui, Jerba.

4. Initial vowel dropping: a- vs. 0-. A number of *(t)a-CV-initial nouns drop the original vowel of the prefix in the same set of varieties as 3, plus Nefusi, Chenoua, and Siwa.

5. Velar softening: in many varieties, in many words, what would elsewhere be k/kk/g/gg corresponds to c/čč/j/ǧǧ. The latter outcome is observed in the same set of varieties as 4, minus Nefusi.

6. Final *-əv: this is retained as such in Ghadames and Awjila, and as constrative length in Zenaga. Otherwise, it becomes -u in most varieties, but -i in the same varieties as listed in 4, plus El-Fogaha (with a few question marks where the author had insufficient data). Cf. Kossmann (1995).

All of 1-6 pick out Zenati varieties, but the exact set differs: 1-2 pick out a core Zenati consisting almost entirely of northern Saharan varieties, while 3-6 pick out a broader Zenati including the semi-arid mountainous lands stretching from the Rif to southern Tunisia, and vary in their inclusion of varieties further east (Nefusi, El-Fogaha, Siwi). Chaker (1972) cites 1-2 and 5 as possibly justifying a Zenati subgrouping, while Kossmann (1999) defines Zenati in terms of 3, 4, and one other morphological innovation, and then cites 5 and 6 as common phonological innovations.

7. Negative intensive theme: retention/loss. The negative intensive is retained in northwestern Morocco (Rif, Iznasen, Senhaja, Ait Seghrouchen, Figuig); in Bissa; in Tuareg and in the nearby oases of Mzab, Ouargla, and Ghadames; and in Jerba. Its loss everywhere else (according to his data, which should be re-checked) shows no prominent genetic patterning, and hence is probably relatively recent.

Then, he moves on to vocabulary, examining 11 lexical variables which I would summarise as follows:

Several forms appear specifically Zenati: irəḍ in the sense of "be dressed" (though it is more widespread in other senses), igur for "go", əɣs for "want", azəgrar for "long", anilti for "shepherd". Of these, El-Fogaha and Siwa share only əɣs for "want", whereas Nefusi shares all except "go in". adəf "go in" is Zenati-specific in the west, but more confusing in the east, being attested in Ghadames and (as an alternative to əggəz) in Air Tuareg.

Several forms appear specifically Tuareg: răgăz for "go", amaḍan for "shepherd", əggəz in the sense of "go in" (elsewhere "go down"), zəgrət (with the extra t) for "long".

One form unites southern/central Morocco with Kabyle: awtul "hare" (vs. pan-Berber a-yərẓiẓ.)

A couple of forms unite Libyan varieties with Tuareg, contrasting with Algerian and Moroccan varieties, in defiance of any plausible genetic classification, reminding us that a tree does not tell the whole story here:

  • iziḍ "donkey" (Tuareg, Ghadames, Nefusi, Siwa, Awjila) vs. aɣyul (everywhere else except El-Fogaha)
  • tufat/tifut/tafyi "tomorrow" (Tuareg, El-Fogaha, Siwa respectively) vs. azəkka (everywhere else except El-Fogaha, Awjila, and Zenaga)

Based somehow on all this, he proposes the following very odd tree:

  1. Group 1
    1. Senhaja, Middle Atlas, Shilha, Kabyle, Zenaga
    2. Tuareg
    3. El-Fogaha
    4. Awjila
    5. Siwa
  2. Nefusa
  3. Ghadames
  4. Group 4 ("Zenati")
    1. Ait Seghrouchen, Beni-Snous, Timimoun, Figuig, Bissa, Mzab, Ouargla
    2. Iznasen, Jerba
    3. Rif, Metmata, Aures
    4. Chenoua

Apparently, to get this he operated by successively applying at each stage the criterion from his list that divided the data into the lowest number of groups possible, without attempting to distinguish innovations from retentions, much less judge the relative likelihood of independent innovation. The fact that even such a crude method was still able to produce a recognisable Zenati subgroup either says something about the robustness of this distinction or about the selection of features. What this data set actually tells us, bearing in mind that shared retentions have no implications for subgrouping and that Zenaga fails to participate in a number of innovations that otherwise seem pan-Berber or nearly pan-Berber, is something quite different:

  • There is definitely a Zenati subgroup, as has been known at least since Destaing (1915), but its boundaries are a bit fuzzy. (If this reminds you of the situation of "Hilalian" g-dialects, that's probably not a coincidence.)
    • Western Zenati:
      • Core (mainly Northern Saharan): Ait Seghrouchen, Figuig, Beni Snous, Bissa, Timimoun, Mzab, Ouargla
      • Transitional (the High Plateau and its edges): Rif, Metmata, Chaoui, Jerba
      • Peripheral:
        • Chenoua (north-central Algeria)
        • Nefusi (northwestern Libya)
    • Eastern Zenati (Libya/Egypt): El-Fogaha, Siwa
  • There is definitely a Tuareg subgroup, as has always been known: Ahaggar, Iwellemmeden, Air, Taneslemt.
  • There just might be a subgroup combining Kabyle with Senhaja, Central Morocco and Shilha: they share the innovation *-əv > -u, and the word awtul "hare". The evidence for it is very weak, though, especially since *-əv > -u is also found in some Tuareg varieties.

The rest of the common features almost all look like shared retentions.

Sunday, November 16, 2014

Out now: The development of dative agreement in Berber

After about two years in the pipeline, an article summarising the results of my British Academy research on agreement in Berber has just come out in Transactions of the Philological Society. If you have access to Wiley Online Library, you can read it online: The development of dative agreement in Berber: beyond nominal hierarchies. If you're interested but don't have access, email me to ask for a copy. Here's the abstract:

Diachronically, agreement commonly emerges from clitic doubling, which in turn derives from topic shift constructions (Givón 1976) – a grammaticalisation pathway termed the Agreement Cycle. For accusatives, at the intermediate stages of this development, doubling constitutes a form of Differential Object Marking, and passes towards agreement as the conditions for its use are relaxed to cover larger sections of the Definiteness and Animacy Scales. Berber, a subfamily of Afroasiatic spoken in North Africa, shows widespread dative doubling with substantial variation across languages in the conditioning factors, which in one case has developed into inflectional dative agreement. Examination of a corpus covering eighteen Berber varieties suggests that low Definiteness/Animacy datives are less likely to be doubled. However, since most datives are both definite and animate, these factors account for very little of the observed variation. Much more can be accounted for by an unexpected factor: the choice of verb. “Say” consistently shows much higher frequencies of doubling, usually nearly 100 per cent. This observation can be explained on the hypothesis that doubling derives from afterthoughts, not from topic dislocation.