Thursday, October 30, 2014

Some Tuareg-Songhay loans

I'm almost three-quarters of the way through Heath's Grammar of Tamashek (Tuareg of Mali). The main interest lies in its efforts to reduce the bewildering complexity of Tuareg morphology to some sort of order, an impossible task which it accomplishes more successfully than any other Tuareg grammar I've looked at so far. Aside from this, however, it's raised some interesting etymological issues.

I've wondered for years where the Korandjé verb wəy "gather (firewood)" comes from. It normally appears in the idiom a-wwəy-ts skudzi [3Sg-gather-hither wood] "she gathered in firewood". On p. 333 of Heath's grammar, I found the explanation, in the following example:

i-wwáy=ədd i-sǽɣer-æn
3MaSgS-bring.Reslt-Centrip Pl-firewood-MaPl
[He] has brought firewood here.

The Tamasheq verb in question, awəy in the imperative, is simply the normal Berber word for "take, bring" (which in Korandjé is expressed with a Songhay verb, zəw), so I would have hesitated to connect them based on a dictionary entry alone. But given this attested usage with "firewood", the semantic specialisation poses no problems. What does surprises me is that it was borrowed as a bare stem, rather than with a fossilised 3rd person prefix y/i - contrast yəf (Tashelhiyt y-arf "roast", not attested in Tamasheq), ikna "make" (Tamasheq i-kna). Usually, only stems that start with a syllabic onset are borrowed into Korandjé without the y/i.

Another probable loan into Korandjé that I noticed going through the grammar is Korandjé ləwləw "shine, gleam" - cp. Tamasheq m̀ələwləw "shine".

However, a number of words have gone the other way - from Songhay into Tuareg. Heath comments on many of these in his dictionary (eg kə̀rikəw "practice sorcery"), but not all. One that struck me is the verb ḍùkr-æt "become angry at", obviously related to Gao Songhay dukur "be angry"; I don't recall seeing this verb elsewhere in Berber (not even in Alojaly's dictionary of Tamajeq), whereas it's widespread in Songhay.

Obviously cognate are Tamasheq é-tæqq "male ostrich" and widespread Songhay forms such as Gao taatagey, Fulan Kirya taataɣey "ostrich" (the shift of g to ɣ next to non-high back vowels is regular in several Songhay varieties, and in Tamasheq qq is the geminate equivalent of ɣ). The word is generic in Songhay but specific in Tuareg - the opposite of what we saw with "bring" - which suggests to me that it was borrowed into the latter, as does the fact that I don't find the term in Alojaly's Tamajeq dictionary. However, since ostriches are extinct in most Berber-speaking areas, it's difficult to prove the direction of borrowing.

Thursday, October 23, 2014

Berber: classification, Tasahlit, roots vs. stems

Today seems to be a good week for comparative Berber linguistics - the day's haul is worth sharing:

Maarten Kossmann has uploaded his preliminary classification of Berber varieties based on shared innovations: Berber subclassification (preliminary version). He divides Berber into seven blocks:

  1. Zenaga block (Zenaga of Mauritania, Tetserrét in Niger)
  2. Tuareg block
  3. Western Moroccan block (SW Morocco, Central Morocco, i.e. Tashelhiyt and most of Tamazight)
    possibly including NW Moroccan Berber (Ghomara, Senhadja de Sraïr)
  4. Zenatic block (Eastern Morocco, Western Algeria, Saharan oases, Tunisia, Zuara) extending towards the east with Sokna, Elfoqaha, Siwa
  5. Kabyle (N Algeria), possibly linked to the western Moroccan block
  6. Ghadames (Libya), probably to be linked to Djebel Nefusa (Libya)
  7. Awdjilah (Libya)
By and large, this appears very plausible, although it should be noted that Tunisian Berber and Zuwara are already somewhat peripheral to Zenati, not sharing western Zenati's innovative distribution of initial vowel dropping, and El-Fogaha is even more so than Siwa or Sokna. (As he notes, the much greater homogeneity and clearer boundaries of Zenati in the west imply that this group arrived in Algeria and Morocco from the east.) But, in principle, it is still necessary to identify specific innovations characteristic of each of these groups. It is also clear that the Zenaga block is by far the first split on the tree, and the list ought ideally to reflect that. But the moderately high degree of mutual intelligibility poses serious obstacles to applying the family tree model to Berber, as he discusses.

The most interesting Kabyle varieties for historical reconstruction are the little-known ones of the extreme east, "Tasahlit". As it happens, Abdelaziz Berkai has just uploaded his recent thesis, a dictionary and sketch grammar of the Tasahlit of Aokas: Essai d’élaboration d’un dictionnaire Tasaḥlit (parler d’Aokas)-français. The quality of his work appears excellent, and this will no doubt be a very useful resource. The choice of dialect, however, is not entirely ideal. It is clear from Basset's dialect atlas, and from the all too rare comments in Rabdi's grammar on neighbouring varieties, that the vocabulary of Aokas is still quite close to that of Bejaia; the really divergent varieties seem to be those of the Babor Mountains and Oued el Bared, approaching Jijel, and those are the ones most likely to give an insight into the dialect of the now largely Arabised Kutama.

I haven't yet had time to properly look at Samir Ben Si Said's thesis, De la nature de la variation diatopique en kabyle: étude de la formation des singulier et pluriel nominaux, but it tackles the synchronically as well as diachronically thorny problem of Berber non-concatenative morphology, and argues for an approach based more on roots than on stems, contrasting with another important study I've been working through lately, Heath's Grammar of Tamashek (Tuareg of Mali).

Tuesday, October 21, 2014

Subject-verb order in Tumzabt

Going through Brahim and Bekir Abdessalam's brief grammar of Tumzabt Berber (الوجيز في قواعد الكتابة والنحو الأمازيغية "المزابية": الجزء الأول) recently, I was struck by their discussion of the problem of subject-verb order. Berber in general allows both verb-subject and subject-verb order, with the case ("state") of the subject depending on which order is used. Determining which order is used under which circumstances, however, poses some difficulties; the same language may be described as VSO or SVO, depending on who you ask, and the determining factors certainly differ from one variety to another (cf. eg Mettouchi fc for Kabyle). Their take on the problem combines information structure with pragmatics and verbal mood. The latter two factors can very likely be reduced to information structure too, but that would require testing; in any case, the observation that VS order is required for serialization is interesting. Here's what they had to say, translated into English (pp. 129-130):

We observe that in the first set of examples, the subject precedes the verb; this is the usual form in an Amazigh clause consisting of a verb and a subject.

In the second set of examples, the subject follows the verb. This happens in the following cases:

  1. The subject may follow the verb when it is specific and known to the speaker and listener because there is a connection between speaking of it and a previous expression involving speaking of the same subject. For instance:

    twelleh! afunas-nni yetthaḍa - Watch out, that bull rampages.

    After the two parties have parted, they meet again the next day, and one says to the other:
    yak yhaḍ ufunas ay-tessečned asennaṭṭ! - Indeed that bull you showed me yesterday really did rampage!

    Here, the subject - the bull - is specific for both parties to the conversation in the second usage, since it had been spoken of earlier.

  2. For the sake of irony, which can only be deduced from the context surrounding this expression and from the circumstances of discourse, eg if we say:

    tiɣawsiwin-ess tqimant-edd ɣel wezğen, drus mi yefra igget, ay-tinid : yebṛem werğaz ! - His affairs stay half-done, rarely does he resolve even one, and you tell me: he's a careful man!

  3. The subject may follow the verb obligatorily in the serial aorist, eg:

    yuli tazdayt yuḍa-y-as wemjer - He climbed the date palm and the sickle fell from him [and dropped the sickle].

    It may also occur directly following the verb in the future tense aorist, eg:

    ad tatef teğrest ad yireḍ isemmuṛa n tḍuft or tağrest ad tatef ad yireḍ isemmuṛa n tḍuft - When winter comes, woolen clothes are worn.

They follow this up with an observation that seems quite astonishing from a comparative Berber perspective (p. 131):

A subject following the verb is put in the construct state if definite, this being the normal case for the postverbal subject, and is put in the free state if indefinite without any need for the [indefinite] article iggen / igget ["one"].

Unfortunately, they provide no examples to illustrate this claim.

Saturday, September 20, 2014

Néologismes en n- en berbère siwi

(experimentally posting in French - opinions?)

Très tard, j'ai commencé cet été à mieux organiser mes notes léxicographiques sur le berbère siwi d'Egypte. Ayant atteint 2300 mots après avoir transcrit trois carnets, je prend une pause pour donner une observation qui pourrait être utile un jour à l'aménagement linguistique, si ce dernier est envisageable pour un parler aussi minoritaire ... Pour former les noms déverbaux, le berbère siwi d'Egypte utilise souvent une stratégie analytique assez différente des stratégies morphologiques préférées ailleurs en berbère : la particule du génitif, n, + le nom verbal. J'en ai neuf exemples clairs, pour ne pas parler d'autres cas plus opaques. Le nom peut être le complément du verbe :

  • ačču manger : n-ačču nourriture
  • aknaf rôtir : n-aknaf viscère / aubergine rôti
  • alessa se vêtir : n-alessa vêtements
  • tiswi boire : n-tiswi boisson
ou bien l'instrument pour faire l'action du verbe:
  • ančlaħ glisser : n-ančlaħ planche de dune
  • asebded arrêter : n-asebded bouton d'arrêt
  • aṣṣey tenir : n-aṣṣey poignée
  • azerzi chasser (les mouches) : n-azerzi chasse-mouche
ou même, plus rarement, le lieu :
  • aɛenɛen s'asseoir : n-aɛenɛen la planche transversale d'un chariot sur laquelle on s'asseoit
Comme le montrent "planche de dune" et "bouton d'arrêt", cette forme reste encore productive. La plupart des nouveautés prennent naturellement les noms arabes utilisés par leurs vendeurs, mais si les siwis voulaient adopter des formes puristes, il serait facile d'appeler, par exemple, la télé n-aẓeṛṛa - alors que, en fait, le néologisme le plus connu à Siwa, chez ceux qui s'en intéressent, est la curieuse forme elmeẓṛa, apparemment dérivée de tiliẓṛi à partir de transmission orale.

Sunday, September 14, 2014

On finding the sources of shared items, OR: The irrelevance of anteriority

Similarities between different languages are data. It's easy to come up with any of several wildly different measures of such similarities, typically by applying edit distances to wordlists (as in the ASJP*) or texts, but the result should not be mistaken for an analysis - it's just a measurement, a compression of the data. It doesn't tell you anything about the causes of these similarities on its own. Historical linguistics is not the measurement of similarities, but the effort to find the hypothesis about past events that best explains them. Your H0, of course, is always "coincidence". Once you've rejected that, you're left with the trickier task of disentangling contact from common ancestry - trickier because, quite often, they partially overlap.

To understand linguistic causation in the past, an essential starting point is to look at it in the present. Suppose that you are a native speaker of English:

  1. If you say "football" or "garage" to your child while speaking English, it's because you grew up speaking English, and you know that this is what other English speakers say. The fact that French speakers happen to call it "football" too, if you're even aware of it, has nothing to do with your choice of words.
  2. If you say "football" or "garage" to your child while speaking French, it's because you later studied French, and you know that this is what French speakers say. The fact that it's also what English speakers say no doubt made it easier to memorise, but if French speakers had named them something else, you would be doing the same.

We thus see that, for shared words, inheritance from either of two radically different languages can yield precisely the same outcome. The fact that English and French share these words in the first place is obviously due to contact (in each direction). The fact that your child is growing up with them, however, is because you're faithfully passing on the existing norms of one or the other language, not because you're combining them. In historical linguistic jargon, the use of the word "football" is at this point being inherited, not borrowed. Thus, if an English-monolingual Cajun says "stupid", it's not because he's managed to hold on to his ancestors' French word "stupide", it's because that happens to be the English word for it.

So, if we have a word in language A, and find the same word in two potential source languages B and C, we can't determine which it came from by looking at which language was spoken in the area earlier, or which was spoken by the speakers' ancestors. We can only determine which it came from by determining which language (if either) was transmitted as a whole, and the evidence for that can only come from forms that aren't shared between B and C. I leave the application of this to Levantine ʕāmmiyya as an exercise for the reader.


* It's beating a dead horse at this point, but: this Automated Similarity Judgement Program? It, too, finds that Levantine is way closer to Standard Arabic than to Aramaic, just like any historical linguist could have told you from the start.

Saturday, September 13, 2014

Zombie hypotheses and the Zeitgeist

Everything I've been saying for the past 3 posts is basic textbook stuff, reflecting a stable consensus among Semitic historical linguists over, oh, the past two centuries or so. Why, then, is this zombie hypothesis that Levantine Arabic comes from Aramaic still popular in parts of the Levant? That's no great mystery: it comes from a more general movement to emphasise Levantine (and especially Lebanese) culture's continuity with the pre-Islamic Levant, and downplay the influence of Arabs. (Similar efforts have been made in North Africa, notably Abdou Elimam). As far as I can tell, the unstated reasoning goes something like this:
  1. Levantines are descended from the Aramaic-speaking natives of the land, not from Arab immigrants.
  2. Levantines' language contains a lot that sounds like Aramaic.
  3. Therefore, Levantine is a continuation of Aramaic, not of Arabic.

Step 3, of course, does not follow from Steps 1 and 2. Step 1 is irrelevant to the whole question; the language of your ancestors is very often not the ancestor of your language (ask any Irishman, or any Egyptian). Step 2 is necessary but insufficient for getting to Step 3, since the statement is just as true of Classical Arabic - or of Akkadian, or Ethiopic - as it is of Levantine; we've already seen that deciding linguistic ancestry requires a more sophisticated toolkit.

Nevertheless, this impulse to emphasise continuity and downplay movement deserves more attention. In the Arabic-speaking world, the conspicuous problems with the existing political and economic order, and the humiliating contrasts between the ideals of pan-Arabism and the reality of closed borders and unchallenged occupations, provide an obvious local motivation to downplay Arab identity, and language is so central to pan-Arab identity that it could hardly be left unchallenged. But the impulse is not unique to the region; in some respects, it faithfully reflects wider intellectual trends of the late 20th/early 21st century.

During this era, immediately following some of the largest migrations and invasions in human history, many archeologists and historians have come to feel more and more uncomfortable with the very idea of either. Changes in material culture previously seen as the result of migration were re-explained as diffusion or independent innovation, and reports of barbarian invasions were reinterpreted or dismissed. In some ways, this has been a useful corrective to a previous era's overemphasis on migration; it has arguably made linguists more conscious of the familiar fact that language shift does not necessarily imply invasion, much less population replacement. In others, its influence has been rather less helpful. Linguists reached the late 20th century with a well-tested toolkit for studying the origins of basic vocabulary and morphology, its predictions spectacularly confirmed by such discoveries as laryngeals in Hittite and labiovelars in Mycenaean Greek. Applying this to most Old World languages, and many American or Australian ones, yields a story of discontinuity (be it through language shift or population replacement) that would be familiar to any 19th-century philologist, but that grates somewhat on postmodern ears. Of course, the same toolkit often allows us to detect substrata - elements left over from the population's previous language after they shifted to another one - but that's not enough to satisfy everybody.

A few linguists have responded by trying to change the rules of the game, insisting that the origins of a language should be determined not by vocabulary and morphology, as is normally done, but by purely structural features. This is an important component of Wexler's generally rejected claims that Yiddish is non-Germanic (and that Modern Hebrew is non-Semitic), and is the very essence of Lefebvre's somewhat more popular claims that Haitian Creole is just relexified Fongbe (and almost anything else with "relexification" in the title.) This approach runs into severe problems almost instantly - establishing the history of syntactic or semantic patterns is far more difficult than establishing the history of vocabulary or morphology, simply because the former are far less arbitrary and are chosen from a far smaller set of possibilities. To make matters worse, we also find major discontinuities in such patterns in cases where both the population and the vocabulary were relatively stable, such as the transition from Old English to Modern English. Johanna Nichols' efforts point towards the possibility of getting around this by identifying highly time-stable typological features, but the results, at their best, are not nearly fine-grained enough to support narratives of continuity in any specific location. "Continuitarians" in the Arab world apparently haven't gotten around to adopting this approach yet, except occasionally in Morocco, where academic linguistics is unusually advanced for the region; they surely will, however, when they realise that it could be extended to cases like Egypt, rather than being limited to the Fertile Crescent.

For much of the world, especially Europe, a complete lack of ancient written documentation makes another response available: simply argue that the language currently spoken there must have been spoken far earlier than previously assumed, and hence got there not through invasion but through some more peaceful process. This yields the various Paleolithic Continuity Hypotheses. The main problem with this for linguists is that it forces us to postulate a much lower rate of linguistic change for the past than is observed for languages with a long written history, or even for unwritten languages that happen to have been recorded as long intervals; as a result, these hypotheses have remained fairly unpopular. For the Middle East, however, the point is moot: writing has a longer history there than anywhere else on the planet, and that history reveals regular episodes of language extinction, language shift, invasion, migration, exile, and everything else that we're supposed to be de-emphasising.

So if you really want to emphasise your languages' continuity with your ancestors', these are two more promising ways to do it. But I would suggest that there's no reason to bother. If your current identity isn't working out for you, and you don't think you can reform it, why not work on creating a genuinely new one, rather than perpetuating the obsession with heritage by digging around in history for an even older one? It worked out pretty well for America, after all.

Thursday, September 11, 2014

Why "Levantine" is Arabic, not Aramaic: Part 3

We've seen that historical linguists decide which languages share a more recent common ancestor on the basis of shared innovations (or their absence). But if you're paying attention, you may have noticed a potential problem here: innovations can be shared for at least three reasons:
  • Common ancestry - the reason why, for example, Proto-Indo-European intervocalic *s has changed to r both in Spanish and in French.
  • Contact - for example, the change of r (the rolled r you get in Spanish) to R (the uvular r you get in French) started in French, but spread to other European languages such as German, probably due to the prestige of French among the upper classes (actually there's some debate about the direction of spread - see eg this paper by Kostakis - but either way it spread through contact)
  • Chance - for example, θ (th) has changed to t both in Jamaican English and in Levantine, but not because they share any common history or close ties.

So, when it comes to shared innovations, what can we do to distinguish the "confounding factors" of chance and contact from common ancestry? There are two obvious general approaches. The most securely reliable is to establish relative chronology: if change A was applied to the outputs of change B, then obviously change B is the older. Unfortunately, many pairs of changes are commutative - the relative order makes no difference to the output. That often forces us to resort to the more probabilistic criterion of number of changes: if language A shares a lot of common innovations with language B to the exclusion of C, and only a couple with language C to the exclusion of B, then it's more parsimonious to group A with B and find some other explanation for those shared with C. For better results, we can weight the innovations according to the chances of them occurring independently: for example, a change of ð > d is rather common worldwide, whereas a change of ɬʼ > ʕ is rather unusual.

Levantine Arabic provides a useful case study: as NNT correctly pointed out, it shares a couple of innovative sound changes with Aramaic, in particular θ (th) > t, ð (dh) > d. (The hamza-y correspondence is a different issue - there's massive variation within Classical Arabic on where and whether hamza is realised, as can be seen from the different Qur'an reading traditions, and the consonantal orthography of Classical Arabic obviously reflects a dialect in which, like the majority of present-day dialects but unlike Modern Standard, hamza was hardly ever pronounced). Yet we have seen that Levantine Arabic does not share most of Aramaic's defining innovations, and does share important innovations of Arabic, such as the reflexes of proto-Semitic *g, *θʼ, *ɬʼ, and (depending on reconstruction) , the replacement of "say" (originally 'amar-) with qāl-, the metathesis of ʕam- "with" to maʕ-, or almost every detail of the extremely intricate broken plural system. How can this be explained?

If the explanation is common ancestry, then we should find the changes θ > t, ð > d only in Levantine words that are not Arabic innovations. In fact, however, we find them in words such as itnēn "two", in which the i- is an Arabic innovation - cp. Arabic iθnayni (acc/gen), Aramaic trēn, proto-Semitic *θn-ay-n(a). This hypothesis would also fail to account for the rest of the observations; if Levantine shares a more recent common ancestry with Aramaic than with Arabic, and is spoken exclusively in an area once dominated by Aramaic, then why on earth did it pick up so many innovations from Arabic while remaining immune to practically all the innovations Aramaic went through except these two? Both the criteria given above therefore point away from common ancestry as an explanation.

This suggests that we should consider contact. At first sight, you might think the answer is simple: Aramaic speakers couldn't pronounce interdentals, so they left them out of their Aramaic-accented Arabic. But that hypothesis would be absurd. By the late pre-Islamic era, all known varieties of Aramaic did in fact have the sounds θ and ð, due to a later development of t > θ, d > ð after vowels (except when doubled). We find these sounds alive and well in the only surviving Levantine Aramaic dialect, that of Maaloula: eg xoθla "wall", ḳrīθa "village", eḥða "one (f.)". Why, then, would Aramaic speakers change these sounds to t, d in Arabic?

How about the opposite contact situation: Arabic speakers living on the fringes of the Aramaic-speaking world copied the shift θ > t, ð > d from their neighbours, while those living further inland stuck with the traditional pronunciation? That is more plausible, but still a bit problematic. The development of t > θ, d > ð had already happened by 250 BC in Aramaic, so the shift would have to have been borrowed before that; but Arabic-speaking groups which used Aramaic as their high language, such as the Nabataeans or Petra, are only well-attested later than that.

A third, more subtle contact explanation seems preferable. Aramaic speakers would certainly have taken advantage of the many similarities between Aramaic and Arabic to reduce the burden on their memories. But, whereas θ and ð are extremely common in Aramaic, in Arabic they are quite rare: in the Qur'ān, t is ten times commoner than θ, and while ð is about as common as d overall, practically all of its occurrences are limited to demonstratives. A good rule of thumb for the Aramaic learner of Arabic to apply would therefore be "replace Aramaic θ, ð with t, d except in demonstratives"; 9 times out of 10, the result would be correct Arabic, and the 10th time it would still be comprehensible. In such an environment, where Aramaic-speaking learners of Arabic outnumbered native speakers, it's not hard to imagine the distinction disappearing. If so, the loss of interdentals in Levantine would indeed reflect Aramaic influence - as a result of Aramaic speakers' effort to avoid Aramaic forms!