Jabal al-Lughat

Thursday, October 30, 2014

Some Tuareg-Songhay loans

I'm almost three-quarters of the way through Heath's Grammar of Tamashek (Tuareg of Mali). The main interest lies in its efforts to reduce the bewildering complexity of Tuareg morphology to some sort of order, an impossible task which it accomplishes more successfully than any other Tuareg grammar I've looked at so far. Aside from this, however, it's raised some interesting etymological issues.

I've wondered for years where the Korandjé verb wəy "gather (firewood)" comes from. It normally appears in the idiom a-wwəy-ts skudzi [3Sg-gather-hither wood] "she gathered in firewood". On p. 333 of Heath's grammar, I found the explanation, in the following example:

i-wwáy=ədd i-sǽɣer-æn
3MaSgS-bring.Reslt-Centrip Pl-firewood-MaPl
[He] has brought firewood here.

The Tamasheq verb in question, awəy in the imperative, is simply the normal Berber word for "take, bring" (which in Korandjé is expressed with a Songhay verb, zəw), so I would have hesitated to connect them based on a dictionary entry alone. But given this attested usage with "firewood", the semantic specialisation poses no problems. What does surprises me is that it was borrowed as a bare stem, rather than with a fossilised 3rd person prefix y/i - contrast yəf (Tashelhiyt y-arf "roast", not attested in Tamasheq), ikna "make" (Tamasheq i-kna). Usually, only stems that start with a syllabic onset are borrowed into Korandjé without the y/i.

Another probable loan into Korandjé that I noticed going through the grammar is Korandjé ləwləw "shine, gleam" - cp. Tamasheq m̀ələwləw "shine".

However, a number of words have gone the other way - from Songhay into Tuareg. Heath comments on many of these in his dictionary (eg kə̀rikəw "practice sorcery"), but not all. One that struck me is the verb ḍùkr-æt "become angry at", obviously related to Gao Songhay dukur "be angry"; I don't recall seeing this verb elsewhere in Berber (not even in Alojaly's dictionary of Tamajeq), whereas it's widespread in Songhay.

Obviously cognate are Tamasheq é-tæqq "male ostrich" and widespread Songhay forms such as Gao taatagey, Fulan Kirya taataɣey "ostrich" (the shift of g to ɣ next to non-high back vowels is regular in several Songhay varieties, and in Tamasheq qq is the geminate equivalent of ɣ). The word is generic in Songhay but specific in Tuareg - the opposite of what we saw with "bring" - which suggests to me that it was borrowed into the latter, as does the fact that I don't find the term in Alojaly's Tamajeq dictionary. However, since ostriches are extinct in most Berber-speaking areas, it's difficult to prove the direction of borrowing.

Thursday, October 23, 2014

Berber: classification, Tasahlit, roots vs. stems

Today seems to be a good week for comparative Berber linguistics - the day's haul is worth sharing:

Maarten Kossmann has uploaded his preliminary classification of Berber varieties based on shared innovations: Berber subclassification (preliminary version). He divides Berber into seven blocks:

Zenaga block (Zenaga of Mauritania, Tetserrét in Niger)
Tuareg block
Western Moroccan block (SW Morocco, Central Morocco, i.e. Tashelhiyt and most of Tamazight)
possibly including NW Moroccan Berber (Ghomara, Senhadja de Sraïr)
Zenatic block (Eastern Morocco, Western Algeria, Saharan oases, Tunisia, Zuara) extending towards the east with Sokna, Elfoqaha, Siwa
Kabyle (N Algeria), possibly linked to the western Moroccan block
Ghadames (Libya), probably to be linked to Djebel Nefusa (Libya)
Awdjilah (Libya)

By and large, this appears very plausible, although it should be noted that Tunisian Berber and Zuwara are already somewhat peripheral to Zenati, not sharing western Zenati's innovative distribution of initial vowel dropping, and El-Fogaha is even more so than Siwa or Sokna. (As he notes, the much greater homogeneity and clearer boundaries of Zenati in the west imply that this group arrived in Algeria and Morocco from the east.) But, in principle, it is still necessary to identify specific innovations characteristic of each of these groups. It is also clear that the Zenaga block is by far the first split on the tree, and the list ought ideally to reflect that. But the moderately high degree of mutual intelligibility poses serious obstacles to applying the family tree model to Berber, as he discusses.

The most interesting Kabyle varieties for historical reconstruction are the little-known ones of the extreme east, "Tasahlit". As it happens, Abdelaziz Berkai has just uploaded his recent thesis, a dictionary and sketch grammar of the Tasahlit of Aokas: Essai d’élaboration d’un dictionnaire Tasaḥlit (parler d’Aokas)-français. The quality of his work appears excellent, and this will no doubt be a very useful resource. The choice of dialect, however, is not entirely ideal. It is clear from Basset's dialect atlas, and from the all too rare comments in Rabdi's grammar on neighbouring varieties, that the vocabulary of Aokas is still quite close to that of Bejaia; the really divergent varieties seem to be those of the Babor Mountains and Oued el Bared, approaching Jijel, and those are the ones most likely to give an insight into the dialect of the now largely Arabised Kutama.

I haven't yet had time to properly look at Samir Ben Si Said's thesis, De la nature de la variation diatopique en kabyle: étude de la formation des singulier et pluriel nominaux, but it tackles the synchronically as well as diachronically thorny problem of Berber non-concatenative morphology, and argues for an approach based more on roots than on stems, contrasting with another important study I've been working through lately, Heath's Grammar of Tamashek (Tuareg of Mali).

Tuesday, October 21, 2014

Subject-verb order in Tumzabt

Going through Brahim and Bekir Abdessalam's brief grammar of Tumzabt Berber (الوجيز في قواعد الكتابة والنحو الأمازيغية "المزابية": الجزء الأول) recently, I was struck by their discussion of the problem of subject-verb order. Berber in general allows both verb-subject and subject-verb order, with the case ("state") of the subject depending on which order is used. Determining which order is used under which circumstances, however, poses some difficulties; the same language may be described as VSO or SVO, depending on who you ask, and the determining factors certainly differ from one variety to another (cf. eg Mettouchi fc for Kabyle). Their take on the problem combines information structure with pragmatics and verbal mood. The latter two factors can very likely be reduced to information structure too, but that would require testing; in any case, the observation that VS order is required for serialization is interesting. Here's what they had to say, translated into English (pp. 129-130):

We observe that in the first set of examples, the subject precedes the verb; this is the usual form in an Amazigh clause consisting of a verb and a subject.
In the second set of examples, the subject follows the verb. This happens in the following cases:

The subject may follow the verb when it is specific and known to the speaker and listener because there is a connection between speaking of it and a previous expression involving speaking of the same subject. For instance:
twelleh! afunas-nni yetthaḍa - Watch out, that bull rampages.
After the two parties have parted, they meet again the next day, and one says to the other:
yak yhaḍ ufunas ay-tessečned asennaṭṭ! - Indeed that bull you showed me yesterday really did rampage!
Here, the subject - the bull - is specific for both parties to the conversation in the second usage, since it had been spoken of earlier.
For the sake of irony, which can only be deduced from the context surrounding this expression and from the circumstances of discourse, eg if we say:
tiɣawsiwin-ess tqimant-edd ɣel wezğen, drus mi yefra igget, ay-tinid : yebṛem werğaz ! - His affairs stay half-done, rarely does he resolve even one, and you tell me: he's a careful man!
The subject may follow the verb obligatorily in the serial aorist, eg:
yuli tazdayt yuḍa-y-as wemjer - He climbed the date palm and the sickle fell from him [and dropped the sickle].
It may also occur directly following the verb in the future tense aorist, eg:
ad tatef teğrest ad yireḍ isemmuṛa n tḍuft or tağrest ad tatef ad yireḍ isemmuṛa n tḍuft - When winter comes, woolen clothes are worn.

They follow this up with an observation that seems quite astonishing from a comparative Berber perspective (p. 131):

A subject following the verb is put in the construct state if definite, this being the normal case for the postverbal subject, and is put in the free state if indefinite without any need for the [indefinite] article iggen / igget ["one"].

Unfortunately, they provide no examples to illustrate this claim.

Saturday, September 20, 2014

Néologismes en n- en berbère siwi

(experimentally posting in French - opinions?)

Très tard, j'ai commencé cet été à mieux organiser mes notes léxicographiques sur le berbère siwi d'Egypte. Ayant atteint 2300 mots après avoir transcrit trois carnets, je prend une pause pour donner une observation qui pourrait être utile un jour à l'aménagement linguistique, si ce dernier est envisageable pour un parler aussi minoritaire ... Pour former les noms déverbaux, le berbère siwi d'Egypte utilise souvent une stratégie analytique assez différente des stratégies morphologiques préférées ailleurs en berbère : la particule du génitif, n, + le nom verbal. J'en ai neuf exemples clairs, pour ne pas parler d'autres cas plus opaques. Le nom peut être le complément du verbe :

ačču manger : n-ačču nourriture
aknaf rôtir : n-aknaf viscère / aubergine rôti
alessa se vêtir : n-alessa vêtements
tiswi boire : n-tiswi boisson

ou bien l'instrument pour faire l'action du verbe:

ančlaħ glisser : n-ančlaħ planche de dune
asebded arrêter : n-asebded bouton d'arrêt
aṣṣey tenir : n-aṣṣey poignée
azerzi chasser (les mouches) : n-azerzi chasse-mouche

ou même, plus rarement, le lieu :

aɛenɛen s'asseoir : n-aɛenɛen la planche transversale d'un chariot sur laquelle on s'asseoit

Comme le montrent "planche de dune" et "bouton d'arrêt", cette forme reste encore productive. La plupart des nouveautés prennent naturellement les noms arabes utilisés par leurs vendeurs, mais si les siwis voulaient adopter des formes puristes, il serait facile d'appeler, par exemple, la télé n-aẓeṛṛa - alors que, en fait, le néologisme le plus connu à Siwa, chez ceux qui s'en intéressent, est la curieuse forme elmeẓṛa, apparemment dérivée de tiliẓṛi à partir de transmission orale.

Sunday, September 14, 2014

On finding the sources of shared items, OR: The irrelevance of anteriority

Similarities between different languages are data. It's easy to come up with any of several wildly different measures of such similarities, typically by applying edit distances to wordlists (as in the ASJP*) or texts, but the result should not be mistaken for an analysis - it's just a measurement, a compression of the data. It doesn't tell you anything about the causes of these similarities on its own. Historical linguistics is not the measurement of similarities, but the effort to find the hypothesis about past events that best explains them. Your H₀, of course, is always "coincidence". Once you've rejected that, you're left with the trickier task of disentangling contact from common ancestry - trickier because, quite often, they partially overlap.

To understand linguistic causation in the past, an essential starting point is to look at it in the present. Suppose that you are a native speaker of English:

If you say "football" or "garage" to your child while speaking English, it's because you grew up speaking English, and you know that this is what other English speakers say. The fact that French speakers happen to call it "football" too, if you're even aware of it, has nothing to do with your choice of words.
If you say "football" or "garage" to your child while speaking French, it's because you later studied French, and you know that this is what French speakers say. The fact that it's also what English speakers say no doubt made it easier to memorise, but if French speakers had named them something else, you would be doing the same.

We thus see that, for shared words, inheritance from either of two radically different languages can yield precisely the same outcome. The fact that English and French share these words in the first place is obviously due to contact (in each direction). The fact that your child is growing up with them, however, is because you're faithfully passing on the existing norms of one or the other language, not because you're combining them. In historical linguistic jargon, the use of the word "football" is at this point being inherited, not borrowed. Thus, if an English-monolingual Cajun says "stupid", it's not because he's managed to hold on to his ancestors' French word "stupide", it's because that happens to be the English word for it.

So, if we have a word in language A, and find the same word in two potential source languages B and C, we can't determine which it came from by looking at which language was spoken in the area earlier, or which was spoken by the speakers' ancestors. We can only determine which it came from by determining which language (if either) was transmitted as a whole, and the evidence for that can only come from forms that aren't shared between B and C. I leave the application of this to Levantine ʕāmmiyya as an exercise for the reader.

* It's beating a dead horse at this point, but: this Automated Similarity Judgement Program? It, too, finds that Levantine is way closer to Standard Arabic than to Aramaic, just like any historical linguist could have told you from the start.

Saturday, September 13, 2014

Zombie hypotheses and the Zeitgeist

Everything I've been saying for the past 3 posts is basic textbook stuff, reflecting a stable consensus among Semitic historical linguists over, oh, the past two centuries or so. Why, then, is this zombie hypothesis that Levantine Arabic comes from Aramaic still popular in parts of the Levant? That's no great mystery: it comes from a more general movement to emphasise Levantine (and especially Lebanese) culture's continuity with the pre-Islamic Levant, and downplay the influence of Arabs. (Similar efforts have been made in North Africa, notably Abdou Elimam). As far as I can tell, the unstated reasoning goes something like this:

Levantines are descended from the Aramaic-speaking natives of the land, not from Arab immigrants.
Levantines' language contains a lot that sounds like Aramaic.
Therefore, Levantine is a continuation of Aramaic, not of Arabic.

Step 3, of course, does not follow from Steps 1 and 2. Step 1 is irrelevant to the whole question; the language of your ancestors is very often not the ancestor of your language (ask any Irishman, or any Egyptian). Step 2 is necessary but insufficient for getting to Step 3, since the statement is just as true of Classical Arabic - or of Akkadian, or Ethiopic - as it is of Levantine; we've already seen that deciding linguistic ancestry requires a more sophisticated toolkit.

Nevertheless, this impulse to emphasise continuity and downplay movement deserves more attention. In the Arabic-speaking world, the conspicuous problems with the existing political and economic order, and the humiliating contrasts between the ideals of pan-Arabism and the reality of closed borders and unchallenged occupations, provide an obvious local motivation to downplay Arab identity, and language is so central to pan-Arab identity that it could hardly be left unchallenged. But the impulse is not unique to the region; in some respects, it faithfully reflects wider intellectual trends of the late 20th/early 21st century.

During this era, immediately following some of the largest migrations and invasions in human history, many archeologists and historians have come to feel more and more uncomfortable with the very idea of either. Changes in material culture previously seen as the result of migration were re-explained as diffusion or independent innovation, and reports of barbarian invasions were reinterpreted or dismissed. In some ways, this has been a useful corrective to a previous era's overemphasis on migration; it has arguably made linguists more conscious of the familiar fact that language shift does not necessarily imply invasion, much less population replacement. In others, its influence has been rather less helpful. Linguists reached the late 20th century with a well-tested toolkit for studying the origins of basic vocabulary and morphology, its predictions spectacularly confirmed by such discoveries as laryngeals in Hittite and labiovelars in Mycenaean Greek. Applying this to most Old World languages, and many American or Australian ones, yields a story of discontinuity (be it through language shift or population replacement) that would be familiar to any 19th-century philologist, but that grates somewhat on postmodern ears. Of course, the same toolkit often allows us to detect substrata - elements left over from the population's previous language after they shifted to another one - but that's not enough to satisfy everybody.

A few linguists have responded by trying to change the rules of the game, insisting that the origins of a language should be determined not by vocabulary and morphology, as is normally done, but by purely structural features. This is an important component of Wexler's generally rejected claims that Yiddish is non-Germanic (and that Modern Hebrew is non-Semitic), and is the very essence of Lefebvre's somewhat more popular claims that Haitian Creole is just relexified Fongbe (and almost anything else with "relexification" in the title.) This approach runs into severe problems almost instantly - establishing the history of syntactic or semantic patterns is far more difficult than establishing the history of vocabulary or morphology, simply because the former are far less arbitrary and are chosen from a far smaller set of possibilities. To make matters worse, we also find major discontinuities in such patterns in cases where both the population and the vocabulary were relatively stable, such as the transition from Old English to Modern English. Johanna Nichols' efforts point towards the possibility of getting around this by identifying highly time-stable typological features, but the results, at their best, are not nearly fine-grained enough to support narratives of continuity in any specific location. "Continuitarians" in the Arab world apparently haven't gotten around to adopting this approach yet, except occasionally in Morocco, where academic linguistics is unusually advanced for the region; they surely will, however, when they realise that it could be extended to cases like Egypt, rather than being limited to the Fertile Crescent.

For much of the world, especially Europe, a complete lack of ancient written documentation makes another response available: simply argue that the language currently spoken there must have been spoken far earlier than previously assumed, and hence got there not through invasion but through some more peaceful process. This yields the various Paleolithic Continuity Hypotheses. The main problem with this for linguists is that it forces us to postulate a much lower rate of linguistic change for the past than is observed for languages with a long written history, or even for unwritten languages that happen to have been recorded as long intervals; as a result, these hypotheses have remained fairly unpopular. For the Middle East, however, the point is moot: writing has a longer history there than anywhere else on the planet, and that history reveals regular episodes of language extinction, language shift, invasion, migration, exile, and everything else that we're supposed to be de-emphasising.

So if you really want to emphasise your languages' continuity with your ancestors', these are two more promising ways to do it. But I would suggest that there's no reason to bother. If your current identity isn't working out for you, and you don't think you can reform it, why not work on creating a genuinely new one, rather than perpetuating the obsession with heritage by digging around in history for an even older one? It worked out pretty well for America, after all.

Thursday, September 11, 2014

Why "Levantine" is Arabic, not Aramaic: Part 3

We've seen that historical linguists decide which languages share a more recent common ancestor on the basis of shared innovations (or their absence). But if you're paying attention, you may have noticed a potential problem here: innovations can be shared for at least three reasons:

Common ancestry - the reason why, for example, Proto-Indo-European intervocalic *s has changed to r both in Spanish and in French.
Contact - for example, the change of r (the rolled r you get in Spanish) to R (the uvular r you get in French) started in French, but spread to other European languages such as German, probably due to the prestige of French among the upper classes (actually there's some debate about the direction of spread - see eg this paper by Kostakis - but either way it spread through contact)
Chance - for example, θ (th) has changed to t both in Jamaican English and in Levantine, but not because they share any common history or close ties.

So, when it comes to shared innovations, what can we do to distinguish the "confounding factors" of chance and contact from common ancestry? There are two obvious general approaches. The most securely reliable is to establish relative chronology: if change A was applied to the outputs of change B, then obviously change B is the older. Unfortunately, many pairs of changes are commutative - the relative order makes no difference to the output. That often forces us to resort to the more probabilistic criterion of number of changes: if language A shares a lot of common innovations with language B to the exclusion of C, and only a couple with language C to the exclusion of B, then it's more parsimonious to group A with B and find some other explanation for those shared with C. For better results, we can weight the innovations according to the chances of them occurring independently: for example, a change of ð > d is rather common worldwide, whereas a change of ɬʼ > ʕ is rather unusual.

Levantine Arabic provides a useful case study: as NNT correctly pointed out, it shares a couple of innovative sound changes with Aramaic, in particular θ (th) > t, ð (dh) > d. (The hamza-y correspondence is a different issue - there's massive variation within Classical Arabic on where and whether hamza is realised, as can be seen from the different Qur'an reading traditions, and the consonantal orthography of Classical Arabic obviously reflects a dialect in which, like the majority of present-day dialects but unlike Modern Standard, hamza was hardly ever pronounced). Yet we have seen that Levantine Arabic does not share most of Aramaic's defining innovations, and does share important innovations of Arabic, such as the reflexes of proto-Semitic *g, *θʼ, *ɬʼ, and (depending on reconstruction) *š, the replacement of "say" (originally 'amar-) with qāl-, the metathesis of ʕam- "with" to maʕ-, or almost every detail of the extremely intricate broken plural system. How can this be explained?

If the explanation is common ancestry, then we should find the changes θ > t, ð > d only in Levantine words that are not Arabic innovations. In fact, however, we find them in words such as itnēn "two", in which the i- is an Arabic innovation - cp. Arabic iθnayni (acc/gen), Aramaic trēn, proto-Semitic *θn-ay-n(a). This hypothesis would also fail to account for the rest of the observations; if Levantine shares a more recent common ancestry with Aramaic than with Arabic, and is spoken exclusively in an area once dominated by Aramaic, then why on earth did it pick up so many innovations from Arabic while remaining immune to practically all the innovations Aramaic went through except these two? Both the criteria given above therefore point away from common ancestry as an explanation.

This suggests that we should consider contact. At first sight, you might think the answer is simple: Aramaic speakers couldn't pronounce interdentals, so they left them out of their Aramaic-accented Arabic. But that hypothesis would be absurd. By the late pre-Islamic era, all known varieties of Aramaic did in fact have the sounds θ and ð, due to a later development of t > θ, d > ð after vowels (except when doubled). We find these sounds alive and well in the only surviving Levantine Aramaic dialect, that of Maaloula: eg xoθla "wall", ḳrīθa "village", eḥða "one (f.)". Why, then, would Aramaic speakers change these sounds to t, d in Arabic?

How about the opposite contact situation: Arabic speakers living on the fringes of the Aramaic-speaking world copied the shift θ > t, ð > d from their neighbours, while those living further inland stuck with the traditional pronunciation? That is more plausible, but still a bit problematic. The development of t > θ, d > ð had already happened by 250 BC in Aramaic, so the shift would have to have been borrowed before that; but Arabic-speaking groups which used Aramaic as their high language, such as the Nabataeans or Petra, are only well-attested later than that.

A third, more subtle contact explanation seems preferable. Aramaic speakers would certainly have taken advantage of the many similarities between Aramaic and Arabic to reduce the burden on their memories. But, whereas θ and ð are extremely common in Aramaic, in Arabic they are quite rare: in the Qur'ān, t is ten times commoner than θ, and while ð is about as common as d overall, practically all of its occurrences are limited to demonstratives. A good rule of thumb for the Aramaic learner of Arabic to apply would therefore be "replace Aramaic θ, ð with t, d except in demonstratives"; 9 times out of 10, the result would be correct Arabic, and the 10th time it would still be comprehensible. In such an environment, where Aramaic-speaking learners of Arabic outnumbered native speakers, it's not hard to imagine the distinction disappearing. If so, the loss of interdentals in Levantine would indeed reflect Aramaic influence - as a result of Aramaic speakers' effort to avoid Aramaic forms!

Tuesday, September 09, 2014

Why "Levantine" is Arabic, not Aramaic: Part 2

Last time, I promised to look at the "ratio of content ⊂ Arabic & ⊄ Aramaic". To do that, we need two things: data on the frequency of different words and morphemes, and etymologies for each word and morpheme. If this were English, I could offer you a 450-million-word online digital corpus for the former, and the OED for the latter. For Levantine Arabic the pickings are a bit scantier. There are indeed several digital corpora of Levantine Arabic, but none of them are publicly available, and none have published any frequency data that I can find offhand; and for etymologies, you have to consult, by hand, as many dictionaries (of several languages) as it takes.

So for present purposes, I will use a much smaller substitute, which can hardly be accused of any partiality to Standard Arabic: namely, a selection from ~~Said Akl's~~ Roomyo w Julyeet (CORRECTION: introduced by Said Akl), which I was lucky enough to run into at an Oxfam a few years ago. I picked a well-known section of the play whose language seemed relatively simple, with little or no visible Standard Arabic influence - the lines starting from "Romeo, Romeo, wherefore art thou Romeo?" (p. 62), including Romeo's reply and Juliet's reply to him (finishing on the second line of p. 63) - and counted morpheme frequencies (retaining his eccentric orthography). The 26 morphemes that occurred more than once account for about two-thirds of the selection, so looking at their etymologies gives us the maximum of information for the minimum of effort - and here they are. Only those that are unambiguously Arabic or unambiguously Aramaic are relevant to our purpose; the rest may be dismissed as "confounding factors":

w(e) و "and" (11 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
b(e)- / m- بـ٬ مـ [marker of the indicative imperfect] (10 occurrences): Innovative. This form is found as such neither in Classical Arabic nor in Aramaic, and its etymology poses some difficulties; if you know of any convincing work on this, let me know in the comments.
-aq ـك "you m. sg. oblique" (9 occurrences): Arabic. Both Aramaic and Arabic have cognates of this, but in Aramaic the consonant has changed to kh, whereas Levantine - like Arabic - has kept the original k.
¢esm اسم "name" (6 occurrences): Arabic. Both Aramaic and Arabic have cognates of this, but in Aramaic the consonant is sh, whereas in Levantine - as in Arabic - it's s. (There is controversy over which value is original.)
la "no, not, neither... nor" (5 occurrences): "Confounding". The form is shared identically by Arabic and Aramaic; the usage is actually closer to Arabic (where it negates verbs only in the imperfect and the negative imperative) than to Aramaic (where it negates verbs in all tenses), but we'll score it as shared.
-u / -h / -vowel length (depending on context) ـه "him, his" (5 occurrences): Arabic. Aramaic -eh could explain the h form and the vowel length form, but the -u can be satisfactorily derived only from Arabic -hu.
quun كون "be" (4 occurrences): "Confounding". In reality this is much more likely to be Arabic, since the normal Aramaic root for "be" is hwy, but kwn is attested in this sense in Aramaic too.
men من "from" (4 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
ḍall ضل "remain" (4 occurrences): Arabic. There is no Aramaic source for emphatic D.
(e)l الـ
- "the" (4 occurrences): Arabic. (Aramaic originally used suffixed -aa, which later lost its definite sense.)
- [relative marker] (3 occurrences): Innovative, but based on extending the functions of the Arabic definite article, and probably on shortening a form similar to Classical Arabic alladhii, which it resembles rather more than the Aramaic relative marker dh-.)
¢ent انت "you (m. sg.)" (3 occurrences): Arabic. In Aramaic, the n disappeared, assimilated to the following t.
ma ما "not" (3 occurrences): Arabic. In Aramaic, maa is never used as a negator.
law لو "if" (3 occurrences): Arabic. (Aramaic does not generally use this, but where traces of a cognate are found, as in some frozen combinations, it takes the form luu, not law.)
cu شو "what?" (3 occurrences): Original, from Arabic. Found as such neither in Arabic nor in Aramaic, but its generally accepted etymology is Arabic, from a contraction of أي شي هو "what thing is it?".
sammi "name (v.)" (3 occurrences): Arabic, for the same reason as esm above.
e- / Ø- أـ [first person singular subject marker] (3 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
t- تـ [second person masculine singular subject marker] (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
-ni ـني "me" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
-a ـا "her" (2 occurrences): "Confounding". At first sight the loss of the h makes it appear closer to Aramaic than to Classical Arabic - but the h was also lost in -u "him", which cannot be explained as Aramaic.
-t ـت [feminine singular construct state marker]: "Confounding". The form is compatible with Arabic or Aramaic origins (Aramaic had th, but we would expect that to be turned back into t, since Levantine has no interdentals.) The function straightforwardly existed in Aramaic; in Classical Arabic, it did not, but the pre-pausal pronunciation of -at- as -ah provides an obvious source for it to develop from, and indeed it exists in practically all modern dialects (including those of the Arabian peninsula). If you're feeling really generous, though, you might ignore the latter fact and award this one to Aramaic.
¢ana أنا "I" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
hu هو "he" (2 occurrences): "Confounding". At first sight the Aramaic form huu is closer than Classical Arabic huwa, but loss of final vowels is regular in Levantine Arabic, so you would expect huwa to become hu anyway.
ya يا "oh" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
¢aw أو "or" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
xebb حب "love" (2 occurrences): "Confounding". Shared by Arabic and Aramaic in effectively identical form.
jez¢ جزء "part" (2 occurrences): Arabic. I haven't noticed an Aramaic cognate, but even if there is one, the palatalisation of the j (from original g) marks it as Arabic.

So, out of these 26 items - which together account for 107 out of the 161 morphemes in this selection - 10 are unambiguously Arabic (accounting for 46 morphemes), and none are unambiguously Aramaic. 15 items (accounting for 91 of the morphemes) could equally well be Arabic or Aramaic, and as such are irrelevant to determining which one predominates within Lebanese Arabic. (If you decide to be really generous to Aramaic, you might shift -a, hu, and -t to the Aramaic column, accounting for a grand total of 6 morphemes versus Arabic's 46.) The remaining single item, the imperfect prefix b-, is a later innovation whose history is unclear; even if someone found an Aramaic etymology for it and added it to all the unlikely cases mentioned, the ratio of "content ⊂ Arabic & ⊄ Aramaic" to "content ⊂ Aramaic & ⊄ Arabic" for this list would still be about 3:1. On a less generous and more plausible calculation, it's infinite (46:0). Either way, by this criterion, too, Levantine is Arabic, not Aramaic.

If you pick a long enough text, of course, you will eventually find an Aramaic loan or two. There are quite a few Aramaic loans in Levantine Arabic, depending on the dialect, and they must really stand out to a Levantine speaker studying Aramaic. But even in the most heavily Aramaic-influenced dialects, they occur far less frequently than unambiguously Arabic forms. While historical linguists' usual definition of language origin does not rely on any explicit frequency criteria, in all the cases I've seen, the most frequent source of vocabulary by token count for a sufficiently large text turns out to be what historical linguists would consider as that language's parent. In Levantine Arabic the effect is even stronger, since not only is the basic vocabulary of Arabic origin, so is most of the learned vocabulary.

Now, after all those calculations, I'm sure you're eager to read the lovers' dialogue, so here it is:

جلييت: يا روميو! يا روميو! ليش انت روميو؟
نكور بيك٬ ورفود اسمك٬
أو٬ إذا ما بدك٬ حلوف إنك بتحبني
وأنا ببطل كون من عايلت كابيولت.

روميو: بضل عم بسمعا
أو بحكي معا؟

جلييت: إسمك بس عدوي.
انت، بتضل انت زاتك٬ ولو ما كنت منتغيو.
و شو المنتغيو؟ لا هو إيد ولا إجر
ولا دراع ولا وج ولا أي جزء
من جسم الإنسان؟ آه، كون اسم تاني!
و شو فيه الاسم؟ ال منسميه ورد
لو شو ما سمينا بتضل ريحتو حلوة،
و هيك روميو، لو ما تسمى روميو
كان بيضل محتفظ بهالكمال المحبوب
ال بيملكو بدون عيب. يا روميو، تجرد من اسمك،
ومقابل اسمك ال هو مش جزء منك،
خدني أنا كلي!

And in the original orthography:

Sunday, September 07, 2014

Why "Levantine" is Arabic, not Aramaic: Part 1

Following in a long tradition of people imagining that knowing a few languages or a bit of mathematics implies they already know linguistics better than any self-styled specialist, the quasi-celebrity author Nassim Nicholas Taleb recently decided to claim that "Levantine is modernized Aramaic". (Let's not comment further on the attached table, whose attempt at Standard Arabic is painfully bad, and which omits the whole Aramaic column except for the title. Also, let's not confuse it with the separate question of how distant Levantine is from Standard Arabic.) The ensuing Twitter "debate", while of little value in itself, nicely illustrates a number of common misconceptions, some of them worth responding to in a less cramped medium. I'll start with the most explicitly political one, since it's bound to colour responses to any purely academic argument:

You just call it Arabic because Arabic is used for "high" functions in the region; If we were diglossic Levantine/Aramaic instead of Levantine/Arabic you would say the same.

Less than 90 km from NNT's hometown is a village where they do in fact still speak Aramaic, while of course still being diglossic in Arabic: Maaloula, in Syria. Despite heavy Arabic influence, this village's language has never once been mistaken for Arabic; its own people call it siryêni, and European Semitists recognised it as Aramaic as soon as even simple wordlists became available. If you happen to be Levantine, try listening to some of it (eg here) - how much of that do you understand? The same is true of other relict Semitic languages within the Arabic-speaking world, such as Mehri or Jibbali or Soqotri or Neo-Mandaic. I have more than one book in which Soqotri or Jibbali speakers attempt to prove that their languages are really Arabic, for much the same reasons that NNT wants his language not to be Arabic - but, notwithstanding the speakers' desires, Semitists had no trouble proving that these languages were not descended from Arabic. Conversely, the "high" languages of Malta have always been English and Italian, yet, despite Maltese nationalists' best efforts to show that Maltese was really Punic, European Semitists had no difficulty in identifying it as descended from Arabic. So, no, Semitic historical linguists do not base their decisions on what kind of diglossia happens to be around, nor were all those 19th-century German Orientalists secret agents sent back in time by the Baath Party. To the contrary, almost all Semitists I've known would be far more excited to discover that some undocumented variety was a new Semitic language than to find out that it was "just" another dialect of Arabic.

"Proving" Levantine comes from Arabic rather than Aramaic like "proving" Spanish comes from Italian, not latin.

How do linguists know that Spanish is descended directly from Latin, not from Italian? Simple: we look for cases in which Italian has made a change - innovated - and Spanish hasn't. Such cases are easy to find: for example, in Italian original *fl has become *fi (thus fiore "flower") and original long *e in open syllables has become i (thus di "of"), whereas in Spanish original *fl remains fl, and *e e (thus flor, de). If Spanish were descended from Italian, then these changes would all have had to have happened and then reversed themselves in Spain, which is very unlikely. We can know which form was original not just because in this case we have copious ancient data, but also by using comparative-historical reconstruction. The full toolkit would take too long to explain here (my favourite textbook is Lyle Campbell's Historical Linguistics), but basically, we:

establish sets of sounds corresponding systematically to one another;
figure out whether these correspondence sets systematically occur only in certain environments, and, if so, see whether there are any other correspondence sets occurring only in non-overlapping environments that they can be unified with.

This procedure allows us to prove that the ancestor language must have distinguished at least as many phonemes as members of the resulting set of correspondence sets, and - combined with a large body of knowledge about likely and unlikely sound changes - gives us a good chance of determining what the actual sound of those phonemes were. This technique was, of course, developed mainly for reconstructing unattested languages, but way back in the 1950s, Charles Hall decided to test it by applying it to Romance. The result was, as you might hope, Vulgar Latin.

Now, let us apply this to Levantine, Arabic, and Aramaic. Reconstructing the common ancestor of Aramaic and Arabic (see eg here or even just here) shows that Aramaic features a number of innovations not shared with Arabic; conveniently, many of these are mergers. In particular, in Aramaic *`, *ʁ (gh), and *ɬʼ (lh) all merge to ` (ayin); *x (kh) and *ħ merge to ħ (heth); initial *w and *y merge to *y. In Arabic, all of these distinctions are maintained. Now, the nice thing about mergers is that they can't be reversed; once two formerly distinct word classes feature the same phoneme, there's no way for the ordinary speaker to recover the distinction. A monolingual Aramaic speaker has no way of telling that the ` in 'ar`ā "earth" (< *'arɬʼ- + -ā) used to be pronounced differently from the ` in ṭar`ā "door", or in `aynā "eye". In Levantine, all of these distinctions are normally maintained, just as they are in Arabic; أرض has none of the consonants of عين. QED. (In fact, historical linguists have also succeeded in identifying some Aramaic loans into Levantine Arabic by finding the small minority of words in which these distinctions were lost.) In fact, you don't even need to look at phonology to figure this out; the grammar provides plenty of clues. In Aramaic, for example, almost every noun ends in -ā, except in a few specific contexts. This is an innovation specific to Aramaic, accomplished by gluing a former demonstrative on to the end of the noun, and preserved in every modern spoken Aramaic variety. In Arabic, it never happened - nor, obviously, in Levantine.

Of course, NNT shows no signs of even being aware of the relevance of regular sound correspondences, mergers, or any of the other elements in a historical linguist's toolkit, much less of accepting them as definitive criteria for language classification. At one point, however, he vaguely expresses the criterion he thinks should be definitive:

To prove that Levantine derives from Arabic WITHOUT Aramaic route you need to finds ratio of content ⊂ Arabic & ⊄ Aramaic. Not done.

Now that we've seen a little bit of how linguists determine what comes from Arabic and what comes from Aramaic, we're ready to look at the results of this criterion in the next post. You should be able to guess the answer already...

Thursday, August 21, 2014

Sondage pour les algériens bilingues arabe/français

Même si j'écris généralement en anglais, je suis sûr que ce blog a quelques lecteurs algériens qui sont bilingues arabe/français. Si vous appartenez à cette catégorie, et si vous avez quelques minutes pour aider une doctorante algérienne à l'Université de Florida en ses recherches linguistiques, vous pouvez faire ce sondage. Je joins la lettre que j'ai reçue.

Bonjour,

Nous menons une étude sur les algériens bilingues arabe/français. Si vous souhaitez participer, connectez-vous sur le lien ci-dessous. Soyez sure que vos réponses seront anonymes. https://ufl.qualtrics.com/SE/?SID=SV_6QjKS7nNFYDYlBr

Si le lien ne s'ouvre pas, copier et coller le dans votre navigateur.

Pour ceux d'entre vous qui souhaiteraient terminer le sondage en deux fois, n'envoyez pas vos réponses, simplement quittez le sondage en fermant votre navigateur. Une fois connectés à nouveau vous pouvez continuer là où vous vous êtes arrêtés (les phrases peuvent apparaître dans un ordre différent).

Le sondage a plusieurs listes de phrases. Après avoir complété et envoyé le sondage, vous pouvez (si vous le souhaitez) entrer dans le lien une nouvelle fois et compléter une autre liste. Nous tenons à vous rappeler que vous ne devez pas compléter la même liste. Si on vous donne la même liste, veuillez quitter le sondage.

Vous pouvez transmettre ce message à d'autres algériens bilingues, mais s'il vous plaît ne l'affichez pas sur Facebook.

S'il vous plaît essayez de compléter l'enquête le plutôt possible avant sa fermeture.

Merci d'avoir partagé votre temps et vos idées.

Souad,

University of Florida

Monday, August 18, 2014

A South Arabian loan into Libyan Berber?

From Morocco to Oman, there is a long tradition of imagining that the Berbers of North Africa and the Mehris of South Arabia speak the same language. This is by no means confined to pan-Arab nationalists - Siwis have told me more than once that some friend of a friend had met non-Arabic-speaking Yemenis and understood their language, and I'm told many Mehris have the same belief. I've previously discussed some possible reasons for this belief, as well as the more obviously propagandistic claim that Arabic descends from Berber; both are false.

Nevertheless, it is true that significant numbers of Yemenis participated in the Arab migrations to North Africa during the Islamic era, and it's not inherently implausible that some should have brought their languages with them. In fact, I just came across what looks very much like a South Arabian loan into the northwestern Libyan Berber variety of Zuwara (At Willul).

In Zuwara, the usual word for "father" is baba, as in many other Berber varieties, but in a few collocations such as əg tíddart n ḥíbi-s "in her father's house", a different term ḥibi is substituted (Mitchell 2009:303, 341). This word is unlikely to be proto-Berber, since proto-Berber did not have a phoneme /ḥ/ and since it is quite unusual within Berber. And as far as I know, it is not used anywhere in Arabic (although Libyan dialects are not that well documented). One could try to link it to ḥabīb-ī "my beloved", but that would be phonetically irregular and semantically unlikely, since this term is normally used in the context of romantic love or of a child by their parents.

However, the normal word for "father" in Mehri is ḥīb "father" - ḥayb-ī "my father", ḥīb-as "his father" (Watson 2012:149). In fact, Mehri adds this ḥ prefix to a number of kinship terms: ḥāmē "mother", ḥabrē "son", ḥabrīt "daughter" (ibid), as well as a number of other common nouns. Its function is to mark definiteness (ibid:64). But no such definite article has ever existed in Arabic or in Berber, so the only possible explanations for the similarity of Zuwara ḥibi are pure coincidence or borrowing from Mehri into Berber (perhaps via an Arabic dialect?). It will be interesting to see if other cases turn up.

And as long as I'm talking about Libyan Berber, I really ought to mention Marijn van Putten's new book A Grammar of Awjila Berber (see his announcement at Oriental Berber).. This careful analysis of all the unfortunately limited data available on the very unusual Berber variety of Awjila, in the far east of Libya, is an important resource for Berber historical linguistics. I hope that things settle down in Libya soon enough to make a fuller description possible, but for the moment, this work appears unlikely to be superseded.

Saturday, August 09, 2014

Some minority languages of the Mosul Plain

For most of the past decade, while first the rest of Iraq and then Syria (150,000 dead, 2.5 million refugees) have burned, Northern Iraq has seemed like a relative oasis of calm. That has changed rather suddenly: with ISIS' religious persecution, and now American airstrikes, Northern Iraq and its minorities are suddenly prominent in the headlines. The headlines throw into sharp relief the region's status as perhaps the most religiously diverse place in the Middle East - but what they may not show is that this region is also a small-scale "residual zone" preserving rather more linguistic diversity than is typical for such a small area in the modern Fertile Crescent (not just Arabic and Kurdish!)

The most endangered language of the region is certainly Northeastern Neo-Aramaic (NENA), or Sûreth (ܣܘܪܝܬ). Once, Aramaic was the lingua franca of the Middle East, spoken in various dialects from Gaza to Basra, and written as far afield as China and India. By the early 20th century, it was restricted to a few hundred far-flung mountain villages; the largest dialect group, NENA, was centered on the Christian (Assyrian and Chaldean) villages of the Mosul Plain, such as Tel Kef (Telkepe) and Qaraqosh, and across the border in Iran and Turkey; a detailed map is available at Cambridge's NENA Database. Today, those who have stayed behind in ever harder conditions are substantially outnumbered by their diaspora in cities such as Detroit or Sydney, whose children increasingly just speak English - and, as of the past couple of days, media accounts suggest that fleeing refugees have left the Mosul Plain villages practically empty. Their exodus is rather reminiscent of what happened about a century ago: during the Armenian/Assyrian Genocide, the NENA-speaking Assyrians of Hakkari fled from Turkey never to return, taking refuge in Iraq and finally in Syria. It remains to be seen whether this exile will be as lasting as the previous one. If you're wondering how the language sounds, the NENA Database site has a number of recordings, some transcribed, such as The Story of the Cobbler; others can be heard at Semitisches Tonarchiv.

While Kurds prefer to consider Kurdish as one language, the two main Kurdish varieties of northern Iraq - Sorani and Kurmanji - are strikingly different from one another, and are usually considered as separate languages by academics. The smaller Gurani language, (see DOBES), spoken in northwestern Iraq and also commonly labelled Kurdish, doesn't even belong to the same branch of Iranian as Sorani and Kurmanji. Many of its speakers belong to loosely Shia-affiliated minority religions, such as the Ahl-i Haqq and the Shabak, considered by ISIS as beyond the pale.

The other minority group unfortunate enough to have been pitched into the headlines, Yezidis, do not have a language of their own; they speak Kurmanji Kurdish. However, the Yezidis are associated with a unique writing system. In the early 20th century, manuscripts summarising Yezidi beliefs written in a unique alphabet (such as the Meshefa Resh "Black Scripture") came into the possession of Western researchers, and the alphabet in question duly found its way into compendia such as Diringer (1968). Later research, though, suggests that both these manuscripts and the alphabet they were written in were created for Western consumption, likely by a non-Yezidi bookseller, rather than representing a Yezidi tradition (Kreyenbrook and Rashow 2005, EI).

The region's Turkmen, many of whom have also apparently been persecuted by ISIS for their Shiism, speak a Turkic variety close to Turkish and Azeri. From what little information I've seen, it seems unlikely to qualify as a separate language, but does not seem to have attracted much research.

The Arabic dialects of northern Iraq - the so-called qeltu dialects, for their unique pronunciation of the word "I said" - are also quite interesting in their own right; the spoken Arabic dialect of Abbasid Baghdad seems likely to have belonged to this group. However, that is another story for another day...

Monday, July 14, 2014

Northern Songhay comparative wordlists

Linguistically, the northern and southern shores of the Sahara have remained surprisingly distinct, and most Saharan groups are easily identifiable as outposts of one or the other. Occasionally, however, a greater degree of language mixture is found. Nowhere is trans-Saharan language mixture more prominent than in Northern Songhay, a group of languages spoken in Niger, Mali, and Algeria combining a Songhay base with an enormous Berber superstratum, including Korandjé, a southwestern Algerian language I've been working on for a few years now.

Following an inquiry I recently received, I've been comparing Korandjé data to the Northern Songhay comparative wordlist in Rueck and Christiansen (1999). In the spirit of open data, you can view the wordlist (with a few remaining gaps to be filled) here: Korandjé 380-word list for Northern Songhay lexical comparison. Draft version, 14 July 2014. The results should be treated as provisional, since the Tasawaq part of this wordlist in particular appears a bit unreliable and since a few gaps remain in the Korandjé and even Tadaksahak lists, but are nevertheless interesting.

Counting cognates makes it very clear that Korandjé is the outlier, as might be expected based on geography:

	Korandjé	Tadaksahak	Tagdal	Tabarog	Tasawaq
Korandjé	–	139	140	141	152
Tadaksahak	139	–	242	238	214
Tagdal	140	242	–	304	237
Tabarog	141	238	304	–	229
Tasawaq	152	214	237	229	–

The other three Northern Songhay varieties (treating Tagdal+Tabarog as one variety) form a linkage, which, following Wolff and Alidou's suggestion, we might label Azawagh Songhay - from west to east: Tadaksahak, Tagdal+Tabarog, then Tasawaq. On this wordlist Korandjé is clearly closest to Tasawaq, but that's only because Korandjé and Tasawaq have both kept more Songhay vocabulary, a fact irrelevant for subgrouping. The only innovation in vocabulary that Korandjé and Tasawaq share to the exclusion of the rest is the borrowing of numerals from 5 up from Arabic, and if you look at the sound correspondences it's clear that Tasawaq and Korandjé each borrowed their current numerals separately from different dialects of Arabic. Tadaksahak, Tagdal, and Tabarog all show almost the same number of items shared with Korandjé due to common borrowing from Berber, and most of that is due to shared borrowings of widespread Berber words that could easily have happened independently. The use of a Berber form originally meaning "weaver" for "spider" in Korandjé and Tadaksahak alone is striking, but very likely coincidental.

Another way to look at this is to note that 188 of the 332 items are shared across all of Azawagh Songhay, whereas only 108 are shared across all of Azawagh Songhay plus Korandjé. Of the latter, only 9 are Berber or Arabic loans, while 99 are Songhay retentions:

eye, ear, mouth, head, hair, neck, milk, belly, foot, hand, skin, blood, urine, liver, person, man, woman, owner, name, dog, cow, donkey, (venomous) snake, louse, meat, fat, stick, grass, rope, salt, pot, pit (hole), iron, fire, smoke, ashes, night, sun, day, yesterday, wind, water, stone, one, two, hot, cold, long, old, lots, red, black, white, dry, full, what, where, near, far, and, sit down, stand up, lie down, sleep, bite, eat, drink, suck, laugh, cry, see, hear, know, love, give, steal, hide, give birth, die, kill, walk, run, fall, wash, pierce, hit, tie, do, sew, bury, sandals, horse, truth, falsehood, finish, dig, stand, find.

This list is dominated by basic, rarely loaned words: nearly half of it overlaps with the Leipzig-Jakarta list. However, more culturally specific shared retentions such as "iron", "owner", "cow", "donkey", "horse", "pot", "sew", and "sandals" remind us that the split of Northern Songhay is after all rather recent (much more so, in fact, than these words alone might suggest).

These pan-Northern retentions, however, by no means exhaust the Songhay lexicon of Northern Songhay. Korandjé alone retains some 183 list items of Songhay origin, at least 135 of them shared with Tasawaq, while for many words (eg "four", "green"), only Tasawaq has kept Songhay forms. Well over 227 items have Songhay equivalents in at least one Azawagh Songhay variety, and more than 241 have equivalents either in the Azawagh or in Korandje. If the even more conservative (but extinct) Emghedesie variety were added to the list, that number would no doubt be even larger. Proto-Northern Songhay certainly had a significantly larger Songhay lexicon than any of its descendants does.

[Later addendum]: Removing all words with Arabic-derived Korandje forms from the list makes no difference to the classification; the table ends up like this:

	Korandjé	Tadaksahak	Tagdal	Tabarog	Tasawaq
Korandjé	–	135	136	138	142
Tadaksahak	135	–	188	186	174
Tagdal	136	188	–	231	188
Tabarog	138	186	231	–	181
Tasawaq	142	174	188	181	–

Saturday, June 28, 2014

Grammatically analysing "Sahha Ramdankoum!"

Sahha Ramdankoum صحّة رمضانكم!‍ ‍This Darja phrase, which might be rendered as "happy Ramadan!", is familiar to any Algerian. It groups with a few others - notably Sahha Ftourkoum صحة فطولاركم "happy fast-breaking dinner!" and Sahha Eidkoum صحة عيدكم "happy Eid!" - as an example of a not very productive template "Sahha X+2nd person possessive" expressing good wishes on the occasion of X. But what is "sahha" doing in such forms?

In many contexts, "sahha" is a noun meaning "health"; we can be sure it is a noun, since it can be the object of a preposition and take personal possessive endings, as in b-sahht-ek بصحتك "good for you" (with your health). But there is also a defective verb, taking 2nd person perfective endings: sahhit صحيت (to a man), sahhiti صحيتي (to a woman), sahhitou صحيتو (to a group) "thanks / well done" (a little stronger than sahha "thanks"). The expected 3rd person masculine singular form of this verb would be sahh صح or sahha صحى; sahh actually is attested as an impersonal verb (ysahh-lek يصحلك "it is appropriate for you"), but its meaning is sufficiently distant that it's not necessarily part of the same paradigm. So in principle, "sahha" in "Sahha Ramdanek" could be interpreted as a noun, or a verb. Is there any way to decide which?

If it's a noun, then the phrase's syntax is bizarre - the literal interpretation would then be "Health is your Ramadan", whereas to make it fit the actual meaning we want at least something like "Your Ramadan is health", which would be the opposite order (?Ramdanek Sahha رمضانك صحة). If it's a verb, on the other hand, the syntax is fine - subjects in Algerian Arabic routinely follow the verb, and perfective verbs are routinely used to express states, so we could interpret it as something like "Healthy is your Ramadan!" or even, if we allow the perfective to be optative as in Classical Arabic, "May your Ramadan be healthy!"

On the other hand, if it's a verb, then it should agree in gender and number with what follows it, with feminine "sahhat" صحات and plural "sahhaw" صحاو. This can't actually be tested directly: in all such expressions that I can think of, the noun happens to be masculine and singular, and this expression cannot normally be extended to congratulate people on other occasions. But if we imagine using this formula to congratulate someone on their happiness, I for one would much sooner say "Sahha Farhatkoum" صحة فرحتكم than "Sahhat Farhatkoum" صحات فرحتكم, which suggests that my mind, at least, is not analysing it as a verb.

Perhaps it's neither noun nor verb, then? There are a few words in Algerian Arabic that form predicates and comme at the start of the clause, but do not take verbal morphology - for instance, makash ماكاش "there is no" or oulah ولاه "no need (for)". Putting it in this class would take care of the problem, but just leads us to a different one: can this class of non-verbal predicators be given a coherent positive definition, or is it just whatever happens to be left over from defining the major word classes?

Be that as it may, best wishes to all readers for this coming month, and, for those fasting it, Sahha Ramdankoum!

Tuesday, June 24, 2014

From Figuig to Igli: Berber in the Algerian-Morocco borderland

The number of good Berber descriptive dictionaries has been slowly but steadily increasing in recent years, but Hassane Benamara's new Dictionnaire amazigh-français : Parler de Figuig et ses régions (Rabat: IRCAM, 2013), which I was lucky enough to be lent a copy of lately, is surely one of the best. Apart from being quite unusually large (800 pages), it incorporates examples, multiple senses, pictures of items difficult to describe, an appendix with encyclopedic information on culturally specific words such as festivals and childrens' games. It incorporates a few neologisms useful for schooling, but takes a fairly inclusive attitude towards Arabic loanwords. There are barely 15,000 people in Figuig, but, astonishingly enough, this is actually the second dictionary of Figuig Berber published by a native speaker; the first, Ali Sahli's معجم أمازيغي-عربي (خاص بلهجة أهالي فجيج) (Oujda: Al Anwar Al Maghribia, 2008), was a good effort, but is substantially shorter and used a less accurate transcription. (There's even another linguist from Figuig, Mohamed Yeou, threatening to make a third dictionary – if he goes ahead with the project, he'll have a high hurdle to clear.)

Across the border in Algeria, the situation is rather different. A number of towns across a wide area around Bechar and Ain Sefra speak Berber varieties closely related to that of Figuig, collectively imprecisely termed "Shelha". Some of them seem to be shifting to Arabic (on my latest trip, I was told that in Lahmar they had stopped speaking Berber with their children, and for Igli I had heard the same much earlier.) But little effort – and no official effort, as far as I know – is being made to document them. The only (very) partial exceptions of which I am aware are Igli and Boussemghoun.

For Igli (population 7000), I have already described the local Scouts' efforts to put together an online dictionary. More recently, however, I came across a laudable local attempt at approaching the problem academically: Fatima Mouili's The Berber Speech of Igli, Language towards Extinction. After a very brief summary of Igli grammar and phonology, unfortunately made frequently illegible by font problems, the author discusses the reasons for language shift. Corresponding to my impressions for the region, including Tabelbala, she cites emigration and the desire to ensure educational success as important drivers; others are more surprising, including the immigration of refugees expelled by the French from a nearby village during the Algerian War of Independence. Apparently, her thesis discusses similar issues, for those with 59€ to spare...

For Boussemghoun (population 4000), a few articles and a book by Mohamed Benali may be cited, all focusing – as far as I can see – exclusively on the sociolinguistic situation of Berber in the town. A local Berber-language poet billed as "the Ait Menguellet of Boussemghoun", Bashir Oulhaj, has a considerable presence on YouTube, eg here; he's even been interviewed, by Figuig News. It seems to be treated as the centre for Amazigh identity in the region; the HCA has even organised a symposium there. Nevertheless, little if any descriptive work has been published on its variety of Berber.

Taken together, there are probably more speakers of Berber in southwestern Algeria than in and around Figuig. Why the difference, then? Is it because linguistics is better represented in Moroccan universities than in Algerian ones? (Notwithstanding some interesting work coming out of Algeria, I think that is fair – it would be hard to think of any linguist working in Algeria with a profile comparable to Abdelkader Fassi Fehri, for example.) Or is it because the Amazigh movement in Morocco is less closely associated with one side in the "culture war"? (Benali observes that, while most Semghounis wanted Berber to be taught in schools, they rejected the installation of an HCA office due to distrusting their politics.) Or are there more specific, purely local factors explaining the difference? That would be worth a study in itself – though perhaps not as much so as the Berber varieties in question!

Wednesday, June 18, 2014

Why Yiddish is not Slavic, and language families are not families

Recently I came across a popular article, Where Did Yiddish Come From?, discussing Paul Wexler's eccentric claim that Yiddish is a "relexified" Slavic language (and Modern Hebrew, in turn, "relexified" Yiddish). To make any sense of this claim, we have to stop and consider what historical linguists mean when they talk about language origins.

If you want to learn a language perfectly, the best way to start is to pick it up as a child from your family and the community they're part of. That way, you and your generation end up speaking the same language as your parents and their generation, modulo a few little innovations you threw in just to annoy them. As those little innovations pile up, generation on generation, sooner or later you end up speaking something that the first generation wouldn't have been able to understand. In such a scenario, everyone agrees, the latest generation's language – let's call it B – is descended from the first generation's (A). If some of the children of that first generation moved far away early on and went through the same process of gradual change, their descendants speak another language, C, which speakers of B can't understand, but which is also descended from A. So we say that B and C belong to the same language family, just as their speakers belong at some remove to the same extended family.

If you're reading this, it's probably too late to learn a language that way. (Sorry.) You can still learn another language, say B, but the odds are that, at best, you'll always speak it with a bit of a foreign accent, and keep using expressions that make sense in English but sound weird to native speakers. If you're just an individual migrant learning it to fit in, that won't matter in the long run – your kids will learn the language in the playground and come back speaking it better than you do. But what if it's not just you that's learning it, but also your spouse, and your brothers, and almost everyone you know? What if your whole community is starting to prefer to speak this language with their kids, instead of the one they grew up with? In that case, the kids will still end up speaking it – but instead of speaking it like natives, they'll probably end up speaking it with your foreign accent and all those expressions of yours that native speakers laugh at. In that scenario, does the kids' language (let's call it D) belong to the same language family as B and C, or not? That's the ambiguity that Wexler is playing with.

The obvious answer – and the one most linguists would give – is yes*. For one thing, assuming you did a half-decent job of learning B, it's the same language – speakers of D can understand speakers of B, and vice versa, even if they laugh at each other's crazy accents. The influence of Gaelic may pervade Irish English, but Irish English is still English, not some Celtic language. It's the vocabulary and the morphology that really make English understandable – a weird accent or a funny way of putting things is just not that big an obstacle on its own. Wexler proposes exactly the opposite criterion: "Yiddish – in contrast to its massive German vocabulary – has a native Slavic syntax and sound system – and thus must be classified as a Slavic language" (1993:5). The origins of Yiddish syntax and phonology I can't comment on, but there's a good reason why historical linguists normally prioritise the vocabulary and the morphology over the syntax and phonology, even apart from the one just given. Vocabulary and morphology are eminently reconstructible, using the comparative method. Phonology, on the other hand, can only be reconstructed from vocabulary, and syntax is notoriously hard to reconstruct at all. If language families were to be defined based on phonology and syntax, it would hardly be possible to define them, much less reconstruct them or state regular correspondences between them.

In short, saying that Yiddish (much less Modern Hebrew) belongs to the Slavic language family is just a word game – in the sense that historical linguists normally use the concept of "language family", it doesn't, and wouldn't even if every last Yiddish speaker happened to be of Slavic ancestry and to speak Yiddish with a heavy Slavic accent. But such word games do not vitiate Wexler's work. After a large enough community has shifted to a different language, it is usually possible to find traces of their former language – although identifying them as such, rather than as later borrowings, may be hard. That's what Wexler is trying to do for Yiddish, and that's how he supports his claim that Yiddish speakers' ancestors used to speak a Slavic language.

* However, the question can easily be made more controversial. Suppose you and your community didn't learn it that well to start with, and aren't trying to imitate native speakers anyway? In that case, the kids will end up speaking something that sounds utterly ridiculous to native speakers; the basic words are recognisable, but the way they're put together seems all wrong. Whatever Tok Pisin is, most people would agree that it's not English. A few people would defend the claim that Tok Pisin belongs to the same family as English, on the basis that that's where the vocabulary comes from, but most would say that it doesn't belong to a language family. The language family model presupposes that the language is being passed on reasonably well as a whole, including not just vocabulary but also some amount of grammar; if all that's learned is a bunch of words, the model breaks down. The border must be drawn somewhere between the extremes of Irish English and Tok Pisin, but linguists can and do disagree on where exactly to draw it.

Tuesday, June 10, 2014

The Subclassification of Songhay, now online

After more than a year, I can now finally put a PDF of my article The Subclassification of Songhay and its Historical Implications online for whoever may be interested. The abstract follows:

This paper seeks to establish the first cladistic subgrouping of Songhay explicitly based on shared arbitrary innovations, a prerequisite both for distinguishing recent loans from valid extra-Songhay comparanda and for determining how Songhay spread. The results indicate that the Northern Songhay languages of the Sahara form a valid subfamily, even though no known historical records link Tabelbala to the others, and that Northern Songhay and Western Songhay (spoken around Timbuktu and Djenné) together form a valid subfamily, Northwestern Songhay. The speakers of Proto-Northern Songhay practised cultivation and permanent architecture, but were unfamiliar with date palms. Proto-Northwestern Songhay was already in contact with Berber and probably (perhaps indirectly) with Arabic, and was spoken along the Niger River. Proto-Songhay itself appears likely to have been in contact with Gur languages, confirming its relatively southerly location. This result is compatible with two scenarios for the northerly spread of Songhay. On Hypothesis A, Northern Songhay spread out from an oasis north-east of Gao, probably Tadmakkat or Takedda, and Northwestern Songhay had been spoken in areas west of Gao which now speak Eastern Songhay. On Hypothesis B, Northern Songhay spread out from the Timbuktu region, and Western Songhay derives from heavy “de-creolising” influence by Eastern Songhay on an originally Northern Songhay language. To choose between these hypotheses, further fieldwork will be required.

Comments welcome!

Beni-Snous Berber

I have the pleasure of announcing that my article with Fatma Kherbache, Syntactically conditioned code-switching? The syntax of numerals in Beni-Snous Berber, has just been published online in the International Journal of Bilingualism. Long-term readers may recall that, five years ago now, I noticed an astonishing claim in Destaing's grammar of Beni-Snous Berber (spoken near Tlemcen, in western Algeria): that, with numerals above ten, they only used Arabic nouns. In this article, I finally try to get to the bottom of this, based both on Destaing's corpus and on data gathered by my co-author from the half-dozen or so last speakers; the real situation turns out to be a little more complicated than Destaing described, but his claim is correct as a statistical generalisation. Syntactically conditioned code-switching as a systematic part of otherwise monolingual discourse has rarely been described, but one other instance is reported – numeral+noun combinations in the Jerusalem dialect of Domari, an Indic language of the Levant spoken by the Dom "gypsies". Comparing the circumstances of switching in both languages supports the generalisation, building on Myers-Scotton's work, that syntactically conditioned code-switching (Matras' "bilingual suppletion") can only happen when a word shared by both languages has different selectional requirements in each language.

Sunday, June 08, 2014

Standard Arabic and cartoons

In the Arab world, practically all cartoons are dubbed in Standard Arabic, rather than in the different countries' spoken varieties. Until recently, Disney was the exception, using Egyptian Arabic; its decision to use Standard Arabic like everyone else has attracted some controversy (New Yorker, Language Log, Arabic Literature, MEI), although it will be very welcome in the Maghreb, where kids don't understand Egyptian anyway. In general, however, there's a strong consensus in favour of Standard Arabic in cartoons; it's seen as a good way to get children used to Standard Arabic, and thus prepare them for school. What are the effects of this?

Let's start by looking around us. We see that younger generations understand Standard Arabic rather well, and have a much larger Standard Arabic vocabulary than earlier generations did at the same age. A cursory search suggests that cartoons have played a role in this; for example, Weyers 1999 shows that American students of Spanish improved their listening comprehension and used a larger vocabulary after watching a Spanish-language telenovela, and Blosser 1988 that Hispanic children, once they've mastered the basics of English, improve their English by watching more TV (although this does not seem to work below the age of 2). So parents are probably right to think that Standard Arabic cartoons are helping their kids learn Standard Arabic.

However – let's be honest – those same younger generations remain largely unable to write a grammatically correct paragraph in it, and normally speak in Standard Arabic only to quote prestigious texts or to parody TV presenters or politicians. This suggests that what they're gaining from it is limited to what Weyers 1999 identified for learners of Spanish: better comprehension and a larger vocabulary, but not better production. I don't think it's an exaggeration to say that, in Algeria at least, Standard Arabic is effectively a read-only language: everyone under a certain age can understand it and read it, practically no one can express themselves in it correctly or confidently. So, as an educational tool, cartoons have their limits.

But education isn't everything. Cartoons are one of the most secure domains of spoken Standard Arabic, right up there with news broadcasts, documentaries, and historical soap operas, and well ahead of teaching, sermons, political speeches, and interviews, all of which often use varying amounts of dialect. For younger children, unless their parents read to them, cartoons may well be the only context other than school and prayer where they regularly hear Standard Arabic (cp. Hamzaoui 2014), and in any case are one of the first contexts they learn to associate with Standard Arabic. Shouldn't we be asking how this affects their feelings about the language?

Sunday, April 27, 2014

Speaking in Oran

It's a bit last minute, but I'm glad to announce that I will be giving two talks in Oran over the next few days:

"Le contact linguistique au Sahara" at 2:00 pm, 29 April, at CRASC (Centre National de Recherche en Anthropologie Sociale et Culturelle), Technopole USTO, Oran.
"L'histoire du korandjé, une langue algérienne méconnue" at 10:00 am, 30 April, at CEMA (Centre d'Etudes Maghrébines en Algérie), Cité du Chercheur (ex-IAP), University of Oran Es-Senia.

It would be a pleasure to see some readers of this blog there.