Monday, August 10, 2015

Algerian Arabic in schools? More smoke than fire

Conveniently distracting public attention from the recent assassination attempt on the president's brother and the continuing drop in oil prices, Algeria's Minister of Education, Nouria Benghebrit, has recently provoked a stormy debate by announcing that preschool, 1st grade, and 2nd grade would from now on be taught partly in local dialects. (Contrary to some reporting, there is no question of introducing dialectal Arabic as a school subject.) In a TV interview, she points out that nearly 10% of Algerian children repeat second grade, and argues that the solution is for the teacher to make more use of the linguistic abilities they come into school with.

Public opinion in Dellys seems to be overwhelmingly against this move, and I suspect that applies to most of Algeria. The president of the teachers' union (UNPEF) called it "a dangerous precedent" reminiscent of "France's [colonial-era] efforts to erase the pillars of the national identity", while the Association of Muslim Ulama went so far as to call for a school boycott (also here) in the event of its implementation - echoing the strategy by which, twenty years ago, Kabyle activists forced the state to teach Berber in school. However, an independent teachers' union (SATEF) expressed its support, claiming that "in the 1970s Algeria called in Egyptian teachers who taught in Egyptian Arabic and no one said anything". I won't bother with the statements of party politicians, whose easily predictable positions are meaningless to anyone but the few who still take their game seriously; but it is interesting that quite a few journalists seem to have come out in support.

Despite all this noise, however, this move will have no direct consequences (except potentially for Berber speakers), for a simple reason: Algerian primary school teachers already, by necessity, teach largely in dialect. The minister admits as much in the same interview, breaking into French to say that this move will simply "déculpabiliser" the teachers. The more intelligent among her critics, such as Mohamed Djemai, make the same point. The problems Benghebrit points to are real enough - such as massive rates of subject failure in Arabic in completely Arabic-speaking regions - but telling teachers to start doing something that they're already doing is hardly likely to solve them!

Rather than as a change in practice, this statement could be understood as an attempt to move Algeria's Overton window (at least insofar as as it's anything but a distraction). If a government minister can now get away with suggesting publicly that the dialect has a positive role to play in education, then maybe one ten years down the line will be able to make proposals that are currently unthinkable. Whether she can in fact get away with this suggestion, or gets thrown out for it at the next reshuffle, remains to be seen. Algerians often assume that any proposal to improve the status of dialectal Arabic is just a stalking horse for preserving the position of French, so it doesn't help that Benghebrit speaks rather poor Arabic: even her conversational dialectal Arabic sounds rather halting when contrasted with the fluency of her frequent and jarring shifts into French.

So would more dialect in education be a good thing?

The state, and many parents, want all children to learn Standard Arabic, French, and English, in that order. Children's exposure to Standard Arabic is practically limited to school and the cartoons they watch; rather, they speak Algerian Arabic, which shares some basic structure and vocabulary but is still effectively a different language, or Berber, which is radically different. In effect, children are learning Standard Arabic as a second language, as well as French and English. That being the case, the question we need to ask in all three cases (not just for Standard Arabic!) is: is it more effective to teach a second language in the learners' first language, or in the target language, or in a combination of both? Algerians usually assume that the answer is to teach exclusively in the target language. There isn't as much scientific evidence on this question as one might hope, and the question is often debated without any convincing empirical evidence (Bruhlmann 2012 summarises some of this in regards to English teaching). August et al. find that non-English-speakers in English-speaking countries learn to read English better if educated in both languages than if educated only in English; but such students are also extensively exposed to English outside their homes, making the situations less comparable. Any SLA researchers reading this are cordially invited to propose better references.

However, schooling is not just about learning languages. Is it more effective to teach other subjects in students' first language, or in a second language? The answer seems too obvious to bear investigating, but it too has occasionally been investigated - notably in the context of America's bilingual education debate. Both Rolstad et al. (2005) and Slavin and Cheng (2005), usefully summarised here, find that immigrant children in the US learn more if taught bilingually than if taught only in English, even as measured by tests in English. Similarly, students in Hong Kong taught in English (Lo and Lo 2013) were found to perform more poorly in non-language subjects than those taught in Chinese, despite the large difference between the Chinese used in school (Mandarin) and that spoken at home there (Cantonese). So it seems that the obvious conclusion is true: it's easier to learn non-language subjects in your first language (i.e. Algerian dialectal Arabic), and failing that, easier to learn them in a closely related language (i.e. Standard Arabic) than in a very different one (i.e. French). This suggests that dialectal Arabic should play a rather larger role in Algerian schools than almost anyone is willing to consider at the moment.

Quite apart from school performance or school curricula, though - and no matter what the underlying agenda may be - it's nice that Algerian dialectal Arabic is finally getting taken seriously enough for proposals like this to be heard. It may be a shame that most Algerians can only express themselves fluently in this "dialect", but it's a fact - and their voices should not be banished from public debate by their inability to dress their thoughts up in a more prestigious language. A good knowledge of its extensive vocabulary and its complex morphology is an achievement that takes years; why do we insist on treating it as an embarrassment or a sign of ignorance?

One of the most interesting responses to this debate was by a teacher, interviewed by Mohamed Saadoune. She strongly supports teaching in Fusha, not for nationalistic or religious reasons, but simply - because its use lets teachers reassert their authority over the class! "It clearly signals to the student that we are no longer in the street, but rather in a place where we learn, and where there are rules. Putting Standard Arabic into question quite simply signals to the students that there is no longer any difference between the street and the school. This necessary boundary risks being erased." Understandable as it might be given the state of Algerian society, this is a counsel of despair: it presupposes that the way you learn to behave in school has nothing to do with the rest of life! In that case, what good does school do at all? Surely the goal should be, precisely, to erase or at least blur the boundary between school and the street - to make it clear that what you ideally learn at school, including expanded vocabulary and polite behaviour, can and should be applied outside school?

Sunday, August 09, 2015

Can two kids change Algerian Arabic? (Probably not, but let's see.)

In central Algerian Arabic, feminine nouns are usually marked by a suffix -a, which becomes -ət when possessed. Pronominal possessors are indicated by suffixes, eg -i "my", -u "his". The lax vowel ə cannot occur in open syllables; when the suffix starts with a vowel, this is resolved by dropping it. If doing so would result in a three-consonant cluster, then, in certain cases, the latter is broken up by inserting a new schwa after the first consonant in the cluster, and geminating that consonant: thus jəfn-a جفنة "big bowl" becomes jəffən-t-i جفّنتي "my big bowl". I've been trying to figure out when exactly this happens in the dialect of Dellys, and finding a good deal of variation, especially in the treatment of sonorants: some people (especially but not exclusively the older ones) say səlʕ-t-i سلْعتي "my goods", leaving the cluster intact, while others say səlləʕ-t-i سلّعتي. I was surprised, however, to find two children, 8 and 10-year-old siblings, using a strategy not, as far as I know, used by adults for nouns at all: changing the problematic ə into a. This was confirmed not just by elicitation (zənq-at-i زنقاتي "my alley", ʕənb-at-i عنباتي "my grape", səlʕ-at-i سلعاتي "my goods") but also by sentences produced; thus for bəlɣ-a بلغة "pair of flip-flops":
ənta ʕəndək bəlɣ-at-ək w ana ʕəndi bəlɣ-at-i انتا عندك بلغاتك وانا عندي بلغاتي
"You have your flip-flops and I have my flip-flops."
and for xədm-a خدماتك "work", completely unprompted:
kəmmli xədm-at-ək كمّلي خدماتك
"Finish your work."
which his older brother actually corrected to kəmmli xəddəm-t-ək كمّلي خدّمتك.

Adults' speech furnishes one plausible model for this strategy - not in nouns but in participles. The active feminine participle takes direct object pronoun suffixes, identical to the genitive ones except in the 1st person singular. In such forms, -ət becomes -at before a vowel, rather than dropping the ə: šayf-a شايفة "having seen (f.)", šayf-at-u شايفاتهُ "having seen him (f. subject)". But its extension to nouns is something quite new; neither their parents nor their elder brother nor any adult I've met use such forms.

Most probably, the next time I go to Dellys I'll find these two children using the normal forms and denying they ever spoke this way. Even now, they already use the normal form for body parts which almost always occur possessed: rəqb-a رقبة "neck" becomes rəqqəb-t-i رقّبتي "my neck". But what if this innovation instead spreads among their peers? Most likely it won't: there seems to be little evidence for children initiating language change, notwithstanding the idea's widespread adoption by generative historical linguists, and adults' innovations are much more likely to be maintained or copied (cf. Luraghi 2013, Foulkes and Vihman fc; for a potential counterexample, see Moyna 2009). For that very reason, however, it will be worth keeping an eye on them; potential counterexamples are always interesting.

Tuesday, August 04, 2015

Seaweed from Hell? A Qur'ānic hapax legomenon in a modern Arabic dialect

The Qur'ān contains many Arabic words obscure enough that even the earliest commentators (eg al-Ṭabarī), at a time when Arabs still natively spoke something quite close to Classical Arabic, considered them to need glossing. This fact has led to sometimes rather wild speculations; at the extreme, Luxenberg, whose attempt to reinterpret houris as grapes brought him some notoriety a decade ago, bases his entire project on the assumption that such words are mistranscriptions or misinterpretations of Syriac. A look at modern Arabic dialects, however, reveals that in some cases a Qur'ānic Arabic word that was evidently unfamiliar to the largely Levantine or Iraqi audience of early commentators nevertheless survives right up to the present in other regions, confirming its historical reality and confirming how regionally variable the vocabulary was within even early Arabic.

One such case that I recently came across is ḍarīʕ ضريع, occurring in the Qur'ān only once, in verse 6 of Surat Al-Ghāshiyah:

They [the inhabitants of Hell] have no food except ḍarīʕ, which neither fattens nor takes away hunger.
The commentators' consensus is that this is a Ḥijāzī word, unfamiliar to Arabs from other regions, referring in this passage to dried shibriq – a thorny shrub with the Latin name of Zilla spinosa. The obscurity of this term outside the Qur'ān may be gauged by the fact that many early Arabic dictionaries omit it entirely; almost all occurrences in Alwaraq.net's rather large collection of classical Arabic literature are in quotes or explanations of this Qur'ānic verse.

So far, I am not aware of any Arabic dialect in which a reflex of the word has survived in the Qur'ānic sense. However, the names of land plants are very often extended to sea plants – for example, Ulva lactuca is “sea lettuce” in English, “laitue de mer” in French, and šḷađ̣a taʕ əlbħəṛ شلاظة تاع البحر in Dellys Arabic – and ḍarīʕ appears to be a case in point. Ibn al-Bayṭār (a 13th century botanist born in Málaga) glosses ḍarīʕ simply as a plant cast up by the salt sea from its bottom, found along the sea coast, not even bothering to mention the Qur'ānic usage of the word. The 13th century lexicographer Ibn Manđ̣ūr, born in Tunis, likewise gives as the primary meaning of ḍarīʕ a green, stinking, light plant cast up by the sea”. [Addition: A more picturesque attestation occurs in al-Nuwayrī (Egypt, 13th-14th c.), who describes a Fatimid general's conquest of Morocco: "He continued until he reached the ocean, and ordered that some fish be caught, and put them in a jar of water and brought them to Mu`izz in the post, and put inside his letter ḍarīʕ of the sea."]

Unlike the terrestrial meaning, this sense has survived in colloquial usage right up to the present day: in Dellys (central Algeria), the keystone seagrass of the Mediterranean, Posidonia oceanica, is called ṭṛiʕ طريع. Just as Ibn al-Bayṭār describes, this seagrass is cast up by the sea in vast quantities along the coast. The change of ḍ > ṭ is not regular in Dellys, but is sporadically attested here (eg ṭəmm “gather together” < ḍamma), and is a good deal commoner in the very similar dialect of the Algiers Casbah; so the derivation is unproblematic.

What is the connection between Zilla spinosa and Posidonia oceanica? It would be hard to think of two plants which resemble one another less. Posidonia oceanica has no thorns, never forms a shrub, and isn't even the same shade of green. Rather than form, we must look to function. Zilla spinosa is (if we may trust the lexicographers and commentators) very bad fodder; Ibn Manđ̣ūr comments that camels fed on it gain neither fat nor meat. Posidonia oceanica has many functions reported for it in Mediterranean cultures, but that of fodder is conspicuous for its absence, suggesting that it too does not make good fodder. This suggests an etymology: the meanings of the root ḍrʕ include “be humble, weak”, and cattle fed on ḍarīʕ presumably become weak.

The obvious follow-up question is whether either sense of ḍarīʕ has survived elsewhere in Arabic, so I'll close by asking my readers: is the word ḍarīʕ used in an Arabic dialect that you know? If so, what does it mean?

Wednesday, July 15, 2015

How Berber is the Arabic of the Chaamba?

As unfortunately foreshadowed by my last post, violence broke out again in Ghardaia recently between Chaamba and Mozabites. At least 22 were killed, most of them Mozabite. As far as I can tell, not one newspaper has ventured to report what specifically triggered this episode of violence - which probably means that the details are thought to be embarrassing or inflammatory. (I suspect that this vicious yet incompetent piece of anti-Mozabite propaganda provides an explanation in reverse - that a rumour circulated among the Chaamba that the Mozabites were celebrating the assassination of Ali or something like that - but that's only a guess.) Instead, they're presenting a flurry of supposedly deeper explanations, vaguely alluding to drug trading, smuggling, religious extremism, and foreign meddling.

One news item that recently made waves came from a Facebook post by Ahmed Ben Naoum, a professor of sociology at the University of Perpignan, who, as reported by El Watan, insists that the Chaamba (properly šʕanba) are not Arabs but rather Zenati Berbers. The ancestry of the Chaamba is not something I can comment on professionally - if that mattered, which it shouldn't, a look at their Y-chromosomes would be the way to go. Nor can I say much on their historical self-identification, though at present it's extremely clear that the Chaamba consider themselves Arab (more specifically, a branch of Banu Sulaym). However, the article also touches on their language:

«Les Cha’anba font partie de la majorité zénète de ce pays. Ils n’ont aucun mythe fondateur les rattachant aux ‘‘Arabes’’ ! Eux-mêmes ont été arabisés comme l’ont été les autres Zénètes, sauf à dire qu’ils expriment leur culture dans une des langues arabes qu’ils ont largement ‘‘zénétisée’’ dans la morphologie et la syntaxe.»
[The Sha'anba are part of the Zenati majority of this country. They have no foundation myth attaching them to the "Arabs"! They themselves have been Arabised like the other Zenatis, but that is only to say that they express their culture in one of the Arabic languages which they have extensively "Zenatified" in morphology and syntax.]

This is not correct. The dialect of the Chaamba is one of the few dialects of the Algerian Sahara for which a grammatical description has been published (Grand'Henry 1976), and its morphology, at least, is pretty well studied. Judging by this material, there is no discernible Zenata (or other Berber) influence on the morphology or syntax of the dialect at all. In this respect, it agrees with Algerian Arabic more generally. Very few dialects of Algerian Arabic show significant morphological influence from Berber; only a few areas, such as Jijel or Adrar, even have Berber plurals for nouns borrowed from Berber, and no dialect anywhere is reported to has borrowed Berber verbal morphology. Many dialects have a few abstract nouns in ta-...-t - usually with negative meanings - but this formation is hardly productive. Syntactic influence is plausible a priori, but has not been adequately demonstrated anywhere in Algerian Arabic (except Jijel), much less for the dialect of the Chaamba.

A better place to look for Berber influence in Algerian dialects, generally speaking, is phonology and vocabulary. In phonology, the phoneme and the merger of the short vowels can both plausibly - although not certainly - be attributed to Berber influence; however, it is unclear from Grand'Henry's rather poor description of the phonology whether even these apply in the Chaamba dialect. The vocabulary listed by Grand'Henry includes very few Berber loans, and most of the latter are pan-Algerian, eg həžžala "widow", atay "tea" (the latter ultimately from Chinese); the only rarer ones noted are two types of date, taqərbŭšt and tantmŭšt, which would naturally be easily borrowed from Berber-speaking oasis dwellers. On the basis of the available data, it's safe to say that the Zenati influence in the dialect of the Chaamba, like the Zenati influence in most Algerian dialects whether spoken by people of Berber ancestry or not, is very limited. It would be very interesting to study the extent of Berber influence in the Arabic spoken in different regions of Algeria, and how it varies. But such a study should not be expected to provide proof that Algerians in general, or any specific group of Algerians in particular, are of Amazigh ancestry. If for some reason you want to know about ancestry, ask a geneticist, not a linguist (nor, I would suggest, a sociologist).

Friday, July 03, 2015

Nasheed in Tumzabt

In honour of the month - and of the harmonious coexistence in Algeria of different branches of Islam, threatened in recent years - here's a rather well-produced bilingual Ramadan nasheed in Arabic and Tumẓabt, the Berber language of the Mzab region far to the south of Algiers:

Apart from its linguistic interest, it's rather interesting semiotically. The first half, in Arabic, presents life in a Saharan oasis as idealised by an oasis-dweller rather than a tourist - no dunes, not much picturesque architecture, just well-watered, well-shaded palm groves, traditional picnic blankets, and lots of happy children. The second half, in Tumzabt with Arabic subtitles, focuses more on religious life - mosques and prayer at odd hours and pages of the Qur'an. Someone put a lot of money into this clip; I don't know anything about its background, but I get the impression that it was intended not just to edify fellow speakers of Tumẓabt but also to show the best possible image of the Mzab to outsiders - perhaps a precautionary PR effort in case of further problems in the region?

Some linguistic features of interest include:

  • The Latin loanword i-bekkaḍ-en "sins", from peccatum;
  • The non-borrowed Berber word Yuc "God";
  • The curious metathesis in dessat < s dat "before, in front of" (I have no explanation for the gemination here either);
  • The coinage ɣiṛu, based on the inherited root "call", for the time before dawn when the first call to prayer is traditionally made, about an hour before the actual time of prayer (thanks to Banouh Nouh-Mefnoune for the details). Similar forms are paralleled sporadically in a number of Berber varieties, but which prayer they refer to depends on the region;
  • The varying forms of the 1st person plural object clitic (if indeed it can still be called a clitic): -aɣen when placed before the verb, as in the first line, but -aneɣ when placed after it;
  • The addition of meaningless -i at the end of the line to make it fit the metre, paralleled in Tashelhiyt. (see comments)

Here's my best effort to transcribe it, minus some of the repetition; corrections welcome.

Yus-ed yur n uẓumi, a-ɣen yerr f etcetmi; [corrected following comments]
The month of fasting has come, let it take us away from sin;
Eṛbeḥ-ed si-s a memmi arrazen n etzeɛmi.
Win from it, my son, the reward of goodness.
Eččer-t fissaɛ ɣiṛu, dessat ma ɣad yedden,
Get up quick before dawn, before the call to prayer,
Esserr n elxiṛ eğrew, a-c reẓmen ibriden;
Gather secret good deeds, roads will open for you;
Yus-əd yur n uẓumi.
The month of fasting has come.

Yus-ed yur n uẓumi, a-ɣen yerr f etcetmi;
The month of fasting has come, let it take us away from sin;
Eṛbeḥ-ed si-s a memmi arrazen n etzeɛmi.
Win from it, my son, the reward of goodness.
S tala-s seṛwa ul-eč, tfarrid-t s ibekkaḍen, [corrected following comments]
Fill your heart from its fount, purify it from sins,
Ezdey i tawwat-eč; a-c yexs Yuc ed midden,
Reconcile with your relatives, God and people will love you;
Yus-əd yur n uẓumi.
The month of fasting has come.

Monday, June 29, 2015

Anomalous gender agreement in Algerian Arabic

In Algerian Arabic (here, Dellys dialect), the feminine singular form of an adjective is formed just by adding a suffix -a, with almost no exceptions. In two of the exceptions, a full look at the paradigm suggests that it's really the masculine form rather than the feminine which is irregular (though the situation is less clear-cut in other dialects - in traditional Algiers, for example, the plural of "beautiful" is شبّان šəbban):
m. sg.f. sg.pl.
beautifulشباب šbabشابّة šabbaشابّين šabbin
otherآخُر axŭṛأُخرى ŭxṛaأُخرين ŭxṛin

A third case is rather different. "Such-and-such (a person), so-and-so" is expressed by the noun m. sg. فلان flan, f. sg. فلانة flana, with no known plural. (This originally Arabic form is rather widely borrowed; you may be familiar with it from Spanish fulano). From this we can derive an adjective "such-and-such a" by adding a nisba suffix -i: m. sg. فلاني flani, but f. sg. فلانتية flantiyya. To make matters worse, we suddenly find ourselves with a gender distinction in the plural, something otherwise absent from adjectival agreement in this dialect: m. pl. فلانيين flaniyyin, f. pl. فلانتيين flantiyyin.

What's going on, though anomalous, is pretty clear (recall that feminine -a regularly becomes -t in the construct state): this adjective is displaying double agreement, gender agreement alone on the nominal root flan, and normal gender+number agreement on the adjectival derivational suffix -i. Can you think of any comparable cases elsewhere?

Saturday, June 27, 2015

How Korandje made "with" agree it-with its subject

Korandje, the language of Tabelbala in southwestern Algeria, requires the comitative preposition "with" to agree in person and number, not with its object, but with its subject (strictly speaking, with its external argument):
ʕa-ddər ʕ-indza xaləd, I-went I-with Khaled.
nə-ddər n-indza xaləd, you-went you-with Khaled.
This seems to be vanishingly rare worldwide. The nearest parallels I have encountered are ones in which the comitative is expressed using a serial verb, but a closer look at the syntax and morphology of Korandje shows that indza is indeed a preposition, not a verb or a noun. Perhaps most strikingly, when you relativise on its object, you pied-pipe not only the preposition but the agreement marker on it too:
ʕan bạ-yu ʕ-indz uɣudz əgga ʕa-b-yəxdəm
my friend-s I-with whom PAST I-IMPF-work
"my friends with whom I was working"
Its historical source, proto-Songhay *ndá "with, and, if", was also a preposition, and did not display agreement. Comparative data makes it possible to reconstruct how this change took place: it developed out of a strategy, common in Berber and found in some Songhay languages, of expressing "I went with Khaled" as "I went, I and Khaled", which seems to be the result of reinterpretation of a postverbal subject as part of the adjacent comitative phrase. This development in turn provides the first attested way to reverse the well-known grammaticalisation chain "with" > "and". If you want to know more, read my article, which has just been published:

"How to make a comitative preposition agree it-with its external argument: Songhay and the typology of conjunction and agreement". In Paul Widmer, Jürg Fleischer, and Elisabeth Rieken (eds.), Agreement from a diachronic perspective, Berlin: De Gruyter, pp. 75-100, 2015. (offprints available on request - just email me.)

Here's the abstract:

This article describes two hitherto unreported comitative strategies exemplified in Songhay languages of West Africa – external agreement, and bipartite – and demonstrates their wider applicability. The former strategy provides the first clear-cut example of a previously unattested agreement target-controller pair. Based on comparative evidence, this article proposes a scenario for how these could have developed from the typologically unremarkable comitative and coordinative strategies reconstructible for proto-Songhay, in a process facilitated by contact with Berber. The grammaticalisation chain required to explain this has the unexpected effect of reversing a much better-known one previously claimed to be unidirectional, the development COMITATIVE > NP-AND.

Sunday, June 21, 2015

Comparative Siouan Dictionary

A key document in Native American philology which has been circulating in samizdat form for decades is finally online and searchable: the multi-authored Comparative Siouan Dictionary (as noted by Guillaume Jacques). Named for the last of its speakers to resist colonization, the Sioux or Lakota, the Siouan family was spread over a vast section of North America, covering much of the Missouri and Mississippi valleys but with old outliers as far east as Tutelo in Virginia. The names of several Midwesternstates derive from Siouan languages, so they make a convenient starting point for exploring the database. Minnesota is from Dakota mni sota "cloudy water",both elements of whose history you can trace back here to proto-Siouan: *waRé• "lake, water" and *(a)só•tE "hazy, bluish, cloudy". *waRé• also yields Chiwere ñį, which in combination with the Chiwere reflex of *parás-ka "spread > flat (1)" yields the name of Nebraska. Dakota, from a name of the Sioux, has a less venerable history, being traceable only back to proto-Mississippi Valley Siouan *hkota/*hkoRa/*hkora "friend", with unexplained internal variation and similar forms in other families suggesting the possibility of a loan. (The la- element might have something to do with fire; see John Koontz's discussion.) Kansas, Arkansas, and Iowa also have names of Siouan origin, but I can't find them in here; much work remains to be done, after all... For the relevant correspondences, a good starting point is Rankin et al. 1997, available from the same site.

The more adventurous may note that there are good prospects for going beyond proto-Siouan. It is generally accepted that Catawban is Siouan's nearest relative, and the database sometimes includes Catawba cognates (as under "lake, water" above), but makes no attempt at Proto-Siouan-Catawban reconstructions. (Work on Catawba continues, but some older materials are available online, eg Lieber 1858, Gatschet 1900). Beyond that, some work suggests that Siouan-Catawban is in turn related to what would otherwise be an isolate language - Yuchi, originally spoken in Tennessee and later forcibly relocated to Oklahoma. Efforts to find etymologies at that level have barely gotten off the ground (cf. eg Rudes 1974), but there are some promising ones, notably proto-Siouan *isá•pE "black" vs. Yuchi ispí (Elmendorf 1964). Even more implausible proposals, like the idea of a special relationship with the small Yukian family of California (Elmendorf 1963), could at any rate be reexamined in the light of this work.

Tuesday, June 02, 2015

The irrelevance of the standard in Algeria

I recently came across a nice little study of language attitudes among Kabyles in Oran, inheriting Kabyle from their parents and kin but living in an overwhelmingly Arabic-speaking context: Ait Habbouche 2013. The results will not come as a huge surprise to anyone familiar with Algeria, but they stand in stark contrast to a curiously widespread idea about Berber language endangerment: the notion that Berber is under threat from the government-imposed hegemony of Standard Arabic. What the survey answers reveal, time after time, is in fact the utter failure of government policies to create any meaningful space for Standard Arabic in daily life. It is no surprise to see that Standard Arabic is used by 0% of respondents with other Kabyles in the cafe or at home. But seeing that only 4% speak it even at work, and 0% in university, should be a shock to anyone who still imagines that Standard Arabic occupies a position analogous to, say, Standard German. The taboo on speaking Standard Arabic in any but the most formal quasi-academic conversation remains nearly absolute; 73% rated it as the language they used least. The only topics surveyed for which this option was selected by any significant number were religion and politics, and actual usage in both cases would probably reveal a mix of Standard words into a basically dialectal matrix. There are absolutely no signs that this group is shifting to Standard Arabic, or even sees this as a viable possibility. The language that has attained a large usage among these speakers, even with other Kabyles, is not Standard Arabic but Algerian Arabic - a language with no official status taught in no school, which was the least likely (2%) of any of the available languages to be rated as most beautiful or richest, and was rated by 42% as the language they liked least (nearly tied with Standard Arabic). Yet this little-loved language, dismissed as much by its speakers as by their rulers, is not only the main language they use with non-Kabyles but is extensively used even with fellow Kabyles (42% with their own siblings).

The utterly marginal status of Standard Arabic in conversation within this group (and elsewhere in Algeria) contrasts sharply with that of French. 22% of the sample claimed to address Kabyle strangers in French, and 26% to speak it with their friends. More tellingly, 38% chose it as the language they spoke in at work, and no less than 68% for speaking about science. It's interesting to find an official language that doesn't dominate even in contexts like that! In short, while Standard Arabic is taboo for conversation, French is not. There are of course circumstances where it could be inappropriate, but there is no blanket ban as with Standard Arabic.

What does this imply for language policy? I'm no policy analyst, but here are my thoughts...

As far as the linguistic majority goes, only a spoken language can hope to displace French from the spoken domain, and long-standing efforts to break the taboo on speaking Standard Arabic have been utterly futile. Maybe it's time for those who want Arabic to be official in practice and not just in theory to acknowledge and support the existing complementary distribution of functions between Standard and Algerian Arabic, rather than treating the latter as some kind of unfortunate necessity. Demanding that officials consistently speak to the public in Standard Arabic instead of French is not always realistic, but demanding that they speak in a high register of Algerian Arabic could be. But that will only happen if people learn to value the language they speak, rather than dismissing it.

For the minority, it suggests that the main threat to Berber comes not from school, but rather from daily life in non-Berber-speaking environments. If so, solutions should focus less on making sure that Berbers can study Berber at school (though that is certainly desirable for other reasons), and more on getting non-Berbers in linguistically mixed contexts to study Berber and use it in conversation - almost the opposite of existing policy.

Friday, May 22, 2015

Old Arabic in Greek letters, in 3rd/4th century Jordan

An article published this year (Al-Jallad and Al-Manaser 2015) reveals the oldest known fully vocalised Arabic inscription by far - written in Greek letters in northeastern Jordan, probably in the 3rd or 4th century AD. Here it is: New Epigraphica from Jordan I: a pre-Islamic Arabic inscription in Greek letters and a Greek inscription from north-eastern Jordan. The inscription's author describes himself as "al-'Idāmī" - probably to be interpreted as "the Edomite" - a nisba featuring the definite article al-, unique within Semitic to Arabic.

There are a fair number of Arabic names transcribed in Greek at this period in various sources, but this seems to be the only known attempt to write Arabic text in Greek letters until much later. Most contemporary Arabic inscriptions were instead written in the Safaitic script, which does not indicate vowels. A text like this thus enables us to see much more clearly how the Arabic of the nomads of 3rd/4th century Jordan was pronounced. It confirms two crucial points. In Arabic, case is usually indicated only by final vowel choice; in this inscription, accusative case (-a) is clearly marked, but the Classical nominative and genitive (-u, -i) are not transcribed, suggesting that this dialect had dropped final short high vowels and thus developed a case system like that of Geez. Also reminiscent of Geez is the fact that intervocalic semivowels elided in Classical Arabic were unambiguously pronounced - thus 'atawa rather than 'atā for "he came". There may well be more material like this out there in the deserts on the Syrian-Jordanian border; let's hope research on the Syrian side becomes possible again soon...

Incidentally, next week I'll be at Bucharest for AIDA - if you're there, come to my talk on Wednesday!

Sunday, May 10, 2015

How to remember numerals better

In all the debate around "Whorfian" effects of language on cognition, one relatively well-known case has received oddly little attention among linguists, despite being widely discussed by psychologists and popularised by Malcolm Gladwell: the effect of word length on short-term memory (Baddeley et al. 1975). Basically, all other things being equal, it's easier to remember a sequence of short words than a sequence of long words. This suggests that our short-term memory for words (what psychologists confusingly call phonological memory) has a capacity limited by length - specifically, the amount that can be pronounced in about 2 seconds (Schweickert & Boruff 1987). That should suggest, in particular, that numbers presented orally will be easier to remember in a language with short numerals than in one with long numerals. (Note that this affects, among other things, IQ test results, since IQ tests typically include tests of numeral recall.)

Psychologists followed up on this by attempting to test this hypothesis with a number of language pairs (for an overview, see Baddeley (1997). Disclaimer: I'm not a psycholinguist, and the following references are certainly not exhaustive). The best-tested and most consistent result concerns Chinese. Mandarin and Cantonese numerals take shorter to say than English ones, and a number of psychologists have accordingly confirmed that Chinese speakers can remember longer numerals than English speakers (Stigler, Lee, & Stevenson (1986), Hoosain & Salili (1987)), even at 4 years old Chen and Stevenson (1988)), and that this applies even when bilinguals are tested across their two languages (Hoosain 1979). It goes further than that, in fact: Chincotta & Underwood (1997) find that, out of Cantonese, English, Greek, Finnish, Swedish, and Spanish, only Cantonese speakers remember significantly more digits than speakers of other languages - and that this difference disappeared if the subjects were prevented from rehearsing the numbers auditorily by being asked to keep repeating "la-la" while being tested, proving its linguistic nature. The difference ranges around 2 digits, with the exact figure depending on the experiment.

Data for other languages is less clearcut. Welsh numerals take longer to say in isolation than English ones, and Ellis & Hennelly (1986) accordingly found that English-Welsh bilinguals can on average remember longer numerals in English than Welsh. Naveh-Benjamin & Ayres (1986) simultaneously tested the hypothesis for university students in Israel speaking English, Spanish, Arabic, and Hebrew natively (but excluding the digits "seven" and "zero"). They found that the average number of digits recalled was highest in English (7.21), followed by Hebrew (6.51), then Spanish (6.37), and lowest in Arabic (5.77); the ordering by average number of syllables per digit, or by average time taken to read a digit, was English, Spanish, Hebrew, Arabic. However, the difference in number of digits recalled was smaller than predicted by the time taken to read a digit in each language, suggesting that other factors were also relevant.

A proviso is necessary: some recent work, without disputing the differences observed, has made a strong case that they relate not simply to length ( Lovatt, Avons, & Masterson 2000), but crucially to phonological factors (Service 2010, Lethbridge, Hinton & Nimmo 2002). This has been argued for Welsh numerals vs. English ones by Murray & Jones (2002), who find that Welsh digits take longer to say in isolation but actually take less time to say in connnected speech than English ones, and that changes of place of articulation at word boundaries negatively affect memory.

The research is curiously selective in terms of languages examined, and many of the experiments don't control for all possible confounding factors, such as diglossia and social status in the case of Welsh or Arabic. Nevertheless, it does at least seem well-established that speaking Chinese gives a short-term digit memory advantage over speaking major European or Semitic languages. So, if for some reason you regularly need to remember long numerals, and your preferred language doesn't happen to be Chinese, how do you compensate for this handicap?

There are two obvious ways to get around this (assuming you care enough about remembering numerals to want to, which depends very much on your tastes and circumstances). One is to remember the number visually (as a sequence of written digits) or even kinesthetically (as a sequence of typing actions), in which case this particular constraint no longer applies (cf. eg Olsthoorn, Andriga, & Hulstijn 2012). This only helps, however, if you remember numerals better visually or kinesthetically than auditorily, and my impression is that most people don't.

A probably more helpful alternative is to establish a code that lets you turn long numerals into much shorter words by identifying digits with single letters or single phonemes. This solution has a very long history in Arabic and Hebrew, in which each letter of the alphabet can be used to represent a digit: 'a is 1, b is 2, etc. (the first 9 digits are units, the second 10 are tens, and the rest are hundreds). Since short vowels are not letters, the resulting word can be given whatever vowels the user sees fit to give it. A common game of later poets using the Arabic script was to encode the date of their poem within the poem as a chronogram; more practically, Moroccan schoolchildren used to memorise the multiplication tables as a series of meaningless words formed by this encoding (Meakin 1905). Chronograms have been formed using Roman numerals, but for memorisation, at least, they are rather ill-adapted to such a system - think how much padding would be required to turn a number like MDCCCLXXXIII into words.

However, the spread of Hebrew studies in Western Europe following the Renaissance, and the increasing importance of memorising statistics there, encouraged European mnemonists to look for ways of emulating this encoding without having to learn a Semitic language. Doing so at a time when place notation was widely used, they introduced a crucial improvement: each consonant represented a digit in a place notation system, rather than a number in an additive notation system. After various cumulative efforts at improvement, this culminated in the early 19th century with the so-called Major system: 0=s/z, 1=t/d, 2=n, 3=m, 4=r, 5=l, 6=š/ž/č/j, 7=k/g, 8=f/v, 9=p/b, with vowels, semivowels, and laryngeals ignored. To remember 94801 (LACITO's zip code), for example, one would turn it into "professed". This system apparently remains in use among professional mnemonists to this day, despite being virtually unknown to wider society.

Perhaps this is why linguists haven't paid more attention to the word-length effect in the context of the Whorfian debate: it's a clear-cut effect of language on cognition, but not a very profound one, in that it should be fixable by some very simple hacks (or even just by borrowing some one else's numerals). But I'm not aware of any experimental work testing the effect of this particular hack on digit recall...

Monday, May 04, 2015

Foucauld's Tuareg (Tamahaq) dictionary on Wikisource

A reader of this blog, Julian Jarosch, wrote in to announce a collaborative project that will very likely interest some other readers:
The purpose of writing to you is to ‘promote’ a project I started some years ago: digitizing Charles de Foucauld’s Dictionnaire touareg – français on Wikisource. You probably already know that Wikisource, like Wikipedia, is an open and collaborative project. So far, I’m working on this alone. I’ve transcribed 13% of the text, almost all of which is not yet proofread. Wikisource provides quality management tools, so each page is marked and colour-coded for proofreading status.
The digital text has some cross-references as links; more could be added once it is complete. All Berber words and phrases are marked as such in the html code. I’ve appended an ebook version of the digital text, generated automatically from the online version, to demonstrate just one derived usage. Deriving a print edition or an enriched structured XML version should be feasible as well. I also experimented with automatically ‘updating’ Foucauld’s mode of transcription, but this proved to be too complicated, due to the ambiguities in his use of 〈i〉 and 〈ou〉.
I hope you find this useful and solid work. In principle, I’d like to spread word about this project in Berber linguistics; I just hardly get around to it, since I pursue this in my spare time.

For anyone who wants to help with this project, the link is: Livre:Foucauld, Dictionnaire touareg.djvu.

Monday, April 20, 2015

Archaic and innovative Islamic prayer names around the Sahara, finally out

Just a quick alert: my article about Islamic prayer time names that I discussed here almost two years ago (post) is finally out! If your institution has a subscription, you can view it at the following link:
Archaic and innovative Islamic prayer names around the Sahara
Or you could email to ask me for a copy.

Saturday, April 18, 2015

Dreams and tales in Siwa and Ouargla

Valentina Schiattarella, who recently finished her PhD thesis on some aspects of Siwi grammar, has gathered the first serious collection of Siwi folk tales recorded in Siwi (forthcoming from Köppe some time soon). Like most languages, Siwi has opening and closing formulae to mark the beginning and end of a tale. The commonest closing formula in the stories she's recorded seems to be:
ħattuta ħattuta qaṣṣaṛ ʕṃəṛha, akəṃṃus n xer i ənšni, akəṃṃus n šaṛ i ntnən
Hattuta hattuta its span has shortened[Ar], bundle of good for us, bundle of bad for them
The first part of this is in Arabic, and is not too different from what you might hear elsewhere in Egypt: ħattuta ħattuta is a corruption of حدوتة ħadduta, Egyptian Arabic for "story". (For similar formulae in Palestinian tales, such as tūtū tūtū faraɣat il-ħaddūtu, see Sirhan 2014.) The second part is in Berber, and hence presumably has an older history within Siwi; it is precisely paralleled in an opening formula used at Ouargla (Algeria):
Ṛəbbi yəttamən f lxiṛ ụhụ f ššəṛṛ, lxiṛ nn-iw, ššəṛṛ nn-əs, ini yiwi-tən gaɛ
God believes(?) in good not in bad; the good for me, the bad for him, or may He take them both
Basset (1920:107) places this formula in a wider context; throughout the Berber world, opening or closing formulae commonly take the form of "propitiatory formulae or formulae for the expulsion of evil", which he takes to indicate that the act of storytelling must have been viewed as potentially dangerous. Alongside Ouargla, he cites Kabyle examples blessing the group and cursing the jackal, and Shilha ones wishing the teller the meat and the others the tripes. The Siwi formula, however, is far closer to the Ouargla one than to anything else Basset mentions. And whereas the Kabyle formula invokes an animal whose importance in Berber folklore and mythology is obvious, and the Shilha one remians close to everyday life, all the key words of the Ouargli and Siwi formulae are specifically Arabic and religious (Rabbī "my Lord", khayr "good", sharr "bad"). This suggests that, while the idea may be Berber, the formulation itself might be taken from Arabic.

As it happens, the early Islamic period furnishes us with just such a formula in Arabic, in a similar but curiously different context. The still widely used Interpretation of Dreams, attributed to Ibn Sirin, explains in its introduction that a dream interpreter who does not want to reveal his interpretation to his client should instead tell him the following: "May good be for you and bad be for your enemies; may you receive good and avoid bad" (خير لك وشر لأعدائك، خير تؤتاه وشر تتوقاه), or, if the interpretation concerns the interpreter too: "May the good be for us and the bad be for our enemies (etc.)" This expression is also found in an unmistakeably related context in some dubious hadiths reporting Umar ibn al-Khattab as saying "Learn to read the Qur'an in Arabic, and the interpretation of dreams, and say: May good (khayr) be for us and bad (sharr) for our enemy", and: "If one sees a vision and recounts it to one's brother, let him say: May good be for us and bad for our enemy".

The obvious interpretation is that, at some point in the early history of these Saharan oases, the act of telling tales was locally assimilated to the act of recounting dreams, allowing the Arabic formula for the latter to be adopted for the former. It would be interesting to know why this happened; was the idea that a tale, no less than a dream, somehow contained cryptic clues about the future? Or did Saharan Berbers in late antiquity make a habit of recounting dreams to one another on winter evenings, as well as folktales? Unfortunately, we'll probably never know for sure, but it can be interesting to speculate...

Saturday, April 04, 2015

Improving language?

In a natural segue from Ibn Khaldun, I've been reading Ernest Gellner - specifically, Words and Things, his attack on Linguistic Philosophy (that is, on Wittgenstein and his followers at Oxford). As he presents it, Linguistic Philosophy amounted to, essentially, the descriptive study of lexicography and semantics. Since meaning is defined by usage, any statement that would be accepted as true in ordinary language is ipso facto true, and any philosophical argument suggesting otherwise can only be the result of some semantic misunderstanding; a philosopher's only legitimate goal is to figure out how words are used in ordinary language to prevent such misunderstandings. The key weak point of this view, for Gellner, is its underlying assumption that ordinary language is unimprovable:
To "observe how we use words" is to make statements, in ordinary language, about the role, function, effects, and context of expressions. But in doing this, the concepts and presuppositions of that ordinary language are taken for granted and insinuated as the only possible view [...] It is true that certain things may be said in favour of ordinary language. It would not be in use, and it would not have survived were it not wholly without merit. But this argument, as in politics where it is often used to buttress conservatism, proves fairly little. Very silly and undesirable things often survive, and neither society nor language is such a tightly integrated whole as would disastrously suffer from alteration of some one part. (pp. 195-197)
For Gellner, contra Wittgenstein, ordinary language can be improved upon by the very activity of reflecting on it, leaving a positive role for philosophy after all:
[T]here are many language games which become unworkable when properly understood: where self-consciousness not merely does not "leave everything as it is" but simply necessitates change. Many "conceptual systems", in primitive societies and in advanced ones, contain confusions and absurdities which are essential for their functioning. To lay them bare is to make such a framework unworkable. (p. 206)
The notion of improving language (my paraphrase) would need a lot more working out than I see in this book, but presumably means something like "make the concepts and presuppositions underlying language use more internally coherent and in better accord with non-linguistic experience."

Such a standard would not necessarily imply that one language can be superior to another. For one thing, while such concepts and presuppositions certainly play a role in language use, they don't seem to be critical to the definition of a language; you can change them and leave the language sufficiently intact to be mostly understood by speakers who have retained the old ones. A single language has room for many different kinds of language use.

However, it would suggest a potentially interesting alternative to a purely descriptive approach to linguistics. If Linguistic Philosophy was the effort to identify ways in which attention to ordinary native speakers' usage might correct misunderstandings embedded in philosophical thought, would Philosophical Linguistics be the effort to identify ways in which attention to philosophical thought might correct misunderstandings embedded in ordinary native speakers' usage?

Saturday, March 14, 2015

Sapir-Whorf is no shortcut

Lately the Sapir-Whorf hypothesis - that the language you speak influences the way you think - has had a bit of a revival; investigators such as Boroditsky or Levinson have finally managed to demonstrate small Whorfian effects on colour perception and sense of direction. Unfortunately, these successes only underscore how difficult it would be to make a convincing case for the version of this idea that perennially fascinates the public: the idea that language determines aspects of our worldview. Well before Sapir or Whorf, Nietzsche summarises it in Beyond Good and Evil:
"The strange family resemblance of all Indian, Greek, and German philosophizing is explained easily enough. Where there is affinity of languages, it cannot fail, owing to the common philosophy of grammar - I mean, owing to the unconscious domination and guidance by similar grammatical functions - that everything is prepared at the outset for a similar development and sequence of philosophical systems; just as the way seems barred against certain other possibilities of world-interpretation. It is highly probable that philosophers within the domain of the Ural-Altaic languages (where the concept of the subject is least developed) look otherwise "into the world", and will be found on paths of thought different from those of the Indo-Germanic peoples and the Muslims [...]" (Walter Kaufman's translation)
If a community's grammar really does affect its worldview, two centuries of speculation have hardly brought us any nearer to proving it, much less figuring out how. The commonsense converse, that a community's worldview affects its grammar, is rather better supported. But this idea's attraction for intellectuals, I think, is basically technological: it holds out the promise of being able to change the way people think "just" by changing the way they talk, as envisioned for Newspeak and Láadan. Ironically, it's observably true that imposing a new language on a previously monolingual community usually implies major changes in the way they think - that's what happens when you introduce compulsory schooling - but that has less to do with the language than with the institutions diffusing it.

The technological question remains, then: can we redesign some aspects of our language to help us think more effectively?

For grammar, the answer is not obvious. For the lexicon, however, the answer is yes, and we do it all the time. If something seems to need a name, we give it one - "mouse" or "selfie". Sometimes we choose a name that transparently encodes an property of this item that's particularly important to remember - "henbane" or "fool's gold". Ask any taxonomist whether the existence and form of a name matters, or any mathematician whether all notations are equal.

But this isn't actually the shortcut that some science fiction would have us believe. Many readers probably know that "henbane" is some kind of plant, but couldn't identify it if it was sitting in front of them, much less take advantage of knowing the name to prevent some unfortunate fowl's death. Understanding a given domain requires you to have words for the items signified by its technical vocabulary, but the most important part of that is learning to identify and think about the referents. Hundreds of New Age texts attest to the fact that you can use the vocabulary of quantum mechanics without understanding the first thing about it.

This points the way towards a solution, but not a very linguistic one: If you want to make your language better for thinking with, then first learn to perceive and think about the world more clearly yourself, and then share what you learn (and the labels you've given to it) with other interested speakers. Make a point of spotting and labelling relevant differences between things or situations, and involve yourself in a wider range of situations than you're used to. A sign is a link between word and world - between the set of all possible combinations of phonemes, meaningless in themselves, and the set of everything the speaker has some idea how to recognise. Expanding the former is meaningless unless you're expanding the latter.

Saturday, March 07, 2015

Ibn Khaldun: Arabic dialects are independent languages

In Part 39 of the Muqaddimah, written in 1377, Ibn Khaldun discusses Arabic dialectology and language contact, reaching substantially correct conclusions marred only by the lack of attention to the role of purely internal developments in language change. The section is worth reading, if you haven't already come across it; it gives some idea of just how divergent the different Arabic "dialects" already were in his time. Like a lot of his work, if he had written it today, it would get many Arab nationalists up in arms! The translation is my own, and needs double-checking - appropriately, the Arabic of Ibn Khaldun is often difficult for modern Arabic readers.

"That the language of the city dwellers and townsmen is a language independent of the language of Mudar [Classical Arabic]

Know that the customary medium of discourse in the towns and among the city-dwellers is not the old language of Mudar, nor the language of the people of the generation (of Arabs). Rather, it is a different language, independent, and far from the language of Mudar and of this generation of Arabs in our time. Indeed, it is further from the language of Mudar (than the language of modern Arabs is).

The fact that it is an independent language is obvious; witness how many changes it has which grammarians consider as solecisms. Nevertheless, it varies in its expressions depending on the town. The language of the Mashriq is somewhat different from that of the Maghreb, and likewise that of Andalus from both. Yet each succeeds, with his own language, in realising his purpose and expressing what is within him. That is what is meant by "tongue" and "language". The loss of case-/mood-suffixes is not a problem for them, as we have already said regarding the Arabs of the present day.

As for the fact that it is further than the language of this generation (of Arabs) from the original language, that is because distance from the language depends on mixing with non-Arabness. The more one mixes with non-Arabs, the further one gets from the original tongue, because habits are acquired by learning, as we have said, and this (linguistic) habit is a mixture of the original habits which the Arabs had and the secondary habits which the non-Arabs had. So the more they hear it from non-Arabs and grow up with it, the further they get from the original habit.

You may observe this in the towns of Ifriqiya and the Maghreb and Andalus and the Mashriq:

  • As for Ifriqiya and the Maghreb, the Arabs there mixed with the non-Arab Berbers as they spread their civilisation among them. Hardly a town or a generation was isolated from them. Thus non-Arabness came to predominate over the Arab tongue which they had had. It became a different, mixed language, within which non-Arabness predominated for the reasons outlined. So it is further from the original tongue.
  • Likewise the Mashriq. When the Arabs prevailed over its nations, the Persians and the Turks, they mixed with them. Their languages then spread among them through the labourers and farmers and captives whom they took as servants and nannies and wet-nurses. As a result, their own language was corrupted by corruption of their (linguistic) habits, until it became a different language.
  • Likewise the people of Andalus, with the non-Arab Galicians and Franks.

All the people of the towns from these regions came to have a different language, specific to them and distinct from that of Mudar [=Classical Arabic], and distinct each from the other - as we shall recall. It is as if it were a different language due to their generations' mastery of the linguistic habit of it. And God creates and decrees what He will."

Thursday, December 11, 2014

A Mexican colony in Louisiana before Columbus?

In the latest issue of the International Journal of American Linguistics, Cecil Brown, Soren Wichmann, and David Beck announce a rather interesting finding: that Chitimacha [is] A Mesoamerican Language in the Lower Mississippi Valley. I don't know much about any of the languages involved, but insofar as I can judge it, it strikes me as quite convincing. They find 91 cognates between Chitimacha, a language of southern Louisiana, and Totozoquean, a language family of southern Mexico consisting of Totonacan and Mixe-Zoquean. Most of these cognates are very straightforward, with identical meanings and obviously similar, regularly corresponding sounds, and 36 of them involve words basic enough to be on the 100-word Swadesh or Leipzig-Jakarta lists. The grammatical similarities are rather less extensive, but there are a few. So, pending other specialists' comments, it looks like Chitimacha was brought to Louisiana by a migration across the Gulf of Mexico, from somewhere around the Isthmus area.

There is some useful shared cultural vocabulary, including "paper", "to write", "lime", "maize (corn)", "leached corn", and "to shell corn", and it looks like Caddo - spoken just upriver - in turn borrowed much of its maize-related vocabulary from Chitimacha. In combination with archeological evidence, this leads the authors to favour a migration date either some time around 850 AD, when the Caddo began low-level maize cultivation, or sometime around 1200-1450 AD, when they intensified it. Such a late date seems a little troubling, given how few cognates are to be found; Korandje separated from Songhay around 1200 AD, and there are well over 200 shared items there, mostly belonging to basic vocabulary. The ancestor of Chitimacha would have to have already been rather different from any other Totozoquean language even before they reached Louisiana; but then why did they apparently leave no trace in Mexico itself? Perhaps a study of southern Mexican place names could shed some light on the question.

This looks like historical linguistics at its best: a surprising long-distance connection affecting both language and culture. Now it's up to the historians and archeologists to fill in the gaps: why did southern Mexicans find it worth while to cross the Gulf to Louisiana in significant numbers?

Sunday, November 30, 2014

Good prescriptivism?

People tend to enter their first linguistics classes with a vague but strongly felt idea, instilled by English teachers or by society at large, that some ways of speaking are bad, illogical, sloppy, rule-breaking, etc. One of our first tasks is thus to explain to them that, actually, such ways of speaking are just as logical and law-governed as standard English, they're simply obeying a different set of rules. Not infrequently, we follow that up by telling them everything that's wrong with the prescriptive rules of Standard English, based ironically on a very similar set of tropes: they're illogical (stop splitting infinitives because you can't do that in Latin), they're historically inaccurate (don't use singular they even though the King James Bible does), they're incompatible with the rules of modern spoken English (eg "it is I") to the point of confusing them into gross solecisms ("they gave it to John and I"). Unless we're careful, the students end up walking away from all that with the impression that linguists think prescriptivism is bad, full stop. That, however, would be a mistake. As irritating as these problems and misconceptions are, they don't affect the case for having a prescriptive standard language - just the extent of its ambitions and the details of its usage.

Prescriptivism, of course, is all about power: who gets to talk how where, and who gets to say how they should talk. As good libertarians, our first reflex might be to say that this is all unnecessary: let everyone decide for themselves! That has two different problems. The first is that, when people decide for themselves, what they end up with is in fact a set of implicit rules for what's appropriate in which circumstances, and if you want to make life easier for visitors from other cultures, the least you can do is make those rules explicit somewhere. The other is that, in the event of any clashes, it's the more powerful individual that gets to decide, which is a particular problem in the case of public services. You want a driver's license, and you only speak English? Sorry, our local transport officials aren't really comfortable with English, so you'd better brush up on your Russian.

The latter example may sound like fantasy to American or English readers (not so much to the Irish or Welsh), but it's rather close to reality in a lot of the world. If you understand Arabic, have a look at this video of Moncef Marzouki, one of the two current presidential candidates in Tunisia, having a go at his Tunisian interviewer for using too many French words: "Respect the Arabic language! Plutôt, what does plutôt mean? You say plutôt, what's that? My sister in Douz won't understand plutôt. [...] [Interviewer: It's a chance for her to learn...] No, she needn't learn - you learn the language of Tunisians!"

It's populism, of course - but, like a lot of populism, it makes a good point. Why the heck should the average citizen have to speak a foreign language to deal with officials and other elites in his/her own country? (Especially in one as close to monolingual as Tunisia?) In such a situation, if the populace doesn't prescriptively impose their language preferences through concerted action, the bureaucracy will simply impose their own in one-to-one interactions.

Thursday, November 27, 2014

Berber subclassification: Reading Nait-Zerrad

Kamal Nait-Zerrad's 2001 article "Esquisse d'une classification linguistique des parlers berbères" presents a good deal of useful data, but does so in a manner that I find makes it rather difficult to figure out what's going on without plenty of pencil work. In case anyone else has the same experience, here's my take on it. I will not focus on, or even necessarily present, his interpretation here - read the article for that; rather, I'm more interested in figuring out the implications of the data he presents in the light of other work before and since, and in the light of accepted principles of historical-comparative linguistics.

First, he looks at a number of morphological and phonetic isoglosses:

1. The 3rd person singular preterite of CC verbs: yərra vs. yərru. Following Kossmann (2001), we now know that these are actually CC+glottal stop, so the data exemplifies two different sound changes: the relatively trivial *-aʔ > -a, and and the more surprising *-aʔ > o > u. The former is the commonest outcome; the latter is exemplified by: Ait Seghrouchen, Figuig, Beni Snous, Bissa, Timimoun, Mzab, Ouargla, Nefusa. (Ghadames still has o).

2. The proximal demonstrative suffix: -a vs. -u. Again, -a is the default, but -u appears in the same set of varieties as seen in 1, plus one more: Iznasen.

3. The 3rd person singular aorist of CCV verbs: ad yəbḍu vs. ad yəbḍa. Here, -u is the default, and is closer to the original, while -a has spread from the preterite. This applies to the same set of varieties as 2 (excluding Nefusi), plus several more: Rif, Metmata, Chaoui, Jerba.

4. Initial vowel dropping: a- vs. 0-. A number of *(t)a-CV-initial nouns drop the original vowel of the prefix in the same set of varieties as 3, plus Nefusi, Chenoua, and Siwa.

5. Velar softening: in many varieties, in many words, what would elsewhere be k/kk/g/gg corresponds to c/čč/j/ǧǧ. The latter outcome is observed in the same set of varieties as 4, minus Nefusi.

6. Final *-əv: this is retained as such in Ghadames and Awjila, and as constrative length in Zenaga. Otherwise, it becomes -u in most varieties, but -i in the same varieties as listed in 4, plus El-Fogaha (with a few question marks where the author had insufficient data). Cf. Kossmann (1995).

All of 1-6 pick out Zenati varieties, but the exact set differs: 1-2 pick out a core Zenati consisting almost entirely of northern Saharan varieties, while 3-6 pick out a broader Zenati including the semi-arid mountainous lands stretching from the Rif to southern Tunisia, and vary in their inclusion of varieties further east (Nefusi, El-Fogaha, Siwi). Chaker (1972) cites 1-2 and 5 as possibly justifying a Zenati subgrouping, while Kossmann (1999) defines Zenati in terms of 3, 4, and one other morphological innovation, and then cites 5 and 6 as common phonological innovations.

7. Negative intensive theme: retention/loss. The negative intensive is retained in northwestern Morocco (Rif, Iznasen, Senhaja, Ait Seghrouchen, Figuig); in Bissa; in Tuareg and in the nearby oases of Mzab, Ouargla, and Ghadames; and in Jerba. Its loss everywhere else (according to his data, which should be re-checked) shows no prominent genetic patterning, and hence is probably relatively recent.

Then, he moves on to vocabulary, examining 11 lexical variables which I would summarise as follows:

Several forms appear specifically Zenati: irəḍ in the sense of "be dressed" (though it is more widespread in other senses), igur for "go", əɣs for "want", azəgrar for "long", anilti for "shepherd". Of these, El-Fogaha and Siwa share only əɣs for "want", whereas Nefusi shares all except "go in". adəf "go in" is Zenati-specific in the west, but more confusing in the east, being attested in Ghadames and (as an alternative to əggəz) in Air Tuareg.

Several forms appear specifically Tuareg: răgăz for "go", amaḍan for "shepherd", əggəz in the sense of "go in" (elsewhere "go down"), zəgrət (with the extra t) for "long".

One form unites southern/central Morocco with Kabyle: awtul "hare" (vs. pan-Berber a-yərẓiẓ.)

A couple of forms unite Libyan varieties with Tuareg, contrasting with Algerian and Moroccan varieties, in defiance of any plausible genetic classification, reminding us that a tree does not tell the whole story here:

  • iziḍ "donkey" (Tuareg, Ghadames, Nefusi, Siwa, Awjila) vs. aɣyul (everywhere else except El-Fogaha)
  • tufat/tifut/tafyi "tomorrow" (Tuareg, El-Fogaha, Siwa respectively) vs. azəkka (everywhere else except El-Fogaha, Awjila, and Zenaga)

Based somehow on all this, he proposes the following very odd tree:

  1. Group 1
    1. Senhaja, Middle Atlas, Shilha, Kabyle, Zenaga
    2. Tuareg
    3. El-Fogaha
    4. Awjila
    5. Siwa
  2. Nefusa
  3. Ghadames
  4. Group 4 ("Zenati")
    1. Ait Seghrouchen, Beni-Snous, Timimoun, Figuig, Bissa, Mzab, Ouargla
    2. Iznasen, Jerba
    3. Rif, Metmata, Aures
    4. Chenoua

Apparently, to get this he operated by successively applying at each stage the criterion from his list that divided the data into the lowest number of groups possible, without attempting to distinguish innovations from retentions, much less judge the relative likelihood of independent innovation. The fact that even such a crude method was still able to produce a recognisable Zenati subgroup either says something about the robustness of this distinction or about the selection of features. What this data set actually tells us, bearing in mind that shared retentions have no implications for subgrouping and that Zenaga fails to participate in a number of innovations that otherwise seem pan-Berber or nearly pan-Berber, is something quite different:

  • There is definitely a Zenati subgroup, as has been known at least since Destaing (1915), but its boundaries are a bit fuzzy. (If this reminds you of the situation of "Hilalian" g-dialects, that's probably not a coincidence.)
    • Western Zenati:
      • Core (mainly Northern Saharan): Ait Seghrouchen, Figuig, Beni Snous, Bissa, Timimoun, Mzab, Ouargla
      • Transitional (the High Plateau and its edges): Rif, Metmata, Chaoui, Jerba
      • Peripheral:
        • Chenoua (north-central Algeria)
        • Nefusi (northwestern Libya)
    • Eastern Zenati (Libya/Egypt): El-Fogaha, Siwa
  • There is definitely a Tuareg subgroup, as has always been known: Ahaggar, Iwellemmeden, Air, Taneslemt.
  • There just might be a subgroup combining Kabyle with Senhaja, Central Morocco and Shilha: they share the innovation *-əv > -u, and the word awtul "hare". The evidence for it is very weak, though, especially since *-əv > -u is also found in some Tuareg varieties.

The rest of the common features almost all look like shared retentions.

Sunday, November 16, 2014

Out now: The development of dative agreement in Berber

After about two years in the pipeline, an article summarising the results of my British Academy research on agreement in Berber has just come out in Transactions of the Philological Society. If you have access to Wiley Online Library, you can read it online: The development of dative agreement in Berber: beyond nominal hierarchies. If you're interested but don't have access, email me to ask for a copy. Here's the abstract:

Diachronically, agreement commonly emerges from clitic doubling, which in turn derives from topic shift constructions (Givón 1976) – a grammaticalisation pathway termed the Agreement Cycle. For accusatives, at the intermediate stages of this development, doubling constitutes a form of Differential Object Marking, and passes towards agreement as the conditions for its use are relaxed to cover larger sections of the Definiteness and Animacy Scales. Berber, a subfamily of Afroasiatic spoken in North Africa, shows widespread dative doubling with substantial variation across languages in the conditioning factors, which in one case has developed into inflectional dative agreement. Examination of a corpus covering eighteen Berber varieties suggests that low Definiteness/Animacy datives are less likely to be doubled. However, since most datives are both definite and animate, these factors account for very little of the observed variation. Much more can be accounted for by an unexpected factor: the choice of verb. “Say” consistently shows much higher frequencies of doubling, usually nearly 100 per cent. This observation can be explained on the hypothesis that doubling derives from afterthoughts, not from topic dislocation.

Sunday, November 02, 2014

Linguistics for high schools: what would a syllabus look like?

Today, just for fun, I'd like to invite you to discuss a topic a little off the beaten track for this blog: how much linguistics should a high school graduate know? The question may seem bizarre - there have been occasional efforts to introduce linguistics courses into high schools (MIT, Milwaukee), but you don't expect to see "linguistics" on a high school curriculum. Still, let's not get confused by labels. Linguistics is inextricably woven into language teaching, and even the most resolutely monolingual curriculum includes at least the school's own language. (I recently happened to come across an 8th grade final exam from 1895 from Kansas; no foreign languages were featured, but no less than two out of the six subjects tested, Grammar and Orthography, rely heavily on linguistic concepts.)

One useful way of separating linguistic education from language education is to look at universality. Some of what you learn in English class is useful across practically all languages, like the idea of a verb or of a vowel. Some of it is much more parochial; the fact that the plural of "child" is "children" is a historical accident relevant only to English and, at best, its closest relatives. Such parochial facts can be vital, of course; if you're going to grow up in an English-speaking country, you'd better be able to form your English irregular plurals correctly. But the more general concepts have a deeper interest; they help you analyse what you're saying, and make it easier to learn new languages. Unfortunately, those concepts are precisely the ones that have suffered most in recent decades. In the UK, at least, my own experience suggests that most high school graduates can't even reliably tell a noun from a verb. In theory, the latest changes to the English syllabus should change that - but given that many of the teachers were hardly taught any grammar either, one wonders how successful the reform will be.

In any case, if I were designing a syllabus, here is what I would suggest to start with. I'd be interested to see what other linguistically oriented people think:

Phonetics has never been a focus of early education, apart from the minimum necessary for teaching a child to read and write (and even that gets de-emphasised in some approaches). This is a shame, because the younger you are, the easier it is to learn to hear and pronounce unfamiliar sounds. Why not learn:
- The IPA, or at least the most commonly used symbols in it; be able to pronounce and recognise them. This should include tone if at all possible.
- Basic articulatory phonetics: how the configuration of your vocal organs relates to the sound produced, and how to use this knowledge to pronounce unfamiliar sounds. (If your language uses Devanagari, you should have an advantage, as this is practically built in to the alphabet anyway; students of tajweed too will come across this issue at some point.)
- Phonology: the concepts of the phoneme and of conditioned allophones. That way when you learn another language you'll at least know why some sounds give you so much more trouble than others.
- Metric structure: syllable, foot, etc. (Yes, I know the concept of syllable is controversial, but you'll need this to be able to study poetry anyway.)

Morphology is a lot more language-specific than the other topics here, but one should at least know:
- How to decompose a word into its component morphemes (prefixes, suffixes, templates, roots...), and guess its meaning from them if necessary.

Syntax: Unlike phonology, this has traditionally been deliberately taught, and you should certainly know:
- The parts of speech: noun, verb, adjective, preposition, etc... and how to tell them apart.
- Argument structure and case: subject, direct object, nominative, accusative, etc.
- How to to break down a sentence into its phrase structure: what modifies what? What is a phrase, and what is its head? For best results, try being able to diagram it.

Unfortunately, it's not quite so simple: all three of those - especially the latter - are the subject of major controversies between different syntactic theories... (Two good Language Log posts on this issue: parts of speech and sentence diagramming.) If you teach whatever theory happens to be traditional where you're from, you may not make any friends in academia, and you risk perpetuating some old misconceptions; but you will certainly leave your students much better prepared to learn any more current theory - or any language - than if they had studied no grammar at all.

Historical linguistics and sociolinguistics: The language you speak most likely has relatives, and certainly contains words borrowed from other languages. You should understand:
- That there is normally variation inside a single language, which people often use to signal their social position and to identify the social position of others, and over which people's control is limited.
- That languages change over time as some variants become obsolete and others emerge, and in what ways they change - sound shift, semantic shift, borrowing, morphological and syntactic change...
- That different changes accumulating in different areas can split what used to be one language into several, and that people can abandon one language and start speaking another one instead.
- That sound shifts are usually regular, and that this regularity can be used to identify potential cognates (making it easier to learn languages related to ones you know.)

There should certainly also be some semantics and pragmatics in this list, but I'm not feeling especially inspired on either subject at the moment - any thoughts?

Thursday, October 30, 2014

Some Tuareg-Songhay loans

I'm almost three-quarters of the way through Heath's Grammar of Tamashek (Tuareg of Mali). The main interest lies in its efforts to reduce the bewildering complexity of Tuareg morphology to some sort of order, an impossible task which it accomplishes more successfully than any other Tuareg grammar I've looked at so far. Aside from this, however, it's raised some interesting etymological issues.

I've wondered for years where the Korandjé verb wəy "gather (firewood)" comes from. It normally appears in the idiom a-wwəy-ts skudzi [3Sg-gather-hither wood] "she gathered in firewood". On p. 333 of Heath's grammar, I found the explanation, in the following example:

i-wwáy=ədd i-sǽɣer-æn
3MaSgS-bring.Reslt-Centrip Pl-firewood-MaPl
[He] has brought firewood here.

The Tamasheq verb in question, awəy in the imperative, is simply the normal Berber word for "take, bring" (which in Korandjé is expressed with a Songhay verb, zəw), so I would have hesitated to connect them based on a dictionary entry alone. But given this attested usage with "firewood", the semantic specialisation poses no problems. What does surprises me is that it was borrowed as a bare stem, rather than with a fossilised 3rd person prefix y/i - contrast yəf (Tashelhiyt y-arf "roast", not attested in Tamasheq), ikna "make" (Tamasheq i-kna). Usually, only stems that start with a syllabic onset are borrowed into Korandjé without the y/i.

Another probable loan into Korandjé that I noticed going through the grammar is Korandjé ləwləw "shine, gleam" - cp. Tamasheq m̀ələwləw "shine".

However, a number of words have gone the other way - from Songhay into Tuareg. Heath comments on many of these in his dictionary (eg kə̀rikəw "practice sorcery"), but not all. One that struck me is the verb ḍùkr-æt "become angry at", obviously related to Gao Songhay dukur "be angry"; I don't recall seeing this verb elsewhere in Berber (not even in Alojaly's dictionary of Tamajeq), whereas it's widespread in Songhay.

Obviously cognate are Tamasheq é-tæqq "male ostrich" and widespread Songhay forms such as Gao taatagey, Fulan Kirya taataɣey "ostrich" (the shift of g to ɣ next to non-high back vowels is regular in several Songhay varieties, and in Tamasheq qq is the geminate equivalent of ɣ). The word is generic in Songhay but specific in Tuareg - the opposite of what we saw with "bring" - which suggests to me that it was borrowed into the latter, as does the fact that I don't find the term in Alojaly's Tamajeq dictionary. However, since ostriches are extinct in most Berber-speaking areas, it's difficult to prove the direction of borrowing.

Thursday, October 23, 2014

Berber: classification, Tasahlit, roots vs. stems

Today seems to be a good week for comparative Berber linguistics - the day's haul is worth sharing:

Maarten Kossmann has uploaded his preliminary classification of Berber varieties based on shared innovations: Berber subclassification (preliminary version). He divides Berber into seven blocks:

  1. Zenaga block (Zenaga of Mauritania, Tetserrét in Niger)
  2. Tuareg block
  3. Western Moroccan block (SW Morocco, Central Morocco, i.e. Tashelhiyt and most of Tamazight)
    possibly including NW Moroccan Berber (Ghomara, Senhadja de Sraïr)
  4. Zenatic block (Eastern Morocco, Western Algeria, Saharan oases, Tunisia, Zuara) extending towards the east with Sokna, Elfoqaha, Siwa
  5. Kabyle (N Algeria), possibly linked to the western Moroccan block
  6. Ghadames (Libya), probably to be linked to Djebel Nefusa (Libya)
  7. Awdjilah (Libya)
By and large, this appears very plausible, although it should be noted that Tunisian Berber and Zuwara are already somewhat peripheral to Zenati, not sharing western Zenati's innovative distribution of initial vowel dropping, and El-Fogaha is even more so than Siwa or Sokna. (As he notes, the much greater homogeneity and clearer boundaries of Zenati in the west imply that this group arrived in Algeria and Morocco from the east.) But, in principle, it is still necessary to identify specific innovations characteristic of each of these groups. It is also clear that the Zenaga block is by far the first split on the tree, and the list ought ideally to reflect that. But the moderately high degree of mutual intelligibility poses serious obstacles to applying the family tree model to Berber, as he discusses.

The most interesting Kabyle varieties for historical reconstruction are the little-known ones of the extreme east, "Tasahlit". As it happens, Abdelaziz Berkai has just uploaded his recent thesis, a dictionary and sketch grammar of the Tasahlit of Aokas: Essai d’élaboration d’un dictionnaire Tasaḥlit (parler d’Aokas)-français. The quality of his work appears excellent, and this will no doubt be a very useful resource. The choice of dialect, however, is not entirely ideal. It is clear from Basset's dialect atlas, and from the all too rare comments in Rabdi's grammar on neighbouring varieties, that the vocabulary of Aokas is still quite close to that of Bejaia; the really divergent varieties seem to be those of the Babor Mountains and Oued el Bared, approaching Jijel, and those are the ones most likely to give an insight into the dialect of the now largely Arabised Kutama.

I haven't yet had time to properly look at Samir Ben Si Said's thesis, De la nature de la variation diatopique en kabyle: étude de la formation des singulier et pluriel nominaux, but it tackles the synchronically as well as diachronically thorny problem of Berber non-concatenative morphology, and argues for an approach based more on roots than on stems, contrasting with another important study I've been working through lately, Heath's Grammar of Tamashek (Tuareg of Mali).

Tuesday, October 21, 2014

Subject-verb order in Tumzabt

Going through Brahim and Bekir Abdessalam's brief grammar of Tumzabt Berber (الوجيز في قواعد الكتابة والنحو الأمازيغية "المزابية": الجزء الأول) recently, I was struck by their discussion of the problem of subject-verb order. Berber in general allows both verb-subject and subject-verb order, with the case ("state") of the subject depending on which order is used. Determining which order is used under which circumstances, however, poses some difficulties; the same language may be described as VSO or SVO, depending on who you ask, and the determining factors certainly differ from one variety to another (cf. eg Mettouchi fc for Kabyle). Their take on the problem combines information structure with pragmatics and verbal mood. The latter two factors can very likely be reduced to information structure too, but that would require testing; in any case, the observation that VS order is required for serialization is interesting. Here's what they had to say, translated into English (pp. 129-130):

We observe that in the first set of examples, the subject precedes the verb; this is the usual form in an Amazigh clause consisting of a verb and a subject.

In the second set of examples, the subject follows the verb. This happens in the following cases:

  1. The subject may follow the verb when it is specific and known to the speaker and listener because there is a connection between speaking of it and a previous expression involving speaking of the same subject. For instance:

    twelleh! afunas-nni yetthaḍa - Watch out, that bull rampages.

    After the two parties have parted, they meet again the next day, and one says to the other:
    yak yhaḍ ufunas ay-tessečned asennaṭṭ! - Indeed that bull you showed me yesterday really did rampage!

    Here, the subject - the bull - is specific for both parties to the conversation in the second usage, since it had been spoken of earlier.

  2. For the sake of irony, which can only be deduced from the context surrounding this expression and from the circumstances of discourse, eg if we say:

    tiɣawsiwin-ess tqimant-edd ɣel wezğen, drus mi yefra igget, ay-tinid : yebṛem werğaz ! - His affairs stay half-done, rarely does he resolve even one, and you tell me: he's a careful man!

  3. The subject may follow the verb obligatorily in the serial aorist, eg:

    yuli tazdayt yuḍa-y-as wemjer - He climbed the date palm and the sickle fell from him [and dropped the sickle].

    It may also occur directly following the verb in the future tense aorist, eg:

    ad tatef teğrest ad yireḍ isemmuṛa n tḍuft or tağrest ad tatef ad yireḍ isemmuṛa n tḍuft - When winter comes, woolen clothes are worn.

They follow this up with an observation that seems quite astonishing from a comparative Berber perspective (p. 131):

A subject following the verb is put in the construct state if definite, this being the normal case for the postverbal subject, and is put in the free state if indefinite without any need for the [indefinite] article iggen / igget ["one"].

Unfortunately, they provide no examples to illustrate this claim.