Friday, July 03, 2015

Nasheed in Tumzabt

In honour of the month - and of the harmonious coexistence in Algeria of different branches of Islam, threatened in recent years - here's a rather well-produced bilingual Ramadan nasheed in Arabic and Tumẓabt, the Berber language of the Mzab region far to the south of Algiers:

Apart from its linguistic interest, it's rather interesting semiotically. The first half, in Arabic, presents life in a Saharan oasis as idealised by an oasis-dweller rather than a tourist - no dunes, not much picturesque architecture, just well-watered, well-shaded palm groves, traditional picnic blankets, and lots of happy children. The second half, in Tumzabt with Arabic subtitles, focuses more on religious life - mosques and prayer at odd hours and pages of the Qur'an. Someone put a lot of money into this clip; I don't know anything about its background, but I get the impression that it was intended not just to edify fellow speakers of Tumẓabt but also to show the best possible image of the Mzab to outsiders - perhaps a precautionary PR effort in case of further problems in the region?

Some linguistic features of interest include:

  • The Latin loanword i-bekkaḍ-en "sins", from peccatum;
  • The non-borrowed Berber word Yuc "God";
  • The curious metathesis in dessat < s dat "before, in front of" (I have no explanation for the gemination here either);
  • The coinage ɣiṛu, based on the inherited root "call", for the time before dawn when the first call to prayer is traditionally made, about an hour before the actual time of prayer (thanks to Banouh Nouh-Mefnoune for the details). Similar forms are paralleled sporadically in a number of Berber varieties, but which prayer they refer to depends on the region;
  • The varying forms of the 1st person plural object clitic (if indeed it can still be called a clitic): -aɣen when placed before the verb, as in the first line, but -aneɣ when placed after it;
  • The addition of meaningless -i at the end of the line to make it fit the metre, paralleled in Tashelhiyt.

Here's my best effort to transcribe it, minus some of the repetition; corrections welcome.

Yus-ed yur n uẓum-i, a-ɣen yerfed cetm-i;
The month of fasting has come, let it take away our sin;
Eṛbeḥ-ed si-s a memmi arrazen n etzeɛmi.
Win from it, my son, the reward of goodness.
Eččer-t fissaɛ ɣiṛu, dessat ma ɣad yedden,
Get up quick before dawn, before the call to prayer,
Esserr n elxiṛ eğrew, a-c reẓmen ibriden;
Gather secret good deeds, roads will open for you;
Yus-əd yur n uẓum-i.
The month of fasting has come.

Yus-ed yur n uẓum-i, a-ɣen yerfed cetm-i;
The month of fasting has come, let it take away our sin;
Eṛbeḥ-ed si-s a memmi arrazen n etzeɛmi.
Win from it, my son, the reward of goodness.
S tala-s eṛwa ul-eč, tferred s ibekkaḍen,
Fill your heart from its fount, purify it from sins,
Ezdey i tawwat-eč; a-c yexs Yuc ed midden,
Reconcile with your relatives, God and people will love you;
Yus-əd yur n uẓum-i.
The month of fasting has come.

Monday, June 29, 2015

Anomalous gender agreement in Algerian Arabic

In Algerian Arabic (here, Dellys dialect), the feminine singular form of an adjective is formed just by adding a suffix -a, with almost no exceptions. In two of the exceptions, a full look at the paradigm suggests that it's really the masculine form rather than the feminine which is irregular (though the situation is less clear-cut in other dialects - in traditional Algiers, for example, the plural of "beautiful" is شبّان šəbban):
m. sg.f.
beautifulشباب šbabشابّة šabbaشابّين šabbin
otherآخُر axŭṛأُخرى ŭxṛaأُخرين ŭxṛin

A third case is rather different. "Such-and-such (a person), so-and-so" is expressed by the noun m. sg. فلان flan, f. sg. فلانة flana, with no known plural. (This originally Arabic form is rather widely borrowed; you may be familiar with it from Spanish fulano). From this we can derive an adjective "such-and-such a" by adding a nisba suffix -i: m. sg. فلاني flani, but f. sg. فلانتية flantiyya. To make matters worse, we suddenly find ourselves with a gender distinction in the plural, something otherwise absent from adjectival agreement in this dialect: m. pl. فلانيين flaniyyin, f. pl. فلانتيين flantiyyin.

What's going on, though anomalous, is pretty clear (recall that feminine -a regularly becomes -t in the construct state): this adjective is displaying double agreement, gender agreement alone on the nominal root flan, and normal gender+number agreement on the adjectival derivational suffix -i. Can you think of any comparable cases elsewhere?

Saturday, June 27, 2015

How Korandje made "with" agree it-with its subject

Korandje, the language of Tabelbala in southwestern Algeria, requires the comitative preposition "with" to agree in person and number, not with its object, but with its subject (strictly speaking, with its external argument):
ʕa-ddər ʕ-indza xaləd, I-went I-with Khaled.
nə-ddər n-indza xaləd, you-went you-with Khaled.
This seems to be vanishingly rare worldwide. The nearest parallels I have encountered are ones in which the comitative is expressed using a serial verb, but a closer look at the syntax and morphology of Korandje shows that indza is indeed a preposition, not a verb or a noun. Perhaps most strikingly, when you relativise on its object, you pied-pipe not only the preposition but the agreement marker on it too:
ʕan bạ-yu ʕ-indz uɣudz əgga ʕa-b-yəxdəm
my friend-s I-with whom PAST I-IMPF-work
"my friends with whom I was working"
Its historical source, proto-Songhay *ndá "with, and, if", was also a preposition, and did not display agreement. Comparative data makes it possible to reconstruct how this change took place: it developed out of a strategy, common in Berber and found in some Songhay languages, of expressing "I went with Khaled" as "I went, I and Khaled", which seems to be the result of reinterpretation of a postverbal subject as part of the adjacent comitative phrase. This development in turn provides the first attested way to reverse the well-known grammaticalisation chain "with" > "and". If you want to know more, read my article, which has just been published:

"How to make a comitative preposition agree it-with its external argument: Songhay and the typology of conjunction and agreement". In Paul Widmer, Jürg Fleischer, and Elisabeth Rieken (eds.), Agreement from a diachronic perspective, Berlin: De Gruyter, pp. 75-100, 2015. (offprints available on request - just email me.)

Here's the abstract:

This article describes two hitherto unreported comitative strategies exemplified in Songhay languages of West Africa – external agreement, and bipartite – and demonstrates their wider applicability. The former strategy provides the first clear-cut example of a previously unattested agreement target-controller pair. Based on comparative evidence, this article proposes a scenario for how these could have developed from the typologically unremarkable comitative and coordinative strategies reconstructible for proto-Songhay, in a process facilitated by contact with Berber. The grammaticalisation chain required to explain this has the unexpected effect of reversing a much better-known one previously claimed to be unidirectional, the development COMITATIVE > NP-AND.

Sunday, June 21, 2015

Comparative Siouan Dictionary

A key document in Native American philology which has been circulating in samizdat form for decades is finally online and searchable: the multi-authored Comparative Siouan Dictionary (as noted by Guillaume Jacques). Named for the last of its speakers to resist colonization, the Sioux or Lakota, the Siouan family was spread over a vast section of North America, covering much of the Missouri and Mississippi valleys but with old outliers as far east as Tutelo in Virginia. The names of several Midwesternstates derive from Siouan languages, so they make a convenient starting point for exploring the database. Minnesota is from Dakota mni sota "cloudy water",both elements of whose history you can trace back here to proto-Siouan: *waRé• "lake, water" and *(a)só•tE "hazy, bluish, cloudy". *waRé• also yields Chiwere ñį, which in combination with the Chiwere reflex of *parás-ka "spread > flat (1)" yields the name of Nebraska. Dakota, from a name of the Sioux, has a less venerable history, being traceable only back to proto-Mississippi Valley Siouan *hkota/*hkoRa/*hkora "friend", with unexplained internal variation and similar forms in other families suggesting the possibility of a loan. (The la- element might have something to do with fire; see John Koontz's discussion.) Kansas, Arkansas, and Iowa also have names of Siouan origin, but I can't find them in here; much work remains to be done, after all... For the relevant correspondences, a good starting point is Rankin et al. 1997, available from the same site.

The more adventurous may note that there are good prospects for going beyond proto-Siouan. It is generally accepted that Catawban is Siouan's nearest relative, and the database sometimes includes Catawba cognates (as under "lake, water" above), but makes no attempt at Proto-Siouan-Catawban reconstructions. (Work on Catawba continues, but some older materials are available online, eg Lieber 1858, Gatschet 1900). Beyond that, some work suggests that Siouan-Catawban is in turn related to what would otherwise be an isolate language - Yuchi, originally spoken in Tennessee and later forcibly relocated to Oklahoma. Efforts to find etymologies at that level have barely gotten off the ground (cf. eg Rudes 1974), but there are some promising ones, notably proto-Siouan *isá•pE "black" vs. Yuchi ispí (Elmendorf 1964). Even more implausible proposals, like the idea of a special relationship with the small Yukian family of California (Elmendorf 1963), could at any rate be reexamined in the light of this work.

Tuesday, June 02, 2015

The irrelevance of the standard in Algeria

I recently came across a nice little study of language attitudes among Kabyles in Oran, inheriting Kabyle from their parents and kin but living in an overwhelmingly Arabic-speaking context: Ait Habbouche 2013. The results will not come as a huge surprise to anyone familiar with Algeria, but they stand in stark contrast to a curiously widespread idea about Berber language endangerment: the notion that Berber is under threat from the government-imposed hegemony of Standard Arabic. What the survey answers reveal, time after time, is in fact the utter failure of government policies to create any meaningful space for Standard Arabic in daily life. It is no surprise to see that Standard Arabic is used by 0% of respondents with other Kabyles in the cafe or at home. But seeing that only 4% speak it even at work, and 0% in university, should be a shock to anyone who still imagines that Standard Arabic occupies a position analogous to, say, Standard German. The taboo on speaking Standard Arabic in any but the most formal quasi-academic conversation remains nearly absolute; 73% rated it as the language they used least. The only topics surveyed for which this option was selected by any significant number were religion and politics, and actual usage in both cases would probably reveal a mix of Standard words into a basically dialectal matrix. There are absolutely no signs that this group is shifting to Standard Arabic, or even sees this as a viable possibility. The language that has attained a large usage among these speakers, even with other Kabyles, is not Standard Arabic but Algerian Arabic - a language with no official status taught in no school, which was the least likely (2%) of any of the available languages to be rated as most beautiful or richest, and was rated by 42% as the language they liked least (nearly tied with Standard Arabic). Yet this little-loved language, dismissed as much by its speakers as by their rulers, is not only the main language they use with non-Kabyles but is extensively used even with fellow Kabyles (42% with their own siblings).

The utterly marginal status of Standard Arabic in conversation within this group (and elsewhere in Algeria) contrasts sharply with that of French. 22% of the sample claimed to address Kabyle strangers in French, and 26% to speak it with their friends. More tellingly, 38% chose it as the language they spoke in at work, and no less than 68% for speaking about science. It's interesting to find an official language that doesn't dominate even in contexts like that! In short, while Standard Arabic is taboo for conversation, French is not. There are of course circumstances where it could be inappropriate, but there is no blanket ban as with Standard Arabic.

What does this imply for language policy? I'm no policy analyst, but here are my thoughts...

As far as the linguistic majority goes, only a spoken language can hope to displace French from the spoken domain, and long-standing efforts to break the taboo on speaking Standard Arabic have been utterly futile. Maybe it's time for those who want Arabic to be official in practice and not just in theory to acknowledge and support the existing complementary distribution of functions between Standard and Algerian Arabic, rather than treating the latter as some kind of unfortunate necessity. Demanding that officials consistently speak to the public in Standard Arabic instead of French is not always realistic, but demanding that they speak in a high register of Algerian Arabic could be. But that will only happen if people learn to value the language they speak, rather than dismissing it.

For the minority, it suggests that the main threat to Berber comes not from school, but rather from daily life in non-Berber-speaking environments. If so, solutions should focus less on making sure that Berbers can study Berber at school (though that is certainly desirable for other reasons), and more on getting non-Berbers in linguistically mixed contexts to study Berber and use it in conversation - almost the opposite of existing policy.

Friday, May 22, 2015

Old Arabic in Greek letters, in 3rd/4th century Jordan

An article published this year (Al-Jallad and Al-Manaser 2015) reveals the oldest known fully vocalised Arabic inscription by far - written in Greek letters in northeastern Jordan, probably in the 3rd or 4th century AD. Here it is: New Epigraphica from Jordan I: a pre-Islamic Arabic inscription in Greek letters and a Greek inscription from north-eastern Jordan. The inscription's author describes himself as "al-'Idāmī" - probably to be interpreted as "the Edomite" - a nisba featuring the definite article al-, unique within Semitic to Arabic.

There are a fair number of Arabic names transcribed in Greek at this period in various sources, but this seems to be the only known attempt to write Arabic text in Greek letters until much later. Most contemporary Arabic inscriptions were instead written in the Safaitic script, which does not indicate vowels. A text like this thus enables us to see much more clearly how the Arabic of the nomads of 3rd/4th century Jordan was pronounced. It confirms two crucial points. In Arabic, case is usually indicated only by final vowel choice; in this inscription, accusative case (-a) is clearly marked, but the Classical nominative and genitive (-u, -i) are not transcribed, suggesting that this dialect had dropped final short high vowels and thus developed a case system like that of Geez. Also reminiscent of Geez is the fact that intervocalic semivowels elided in Classical Arabic were unambiguously pronounced - thus 'atawa rather than 'atā for "he came". There may well be more material like this out there in the deserts on the Syrian-Jordanian border; let's hope research on the Syrian side becomes possible again soon...

Incidentally, next week I'll be at Bucharest for AIDA - if you're there, come to my talk on Wednesday!

Sunday, May 10, 2015

How to remember numerals better

In all the debate around "Whorfian" effects of language on cognition, one relatively well-known case has received oddly little attention among linguists, despite being widely discussed by psychologists and popularised by Malcolm Gladwell: the effect of word length on short-term memory (Baddeley et al. 1975). Basically, all other things being equal, it's easier to remember a sequence of short words than a sequence of long words. This suggests that our short-term memory for words (what psychologists confusingly call phonological memory) has a capacity limited by length - specifically, the amount that can be pronounced in about 2 seconds (Schweickert & Boruff 1987). That should suggest, in particular, that numbers presented orally will be easier to remember in a language with short numerals than in one with long numerals. (Note that this affects, among other things, IQ test results, since IQ tests typically include tests of numeral recall.)

Psychologists followed up on this by attempting to test this hypothesis with a number of language pairs (for an overview, see Baddeley (1997). Disclaimer: I'm not a psycholinguist, and the following references are certainly not exhaustive). The best-tested and most consistent result concerns Chinese. Mandarin and Cantonese numerals take shorter to say than English ones, and a number of psychologists have accordingly confirmed that Chinese speakers can remember longer numerals than English speakers (Stigler, Lee, & Stevenson (1986), Hoosain & Salili (1987)), even at 4 years old Chen and Stevenson (1988)), and that this applies even when bilinguals are tested across their two languages (Hoosain 1979). It goes further than that, in fact: Chincotta & Underwood (1997) find that, out of Cantonese, English, Greek, Finnish, Swedish, and Spanish, only Cantonese speakers remember significantly more digits than speakers of other languages - and that this difference disappeared if the subjects were prevented from rehearsing the numbers auditorily by being asked to keep repeating "la-la" while being tested, proving its linguistic nature. The difference ranges around 2 digits, with the exact figure depending on the experiment.

Data for other languages is less clearcut. Welsh numerals take longer to say in isolation than English ones, and Ellis & Hennelly (1986) accordingly found that English-Welsh bilinguals can on average remember longer numerals in English than Welsh. Naveh-Benjamin & Ayres (1986) simultaneously tested the hypothesis for university students in Israel speaking English, Spanish, Arabic, and Hebrew natively (but excluding the digits "seven" and "zero"). They found that the average number of digits recalled was highest in English (7.21), followed by Hebrew (6.51), then Spanish (6.37), and lowest in Arabic (5.77); the ordering by average number of syllables per digit, or by average time taken to read a digit, was English, Spanish, Hebrew, Arabic. However, the difference in number of digits recalled was smaller than predicted by the time taken to read a digit in each language, suggesting that other factors were also relevant.

A proviso is necessary: some recent work, without disputing the differences observed, has made a strong case that they relate not simply to length ( Lovatt, Avons, & Masterson 2000), but crucially to phonological factors (Service 2010, Lethbridge, Hinton & Nimmo 2002). This has been argued for Welsh numerals vs. English ones by Murray & Jones (2002), who find that Welsh digits take longer to say in isolation but actually take less time to say in connnected speech than English ones, and that changes of place of articulation at word boundaries negatively affect memory.

The research is curiously selective in terms of languages examined, and many of the experiments don't control for all possible confounding factors, such as diglossia and social status in the case of Welsh or Arabic. Nevertheless, it does at least seem well-established that speaking Chinese gives a short-term digit memory advantage over speaking major European or Semitic languages. So, if for some reason you regularly need to remember long numerals, and your preferred language doesn't happen to be Chinese, how do you compensate for this handicap?

There are two obvious ways to get around this (assuming you care enough about remembering numerals to want to, which depends very much on your tastes and circumstances). One is to remember the number visually (as a sequence of written digits) or even kinesthetically (as a sequence of typing actions), in which case this particular constraint no longer applies (cf. eg Olsthoorn, Andriga, & Hulstijn 2012). This only helps, however, if you remember numerals better visually or kinesthetically than auditorily, and my impression is that most people don't.

A probably more helpful alternative is to establish a code that lets you turn long numerals into much shorter words by identifying digits with single letters or single phonemes. This solution has a very long history in Arabic and Hebrew, in which each letter of the alphabet can be used to represent a digit: 'a is 1, b is 2, etc. (the first 9 digits are units, the second 10 are tens, and the rest are hundreds). Since short vowels are not letters, the resulting word can be given whatever vowels the user sees fit to give it. A common game of later poets using the Arabic script was to encode the date of their poem within the poem as a chronogram; more practically, Moroccan schoolchildren used to memorise the multiplication tables as a series of meaningless words formed by this encoding (Meakin 1905). Chronograms have been formed using Roman numerals, but for memorisation, at least, they are rather ill-adapted to such a system - think how much padding would be required to turn a number like MDCCCLXXXIII into words.

However, the spread of Hebrew studies in Western Europe following the Renaissance, and the increasing importance of memorising statistics there, encouraged European mnemonists to look for ways of emulating this encoding without having to learn a Semitic language. Doing so at a time when place notation was widely used, they introduced a crucial improvement: each consonant represented a digit in a place notation system, rather than a number in an additive notation system. After various cumulative efforts at improvement, this culminated in the early 19th century with the so-called Major system: 0=s/z, 1=t/d, 2=n, 3=m, 4=r, 5=l, 6=š/ž/č/j, 7=k/g, 8=f/v, 9=p/b, with vowels, semivowels, and laryngeals ignored. To remember 94801 (LACITO's zip code), for example, one would turn it into "professed". This system apparently remains in use among professional mnemonists to this day, despite being virtually unknown to wider society.

Perhaps this is why linguists haven't paid more attention to the word-length effect in the context of the Whorfian debate: it's a clear-cut effect of language on cognition, but not a very profound one, in that it should be fixable by some very simple hacks (or even just by borrowing some one else's numerals). But I'm not aware of any experimental work testing the effect of this particular hack on digit recall...