Jabal al-Lughat

Sunday, March 26, 2017

Why it's Siwi, not Tasiwit

In English- and French-language discussions of the languages of Egypt, the Berber language of Siwa Oasis in the Western Desert is more and more often called "Tasiwit". Please, don't do this.

In Moroccan and Algerian Berber, as in the Sahel, language names are feminine, and are formed with the feminine circumfix t-...-t: Taqbaylit, Tarifit, Tamazight... In Siwi, however, languages are masculine, as in Egyptian Arabic. Ordinarily, Siwis simply call their language Siwi. When they want to specify the language as opposed to anything else from the oasis, they call it Jlan n Isiwan, "speech of Siwa/Siwis".

If you're writing in a more westerly Berber language, it's quite appropriate to nativise this term into Tasiwit. But if you do so when writing in a Western language, you're just imposing a Moroccan/Algerian convention on a language whose speakers are even less familiar with it than your readers are. On top of that, the feminine of Siwi in Siwi is Tsiwett, not Tasiwit as it would be further west. So just stick with Siwi, OK?

Wednesday, March 15, 2017

Getting from "Hey you!" to "If only"

A well-known Algerian proverb has it that:

لي عندهٌ مية يقول يا ميتين
li `andu mya yqul ya mitin
who has hundred says oh two.hundred
He who has a hundred says "If only it were two hundred!" (literally: "Oh two hundred!")

The ya here is not a general-purpose interjection. Unlike English "oh", it's normally used as a vocative, followed by the name of the person you're addressing. That's its primary function in Classical Arabic too. But in Classical Arabic, you can't use it on its own to mean "if only..."; in fact, that usage isn't very common in Algerian Arabic either. Yet the same extension of function from vocative to wish-marker is found in Algerian Berber. In an 18th century Kabyle poem recorded by Mouloud Mammeri in his Poèmes kabyles anciens (p. 132), an aspiring poet, Muh At Lemsaawd, begs the better-established Yusef u Qasi to accept him as an apprentice:

Ul-iw fellak d amaalal
A wi k-isâan d ccix is
My heart is sick for you
If only I had you as my teacher (literally: "Oh he who has you as his teacher!")

You can't do this in Classical Arabic, nor in English: a vocative followed by a noun phrase is going to be interpreted as an act of addressing, not of wishing. But in Arabic you do find an otherwise unexpected vocative particle showing up in some wish constructions, notably يا ليت yaa layta "if only". And in (slightly archaic) English you have a very similar construction with an infinitive in "to" or a prepositional phrase in "for", instead of with a noun phrase: "Oh to be young again!", "Oh for a thousand tongues to sing!" That suggests that the connection between vocative and wishing reflects some general feature of human cognition, or at least of a rather large culture area.

The obvious connection would be through requests. One reason to address someone is to ask them to bring you something. It's not such a big step from "Hey kid, get me a glass of water" to "Hey, a glass of water!", with the addressee and the verb erased, and the vocative particle effectively serving as much to mark the wish as to get the addressee's attention. But that doesn't really predict forms like the Kabyle one, where the state wished for takes the form of a relative clause, nor even the old-fashioned English constructions discussed, so I'm not really happy with this explanation. Any ideas? And can you think of any parallels in other languages?

Sunday, February 26, 2017

On Olathe

A few days ago, two unarmed young engineers from India were shot in a bar in Olathe, Kansas by a man yelling "Get out of my country!", as was a heroic bystander who tried to stop the shooter. As this contemptible crime put a normally quiet suburb of Kansas City into the international news, journalists and readers worldwide must have been wondering, as I wondered the first time I heard of it a couple of years ago: "How do you pronounce Olathe, and what sort of a name is that anyway?"

The way the locals pronounce it is /ou'leɪθʌ/, as you can hear early in the Mayor's speech. This is remarkably irregular: I can't think offhand of any other word in the English language in which a final e is pronounced /ʌ/, except occasionally "the". You might expect the etymology to provide an explanation, but it turns out to complicate the story further.

The town of Olathe was founded in 1857 by one John Barton, a doctor from Virginia, who - by his own account - got it into his head that "beautiful" would be a good name for the town he envisaged, and:

... meeting Capt. Joseph Parks, head chief of the Shawnees, he said: 'Captain, what in the Shawnee language would you call two quarters of land, all covered with wild flowers? In English we would say it was beautiful." Parks replied: "We would say it was 'Olathe,' "giving it the Indian pronunciation Olaythe, with an explosive accent on the last syllable. Barton made the same inquiry of the official interpreter, an educated Indian, who made the same reply, adding that for English use it would be best to pronounce it "Olathe," with the accent on the second syllable. So it came to pass that the new town was named "Olathe," the city beautiful. (History of Johnson County, Kansas)

In Shawnee, an Algonquian language, (h)oleθí is indeed documented as meaning "pretty" (Gatschet II:2, II:6, III:5); the root also seems to mean "good", judging from its occurrences (spelled <lafi>) in Alford's Shawnee New Testament translation, eg in Matthew 5:45, 19:16, 20:15. One might assume the Shawnees had their own name for the place, but that is not necessarily true, considering they had gotten there barely a generation earlier. Originally from Ohio, they were induced to sign a treaty to move to Kansas in 1831, onto land originally belonging to the Kaws (Kanzas). A few years after the foundation of Olathe, they were pushed out again, to Oklahoma.

It thus seems pretty clear that the original pronunciation of the town's name was /ou'leɪθi/, corresponding better with the spelling (cp. "synecdoche"). How did that turn into /ou'leɪθʌ/? I think the answer lies in English sociolinguistic variation. In the 19th century, standard English word-final /ʌ/ was often pronounced dialectally as /i/, yielding forms like "Americkee" for America or "Canadee" for Canada. In more recent times this pronuciation seems to show up mainly in caricatures of rural or Appalachian speech. The current pronunciation of Olathe as if it were Olatha can thus best be understood as a hypercorrection by people who didn't want to sound uneducated.

Update: A very helpful article linked by Y below, The Pronunciation of Missouri, reveals that the phenomenon is more systematic in the area than I had realised: it extends not only to placenames like Missouri, but even to words like spaghetti, macaroni, or prairie. This makes hypercorrection seem a less likely explanation. Instead, it looks as though final /ɪ/, which becomes /i/ in standard American English, was instead reduced to schwa in parts of the Midwest, including the area surrounding Kansas City. Andrews' (1994) Shawnee Grammar indicates that Shawnee /i/ was often realised as [ɪ], so this fits together nicely.

Friday, February 24, 2017

The Origin of Mid Vowels in Siwi

How does a language with a relatively small vowel system react to pressure from a language with a larger one?

Most northern Berber varieties have a simple four-vowel system: tense /a/, /i/, /u/, vs. lax schwa (/ə/, written e in the official orthography), the latter being mostly predictable and limited to closed syllables. In the eastern and southern Sahara, however, we tend to find slightly larger vowel systems, and it looks very much as though proto-Berber had a rather asymmetrical six-vowel system, close to modern Tuareg but missing /o/: it had tense /a/, /e/, /i/, /u/ vs. lax /ɐ/, /ə/.

Siwi Berber, in western Egypt, has a more symmetrical six-vowel system: tense /a/, /e/, /i/, /o/, /u/ vs. lax /ə/. All of these vowels occur in inherited vocabulary as well as in Arabic loanwords. It is obvious by inspection that, in almost all contexts, *ɐ merged into /ə/. But the distribution of /e/ shows little connection with that of *e: in fact, most instances of proto-Berber *e correspond to Siwi /i/. And the origin of /o/ is not immediately clear at all. How did this happen?

My latest article - written together with Marijn van Putten - proposes some answers. It turns out that proto-Berber */e/ was retained in Siwi only before word-final /n/. Most instances of /e/ and /o/ are found in Arabic loanwords. Within inherited vocabulary, almost all instances of /e/ - and all instances of /o/ - are phonetically conditioned innovations, arising from at least three distinct regular sound changes and one sporadic one. The net effect of this "conspiracy" of sound changes is to extend phonemes otherwise almost entirely restricted to Arabic loans into inherited Berber vocabulary.

If you want the full story, go read our article: The Origin of Mid Vowels in Siwi (published in Studies in African Linguistics 45:1-2 (2016), pp. 189-208).

Sunday, February 19, 2017

A real-life subjacency problem sentence

There are some kinds of questions and relative clauses that you just can't form without resorting to a resumptive pronoun, even in languages - like English - that otherwise don't allow resumptive pronouns to begin with. Ever since Ross (1967) came up with a typology of "island constraints", syntacticians have hotly debated both which ones these are and how to account for them.

Unfortunately, real-life examples of people trying to say such things are very scarce on the ground. As a result discussion of this phenomenon tends to be dominated by artificial examples. Much of the literature on subjacency inadvertently demonstrates how unsatisfactory the result can be (as discussed here: 1, 2). Every once in a long while, however, you find a completely spontaneous case of someone running up against such constraints - and here's today's, courtesy of some person on Reddit:

Step zero: find a couple million complete and utter morons, who it's a miracle they can breathe in and out without f***ing it up, to support you.

Normally, a relative clause starting in "who" would have no overt subject within the clause itself apart from "who", as in:

Step zero: find a couple million complete and utter morons, who in all honesty Ø can barely breathe in and out without f***ing it up, to support you.

But that's impossible here: note the ungrammaticality of:

*Step zero: find a couple million complete and utter morons, who it's a miracle Ø can breathe in and out without f***ing it up, to support you.

Instead, you end up having to fill the subject position to which "who" refers with a resumptive pronoun "they".

Thursday, February 09, 2017

Romance languages in 17th century North Africa

In 1609, 117 years after conquering Granada, the Spanish state decreed the expulsion of all "Moriscos" - that is, everyone descended from Muslims forcibly converted to Christianity, numbering in the hundreds of thousands. In the 1720s, a century later, two separate travellers - Jean-André Peyssonel and Francisco Ximenez - found that a number of towns in Tunisia, including Testour, Bizerte, and Tebourba, were Spanish-speaking, inhabited by the descendants of these refugees (as I was surprised to learn from Vincent 2004). According to Peyssonel, for example, "the inhabitants of Tebourba practically all speak Spanish there, a language which they have conserved from father to son"; referring to the same town, Ximenez adds "immediately after their arrival from Spain, they had schools in our language. They were insultingly told they were not real Moors, and the Bey took away their books and their schools; after that, they little by little forgot Spanish and learnt Arabic." All in all, the reports seem compatible with a three-generation pattern of language shift: the people they met still spoke Spanish, but were likely mostly not to pass it on to their children, as they became more closely integrated into the wider society of their new home.

In 1627, a couple of decades after the expulsion of the Moriscos, a corsair ship from Algiers raided Iceland, capturing a couple of hundred unfortunate villagers, one of whom left a description of his experiences. While the distance travelled in this raid was unusual, the practice itself was less so: the capitals of the Barbary states were full of European slaves captured by state-sponsored pirates, waiting for ransoms that might never come. Likewise, many North Africans were captured and held as slaves in Europe (see eg Wettinger 2002 on Malta): describing Algiers in 1612, Diego de Haedo comments that "there are many Muslims who have been captives in Spain, Italy and France" and hence speak those countries' languages (Vincent 2004:107). To further complicate matters, not all immigration from Europe was involuntary: Haedo adds that "There are also an infinite number of renegades [converts to Islam] from these countries and a large number of Jews who have been there, who speak polished Spanish, French, or Italian. The same holds for all the children of renegades who, having learned their national language from their parents, speak it as well as those born in Spain or in Italy."

In brief, 17th-century North Africa contained plenty of European immigrants - some refugees, some captives, and even some voluntary - learning the language spoken around them while maintaining, for a while, the language they had arrived with. What impact did this have on Maghrebi Arabic and Berber? Unfortunately, it's not easy to date Romance loans into either, but we can safely assume that some of the precolonial loans arrived in this period. A good dialect map, in combination with historical data on where these groups ended up, might help identify such loans more precisely - but that doesn't really exist yet, except to some extent for Morocco (Heath 2002).

References:

Vincent, Bernard. 2004. In Jocelyne Dakhlia ed., Trames de langues. Usages et métissages linguistiques dans l’histoire du Maghreb, Tunis-Paris, IRMC, Maisonneuve & Larose, 2004, 561 p.

Saturday, February 04, 2017

Why the sun really does rise

In response to someone comparing "alternative facts" to science fiction, the eminent science fiction writer Ursula LeGuin recently wrote:

The test of a fact is that it simply is so - it has no "alternative." The sun rises in the east. To pretend the sun can rise in the west is a fiction, to claim that it does so as fact (or "alternative fact") is a lie.

The comments (never read the comments!) include several people trying to be smart by pointing out that, actually, "the truth of the matter is that the sun does not rise, but rather that the Earth turns". This apparent conflict is worth unpacking from a descriptive linguistic perspective.

All fluent speakers of English use phrases like "The sun rises in the east". They also use phrases like "Hot air rises." The commenter quoted previously seems to be applying something like the following reasoning:

When something (eg hot air) rises, it moves upwards away from the earth.
When the sun "rises", it's not moving upwards away from the earth - rather, the earth is turning relative to it.
Therefore, the sun does not actually rise.

A lexicographer will immediately see at least one ironclad way to vitiate such an argument: identify two distinct senses for "rise". Rise₁ means "to move upward away from the ground", while rise₂ means "for a celestial body's apparent position to come closer to the zenith" (or something along those lines.) The sun rises₂, but it doesn't rise₁.

But not so fast! It's perfectly plausible that someone could believe the earth is stationary and the sun physically moves upwards when it rises. For someone holding that belief (or even just using that mental model without necessarily believing it), "rise" could easily have a single sense, not two different ones. Is there any language-internal evidence that "rise" has two senses?

As it happens, there is: look at antonyms. We say "The sun sets in the West", but "Hot air sinks" (and "Empires fall", but that's another story); you can't say "*Hot air sets". "Set" is the antonym of rise₂, but not of rise₁. That seems like a pretty good reason to assume that, even for flat-earther speakers of English, the two senses are lexically distinct. So it looks like Ursula LeGuin wins this one, as you might expect.

Wednesday, January 25, 2017

Tigre between ejectives and pharyngealization

There is some debate over the original pronunciation of the "emphatic" consonants (Arabic ط ض ظ ص ق) in Semitic and more generally in Afroasiatic: were they ejective as in Amharic, or pharyngealized/uvular as in Arabic? For a number of reasons, such as that in proto-Semitic they did not show a voicing contrast, the general opinion is that they were glottalized. Yet pharyngealized consonants show up not just in Arabic and neo-Aramaic but even in Berber, which would on the face of it suggest that the feature predates proto-Semitic. Either we have to suppose independent parallel development, or we must assume that Berber ejectives turned into pharyngealized consonants under the influence of Arabic. The latter seems more probable, but only if we can show that it is indeed plausible for a language to make such a change as a result of widespread bilingualism in Arabic.

It turns out that Tigre, the main language of northern Eritrea, offers a concrete example of just that. The inland plateau dialect of the Mansa`, commonly considered as standard, is described by Raz (1983) as having four ejectives k' (usually [ʔ]), t', s', and č̣ , and no pharyngealized or uvular consonants. You can hear an example of standard Tigre here, which seems consistent with his description. The coastal Hirgigo dialect spoken around Massawa, however - as heard in these Learn Tigre YouTube videos, however, show a rather different situation. ḳ is simply [q] (as in "elbow", "neck", "thigh"), ṭ is [tˤ] (as in "goat"), ṣ is [sˤ] (as in "white", "black", "back"); only for č̣ can you occasionally hear a slightly ejective realization [tʃ] ~ [tʃ'] (as in "fingers" or "fingernails"). The result is a good deal easier for an Arabic speaker to pronounce! This should not be too surprising: the port of Massawa has had extensive contact with Arabic speakers for many centuries. In fact, it's said to be the place where some of the first Muslims, seeking refuge from the persecution they were suffering in Mecca, landed on their way to the Abyssinian court. Such a diversity of emphatic consonant realizations within a single language confirms in turn that it is plausible for the habit of pharyngealizing emphatic consonants to be transferred from a language to its neighbors.

Saturday, January 21, 2017

Semitic languages in two Arabic novels

I've been reading two novels in Arabic lately. Frankenstein in Baghdad, by Ahmad Saadawi, reimagines Baghdad's descent into chaos in the mid-2000s, blending gritty realism with semi-allegorical horror. Samraweet, by Hajji Jaber, is an altogether gentler but still cutting narrative of the Eritrean diaspora, interleaving scenes from the narrator's life in Jeddah with ones from his first visit to Asmara as he gradually realizes the difficulty of being part of either place. Both turned out to share a feature I hadn't been expecting to find: dialogue in other Semitic languages.

In Frankenstein in Baghdad, one of the main characters is an elderly Assyrian woman, Elishawa "Umm Daniel". All her relatives have long since moved abroad, and keep begging her to come live with them where it's safe, so there are few occasions for her to speak anything but Arabic. However, the Assyrians of northern Iraq traditionally speak a variety of Neo-Aramaic, and when she meets her grandson from Melbourne, they have the following fairly elementary conversation (pp. 276-277), which I hope I've transcribed correctly:

"داخي إيوَت؟" (Dāx īwat?) "How are you?"
"سباي إيْوَن باسيما" (Spāy īwan basīmā) "I am fine, thanks."

The author of the book seems to be from southern Iraq, so I found it remarkable that he took the trouble to get some Neo-Aramaic dialogue - especially since the copula is appropriately put in the feminine form both times (in Assyrian Neo-Aramaic, even the 1st person singular copula agrees in gender). Probably he felt it would enhance her symbolic status as a reminder of what Iraq once was. Unfortunately, while Aramaic has been spoken in Iraq for almost three millennia, its prospects there are dim: after all these years of war and frequently persecution, most speakers live in Western cities, and unless they're exceptionally good at remaining a distinct, cohesive immigrant group, their descendants seem more likely to speak English or Swedish than Aramaic.

In Eritrea, unlike Iraq, most people have as their first language a Semitic language other than Arabic: Tigre in the north, Tigrinya in the south. So it was less surprising to find an Asmaran waiter on the first page saying "سنّي ما سيام" (?Senni mā syām), which I assume from context means something like "Good afternoon!" However, the occasional glimpses provided into Eritrean sociolinguistics were more eye-opening. The narrator and most of his friends are from a Tigre-speaking background and know how to speak it, but Tigre per se seems to play little part in their linguistic identity. They grew up not only speaking Arabic in the street, but feeling that Arabic is an Eritrean national language, and resenting the government's treatment of it as less central than Tigrinya. When an Eritrean in Jeddah speaks Tigre with him, the narrator assumes it's because he only arrived recently until he finds out, to his surprise, that this person simply "enjoys speaking it, even in Jeddah" (p. 76). It would be interesting to see how this compares to the attitudes of Tigre speakers living in Eritrea: between the prestige of Arabic and the status of Tigrinya, what are the long-term prospects for Tigre?

Saturday, January 07, 2017

Of words and pens

In Algerian Arabic, this is a stilu ستيلو - a word instantly recognizable as a borrowing from French stylo:

In Standard Arabic, on the other hand, as any Algerian learns in primary school, it's a qalam قَلَمٌ. This, as it happens, may also be a borrowing, though a much older one; compare ancient Greek kálamos κάλαμος "reed, reed-pen", which apparently has an Indo-European etymology. Clearly, either pre-modern Algerians were so sunk in illiteracy as to have forgotten the word for a pen altogether, or they replaced a pre-existing word for pen with a French borrowing - right?

Well, no. In the Middle Ages, there weren't too many fountain pens or biros around. Classical Arabic qalam referred to something more like these:

Any Algerian who went to Qur'anic school up to the 1960s or so will remember this - a simple reed pen anyone can make using nothing more complicated than a sharp knife. (The Algerian version was a bit different than those in the picture, as it happens - usually people would use a quarter-circumference of a large reed, not the whole circumference of a small one.) More than that, they will remember what it's called: qləm قلم. There are probably people in Algeria who still use these, and very likely they still call them that.

But no one calls a modern industrial pen qləm. When industrial pens were introduced, sometime in the 19th century, ordinary Algerians ended up classing them as a new object, quite distinct from the reed pen despite its similar function, and deserving of an unrelated name. The guardians of Standard Arabic, on the other hand, decided to extend the reference of qalam to cover both. It may be no coincidence that French distinguishes calame from stylo, like Algerian Arabic, whereas English, like Standard Arabic, treats both as diferent types of pen.

Historical linguists regularly use lexical reconstruction to shed light on technological history, an approach called "Wörter und Sachen". This approach has been very fruitful in many cases. But, as this case illustrates, there are some pitfalls to watch out for: whether something counts as the same object or as a new one is a rather culture-bound question, and if investigators impose their own ideas about this on the situation they are investigating, they will get the wrong answer.

Tuesday, December 27, 2016

Too strong to get out

At four, my nephew speaks English (his dominant language) very well. He still shows some interesting divergences from the standard of those around him, though. Some are influenced by German (a close second): he uses "mine" as a determiner in English (like German "mein") rather than "my", saying things like "mine house". Others seem to result from language-internal overgeneralization, as when he said:

If I push the Lego box then the carpet will destroy. [intended meaning: be destroyed]

Presumably, he's interpreted "destroy" as a labile verb, like "open" or "burn".

At first blush, I thought the following sentence was another example of overgeneralization:

I'm too strong to get out, so you can't. [intended meaning: I'm too strong for anyone to get me out]

However, reflection suggests that this ought to be perfectly grammatical in English, since "get out" is already labile. "This stump is too heavy to pull out" works fine, so why not "I'm too strong to get out"? Yet, for me at least, the clause immediately receives a pragmatically absurd interpretation with "I" as the subject of "get out", and the obviously intended interpretation is barely accessible even when I've consciously concluded that it should be grammatically acceptable.

In terms of the classic Chomskyan analysis of control, the two interpretations correspond to different unpronounced pronouns PRO:

I_i'm too strong [PRO_i to get out]
I_i'm too strong [PRO_arb to get PRO_i out]

A lot of linguists really dislike the idea of an unpronounced pronoun. Whatever its psychological merits, though, this analysis has the advantage of suggesting why the first interpretation comes more easily than the second here: it only involves one empty pronoun, whereas the desired interpretation needs two. So if anything is going wrong in this sentence, it's not so much the syntax as the pragmatics: an adult speaker might be more aware that listeners could have trouble processing a clause of this form, and avoid it in favour of something less ambiguous. That would need empirical checking though.

Thursday, December 08, 2016

How Tunisia ruined its PISA performance

PISA 2015 is an OECD-run survey intended to evaluate education systems worldwide by giving the same test to (almost) all students of the same grade across a large number of countries and comparing the results. This years' results have gotten a lot of coverage, notably for the dismal perfomance of all the Arabic-speaking countries participating. The UAE did least badly in terms of combined scores, managing 48th place out of 70; it was trailed by Qatar (59th), Jordan (61st), Lebanon (65th), Tunisia (66th), and, most ignominiously, Algeria at 69th place, barely beating the Dominican Republic.

Laudably, PISA have made their science tests publicly available online in many languages, including four Arabic versions labelled Israel, Qatar, Tunisia, and the UAE - don't ask me what happened to Algeria, Jordan, and Lebanon. Browsing through these, one immediately notices that the Tunisian translation (unlike the Gulf ones) has a remarkable number of grammatical errors, typos, and phrasings so awkward as to be barely comprehensible. For instance:

Bird Migration 1: "يستعملون العدّ الذي يقوم به المتطوّعين" - wrong case: should be المتطوّعون
Bird Migration 1: extremely awkward phrasing: "هجرة الطيور هي حركة موسمية كبيرة، يتنقل أثناءها الطيور نحو أماكن تكاثرها أو هي تعود منها." ("Bird migration is a great seasonal movement, during which birds move to the places of their reproduction and they come back from them.") Contrast the clearer phrasing in the Qatar version: "هجرة الطيور الموسمية هي انتقال واسع النطاق للطيور من وإلى مناطق تكاثرها. وفي كل عام يتولى متطوعون إحصاء عدد الطيور المهاجرة في مواقع محددة."
Bird Migration 3: the bird's name is "الزقزوق الذهبي" in the text, but in the question it turns into "الزقزاق الذهبي".
Running in Hot Weather 1: Garden path title: anyone looking at "العدو في الطقس الحار" is going to read it as "the enemy in hot weather", at least until the context is established. Contrast the Qatari translation "الجري في الجو الحار", using a better known, graphically unambiguous term for "running".
Running in Hot Weather 1: Grammatical error in "يدل على ذلك {كمية العرق | ضياع الماء | درجة حرارة الجسم} العداء بعد ساعة من السباق": for the sentence to make sense (even in dialectal Arabic!), none of the alternatives should contain the definite article, since they form part of an idafa genitive. Contrast the Qatari version, which avoids the problem by putting "للعداء".
Running in Hot Weather 2: Garden path sentence: "شرب الماء خلال السباق يمكن أن يكون له تأثير على حصول تجفّف وضربة حرارة بالنسبة إلى العداء. أيّهما؟ " Anyone reading this will start by reading the first word as šariba "he drank", giving "he drank water during the race, it can have an effect..." and only after the fifth word will they be in a position to read it, as intended, as "Drinking water during the race can have an effect on the occurrence of dehydration and heatstroke for the runner. Which of the two?" Having gotten that far, they'll still be given pause by the need to decide the intended referents of "Which of the two?" Contrast, yet again, the much easier to read Qatari version: " ماهو تأثير شرب المياه خلال الجري على تعرض العداء للجفاف وضربة الشمس ؟ " (What is the effect of drinking water during the race on the runner's exposure to dehydration and heatstroke?")

I could keep going, and no doubt more fluent Arabic speakers can find problems I haven't even noticed, but the pattern is clear: Compared to Qatari students, to say nothing of Western ones, Tunisian students were systematically disadvantaged in the PISA 2015 science tests by bad translation.

Whose fault is this? Clearly there was a failure at the level of PISA's international verification, which should have eliminated such problems. But the translations themselves are carried out at the national level (PISA2012 Technical Report Ch. 5). In other words, this mess was produced by Tunisian translators under the direction of the Tunisian government.

How is that possible? Simple: in Tunisia, appallingly enough, science is taught in French from the start of secondary school onwards. Science teachers have little need to keep up their Standard Arabic proficiency. Which raises the question of why this test, targeted at 15-year-olds, was administered in Arabic there to begin with.

Wednesday, November 30, 2016

Siwi vocabulary for addressing animals

Probably every language has a certain number of forms used especially for addressing animals, especially domestic animals. In response to a recent query by Mark Dingemanse, I gathered together all the ones I happened to have recorded for Siwi - the list below is definitely not exhaustive, but should at least be suggestive. Note the sounds used - clicks do not usually form part of Siwi phonology!

To chicks:
didididididi: eat!

To cats:
ərrrr: come!
ǀǀǀǀǀ: come!
pss: move!

To dogs:
ʘʘʘʘʘʘʘ: follow me!

To goats:
əšš: go!
ħəww: go!
xətt: go!
kškškškškš: eat!

To donkeys:
ǁǁǁǁ: giddy-ap! (?)

The interesting question here is: to what extent are these arbitrary, reflecting an emergent cross-species convention just as most human lexemes do, versus to what extent do they reflect innate properties of animal perception and communication? How do they compare to those you've encountered, if any?

Tuesday, November 08, 2016

Some Dellys etymologies via Andalus

Looking through Corriente's etymological dictionary of Andalusi Arabic, I keep coming across explanations for obscure Dellys words whose origins had been a mystery to me. Corriente's etymologies are not always to be trusted - I've found several errors, most egregiously the attribution of kurānah كُرانة "frog" to Romance rather than to Berber - but the work remains very valuable. Here are a few etymologies that struck me.

l-ənjbaṛ لنجبار "maize" was originally anjibār أنجبار "snake-weed" (Persicaria bistorta), whose flowers looks vaguely similar. This in turn comes from Persian angbār انگبار, which Corriente seems to derive from rang-bār رنگبار "many-coloured".
skənjbir سكنجبير "ginger" derives from some sort of popular confusion between two Arabic words: zanjabīl زنجبيل "ginger" and sakanjabīn سكنجبين "oxymel" (a mixture of honey and vinegar used medicinally). I assume the connection is that both are good for colds, but a quick search didn't turn up any actual evidence that oxymel was used for that purpose. Sakanjabīn is apparently from Persian سرکه انگبین serke angabin (Corriente gives the form sik angubēn) "vinegar honey", while zanjabīl is apparently, again via Persian, from Sanskrit शृङ्गवेर ‎śṛṅgavera.
fərnəħ فرنح "smile, laugh (of a baby)": cp. Andalusi farnas فرنس, Moroccan fərnəs فرنس; possibly, Corriente suggests, from Greek euphrosynē εὐφροσύνη "joy".
bu-mnir بومنير "seal" was very hard to elicit, since they've been locally extinct for decades (they've nearly disappeared from the entire Mediterranean, in fact). However, it turns out to be correct after all: cf. Andalusi bul marīn بل مرين "sea lion", Maltese bumerin "seal". Corriente seems to take this as Romance *pollo marino "sea-chicken", but the first part of that at least is clearly implausible in light of the comparative evidence as well as of common sense; the second might be tenable, but I'm not sure.

On a not entirely unrelated note: for anyone who wants to explore the maritime terminology of Dellys in greater depth than I've ever been able to elicit, El-Bahri.net is a wonderful and unexpected resource.

Friday, November 04, 2016

Lingua Franca and Sabir in "Four Months in Algeria" (1859)

I recently finished reading Four Months in Algeria, a travel diary by the English Rev. J. W. Blakesley published in 1859. It's mostly rather superficial - he couldn't speak Arabic, and spent most of his time with French soldiers and German settlers - but enlivened by occasional insights. It contains little content of linguistic interest, but it does contain two brief passages in the pidgin still used for communication between North Africans and Europeans when neither spoke the other's language - call it Lingua Franca, or Sabir. Since it would take a brave creolist to plough through the whole thing just in the slender hopes of finding such material, I reproduce them here.

The first passage (p. 340) comes from the author's description of his journey from El Aria to a place called Embadis, both in the east of Algeria, during the month of Ramadan; it shows a curious combination of French, Arabic, and "classic" Lingua Franca:

The poor muleteers had not tasted food during the whole day ; and as soon as ever the sun dipped, they produced one or two flat cakes, and ate them with avidity, not however without first offering me a sahre. I of course declined to diminish their scanty store, and reminded them that I had breakfasted at El Aria. "Toi makasch tiene carême ; toujours mangiaria," said one of the poor fellows, in the polyglot dialect which is growing up out of the intercourse between the natives and the illiterate European settlers of the interior.*
* There are a few Arabic words which the European children habitually make use of at Guelma, even when playing with each other. Makasch, no, shuiya, gently, I found invariably took the place of the corresponding French terms. On the other hand the Arabs constantly use the words ora, hour, and buono or bueno, good, to one another. Iauh, yes, a Kabyle word, pronounced exactly like the German affirmation, is also very common among the lower orders of Europeans.

In this passage, "toi" (you), "carême" (fast), and "toujours" (still) are French, while "tiene" (have) is Spanish, and "mangiaria" (eat, or perhaps food?) is Lingua Franca (from Italian), and "makasch", being used as a simple negator, is Algerian Arabic makaš ماكاش "there is no" (I discuss the latter's history here). Despite the diversity of the lexical sources drawn on, however, the grammar - simple SVO with no subject-verb agreement - matches better with Lingua Franca than with any of the lexifiers.

The second (p. 419), from a country as yet unconquered by the French, shows no such admixture, corresponding perfectly to earlier descriptions of Lingua Franca in which it often appears as little more than Italian minus the morphology:

More than once have I found in Algeria the conventional civility of the Arab to an European change into an unmistakeable expression of goodwill, when it appeared that I was an Englishman ; and in Tunis a notification of the fact at once drew forth a "Buono Inglese ; non buono Francese," from the mouth of a native.

Tuesday, September 27, 2016

Two funny adjectives (?) in Algerian Arabic

In Algerian Arabic, as in any other Arabic variety, adjectives follow the noun. However, there is one exception to this rule: invariant quja قوجا or qŭjna قُجنا, "a huge". Thus we say ṛajəl kbir راجل كبير "a big man", but quja ṛajəl قوجا راجل "a great big man". Not only does this "adjective" precede the noun it modifies, it requires it to be made indefinite: you can say šrit quja ktab شريت قوجا كتاب "I bought a huge book", but if you want to say "I bought the huge book", there's nothing you can do but use a different adjective. *šrit quja l-ktab or *šrit əl-quja ktab or *šrit əl-quja l-ktab are all impossible. You can make quja قوجا follow the noun, but you have to use a different construction, equally unique to this "adjective": ṛajəl quja mən huwwa راجل قوجا من هو "a great big man", daṛ quja mən hiyya دار قوجا من هي "a huge house". The origin of quja قوجا is clear: it comes from Turkish koca "large; husband", which in turn is apparently an early adaptation of Persian xɑje خواجه "master, gentleman". In Turkish, all adjectives are prenominal, so one could take that to explain its position in Algerian Arabic; but a quick search suggests that Turkish koca has no problem combining with the indefinite (one finds phrases like bu koca dünya "this huge world"). However, it looks like Algerian quja has followed a trajectory very similar to Iraqi and Khaliji xôš خوش. It is not obvious to me why obligatorily indefinite prenominal adjectives should even be possible in a language that otherwise strictly requires adjectives to be postposed, much less why they should have to be indefinite in order to stay prenominal - but that's what it looks like....

The word məskin مسكين "poor (pitiable)" is not so unusual, lexically speaking; it's just about pan-Arabic. It combines just fine with definite nouns, and takes normal agreement (f. məskina مسكينة, pl. msakən مساكن.) However, it has almost the opposite idiosyncrasy: it doesn't take the definite article, which would be obligatory with any normal adjective whose head is definite (and, if it comes to that, with a noun in apposition to a definite phrase as well). Thus we say bwəʕlam məskin maqdərš yji بوعلام مسكين ماقدرش يجي "poor Boualem couldn't come", even though we would say bwəʕlam əṭ-ṭwil بوعلام الطويل for "tall Boualem" (Boualem the-tall). Why? No idea. Suggestions are welcome!

Monday, August 15, 2016

Microvariation in Dellys Arabic

There are plenty of factors that one naturally expects to condition linguistic variation: age, sex, location, class, ethnicity, religion - in short, any variable such that people are more likely to talk with those who match their value for it than with those who don't. Dellys offers clear examples of several of these:

Age: There's an obvious gap between the generation born before Independence and those born since then, the latter having had much greater freedom of movement and access to media as well as education. Within my extended family, my father's generation all negate verbs indifferently with ma... ši ما...شي or ma... š ما...ش, whereas their children and grandchildren uniformly use only the latter. Similarly, the older generation use mazəlt مازلْت for "I am still...", conjugating it as a verb, while the younger ones consistently use mazalni مازالني; many of the older generation use -ayən ـاين for the dual (eg يوماين yumayən "two days"), while the younger generation all use -in ـين.
Sex: Only women use the exclamation a məħħənti أ محّنتي "oh my goodness!"; only men, as far as I've noticed, use the quasi-expletive jədd جدّ "grandfather" (eg nəħħi jəddu نحّي جدّهُ, approximately "remove the damn thing"). In less integrated French loans, women of my generation or younger use a uvular R, whereas almost all men (and older women) substitute a trill ṛ; this sex differentiation is acquired well before the age of ten.
Location: The most salient distinction at a local level is classic in Maghreb dialectology: urban (more or less pre-Hilalian) vs. rural (Hilalian). People from Dellys proper say qal قال "he said" and ṣab "he found"; people from the villages and small towns around it instead say gal and lga.

Such variation is easily understood. But a lot of variation I'm noticing seems to show no such patterning. Out of three brothers, fairly close together in age and all currently working in the same family business:

Two have baš باش for "so that"; the third - unlike anyone else I know - uses li baš لي باش.
All use lukan لوكان for "if (hypothetical)", but one also uses lakun لاكون and the other yakun ياكون.

Maybe this is somehow explained by their earlier backgrounds - the one who uses li baš لي باش and yakun ياكون had more education, perhaps he picked it up where he went to school, or where he used to work when he was younger? But there are many other variables like this. I similarly don't see any pattern to the choice between bəṛk برْك and kan كان for "only", or yəsħaq and yəsħaj يسحاج for "he needs", or yʊɣləq يُغلق and yəʕləq يعلق for "he closes", or (at least for older speakers)yəqdər يقدر and yənjəm ينجم for "he can". People of the same age and gender, living all their lives less than a kilometer from each other and sometimes even in the same household, consistently use one or the other. Presumably something must explain the difference, but it looks like it would require a pretty intensive social network analysis to find out...

This is actually fairly similar to what Nancy Dorian found for the Scots Gaelic of East Sutherland fisherfolk: "Surprises in Sutherland: Linguistic Variability amidst Social Uniformity". She observes that this kind of variation usually tends to be ignored: "Oftedal, my immediate predecessor in Gaelic dialect studies, noted that the Gaelic of his single source and that of the man’s wife differed in a number of respects, despite the fact that the two had grown up as next-door neighbors; but after noting the existence of such differences in an early footnote, he never referred to the wife’s Gaelic again." While Algerian Arabic is far from endangered, the two situations are not as different as you might think: in both cases, small towns were substantially expanded over the 19th century by rural refugees fleeing land confiscations and wider upheavals, and left to sort out the resulting mess of dialect variation among themselves without that much pressure towards standardization. Perhaps such variables would have correlated more clearly with speakers' background a century ago, and have been left today as relics too scattered by later changes to be assigned a social meaning any longer.

Do these examples of variation seem familiar to you? What kind of individual-level variation have you noticed between friends and family?

Friday, August 12, 2016

Berber feminine nouns in Dellys Arabic: an update

In Dellys, Berber nouns borrowed into Arabic are not very common, and ones that preserve the Berber nominal affixes are even rarer, so I'm always on the lookout for them. A few days ago, listening to my eldest aunt, I heard one that was completely new to me, in an old idiom:

xəlləṭ tazalt u bəḷḷuṭ
خلّط تازالْت وبلّوط
mix up tazalt and oak/acorns (ie mix good with bad)

Tazalt was described as a vine with white flowers; probably the reference is to Cistus (rockrose), whose Kabyle name is tuzzalt, "little iron". Why that would be particularly easy to confuse with an oak tree is beyond me. There are a few other plant and animal names retaining the Berber feminine circumfix t(a)-...-(t), including tirẓəẓt تيرززت (a kind of small wasp), tubrint توبرينْت (a kind of seaweed), taɣanim تاغانيم (a variety of fig, from Berber taɣanimt "small reed"), and originally plural timəlwin تيملْوين (another variety of fig). Otherwise, this circumfix seems to be almost exclusively reserved for abstract nouns referring to negatively judged character traits (see previous posts): eg taɣənnant تاغنّانْت "stubbornness", taklufit تاكلوفيت "meddling", tayhudit تايهوديت "malice", tastutit تاستوتيت "malicious trickiness". An amusing variant on this theme came up recently: taṭnuhist تاطنوهيست "open-mouthed stupidity", presumably a blend of unrecorded *taṭnuhit تاطنوهيست and French -iste. (This in turn derives from ṭnəh "mooring-post", as in "dumb as a post".)

Tuesday, August 09, 2016

Phonics and whole word teaching in Algeria

Just about every parent I've spoken to in Dellys is concerned one way or another about the direction the educational system has been going – over-complex curricula, excessively heavy backpacks, extramural tutoring, discipline, class sizes... How children are taught to read and write looms relatively small among these concerns, except for parents who find their own child having serious difficulties. The more I've learned about this issue, though, the more worrying it seems.

During my brief, unpleasant experience with Algerian education in the late 1980s, reading and writing were taught in much the same way as in my American home school. We learned how to build up letters into words and break down words into letters – in brief, a variant of phonics. Arabic spelling is almost perfectly regular, so this stage is actually significantly easier in Arabic than in English (although this advantage is no doubt more than offset later on by diglossia). Today's Algerian children, however, are taught to memorise words and texts as wholes, and are only exposed to individual letters well after having memorised words containing them – in other words, a rather extreme version of the whole language method. This change of method – imposed not by the controversial current Minister of Education, but by her well-connected predecessor – is enforced by teaching inspectors, who are empowered to penalize efforts to teach in the older way.

This would be all very well if the whole language method were more effective. Unfortunately, as far as I can tell from a quick literature meta-review (and notwithstanding some conspicuous sketchy political exploitation of the issue), the evidence seems to be pretty clear-cut (eg [1], [2], [3]) that including phonics makes reading instruction more effective even in a language as irregularly spelled as English, and tends to favour a primary (if not exclusive) focus on phonic methods in early teaching. In other words, Benbouzid's "modernizing" educational reforms seem likely to have deprived Algerian children of one of the very few advantages they enjoyed over English-speaking children.

A question especially for any readers with a wider background in education: do you know of any good studies of the effectiveness of different methods of teaching Arabic early literacy, preferably carried out within Arabic-speaking countries?

Tuesday, August 02, 2016

More Darja notes: oath complementisers, free choice indefinites, kids' morphology, finger rhymes

Oath complementisers

In North Africa, the oath wəḷḷah والله, literally "by God", is used so frequently to emphasize statements - religious scruples notwithstanding - that a more appropriate synchronic translation might be "seriously". (It can even be used with imperatives, which can hardly be read as committing the speaker to the truth of any given statement.) Perhaps as a result of their high frequency, constructions with wəḷḷah have a number of unique morphosyntactic characteristics. Negation after wəḷḷah uses ma ما alone, whereas in most other contexts negation is bipartite ma... š(i) ما... شي. Positive sentences after wəḷḷah are introduced by what seems to be a complementiser, ɣir غير or la لا, which in other contexts mean "just, only". What struck me this time is that in certain syntactic contexts this complementiser systematically shows up twice, once right after the oath and once at the start of the main clause proper; I've come across this in topics:

wəḷḷah la lyum la sxana والله لا اليوم لا سخانة
by.God just today just heat
By God, today, it's hot.
wəḷḷah ɣir anaya ɣir dərt-ha والله غير أنايا غير درتها
by.God just I.EMPH just did.1sgPf-3FSgAcc
By God, me, I did it.

and in conditionals with the condition preposed:

wəḷḷah ɣir lukan t-dir-ha ɣir nə-ʕṭi-k ṭṛayħa والله غير لوكان تديرها غير نعطيك طرايحة
by.God just if 2Sg-do-3FSgAcc just 1Sg-give-2SgAcc beating
By God, if you do that I'll give you a beating.

In generative grammar, it is generally supposed that sentences are complementiser phrases. The complementiser is unpronounced in normal declarative sentences here, as in many languages, but is pronounced overtly in specific circumstances such as, here, oaths. A popular hypothesis in the cartographic approach to generative grammar proposes that the complementizer phrase needs to be split into a more fine-grained set of projections: Force > Topic > Focus > Topic > Finiteness, following Rizzi 1997. Prima facie, this complementiser-doubling data suggests otherwise: it looks very much as though right-adjunction of both topics and conditions is being handled by embedding the CP within another CP.

Free choice indefinites

In traditional Algerian Arabic, it seems pretty clear that the function of free choice indefinites ("anyone could do that", "take anything (you want)") isn't very strongly grammaticalised. In French, however, it's expressed using a relatively frequent, dedicated series of forms based on "no matter" plus the interrogative pronouns: n'importe qui/quoi/quel "anything, anyone, any..." Younger speakers of Algerian Arabic have borrowed the morpheme n'importe, but not the construction as a whole; instead, they simply prefix n'importe to existing indefinite nominals, in which interrogative pronouns play no role. Thus the phrase I heard today:

fə-z-zit wəlla f næ̃mpoṛt ħaja في الزيت ولا في نامبورت حاجة
in-the-oil or in any thing
in oil or in any thing

More children's morphology

Algerian Arabic has very few native bisyllabic words ending in the vowel u, but in loanwords it's not so unusual; for instance, it uses French triku تريكو (ie tricot) for "t-shirt". The first person singular possessive has two allomorphs: -i after consonants, -ya after vowels. I caught the younger of the two kids mentioned in the last post saying trikuww-i تريكوّي "my T-shirt" and trikuww-ək تريكوّك "your shirt"; his father (and everyone else, as far as I've noticed) says triku-ya تريكويَ and triku-k تريكوك. So it would seem that this kid has reanalysed the word as phonologically /trikuw/. Further inquiries are called for.

This little piggy...

I've encountered two finger rhymes in Algerian Arabic around Dellys; compare them to a Kabyle version below from Hamid Oubagha:

Dellys A	Dellys B	Kabyle
hađa ʕaẓẓi məskin هاذا عزّي مسكين This one is a robin, poor thing	hađa sɣiṛ u ʕaqəl هاذا سغير وعاقل This one is small and gentle	Wa meẓẓiy, meẓẓiy meskin ! This one is small, poor thing!
u hađa ṣbəʕ əssəkkin وهاذا صبع السكّين And this one is the knife-finger	u hađa ləbbas əlxwatəm وهاذا لبّاس الخواتم And this one is the ring-wearer	Wa d Ɛebḍella bu sekkin ! This one is Abdallah of the Knife!
u hađa ṭwil bla xəsla وهاذا طويل بلا خسلة And this one is long without function	u hađa ṭwil u məhbul وهاذا طويل ومهبول And this one is tall and crazy	Wa meqqer, meqqer bezzaf ! This one is big, very big!
u hađa ləħħas əlgəṣʕa وهاذا لحّاس القصعة And this one is the dish-licker	u hađa ləħħas ləqdur وهاذا لحّاس القدور And this is one is the licker of pots	Wa d ameccaḥ n teṛbut ! This one is the dish-licker!
u hađa dəbbuz əlgəmla وهاذا دبّوز القملة And this one is the louse-club	u hađa dəbbuz ənnəmla وهاذا دبّوز النملة And this one is the ant-club	Wa d adebbuz n telkin ! And this one is the lice-club
u yəmma tqul: mʕizati, mʕizati, mʕizati! ويمّا تقول: معيزاتي، معيزاتي، معيزاتي And mother says: my little goats, my little goats, my little goats!	dəbb əđđib, dəbb ənnəmla, dəbb əđđib, dəbb ənnəmla... دبّ الذّيب، دبّ النملة، دبّ الذّيب، دبّ النملة... Debb the wolf, Debb the ant, Debb the wolf, Debb the ant...	(n/a?)

All three clearly share a common background. Obviously, Dellys B has been deliberately made more posh - ants substituted for lice, pots (with urban q) for dishes (with villagers' g), ring-finger for knife-finger... Dellys A remains defiantly unrefined, but shows at least one sign suggesting an original in Kabyle: ʕaẓẓi məskin "a robin, poor thing" makes a lot less sense for referring to the little finger than meẓẓi meskin "small, poor thing", but sounds almost the same. On the other hand, Dellys A shows a near-rhyme between verses 3, 4, and 5 which doesn't work at all in the attested Kabyle version. It would be interesting to compare more versions in both languages