Jabal al-Lughat

Saturday, February 09, 2019

Abdurehim Heyit's "Mother Tongue"

While I was doing my PhD at SOAS, I found myself one term helping teach a field methods class focusing on Uyghur, a Turkic language closely related to Uzbek spoken in Xinjiang in far western China (textbook here). At the time, as far as I gathered, it was a sleepy borderland region; these days, it's best known for the massive reeducation camps into which the Chinese government has thrown a substantial proportion of the population, in what appears to be an ambitious effort to eradicate their language, culture, and religion. ("Kill the Indian to save the man" was the American version.) Today, it's being reported that the talented Uyghur musician Abdurehim Heyit (ئابدۇرېھىم ھېيىت, equivalent to Arabic عبد الرحيم عيد), from Kashgar, died in detention at the age of 55, after two years in the camps. [UPDATE: It now seems that he's alive and still being imprisoned without trial.]

One of his best-known songs, originally a poem by Qutluq Shewqi, is a good fit for this blog: Ana til (ئانا تىل), "Mother Tongue" (lyrics, translation). When he sang it, language was still a relatively politically acceptable element of Uyghur identity to emphasise; traditional Communist Party policy for officially recognised ethnic minorities emphasised development of their languages. Now, with hundreds of thousands of people arbitrarily imprisoned, the rapid loss of language rights is the least of anyone's worries.

ئانا تىل بىلگەن كىشىنىڭ ئىززىتىن قىلغۇم كەلۇر،	I salute the people who speak my mother tongue,
ئانا تىلنى ئاغزىدىن ئالتۇن بەرىپ ئالغۇم كەلۇر.	I am willing to pay in gold for the words they speak.
بۇ ئانا تىل بولسا گەر ئامەرىكا-يۇ ئافرىقىدا،	Wherever my mother tongue is found, be it Africa or America,
سەرپ ئەتىپ مىڭلارچە تىللا ئاندى مەن بارغۇم كەلۇر.	I would go there, whatever the cost and expense.
ئانا تىل بىلگەن كىشىنىڭ ئىززىتىن قىلغۇم كەلۇر،	I salute the people who speak my mother tongue,
ئانا تىلنى ئاغزىدىن ئالتۇن بەرىپ ئالغۇم كەلۇر.	I am willing to pay in gold for the words they speak.
ئەي ئانا تىل بىزگە سەن قالغان ئۇلۇغلاردىن نىشان،	Oh, my mother tongue, you are the sacred bequest to us from our great ancestors,
سەن بىلەن روھىي زىمىندا ئىپتىخارلانغۇم كەلۇر.	With you, I desire to share my pride in you in the spiritual world.

Sunday, January 27, 2019

Hausa in Tamanrasset

On a recent trip to Tamanrasset, Algeria's southernmost significant city, I was not surprised to see lots of signs in Arabic and French, and not too surprised to see a significant minority of signs with Tamahaq (Tuareg) content; if I have the time I'll post later on the Tifinagh alphabet they used, closer to traditional Tifinagh than the version used in the north but still quite conspicuously modernised. But I hadn't fully appreciated how much immigration Tamanrasset attracts from the south these days, and even allowing for that I wasn't expecting to see Hausa signs as well. There was much more Hausa spoken than written, of course - on our brief trip through Tafsit market, I heard probably as much Hausa as Arabic, and even in the upmarket souvenir shops Hausa music was playing some of the time. But one Hausa expression had clearly made its way into the visual linguistic landscape of the town: over and over again, I saw little unpretentious-looking restaurants labelled with various spellings, in both Latin and Arabic script, of the Hausa phrase mai nama, "meat owner" (ie meat seller). Most of my pictures were blurry, but one came out - here it is.

Friday, December 21, 2018

We're all related: a calque from Kabyle into Darja

Algerian Arabic (or at least Dellys Arabic) has a verb for "be related to" (as family): kul كول, taking the dative, as in waš y-kul-lek? واش يكوللك "what relation is he to you?" In the reciprocal form, this yields tkawel تكاول "be related to each other"; "we're related to each other" is ne-tkawl-u نتكاولوا. These only seem to be used in the (present) imperfective; I've never heard anyone say *kal كال.

This verb clearly derives from an Arabic word still used in its own right in Algerian Arabic: kun كون "be", with regular assimilation of n+l to ll and reinterpretation of the root. waš y-kul-lek واش يكوللك "what relation is he to you?" was originally waš y-kun-lek واش يكونلك "what is he to you?" But that construction seems rather odd and unidiomatic from a Classical Arabic perspective. You don't normally use an equational verb "to be" in the indicative present tense like that, in Classical Arabic or even in Algerian Arabic; you would rather expect something with a pronoun, like *wašen huwwa lik واشن هو ليك (which you don't hear). What's going on here?

Flipping through Dallet's (1982) enormous dictionary of Kabyle as spoken by the Ait Menguellet, I came across the answer. The Kabyle verb ili "to be" (imperfective ttili) matches Arabic kun كون fairly well in its usage. In the imperfective, with the dative, it means "be related to" (his gloss: "être parent avec, avoir relation de parenté à"): d acu i-m tettili? "what relation is she to you?") It likewise has a reciprocal myili (imperfective ttemyili) "have in common; be related to each other", which in the latter sense only seems to show up in the imperfective: nettemyili "we're related to each other".

It seems clear that the Algerian Arabic verb derives from an excessively literal translation - a calque - of the Kabyle expression, probably by people whose first language was Kabyle. But since then it's taken its own path; whereas in Kabyle the meaning "be related to" remains a context-specific sense of the verb "be", in Algerian Arabic the change of n to l has allowed it to become an independent lexeme in its own right with no one-to-one Kabyle translation equivalent. Contact catalyses change, but the resulting change follows its own path.

Monday, December 03, 2018

Language attitudes around Paris: a vignette

As we reached the stop by the supermarket the other day, I told my son in English "Now we're getting off the bus." This caught the attention of an elderly man sitting near us, who, as we got off, told him with a smile in accented English "Hello. You speak English - very good!". Turning to me, he asked "Est-ce qu'il parle français aussi ? [Does he also speak French?]"

I assured him that he does, and my son piped up with "Moi je parle trois langues : français; anglais, et arabe [I speak three languages: French, English, and Arabic]". Not to be outdone, the old man replied "Comme moi ; je parle français, anglais, allemand, arabe, et hébreu. [Like me; I speak French, English, German, Arabic, and Hebrew.]" I was duly impressed, and he continued "J'ai grandi à Oran, et j'ai fait mes études à la Sorbonne. [I grew up in Oran, then studied at the Sorbonne.]"

"Ô, moi aussi je suis algérien [Oh, I'm Algerian too]", I replied.

His response: "Ah, est-ce que vous êtes français ou israélite ? [So are you French, or Jewish?]"

My answer "Ni l'un ni l'autre [Neither one]" seemed to come as a surprise... The conversation ended about there, as we went our separate ways, with him saying " تهلّا في روحك thəḷḷa fi ṛuħək [Take care]".

Wednesday, September 19, 2018

Algerian Sign Language

According to Glottolog, the least documented language in Algeria is neither Korandje nor some Berber variety, but rather one that might not immediately leap to mind: Algerian Sign Language. If you have some idea of what to look for, though, there turns out to be a lot more available than might be expected; here's a brief bibliography gleaned from online:

Boutaleb, Djamila. 1987. Les enfants sourds en Algérie : Problèmes d'acquisition de la langue écrite [Deaf children in Algeria: Problems of written language acquisition]. Thèse de doctorat 3e cycle, Université Sorbonne Paris. 408pp. [Abstract: This thesis deals with the problems of deafness in Algeria, more particularly in schools where an attempt is made to pin down the causes of failure in the learning of language by deaf children. In order to understand the difficulties, it had seemed appropriate to examine the problem of deafness itself and its consequences on schooling and social life. This will be the subject of the first part. The emphasis will be on this "difference" which affects primarily the development of language and which may cause schooling delays and create psychoaffective problems and social problems. The current conflict of methods, oralism sign language, makes it possible to reconsider the status of deaf children thanks to the findings of linguistics and the works of psycholinguists and sociolinguists, of whom some current ideas will be presented in this work. In the second part, the deaf community in Algeria will be illustrated with some historical and socio-educational characteristics, for, to know the past and present living conditions of the deaf gives us the means to understand their actual level in the practice of the written language, which will be examined in the third part. The observed difficulties lie at the syntactic level, as well as the lexical, grammatical, and orthographic levels. The choice of deaf francophones, deaf arabophones, and hearing pupils benefits our analysis. This study is made in a pedagogical prospect but is integrated in a set of psycho-sociolinguistic views.]
مديرية النشاط الإجتماعي (الجزائر) [Direction des Affaires Sociales (Algérie)]. n.d. قاموسي الأول في لغة الإشارة : الجزء الاول [My First Dictionary of Sign Language: Volume 1]. Algiers. 50pp.
Djama, Amal. 2016. Les points communs entre la Langue des Signes Algérienne (LSA) - dialecte de Laghouat, Sud de l’Algérie - et la Langue des Signes Française (LSF) [Commonalities between Algerian Sign Language (LSA) - dialect of Laghouat, southern Algeria - and French Sign Language (LSF)]. Dossier, licence SCL « Acquisition et dysfonctionnement » (SCL F14), Licence 3, AMU, Faculté ALLSHS d’Aix-en-Provence. 5pp. [Comparison of 25 signs].
Guiroub, Mustapha. 2010-09-27. «La langue des signes algérienne est une revendication des sourds» [Algerian Sign Language is a demand of the deaf]. El Watan. [Notes that Algerian Sign Language is descended from French Sign Language (LSF), but that about 50% of the vocabulary is different; that there are many differences within Algeria between the North and the South; and that efforts at standardization are being undertaken.]
Lakhfif, Abdelaziz. 2009. Un Environnement de Traduction Automatique du Texte Arabe vers la Langue des Signes Algérienne (LSA) [An Automatic Translation Environment from Arabic Text to Algerian Sign Language (LSA)]. Mémoire de Magistèr en Informatique, Université Badji Mokhtar - Annaba. 134pp. [The only specific information about Algerian Sign Language given is a brief discussion of its legal status, pp. 16-17; as far as I can see, the author seems to have no contact with Algerian signers.]
Mansour, Mohamed Seghier. 2007. Langage et surdité, Description de la langue des signes des sourds oranais [Language and Deafness. Description of Oranais Sign Language]. Mémoire de magistère, Université d'Oran Es-Sénia. 124pp. [An analysis of sign formation in Algerian Sign Language as spoken in Oran, with a brief discussion of syntax, and some background on the language's history taken mainly from Boutaleb (1987).]
Ministère de la Solidarité nationale, de la Famille et de la Condition féminine (Algérie). 2017. Dictionnaire de la langue des signes algérienne : 1560 mots signés les plus usités. Trilingue Arabe - français - langue des signes. 29 thèmes de la vie quotidienne / قاموس لغة الإشارة الجزائرية : 1560 كلمة الأكثر استعمالا. ثلاثي اللغة : عربي - فرنسي - لغة الإشارة. 29 موضواعا من الحياة اليومية [Dictionary of Algerian Sign Language: 1560 most used signs. Trilingual Arabic-French-Sign language. 29 themes from daily life].
Ministère de la Solidarité Nationale (Algérie). 2008. Langue des signes algerienne : Guide de recherche et de recueil des signes [Algerian Sign Language: Guide for research and sign collection]. Algiers. 50pp.
Ministère de la Solidarité Nationale (Algérie). 2008. La langue des signes [Sign language]. Algiers. 14pp.

And - perhaps more usefully - a brief videography:

[Anonymous]. 2010-2018. SOURD ALGERIENNE. YouTube.
Daham, Abdelrazek. 2017. abdelrazek daham. YouTube.
Djama, Amal. 2016. LSA : Langue des Signes Algérienne (LSA), dialecte de Laghouat, Sud de L’Algérie. Aix-Langue des signes, Aix-Marseille Université.
Kahal, Zoheir. 2014-2015. Kahal Zoheir. YouTube.
[Anonymous]. 2015-2018. Langue des signes algérienne. Facebook.

Let's round this off with a school:

Ecole Algérienne de la Langue des Signes. 08 rue Arezki Hamani (ex-Charris ) Alger-Centre, 16000 Algiers, Algeria.

Of course, one obvious question remains open: is there really just one sign language in Algeria?

Thursday, September 06, 2018

Baghrir as a battleground

I really don't have the time to post these days, but I couldn't resist letting you all know about a strong contender for the most ridiculous language-related controversy I've ever seen: the Moroccan baghrir scandal. "Baghrir" بغرير, as North Africans will know, is a kind of delicious pancake typically served with honey. A recent Moroccan primary school textbook incorporates (presumably for the first time) a picture of this regional delicacy, captioned with its name: الْبَغْرِيرُ. Pretty banal, right?
Image result for ‫بغرير دراسية‬‎

Apparently not. A furore erupted on social and traditional media, as ordinary people and self-styled experts lined up to lambast the Ministry of Education for this shocking betrayal of the Arabic language. Fouad Bouali of the National Coalition for the Arabic Language fulminated: "Citizens don't need "baghrir" or "slou" in their school texts... "The use of folk expressions in school texts caps the tendency towards dialectalization". Prof. Mohamed Nabil Esrifi of Ibn Tofail University wrote a letter to the head of government:

In shock, a shock whose bitterness I share with millions of Moroccan citizens, at the insertion of "popular Darija" expressions in approved school texts, which has spoiled our appetite for our favourite dishes such as "briwat" and "baghrir" and "ghribia", and raised our blood pressure and body temperature... I address you this letter to put an end to the rise in idleness and neglect of the career of a whole generation, the dissolution of educational content, and the stultification of the act of education."

Naturally, some took a more sensible view, such as Dr. Mohsen Akramine:

"We cannot reject a group of names that we commonly use in our social life on a regular basis, such as "briwat bellouz" and "baghrir" and "ghribia". so what problem does the term "baghrir" create in nurturing learning in class?... We cannot restrict the Arabic language to the language of "the lion and the blade" [two archaic terms are used, but English is rather poor in synonyms for "lion"]; the Arabic language cannot remain stuck in the classroom and the lecture hall, completely isolated from our daily business."

As far as I can see, none of the shocked citizens complaining about this has ventured to suggest an acceptable Standard Arabic synonym for "baghrir". Nor have they ever objected to the word's routine presence in cookbooks - which, along with schoolbooks and religious texts, are basically what keep most North African bookshops alive. So is the idea simply that this impure term must be kept out of the sacred space of the classroom, or what? My mind is boggled - or, as they say in Algeria, مُخّك يحبس.

Sunday, August 19, 2018

Bilingual suppletion comes from selection conflicts: Supporting evidence from Pichi

Azeb Amha recently directed my attention to a very interesting passage in Kofi Yakpo's grammar of Pichi, the English-based creole spoken in Equatorial Guinea. In this language, English-based numerals are used up to seven (and known in theory by some speakers up to ten), while Spanish numerals are familiar to all speakers and are used consistently above seven. The English numerals are followed by singulars (plural marking in Pichi is handled with postposed dɛ̀n, and occasionally the suffix -s):

So à dɔn gɛt tri nacionalidad nà dis wɔl.

so 1SG.SBJ PRF get three nationality LOC this world

‘So I have three nationalities in this world.’ [fr03ft 102]

When Spanish numerals are used, however (p. 545), we get "bilingual suppletion" (Matras 2012) - i.e., a grammatical rule of one language that seems to require switching into another one:

The attributive use of Spanish numerals goes along with the insertion of Spanish head nouns – there is no instance of a mixed combination of a Spanish numeral and a Pichi noun:

Lɛf=àn mek è rich a los quinze años.

leave=3SG.OBJ SBJV 3SG.SBJ reach to the.PL fifteen years

‘Leave her, let her reach [the age of] fifteen years.’ [ab03ay 138]

In Spanish, of course, numerals other than 1 select for plural nouns.

Now I would prefer to see a wider range of examples before reaching any firm conclusions, because counters like "years" are inherently more likely to cause borrowing of numeral+noun units. But, as described, this language precisely fits the explanation proposed for bilingual suppletion in Souag & Kherbache (2016), based on Myers-Scotton's Embedded Language Island Hypothesis:

[W]here bilingual suppletion in numeral+noun combinations emerges, it will occur only following borrowed numerals whose noun selectional requirements in the source language differ from those in the recipient language.

I was, of course, unaware that Pichi displayed bilingual suppletion when I proposed this generalization, so I take this as corroborating evidence. I would be interested to hear of any further examples.

Tuesday, June 26, 2018

Yaqṭīn as substratum vocabulary?

A strong contender for the most obviously ridiculous etymology in Jeffery's The Foreign Vocabulary of The Quran is his attempt to derive yaqṭīn "gourd" from a "garbled form" of Hebrew qîqāyôn (p. 309). Is it possible to do better?

Like ḍarīʕ, yaqṭīn is barely attested in early Islamic-era literature apart from Qur'ānic allusions and botanical texts. However, in this case the grammarians also take an interest, due to the word's slightly unusual form. Sībawayh (d. 796) notes it as one of two nouns of the form yaCCīC(the similar pattern yaCCūC, mainly for animal names, is more productive), along with a yellow-flowered desert plant called yaʕḍīd (Launaea mucronata). The latter word is well-attested in modern Arabic dialects, eg Najdi ʕaḍīd - and has passed into Korandje, the Songhay language of an oasis in southwestern Algeria, as yaʕḍud; I first heard it there, in a chant from a children's story:

aɣ a išən kadda, I'm a little goat,
aɣ a nɣa tantərama, I eat tantərama,
aɣ a nɣa lyaʕḍud, I eat Launaea.

Now, yaʕḍīd is presumably derived from the root ʕḍd, "support" (etc.); despite its scrawniness, the plant holds itself well above the ground. A Hebrew or Aramaic origin is obviously out of the question, given the ḍ. Ibn Durayd (d. 933) cites a third word of this form whose origin is clearer: yaʕqīd "thickened (crystallized?) honey", related to 'aʕqada "thicken (a liquid)" (ويَعْقيد: عسل يُعقد حتى يَخْثُر). By analogy, one would expect yaqṭīn to be derived from the root qṭn, and this is exactly what al-Zamakhsharī (d. 1144) not unreasonably proposes:

واليقطين: كل ما ينسدح على وجه الأرض ولا يقوم على ساق كشجر البطيخ والقثاء والحنظل، وهو يفعيل من قطن بالمكان إذا قام به. وقيل هو: الدباء.
Yaqṭīn is anything that sprawls on the surface of the earth and does not stand on a stalk, like the melon and the snake cucumber and the colocynth. It is (of the form) yaCCīC, from qṭn, "it dwells/settles" in a place if it comes up there. It is also said to be the gourd.

However, the fact that Arabic has only three words of this form - two of them plant names, and one related to honey extraction - should arouse our suspicions. If a language has a small class of morphologically anomalous nouns all relating to wild food-gathering activities, the hypothesis that should immediately spring to mind is: this is substratum vocabulary. In other words, these three words - especially yaqṭīn and yaʕḍīd - should be suspected of being borrowings, not from some garbled Hebrew source, but from the indigenous Semitic languages spoken in the Arabian peninsula before the spread of Arabic. If so, Western Qur'ān studies' excessive focus on written sources seems more likely to obscure linguistic history than to reveal it.

(Yes, you didn't misread that - epigraphic evidence suggests that Arabic expanded from northwestern Arabia into the rest of the peninsula within historic times. Ahmad Al-Jallad has been doing some interesting work on this issue, summarized briefly on this Twitter thread.)

Sunday, June 24, 2018

Yūnus/Jonah viewed through hapaxes

The Qur'ān is not intended as an account of events. Rather than being organised around narratives, it typically brings up apparently familiar narratives in support of points being made. Yūnus/Jonah, for example, is mentioned by name 4 times, and by epithet another 2 times. Two of these mentions give no details of his story at all (4:163, 6:86). 10:98, 21:87-88, and 67:48-49 only briefly summarise specific aspects of the story. 37:139-148 recounts the story as a whole, but in such an abbreviated form as to presuppose that at least part of the audience had already heard a fuller version. Can anything about that version be determined from the text of the Qur'ān?

Hapaxes - words that occur very rarely or only once in the text - offer an interesting window on the problem (see also previous posts: ضريع, قسورة). Apart from the name Yūnus (Jonah) itself, four words are attested in the Qur'ān only within accounts of Jonah. The oldest attested form of his name is Yônāh, which in Greek yields Iônas (ιωνας) in the nominative (the -s is a widespread Indo-European nominative singular suffix); the final s in Yūnus thus suggests that the audience's knowledge of Jonah came in part via Greek intermediaries at some remove. "Fish" is normally ḥūt حوت in the Qur'ān, including in accounts of Jonah (Standard Arabic samak سمك is unattested in the text), but in 21:87 Jonah is alluded to as ḏā n-nūn ذا النون "he of the fish", the only occurrence in the Qur'ān of the Aramaic loanword nūn. The fish swallowed (iltaqamat التقمت) Jonah in 37:142; the only other mention of swallowing in the Qur'ān uses a word much better attested in modern Arabic dialects, balaʕa بلع (11:44:3). After praying to God for release, he is then cast out onto the shore, for which both 37:145 and 68:49 use the more specific term ʕarā' عراء, ie barren land. Eventually God causes a gourd - yaqṭīn يقطين - to grow over his head; this is the only Qur'ānic mention of the plant in question.

Compare the relevant terms in various early Semitic versions of the Book of Jonah:

	Jonah	fish	swallow	land sp.	plant sp.
Arabic	Yūnus يونس	ḥūt حوت / nūn نون	iltaqama التقم	ʕarā' عراء (barren land)	yaqṭīn يقطين (gourd)
Hebrew	Yônāh יוֹנָה	dāḡ דָּג	bālaʕ בָּלַע	yabbāšāh יַבָּשָׁה (dry land)	qîqāyôn קִיקָיוֹן
Babylonian Jewish Aramaic	Yônāh יוֹנָה	nūnā נוּנָא	blaʕ בְּלַע	yabbeštā יַבֶּשׁתָּא (dry land)	qîqāyôn קִיקָיוֹן
Syriac	Yawnān ܝܘܢܢ	nūnā ܢܘܢܐ	blaʕ	yaḇšā ܝܒܫܐ (dry land)	qar'ā ܩܪܐܐ (gourd)
Geez	Yonas ዮናስ	ʕanbari ዐንበሬ (whale)	wəxṭä ውኅጠ	mədər ምድር (land)	ḥamḥam ሐምሐም (gourd)

One immediately notices that none of them match the Qur'ān as a whole at all well. For "Jonah", only Geez (Ethiopic) offers a similar Greek-influenced term, contrasting with the obvious Aramaic source of nūn for "fish". For "swallow", the Hebrew and Aramaic/Syriac versions all use a word whose direct cognate - balaʕa - is attested elsewhere in the Qur'ān, and is very familiar in Arabic; why then does the more vivid term iltaqama (something like "take in as a morsel") appear? For the land onto which Jonah is cast, the Qur'ān twice uses a specific term incorporating a detail absent from any of these versions of the Book of Jonah, all of which use a generic term for "dry land" or even just "land"; why is this used rather than 'arḍ or even the cognate yābisah?

The conclusion seems obvious: none of these translations were at all prominent for the Arab audience to whom the Qur'ān was first addressed. Whatever its distant roots may have been, the account of Jonah they knew best was something orally transmitted in Arabic, and not directly based on any one of these.

Sunday, June 10, 2018

fatta: a loan from Chadic into Songhay?

The Proto-Chadic word for "go out" was reconstructed by Newman and Ma (1966) as *p-t-, with attested reflexes in all primary subgroups of the family; the best known of these is of course (West Chadic A.1) Hausa fìtā. The vowels vary across languages, and there is often no final vowel. Only one subgroup, as far as I can see on a quick check, shows the consistent vocalisation *patā: the Bole languages (West Chadic A.2), spoken in Nigeria's Yobe State along the boundary between Hausa and Kanuri. Thus Bole pàtā, Ngamo hàtâ, Karekare fàtā.

Most Songhay varieties have reflexes of two near-synonyms for "go out": *hùnú and *fáttá. Usually, the distinction seems to be roughly "leave (a place or event)" vs. "go out of (an enclosed or concealed space)". In Northern Songhay - the subgroup most isolated from the rest for longest, spoken in the Sahara - only reflexes of *hùnú seem to be attested, covering both senses (eg Korandje hnu). This could be interpreted as reflecting Northern Songhay's general tendency to reduce its inherited vocabulary by widening the usage of generic terms. In light of the Chadic data, however, it is tempting to interpret it the other way around: did Northern Songhay preserve the original situation, while a West Chadic borrowing spread throughout the rest of the family via the Niger River?

Saturday, June 09, 2018

Songhay glosses in Djenne manuscripts

Djenne, in central Mali, is one of the oldest cities in West Africa; it also happens to be the westernmost Songhay-speaking town, isolated in a predominantly Bozo area. As an old regional centre of Islamic learning, it has rather a lot of manuscripts, most still in the hands of local families rather than taken over by official heritage-keepers. 56 family collections of manuscripts in Djenne have recently been digitised and made available online, at the Djenne Manuscript Library Collection. Searching through this amazing resource is a bit of an adventure, since a lot got lost in the translation of the metadata (for instance, this manuscript labelled as Intercession is actually a list of tribe names). But doing so has potential rewards for the historical linguist as well as for the historian: scattered through the manuscripts are very occasional marginalia in local languages.

The first examples I've managed to find come from a late 19th or early 20th century manuscript of 8 pages, belonging to the family of Alphamoye Baber Djenepo, to which the cataloguers gave the title مكتوب في اللغة "writing on language" (which, after passing through a layer or two of translation, ended up in English as "Philology"). It's an obviously incomplete part of an alphabetical poem (unknown to Google) recounting the life of the Prophet, which gives for each letter of the Arabic alphabet in order a section rhyming in that letter. The language is somewhat obscure, and is copiously annotated - mainly in Arabic, but every so often in Songhay.

On p. 8, for instance, we see the Arabic word تَعَوُّذِ "seeking God's protection" glossed with the Songhay word sumburku "holy formula, spell":

On p. 9 of the same, we see Arabic نَادِ "caller" glossed with Songhay kaati "call, shout":

This particular example is too recent to contribute much to Songhay philology, but it at least proves that Songhay was used to gloss manuscripts in Djenne, and suggests that it would be worth looking through the collection for other examples.

(Added after posting): On p. 5, we find Arabic تمساح "crocodile" glossed with Songhay kaarey "small crocodile sp.":

(PPS): And in this undated fragment of Maqamat al-Hariri, p. 4, we find another identifiable Songhay gloss (or at least a word found in Djenne Chiini): tangara for قضيب "rod, staff", followed by عجم "non-Arab" to make its status clearer:

Friday, June 01, 2018

Drawing water in Songhay and Zenaga

Almost every attested Songhay variety (Tasawaq is perhaps the only exception) has a reflex of the proto-Songhay word *gúrú "draw water" (from the river, from a pond, from a well, etc.) To express this concept, most Berber varieties (including Tashelhiyt, Kabyle, Tumzabt, Ghadames, Awjila, Tamajeq...) use reflexes of a verb *āgum "draw water", which is thus equally securely reconstructible for proto-Berber. Zenaga, however, has a rather different verb: ägur "puiser l'eau d'un puits, remonter le delou, tirer la corde du seau; faire parvenir qqc (à qqn)" and "se lever (astre)", with an irregular corresponding noun tgäʔrih "eau tirée du puits". It seems to be distinct from äggur "pull".

The only Berber cognates Taine-Cheikh suggests for ägur are reflexes of a verb that may be reconstructed as *agir "throw; rise (of sun)" (eg Tashelhiyt gr, Kabyle gər, Chaoui gər). Presumably the semantic shift of "throw" to "draw water" would be explained via the idea of throwing the bucket down the well. If the comparison is accepted, then the verb shows an innovative semantic shift specific to Zenaga. (It would be interesting to see if Tetserrét shares this, but unfortunately the relevant term doesn't seem to have been recorded.)

If the Zenaga word is indeed cognate to the suggested Berber forms, then it seems reasonable to draw the conclusion that proto-Songhay borrowed *gúrú "draw water" from an early relative of Zenaga. This would fit well with the evidence for a Western Berber language having played an important role in the history of at least northern Mali. If not, then it would become tempting to draw a conclusion much harder to fit with what is known of the region's history: that Zenaga borrowed the word from proto-Songhay.

Tuesday, May 29, 2018

Zenaga dialectal reflexes of ʔ, :

For the purposes of Berber historical linguistics, arguably the most important thing about Zenaga is its thoroughgoing retention of the glottal stop. Some Zenaga glottal stops derive from *q, corresponding to ɣ elsewhere in Berber, but many derive from *ʔ, lost without trace in most Berber varieties. When a rather carefully transcribed new source of dialectal Zenaga data comes to light, it thus seems logical to start by seeing how the glottal stop is reflected there. For convenience, I restrict this first pass to two of Ahmadou Ismail's wordlists: body parts, and herding vocabulary. The results are fairly clear.

In general, Taine-Cheikh's Vʔ corresponds regularly to Ismail's V:, with the length clearly marked, as distinct from Taine-Cheikh's short V, which Ismail consistently transcribes short. Thus:

	Ismail	Taine-Cheikh
young camel	awāra	äwaʔräh
waterbag	āga	äʔgäh
moustache	āya	aʔyäh
donkey m.	ājji	aʔž(ž)iy
donkey f.	tājil	taʔž(ž)əL
beard	tāmmart	taʔmmärt
camels	īyman	iʔymän
cows	tiššīđan	ətšiʔđaʔn / ətšiʔđän
lamb	hīmmar	iẕ̌iʔmär
donkey foal	īgiyu	iʔgiyi
shoulder(blade)	tūṛiḍ	toʔṛuḌ
donkeys	ūjjayan	uʔž(ž)äyän
shoulder(blade)s	tūrdin	tuʔṛäđän

There are only two contexts where this correspondence does not hold. In the context / _C#, if C is a stop or fricative, Ismail retains the glottal stop; if C is a sonorant, it disappears without affecting vowel length. (More examples of this context would be useful to confirm the exact conditioning.)

spring	taniʔđ	täniʔḏ
cow	taššiʔđ	täšši
head	iʔf	iʔf
camel	ayyim	äyiʔm
camel f.	tayyimt	täyi(ʔ)mt

Word-finally, the variety Taine-Cheikh describes has no overtly realised glottal stops (*ʔ > Ø / _#); the contrast, however, is maintained, since all originally vowel-final words now end in h (*V > Vh / _#). In Ismail's dialect, the latter change never happened:

waterbag	āga	äʔgäh
moustache	āya	aʔyäh
young camel	awāra	äwaʔräh
stomach	taxṣa	taḫs(s)äh
goat	tikši	təkših
ewe	tīyyi	tīyih

Nevertheless, the two classes have not completely merged; final *i remains i, but final *iʔ becomes u:

billy-goat	ahayu	äẕ̌äyi
mouth	immu	əmmi
tooth	awkšu	äwkši
tongue	itšu	ətši
donkey foal	īgiyu	iʔgiyi
calf	īrku	īrki

In the variety Taine-Cheikh describes, long vowels derive not from *Vʔ but from *Vh (ultimately *Vβ). Given that vowel length can be a reflex of a former glottal stop in Ismail's dialect, the next thing we need to check is what happens to *Vh there; it turns out that there too it yields long vowels:

small cattle	tākšin	tākšən
calf	īrku	īrki
ewe	tīyyi	tīyih
nostril	tīnhart	tīnẕ̌ärt
nose	tīnharin	tīnẕ̌ärän

The regularity of these correspondences is a testimony to the accuracy of both parties' work, and confirms the value of Zenaga as a data source for Berber historical phonology.

Monday, May 28, 2018

A "crazy rule" in Zenaga

As part of what seems to be a solo documentation effort, Ahmadou Ismail has been posting some very interesting tidbits on Zenaga (in Arabic). The dialect reflected differs in some ways from the one reflected in Catherine Taine-Cheikh's publications. One of the more conspicuous differences is in the fate of proto-Berber *z. For Taine-Cheikh, *z > ẕ̌ in general (a slightly lowered ž), but *zt > Z (a tautosyllabic geminate zz). In Ahmadou Ismail's dialect, *zt > zz as with Taine-Cheikh, but otherwise *z > h, eg tihigrarin "tarawih prayers" vs. Taine-Cheikh's təẕ̌əgrärən, hīmmar "lamb" vs. Taine-Cheikh's iẕ̌iʔmär, awahiđ̣ "rooster" vs. äwäẕ̌uđ̣, yahinha "he sold" vs. yäžžənẕ̌äh. This leads to systematic alternations between h and zz; synchronically, Ismail's dialect of Zenaga has the "crazy rule" ht > zz. This is nicely illustrated by "he knew" (Taine-Cheikh: yuʔgäẕ̌) plus the direct object personal pronoun clitics:

"he knew me": yūgah-i
"he knew you m.": yūgah-ku
"he knew you f.": yūgah-kam
"he knew him": yūgaz-zu
"he knew her": yūgaz-zað
"he knew us": yūgah-ānag
"he knew you m.pl.": yūgah-kūn
"he knew you f.pl.": yūgah-kimmið
"he knew them m.": yūgaz-zin
"he knew them f.": yūgaz-zincað (maybe; not quite sure how چَّٰ is supposed to be read)

For forms without assimilation, compare, as posted by someone else on the same group (Omar Sidi Mohamed), "he was owned by" (Taine-Cheikh yənšäg):

"he was owned by me": yiššag-i
"he was owned by you m.": yiššak-ku
"he was owned by you f.": yiššak-kam
"he was owned by him": yiššak-tu
"he was owned by her": yiššak-tað
"he was owned by us": yiššag-ānag
"he was owned by you m.pl.": yiššak-kūn
"he was owned by you f.pl.": yiššak-kamað
"he was owned by them m.": yiššak-tan
"he was owned by them f.": yiššak-tinyað

Tuesday, May 22, 2018

Pougetoux

Ever since she got interviewed on TV ten days ago, the 19-year-old president of the student union at Université Paris-Sorbonne, Maryam Pougetoux, has been making headlines - not for anything she said, but simply for wearing a hijab while she said it. In the name of defending freedom and feminism, the Minister of the Interior himself had the gall to criticise this brave young Frenchwoman as "marking her difference from French society". But as a historical linguist watching all this, I found myself wondering: where does the name "Pougetoux" come from? It turns out it can be traced several thousand years back:

Pougetoux is a diminutive of:
Pouget, which is a diminutive (in -et) of:
Occitan puech / pueg / puog / poujhë "hill", which comes from:
Latin podium "balcony", which comes from:
Greek πόδιον "foot of a vase", a diminutive (in -ion) of:
Greek πούς "foot", which comes from:
Proto-Indo-European *pod-s "foot"

In the course of this long history, no less than three different diminutive suffixes have been accreted on to the original root (although I'm not quite sure about the identity of that -oux.) I wonder whether that generalizes; do words meaning "hill" tend to accrete more and more diminutive suffixes as they develop over time?

Tuesday, May 08, 2018

Songhay viewed through PCA

Playing around a bit more with PCA, I decided to apply the method* to a dataset I've worked with more extensively: Songhay, a compact language family spoken mainly in Niger and Mali. On a hundred-word list (Swadesh with a few changes), randomly choosing one form in cases of synonymy and including borrowings, I get the following table of lexical cognate percentages:

	Tabelbala	Tadaksahak	Tagdal	In-Gall	Timbuktu	Djenne	Kikara	Hombori	Zarma	Djougou
Tabelbala	1	0.678	0.67	0.687	0.636	0.667	0.625	0.622	0.616	0.602
Tadaksahak	0.678	1	0.857	0.8	0.63	0.635	0.567	0.576	0.58	0.586
Tagdal	0.67	0.857	1	0.857	0.632	0.649	0.579	0.588	0.582	0.588
In-Gall	0.687	0.8	0.857	1	0.65	0.667	0.598	0.606	0.6	0.606
Timbuktu	0.636	0.63	0.632	0.65	1	0.979	0.773	0.808	0.79	0.778
Djenne	0.667	0.635	0.649	0.667	0.979	1	0.753	0.789	0.771	0.768
Kikara	0.625	0.567	0.579	0.598	0.773	0.753	1	0.835	0.814	0.823
Hombori	0.622	0.576	0.588	0.606	0.808	0.789	0.835	1	0.838	0.867
Zarma	0.616	0.58	0.582	0.6	0.79	0.771	0.814	0.838	1	0.808
Djougou	0.602	0.586	0.588	0.606	0.778	0.768	0.823	0.867	0.808	1

Running this through R again to get its eigenvectors, the first two principal components are easily interpretable:

PC1 (eigenvalue=7.3) separates Songhay into three low-level subgroups - Western, Eastern, and Northern, in that order - with an obvious longitude effect: it traces a line eastward all the way down the Niger river, jumps further east to In-Gall, and then proceeds back westward through the Sahara.
PC2 (eigenvalue=1.1) measures the level of Berber/Tuareg influence.

All the other eigenvectors have eigenvalues lower than 0.4, and are thus much less significant.

The resulting cluster patterns have a strikingly shallow time depth; as in the Arabic example in my last post, this method's results correspond well to criteria of synchronic mutual intelligibility (Western Songhay is much easier for Eastern Songhay speakers to understand than Northern is), but it completely fails to pick up on the deeper historic tie between Northern Songhay and Western Songhay (they demonstrably form a subgroup as against Eastern). It's nice how the strongest contact influence shows up as a PC, though; it would be worth exploring how good this method is at identifying contact more generally.

* Strictly speaking, this may not quite count as PCA - I'm starting from a similarity matrix generated non-numerically, rather than turning the lexical data into binary numeric data and letting that produce a similarity matrix.

Update, following Whygh's comment below: here's what SplitsTree gives based on the same table:

Monday, May 07, 2018

Some notes on PCA

(Exploratory notes, written to be readable to linguists but posted in the hope of feedback from geneticists and/or statisticians - in my previous incarnation as a mathmo, I was much more interested in pure than applied....)

Given the popularity of Principal Component Analysis (PCA) in population genetics, it's worth a historical linguist's while to have some idea of how it works and how it's applied there. This popularity might also suggest at first glance that the method has potential for historical linguistics; that possibility may be worth exploring, but it seems more promising as a tool for investigating synchronic language similarity.

Before we can do PCA, of course, we need a data set. Usually, though not always, population geneticists use SNPs - single nucleotide polymorphisms. The genome can be understood as a long "text" in a four-letter "alphabet"; a SNP is a position in that text where the letter used varies between copies of the text (ie between individuals). For each of m individuals, then, you check the value of each of a large number n of selected SNPs. That gives you an m by n data matrix of "letters". You then need to turn this from letters into numbers you can work with. As far as I understand, the way they do that (rather wasteful, but geneticists have such huge datasets they hardly care) is to pick a standard value for each SNP, and replace each letter with 1 if it's identical to that value, and 0 if it isn't. For technical convenience, they sometimes then "normalize" this: for each cell, subtract the mean value of its (SNP) row (so that the row mean ends up as 0), then rescale so that each column has the same variance.

Using this data matrix, you then create a covariance matrix by multiplying the data matrix by its own transposition, divided by the number of markers: in the resulting table, each cell gives a measure of the relationship between a pair of individuals. Assuming simple 0/1 values as described above, each cell will in fact give the proportion of SNPs for which the two individuals both have the same value as the chosen standard. Within linguistics, lexicostatistics offers fairly comparable tables; there, the equivalent of SNPs is lexical items on the Swadesh list, but rather than "same value as the standard", the criterion is "cognate to each other" (or, in less reputable cases, "vaguely similar-looking").

Now, there is typically a lot of redundancy in the data and hence in the relatedness matrix too: in either case, the value of a given cell is fairly predictable from the value of other cells. (If individuals X and Y are very similar, and X is very similar to Z, then Y will also be very similar to Z.) PCA is a tool for identifying these redundancies by finding the covariance matrix's eigenvectors: effectively, rotating the axes in such a way as to get the data points as close to the axes as possible. Each individual is a data point in a space with as many dimensions as there are SNP measurements; for us 3D creatures, that's very hard to visualise graphically! But by picking just the two or three eigenvectors with the highest eigenvalues - ie, the axes contributing most to the data - you can graphically represent the most important parts of what's going on in just a 2D or 3D plot. If two individuals cluster together in such a plot, then they share a lot of their genome - which, in human genetics, is in itself a reliable indicator of common ancestry, since mammals don't really do horizontal gene transfer. (In linguistics, the situation is rather different: sharing a lot of vocabulary is no guarantee of common ancestry unless that vocabulary is particularly basic.) You then try to interpret that fact in terms of concepts such as geographical isolation, founder events, migration, and admixture - the latter two corresponding very roughly to language contact.

The most striking thing about all this, for me as a linguist, is how much data is getting thrown away at every stage of the process. That makes sense for geneticists, given that the dataset is so much bigger and simpler than what human language offers comparativists: one massive multi-gigabyte cognate per individual, made up of a four-letter universal alphabet! Historical linguists are stuck with a basic lexicon rarely exceeding a few thousand words, none of which need be cognate across a given language pair, and an "alphabet" (read: phonology) differing drastically from language to language - alongside other clues, such as morphology, that don't have any immediately obvious genetic counterpart but again have a comparatively small information content.

Nevertheless, there is one obvious readily available class of linguistic datasets to which one could be tempted to apply PCA, or just eigenvector extraction: lexicostatistical tables. For Semitic, someone with more free time than I have could readily construct one from Militarev 2015, or extract one from the supplemental PDFs (why PDFs?) in Kitchen et al. 2009. Failing that, however, a ready-made lexicostatistical similarity matrix is available for nine Arabic dialects, in Schulte & Seckinger 1985, p. 23/62. Its eigenvectors can easily be found using R; basically, the overwhelmingly dominant PC1 (eigenvalue 8.11) measures ~~latitude~~ longitude, while PC2 (eigenvalue 0.19) sharply separates the sedentary Maghreb from the rest. This tells us two interesting things: within this dataset, Arabic looks overwhelmingly like a classic dialect continuum, with no sharp boundaries; and insofar as it divides up discontinuously at all, it's the sedentary Maghreb varieties that stand out as having taken their own course. The latter point shows up clearly on the graphs: plotting PC2 against PC1, or even PC3, we see a highly divergent Maghreb (and to a lesser extent Yemen) vs. a relatively homogeneous Mashriq. (One might imagine that this reflects a Berber substratum, but that is unlikely here; few if any Berber loans make it onto the 100-word Swadesh list.) All of this corresponds rather well to synchronic criteria of mutual comprehensibility, although a Swadesh list is only a very indirect measure of that. But it doesn't tell us much about historical events, beyond the null hypothesis of continuous contact in rough proportion to distance; about all you need to explain this particular dataset is a map.

(NEW: and with PC3:)

Wednesday, April 04, 2018

Songhay crows and Korandje ravens

In Niamey, where I went last week for a workshop on Songhay as a cross-border language, the crows do something I've never seen them do in any other country: they come to the window and start tapping on the glass, like something out of Edgar Allen Poe. The reaction of my fellow attendees taught me a new Songhay word - gaaru-gaaru "pied crow" (Heath 1998) - which in turn revealed a new Korandje etymology. In Korandje, "raven" is gạḍi. The shift of intervocalic *d to r in mainstream Songhay is well-established (Nicolaï 1981). But the vowels are more interesting.

Korandje ạ usually derives from *ar or *or. In several inherited Songhay words, however, ạ seems to derive from *a not followed by *r: thus kạṣ-əw "rough" < kas-ow, bạzu "skin bucket, waterbag" < baasu, hạmu "meat" < *hamu, kə̣kkạbu "key" < *karkabu. Yet *a otherwise usually yields a in similar contexts: contrast gani "louse" < *gani, akama "wheat" < *alkama, dzam-a "do it" < *dam-a. It looks as though the vowel in the following syllable is what makes the difference: if it's rounded, you get ạ, otherwise you get a (though one or two exceptions suggest that the story may be more complicated: notably, "difficult" is gab-ə̣w < *gab-ow.) Assuming this rule, *gaadu should regularly have yielded gaaru in mainstream Songhay and gạḍu in Korandje.

What we actually get, however, is gạḍi. Why? Well, Korandje has a rule of final high vowel deletion phrase-internally: if a word ends in i or u, its final vowel will be deleted unless it comes before a pause, ie most of the time. (Basically the opposite of Classical Arabic.) In a number of words, this seems to have led to confusion between original -i, -u, and consonant-final words. For instance, ạṣạnkri "skink" comes from Berber asrmkal, which should regularly have yielded ạṣạmkər; the i is unetymological (Souag 2015). In effect, speakers must have been hypercorrecting final high vowels - a fact which suggests that, if Korandje survives, it may be on its way towards phonologically losing them altogether, much as Classical Arabic did with final short vowels.

Monday, March 19, 2018

English spelling traces in Algerian placenames

Going east of Algiers along the coast, the names of two little port towns stand out. Their inhabitants know them as جنّات /d͡ʒənnat/ (sometimes جنّاد /d͡ʒənnad/) and دلّس /dalləs/ (or الدّلّس /ddalləs/). Those names would normally be transcribed in French as *Djennat (if not *Djennette) and *Delless. Yet in French - and hence, given the region's colonial history, in most Western languages - they are in fact written as Djinet and Dellys; the latter at least is very often even (mis)pronounced accordingly as /dɛlis/. French i and y are both normally pronounced /i/; why on earth would Frenchmen write the schwa /ə/ of these names in this way, when French has a schwa and normally writes it as e?

The most likely answer is that they didn't. Rather, they adopted or adapted these placenames' spelling from English - specifically, from the widely translated work of Thomas Shaw, an English reverend and Oxford fellow who spent several years in Algeria in the early 1700s, a century before France occupied Algiers. He spelt the two towns' names as Jinnett and Dellys respectively - a spelling which, in English, yields the almost exactly correct pronunciations /d͡ʒɪnɛt/ and /dɛlɪs/.

Shaw's book was translated into French by 1743, and the translator retained the English spellings of both names. In a later edition no doubt prompted by the French invasion (1830), Jinnett got amended to Djinnett - someone had finally got around to noticing that English j is pronounced like French dj, not like French j. The doubled letters, useful for indicating vowel quality in English but serving no purpose in French, were lost within a decade, as seen in Eyriès (1839). But the i of Djinet, and the y of Dellys, remained to testify to a period when French geographers relied on an English traveller to tell them about Algeria - and to confirm most colonists' lack of interest in how the locals pronounced these names.

Saturday, March 17, 2018

Good speaking is not good writing

There's an article by Nathan Robinson that's been going around recently titled "Jordan Peterson: The Intellectual We Deserve". After pages of apparently reasonable criticisms of his subject, the author delivers what he seems to think is his coup de grâce:

Even now, however, I am being too generous to Jordan Peterson’s intellect. I have been presenting him at his most comprehensible and polished. I have not been giving you the full experience of actually listening to him talk. Sitting through a Jordan Peterson lecture is very different to watching a rapid-fire television interview. Below, please find a fully-transcribed portion of 17 minutes of Peterson’s speech.[...] (NOTE: UNDER NO CIRCUMSTANCES ATTEMPT TO READ THE ENTIRETY OF THE FOLLOWING PASSAGE. READ AS MUCH AS YOU CAN BEFORE YOU BEGIN TO FEEL WEARY, THEN SCROLL QUICKLY TO THE END.)

Just to stack the scales a bit further, the transcription features no paragraphing. Nevertheless, I did read it - much quicker than watching some random video for 17 minutes! -and, rather anticlimactically, found a perfectly coherent and reasonably entertaining (if very likely unfair) parenting anecdote, obviously intended to illustrate the importance of setting boundaries. I rubbed my eyes and thought "How is it that an intelligent, well-educated native speaker of English can apparently not only see this transcript as an incoherent mess but also assume all his readers will? Am I crazy, or is he?"

The answer is simple: good speaking is not the same thing as good writing. Take a great talk, one that keeps a non-academic audience riveted, and transcribe it verbatim; it will almost always look rambling and repetitive on the page, unless you're already accustomed to reading such transcripts (part of the job for a descriptive linguist, but a rare experience for most people). That's simply the nature of the medium, and adequately explains the expected audience reaction. Maybe it even explains the author's reaction, if the only context he ever encounters long talks in is academia.

One of the author's main points - a valid one, I think - is that academics need to communicate better with the public for everyone's sake:

[...] he is popular partly because academia and the left have failed spectacularly at helping make the world intelligible to ordinary people, and giving them a clear and compelling political vision.

If so, the first step is to learn appropriate discourse strategies. You don't talk to confused young people on YouTube as if you were addressing a learned seminar, much less writing a article. Nathan Robinson surely realises this himself - but, by going for cheap laughs at the expense of a perfectly ordinary example of spoken language, he's not only weakening his main point but encouraging the very blindness to orality that makes it difficult for many academics to communicate with the public. Academics can surely do better - let a thousand learned YouTube channels bloom! - but not without (re)learning how to talk to the people they want to talk to.

So	à	dɔn	gɛt	tri	nacionalidad	nà	dis	wɔl.
so	1SG.SBJ	PRF	get	three	nationality	LOC	this	world
‘So I have three nationalities in this world.’ [fr03ft 102]

Lɛf=àn	mek	è	rich	a	los	quinze	años.
leave=3SG.OBJ	SBJV	3SG.SBJ	reach	to	the.PL	fifteen	years
‘Leave her, let her reach [the age of] fifteen years.’ [ab03ay 138]