No word for heLLo?

It's no great surprise to find words in another language that have no English equivalent, if what they refer to is an object that's unfamiliar to most English speakers. For example, it's scarcely surprising if English has no word for "dates that aren't quite ripe yet, but that already ooze honey if you bruise them" (Kwarandzyey azMamweg); only a very small number of English speakers are familiar with date maturation stages, whereas practically all Belbalis are. It's a bit more interesting when you find that a phenomenon equally common in both cultures can be described by a fixed word or phrase only in one of them. Here's a case in point that came up in my latest fieldwork.

One of the basic states of mind in Kwarandzyey (and among the few to be retained from Songhay) is being heLLo. Songhay cognates (from *hollo) mean "crazy, possessed", which in Kwarandzyey is bA; the Kwarandzyey meaning of heLLo is quite different. This word is used (usually with a smirk) of people acting happy (leaping around, singing, dancing, etc.) or showing inordinate confidence, with no thought for consequences or respectability - Har ndza ghar ana hell-a bA ddzunets ka, "as if he was the only person in the world". Being full, or intoxicated, helps make people heLLo, but isn't essential. A heLLo person is generally said not to praise his Lord (asbayHemd an mulana si), ie not to appreciate that the causes of his happiness are contingent. Arabic translations suggested include colloquial SameT (literally "bad-tasting", but as a mental state more like "inconsiderate" or "silly") and classical Taaghii (as in "Nay, but verily man is rebellious (yaTghaa) That he thinketh himself independent!"). Here's a nice example of people acting heLLo (apologies to football fans - the example I was looking for was South Africans celebrating in the streets after Mandela's release, video of which was described to me as showing people being heLLo, but I couldn't find it):

Obviously, the mental state is at least as present in English speaking cultures as in Tabelbala - in fact, it might be reasonable to say that regularly achieving heLLo-ness is an important and widely socially accepted goal for British youth. But is there a word or fixed phrase corresponding to the concept in English? If you can think of one, feel free to suggest it!

(PS: Pardon the transcription - my computer is broken, and I can't be bothered to do all the cut-and-pasting it would take to fix the diacritics.)

Back to the Sahara

In the near future I plan to do some further travel in Algeria in the Sahara, to study more Kwarandzyey of course but also other languages of the region. Any readers of this blog in the area that want to meet up, feel free to email me - let's see if we'll be in the same place... For obvious reasons, posting will continue to be sparse until I get back.

A note on Azer

In the unlikely event that you've heard of Azer, a northern dialect of Soninke formerly spoken in the now Arabic-speaking region of Tichit and Walata in southeastern Mauritania, you may well have formed the impression - as I did initially - that it was heavily influenced by Berber, like the Northern Songhay languages are. If you know anything about Berber, a look at Monteil's article on Azer is sufficient to dispel this idea. If you don't, then chapter 3 of Long's thesis on Northern Mande, which I just came across, clarifies the issue nicely. This rather highlights the Northern Songhay problem: if centuries of close contact with Berber left Azer so little changed, why is Northern Songhay so full of Berber words?

Reporting language "discovery"

Turning on the BBC yesterday, I was surprised to hear a descriptive linguistics story, about the "discovery" by linguists on the Enduring Voices project of a previously unknown Tibeto-Burman language called Koro in Arunachal Pradesh: Indian language is new to science. Insofar as one can judge from the report, it sounds like Koro is clearly distinct from its neighbours rather than being an ambiguous dialect-continuum case, so this should be interesting for comparative Tibeto-Burman.

What struck my attention most is that this made it into the news! There have been a couple of discoveries of new languages in Africa over the past decade or so - Baka, for example, and Tondi Songhay Kiini. And the belated realisation that Bangime is a clear isolate, rather than a dialect of "Dogon", actually reshapes our picture of West African linguistic history much more than finding any of these languages has. Where was the news coverage of these? Have media attitudes towards the newsworthiness of "new" languages changed? Is it because they're in Africa? Or did the linguists in question simply not issue any handy press releases? Publicity is a hassle, frankly, and no one wants to sound like they're playing Indiana Jones. But stories like these are a big part of what gets people interested in linguistics in the first place, and the general public who fund most linguistic work, whether through taxes or donations, need to know what they're getting for their money.

Small vocabularies, or lazy linguists?

In Guy Deutscher's new book The Language Glass (which I'll be reviewing on this blog sometime soon) he claims (p. 110) that "Linguists who have described languages of small illiterate societies estimate that the average size of their lexicons is between three thousand and five thousand words." This would be rather interesting, if verified - but this statement is not sourced at the back, and is in any case too vague (what counts as "small"?) to be relied on as it stands. Does anyone have any idea where he might have got this figure?

I haven't found his source, but Bonny Sands et al's paper "The Lexicon in Language Attrition: The Case of N|uu" gives a nice table of Khoisan dictionaries' sizes, ranging from 1,400 for N|uu to < 6,000 for Khwe and 24,500 for Khoekhoegowab. She prudently concludes "The correlation between linguist-hours in the field and lexicon size is so close that no conclusions about lexical attrition can be drawn" - the outlier, Khoekhoegowab, is not only the biggest of the lot (with over 250,000 speakers), but had its dictionary written by a team including a native speaker over the course of twenty years. Given that "2,000 - 5,000 word forms (in English) may cover 90-97% of the vocabulary used in spoken discourse (Adolphs & Schmitt 2004)", it is not surprising that it should take disproportionately long to move beyond the 5,000 word range. However, she also points out that "Gravelle (2001) reports finding only 2,300 dictionary entries in Meyah (Papuan) after 16 years of study", suggesting that some languages may simply have unusually small vocabularies. Along similar lines, Gertrud Schneider-Blum's talk Don’t waste words – some aspects of the Tima lexicon suggested that the Tima language of Kordofan had an unusually small number of nouns due to extensive polysemy and use of idioms (I can't remember any figures, nor indeed whether she gave any.)

I'd be interested to see other discussions of the issue of differences in lexicon size and explanations for them. My Kwarandzyey dictionary (in progress) so far stands at about 2000 words - it would be encouraging to think that I might already have done more than half the vocabulary, but I very much doubt it!

I finally got my hands on an article I had been looking for for a while about the "Kouriya" language of Gourara (around Timimoun, Algeria): Rachid Bouchemit, 1951. Le Kouriya du Gourara, Bulletin de Liaison Saharienne 5, p.46-47. While short, it's significantly more informative than the vague rumours to be found in other sources. "Kouriya", it turns out, was the general-purpose name given locally to any Black African language - "L'unité du terme cache la pluralité des idiomes: Haoussa, Bambra, Foullan, Mouchi, Songhai, Bornou, Boubou, Gouroungou, Minka, Sarnou, Nourma, Kanembou, Karkawi, etc...", in particular as spoken by ex-slaves in the region. Following the abolition of slavery, these languages, no longer reinforced by the arrival of new slaves, rapidly fell into disuse; the new generation learned Arabic and Taznatit instead. By 1951, the author could find only seven or eight speakers of a "Kouriya" in Timimoun, and only two of them spoke the same language, namely Bambara.

While the author leaves the etymology unexplained, I would add that the term "Kouriya", and the corresponding ethnonym kuri, probably derive from Songhay koyra "town, village", used to form the Songhays' own name for themselves, koyra-boro "townsman"; Songhay is, after all, the nearest major ethnic group in the Sahel to the Gourara region.

Arabic right-hemispheric WEIRDness

Recently Language Hat asked for informed reactions to a BBC report claiming that Reading Arabic 'hard for brain'. The papers under discussion are to be found at Eviatar's home page, in particular the 2009 paper "Language status and hemispheric involvement in reading: Evidence from trilingual Arabic speakers tested in Arabic, Hebrew, and English" but also clearly the 2004 paper "Orthography and the hemispheres: Visual and linguistic aspects of letter processing". Now I'm no psycholinguist, but obviously this story smells fishy, so I had a closer look.

At least one glaring mistake seems to be clearly the BBC's fault: it wrongly claims "When the Arabic readers saw similar letters with their right hemispheres, they answered randomly - they could not tell them apart at all." In fact, this seems to conflate two different experiments. Telling letters apart was the first task in the 2004 paper, and the Arabic readers' error rates for similar letters were only 8% (Table 6) - worse than with the left hemisphere, but not nearly so bad. The claim that "there is a specific RH deficit in reading Arabic, because that is the only condition (with bilateral presentation), where these native Arabic speakers responded at chance" comes from the 2009 paper - but the task referred to there was substantially more complicated. They were looking at words/nonwords, not letters; they were presented with two words, one for each hemisphere, one of which was underlined; and they had to decide whether the underlined "word" was a real word or not. Other issues are not so much wrong as stupid: talking as though students could choose which hemisphere to learn with, for example.

However, the BBC cannot be blamed for drawing excessively sweeping conclusions from this experiment. The authors themselves talk of their results as applicable to Arabic in general, which rather overstates the case. In both papers, the Arabic speakers were all also fluent speakers of Hebrew, which they had studied since second grade, and were living in a state where Hebrew is the dominant language. In the 2004 test, at least, they were also all undergraduates studying degrees taught in Hebrew. Obviously, this is a rather unusual situation for Arabic speakers! In particular, it is one where pragmatic (and status-related) motivations to study Hebrew, and opportunities to familiarise oneself with it, are likely to be much greater than for Arabic (especially given the big difference between spoken and written Arabic.) In some types of tests, these speakers's right hemispheres seem to read Hebrew more easily than Arabic. The authors take this to mean that there is a "specific difficulty of the RH with Arabic orthography". But, without further testing elsewhere, it can equally well be taken to reflect the sociolinguistic situation of Palestinian citizens of Israel. This is, in fact, a special case of a much wider problem: most psychology experiments focus on "WEIRD" populations (read the link - it's a concept very much worth remembering when you read the science news.)

Doctorate done

Eid Mubarak everyone! I am now Dr. Souag. (As of a couple of weeks ago, actually, but I've been doing other stuff instead of being online.) You can read my thesis online, for the moment: Grammatical Contact in the Sahara. My examiners were Prof. Jeffrey Heath and Dr Martin Orwin. Thanks once again to everyone in Tabelbala or Siwa that helped me learn their languages, and to my supervisors, teachers, friends, and family. I'm currently working out future plans, but rest assured that they include plenty more research.

Linguistic purism in 19th century Libyan Berber

Looking through Richardson's (1850) vocabulary of Sokna Berber today, I came across a wonderful little piece of sociolinguistic history. The vocabulary in question was written by a Sokni, Ali ben El-Haj Abd et-Tawil, with English translations added by Richardson. He wrote, among other things, the numerals. 1-3 are Berber (əjjin اجين, sən سن, šaṛəṭ شارط), while 4 is Arabic (أربعة arb`a). But when he reached 5 there was a moment of indecision:
Do you see what's going on there? He started out by writing خمسة xəmsa, the Arabic loanword meaning "five" - which, if other languages of the region are any guide, was the usual word for "five" in everyday Sokni. But then he had a thought - xəmsa is just Arabic, it's not proper Sokni, and I ought to be giving this stranger proper Sokni - and he overwrote the word with فوس fus "hand", used by Berber and Songhay groups through much of the Sahara (eg Siwi fus=hand, Kwarandzyey kəmbi=hand) as a substitute for "five" to prevent Arabic speakers from understanding, as they would if the normal numerals, borrowed from Arabic, were used. What at first sight looks like just a piece of messy handwriting turns out to bear witness to a moment of linguistic purism.

The unreliability of Afroasiatic etymologies

The fact that Semitic, Egyptian, Berber, Cushitic, and Chadic all belong to a single family - Afroasiatic - is fairly secure, based on striking correspondences in basic morphology. However, it is often not appreciated just how difficult it is to find reliable lexical comparisons between these families, and just how primitive the current state of AA reconstruction is. The easiest source of AA etymologies online is Militarev's database on Starling, so I'm going to pick on it for this post (Orel & Stolbova and Ehret reveal similar issues, but the latter doesn't even include Berber, and I'm focusing mainly on Berber entries here for convenience.)

Suspiciously many entries are listed as having a cognate in only one Berber language (eg earth, hide, skin, run away); given the general closeness of different Berber varieties, you would expect valid proto-Berber terms to be reflected in more than one place. However, these could always be right. Other issues are more serious.

In several cases, a single proto-Berber root is split across several AA ones, due to mistaken sound correspondences. For example:
  • Proto-Berber *i-qăs "bone, (fruit) pit" is split between PAA *ʔayš/ʔawš- "ripened grain, corn" with Zenaga iʔssi (quoted without the glottal stop) "os; grain, graine, baie; comprimé, pilule, cachet, pastille; perle" (Taine-Cheikh), and *ḳ(ʷ)as "bone", with all other reflexes of *iqăs, even though Berber γ (<*q) commonly corresponds to Zenaga ʔ.
  • Proto-Berber *ta-Hăli (> *ti-Həli) "sheep" is split between pAA *ʔayl "ram" and *bawil "ram", although Ghadames-Awjila v corresponds regularly to Tuareg h and other Berber Ø. (A couple of forms, like Figuig tili mistakenly glossed as "ram", have even somehow found their way into a third etymon, "proto-Berber" *laH!) The issue is alluded to in a cryptic comment under the Berber section of PAA *waʔil "wild goat/ram; antelope": "Pr. H No. 220 (and Kössm. 193): Ghdm., Audj. Hgr etc. te-hele < *tiHeli, which, on the contrary, is to be connected with *ʔayl- 'ram' 3061 (together with Brb. forms of the t-ili type), as *ʔ > h in Hgr, while *ʕ > Hgr 0".

  • Most reflexes of pan-Berber ikərri / akrar "ram" are assigned to PAA *kar(w)- "ram, goat; lamb; kid". (The Semitic parallels listed for this word are rather interesting.) But Zenaga ǝgrǝrh, pl. gurănh 'bélier' (Nic. 156), on its own, is given a supposed proto-Berber form *gur- "ram", corresponding to an AA form *(ʔa-)gʷar "kind of antelope; ram; goat". In fact, however, there is a common correspondence of Zenaga g followed by a sonorant to proto-Berber k (eg ägärgur "chest" = Siwi ikərkər, əməgyih "dine" = Kabyle iməkli etc), and this word is obviously related to the other Berber forms.
Another case is listed as doubtful, eg:
  • Most reflexes of Proto-Berber *a-lăqŭm "camel" are under PAA *ʕalVḳ/g- ˜ *lVḳ/gum- ˜ *ḳalVm- "camel"; but the Zenaga one äyiʔm, with regular *l > y (in his source's transcription ǯ) and common *γ > ʔ as seen previously, ends up as PAA *gam-al- (?).
Similarly, unrelated forms may be grouped together due to accidental similarity, eg:
  • Under PAA *kʷay(-t)- "hen; partridge; dove; chick" is listed a "proto-Berber" form *i-kaHi; but the Ahaggar form listed corresponds regularly to Niger Tuareg tekažit, Mali Tuareg tekazzit, Awjila təkažit "hen" (see Kossmann 2005:60), and as such is unrelated to the Ayr and Tawllemmet forms takəyya quoted.
Another problem is undetected loans; this applies especially in sub-Saharan Africa, where little work has been done on their impact. PAA *ʔa/iw / *waʔ "bull, cow" is supported by Tawellemmet hawu "cow", isolated in Berber and obviously borrowed from Songhay, cp. Zarma haw, Tadaksahak hawú; removing this from the etymology leaves only pan-Tuareg iwan "cows", with no evidence for the desired *H. PAA *bar "cereal, corn" is supported by Zenaga būru "bread"; but this word is isolated in Berber and widespread in West Africa (eg Wolof mbuuru, Soninke buuru, Bambara nbuuru, Peul mbuuru, Zarma buuru), and is more likely a loan from Wolof or Pulaar.

Interestingly, most of the problem cases I've noticed in this quick skim are related to agricultural terminology. I wonder if that has anything to do with the particular interest of such terms for archeologists motivating a more intense search for cognates.

Why they thought the Berbers came from Yemen

A long-standing tradition in North Africa, convincingly rejected by Ibn Khaldūn but perpetuated by poets and curricula alike, claims that some major Berber tribes descend from Yemeni Arabs through semi-mythical pre-Islamic kings and their wholly mythical vast conquests. This idea has little to support it, and probably became popular because it allowed these tribes to claim prestigious connections in the context of a high culture dominated by Arab ideas; but why should the connection be specifically Yemeni, rather than, say, North Arabian or perhaps Persian? Linguistics suggests a possible answer.

In southern Arabia live several groups, most famously the Mehri tribe, whose languages, though Semitic, are only distantly related to Arabic, and quite incomprehensible to other Arabs. (You can hear recordings of it at SemArch.) Recently I borrowed a copy of the recently published Mehri Language of Oman, by Aaron Rubin; looking through it, I could see several points where Mehri resembles Berber but not Arabic that a traveller might seize on, notably:
  • -s ـس "her", -sən ـسن "their (f.)"; compare Siwi -nn-əs ـنّس "his/her", -n-sən ـنسن "their (m/f)". A 3rd person in -s was found in proto-Semitic, as shown by Akkadian, but was replaced in Arabic.
  • əl ال "not" (preverbal first element of negative); compare Tumzabt ul أُل. Again, this is found in Akkadian and hence must be proto-Semitic.
  • -ət ـت feminine singular; compare Siwi -ət ـت (feminine singular in Arabic borrowings.) Again, the connection is real, but dates back to proto-Semitic rather than indicating any special relationship between the two.
  • -tən ـتن feminine plural; compare Berber -tən ـتن (plural of some masculine nouns)
  • a- أَ used as a definite article for some nouns; compare Berber a- أَ(masculine singular noun prefix). A striking case is Mehri a-məsge:d أَمسجيد vs. Siwi a-məzdəg أمزدج "the mosque". However, in Mehri this indicates definiteness, and does not depend on gender; this is probably a coincidence.
  • tə-...-əm تـ...ـم second person plural imperfective, eg təkə́tbəm تكتبم "you (pl.) write"; compare Berber t-...-m تـ...ـم. The t- is cognate; not sure about the history of the -m offhand.
  • 'ār آر "except, but"; compare Tuareg ar.
  • ā آ "oh" (vocative); compare pan-Berber a أ. (This is actually found in Classical Arabic as well, أ, but is not widely used.)
None of these similarities in fact imply any close relationship between Berber and Mehri, of course; some are coincidental, while others can be traced back to proto-Semitic, and hence constitute evidence connecting Berber with Semitic, not specifically with Mehri. However, a medieval traveller between Yemen and North Africa would not have known that, and could easily have observed similarities like these and leapt to the seemingly plausible conclusion that Berber was connected to the language of these Yemeni tribes, who, like many Berbers, seemed to live just like Arabs yet speak totally differently.

The Berber language of Sokna (Libya)

Thank you SOAS library - I finally got a copy of Il dialetto berbero di Sokna! Sokna (they even have a Facebook group) is a small oasis south of Sirt in Libya, whose dialect of Berber, along with that of nearby El-Fogaha, is Siwi's closest relative. There were several surprises inside, including unusual vocabulary like amerru "mountain" or imeγri "Dhuhr (the midday prayer)", and some striking features shared with Siwi; one of the main ones is an unexpected bit of allomorphy. Across Berber, the second person plural ("you guys") is expressed on the verb with t-...-m, except in the imperative; Sokna does the same, so for example "you have" is t-la-m. In the imperative, you have a suffix -t; Sokna again does the same, eg sag-it-ten iyi-leḥbes "(you guys,) take them to prison!" But if you add an indirect object pronoun ("to him" etc.) to the imperative, you replace this t with an m, like the m in the second half of the non-imperative forms: eḍbeḥ-im-as a-na-dd y-used "(you guys) tell him to come to us!" The same thing happens in Siwi, except that in Siwi the prefixed t- of the non-imperative forms has disappeared. I'm doing a paper on the development of indirect object agreement in Siwi for the Berberologie conference in July, and this is a useful pointer to its history. Amazigh readers - have you come across anything like this?

Sadly, Berber is probably no longer spoken in Sokna. When this article was written in 1911, the shaykh of the oasis reported that only 4 or 5 Isuknan could still speak it, although many more could understand a bit. I don't know whether the people of Sokna today regret the loss of their language or are glad of it - but its disappearance destroys a key not just to Sokna's history but to that of Libya, Egypt, and the whole of North Africa, leaving only this article's fairly short wordlist (and a few even shorter older sources) as evidence for migrations between central Libya and Siwa and early contact with vanished pre-Sulaymi Arabic dialects.

Religious origins of the "Welsh Not"?

A well-known weapon in the arsenal deployed by educational systems the world over against local languages was what in the UK used to be called the Welsh Not - a piece of wood hung around the neck of a student caught speaking their own language, and passed on through the day to anyone that student heard speaking their language, so that whoever was wearing it at the end of the day would be punished. At a talk yesterday I heard that the same idea was implemented in Japan (against Ryukyuan languages) and Sudan (against Nubian.) Coincidentally, I just came across an account that gives interesting insight into the origins of this oppressive practice:
"With a general consent of all our company, it was ordained that there should be a palmer or ferula which should be in the keeping of him who was taken with an oath; and that he who had the palmer should give to every one that he took swearing, a palmada with it and the ferula; and whosoever at the time of evening or morning prayer was found to have the palmer, should have three blows given him by the captain or the master; and that he should still be bound to free himself by taking another, or else to run in danger of continuing the penalty, which, being executed a few days, reformed the vice, so that in three days together was not one oath heard to be sworn."The Observations of Sir Richard Hawkins, Knt in his voyage into the South Sea in the year 1593
Hard to imagine a ship full of sailors submitting to such a practice! But was this the original purpose of the Welsh Not? It would be interesting to find out. If anyone has an older citation to compare, I'd love to see it.

Endangered languages on Aljazeera

Aljazeera English is doing an interesting series on language endangerment and revitalisation:
* Language on the brink, talking with the last speaker of Wichita.
* Saving the language of the Cherokee, in Tahlequah
* French region aims to save language, on Breton
* Turkey's fading linguistic heritage and Saving Turkey's Laz language, on Laz (a close relative of Georgian, not "an ancient tongue that bears no resemblance to any other language in the region".)
* Circassians in bid to save language in Jordan - at a talk this week by Enam al-Wer I heard that, at the start of the twentieth century, the only permanent population in Amman was Circassian.

Manatees and bilingual compounds

In Djenné Chiini, the Western Songhay dialect of Djenné in Mali, the word for "manatee" is ayuumaa. This is clearly a compound of two elements: ayuu, the word for manatee throughout the rest of Songhay (as well as in Hausa), and maa from Bozo máa, which also means "manatee" (Bozo being the original language of the Djenné region.) It's as if the American English word for an elk were "elk-moose". I can't think of any other examples of this kind of half-borrowing, where a native word is "expanded" by adding on its translation into another language; can you?

(Sources: Daget 1953, La langue bozo; Heath 1998, Dictionnaire songhay-anglais-français, tome II: Djenné chiini.)

More on the WOLD Kanuri entry

The World Loanword Database is a great resource, and the Hausa/Kanuri team deserve congratulations for undertaking the Herculean labour of putting together two sets of etymologies. However, there are some issues with the Arabic etymologies in the Kanuri entry. The transcription is inconsistent and sometimes incorrect; more seriously, a few entries give incorrect meanings or impossible etymologies, as in the following cases:

3.592 àkú parrot: the quoted Arabic form is almost impossible as a Classical Arabic noun (and not in the Lisan al-Arab; the Arabic word is babγā’), and parrots are known in the Arab world only as an exotic import. Assuming the form exists in some Arabic dialect, it must be a loan from a sub-Saharan African language, not vice versa.
9.24 mágàsù scissors: the g and the u both suggest that this word entered directly from (Bedouin) Arabic, not via Hausa.
11.12 hàláltə́ own: if this is correctly transcribed, surely it comes from Arabic ħalāl “licit; one’s lawful property”. Arabic halak means “perish”.
11.79 ríwà dìò to earn: “ribā” means usury, and is strongly condemned in Islam; it is unlikely that this would be adopted as a neutral word “earn”. The more plausible source for both the Kanuri and the Hausa is Arabic ribħ “profit, gain”.
11.78 àlwúsùr wages: Perhaps < Arabic al-`ušr "tithe (< one-tenth)"; surely not from ma`āš.
14.451/6 kàjílí evening: “kajir” is not a possible native Classical Arabic word, and is not attested in Classical Arabic. If it’s in Shuwa, it must come from Kanuri, not vice versa.
16.34 tə́wə́rítə́ regret: Hausa tuubaa does come from Arabic, but clearly from Arabic tūb “repent”; it has nothing to do with Arabic ta’assaf (not *tāssaf) “regret”.
16.69 gàfə̀rtə́ forgive: the connection to Arabic γafar- is obviously correct, but Arabic yaʕfū is equally obviously not relevant; even if ʕ were normally reflected as g in Kanuri, it would leave the r unexplained.
18.33 kàsàttə́/àrdìtə́ admit: the Arabic form “kasat” does not exist. yarḍā means “may He hope/ approve” (as noted), not “admit”, making the connection rather tenuous.
18.45 áwúlò dìò boast: there is no Classical Arabic word “awulo”.
19.47 àmàrtə́ permit: Arabic ʔamar- means “he ordered”, not “permission”.
20.31 súlwé armor: Arabic silāħ means “weapons”, not “armor”.
21.24 àlàptà swear < ħalaf "swear" (not < allāh "the god")
21.37 àzáwù punishment: from Arabic ʕađāb “punishment, torment” rather than jazā’.
21.47 perjury: by what chain of semantic changes could “perjury” derive from “lawful”? And why would l > k?

Probable Arabic loanwords not listed as such include:
11.54 bàyîl stingy: from Arabic baxīl.
4.89 sûm poison: surely from Arabic samm?
4.93 sə̀lé bald: surely from Arabic ‘aṣla`?
5.26 kóló pot: perhaps cp. Arabic qullah (or onomatopeic?)
7.58 kábbì arch: surely from Arabic qubbah?
14.25 bàdìtə́ begin: surely from Arabic bada’?
11.29 lòrùtə́ damage: from Arabic ḍarr (impf. -ḍurr-). Cp. “judge” for ḍ > l.
24.02 wàltà become: perhaps from Maghrebi Arabic wəlli “become, return”.

In some cases, looking more widely allows the etymologies to be improved:
3.11 lə̀mân animal: < al-māl- "livestock, money", rather than al-mann "favor, benefit". For the dissimilation, compare the common Maghrebi Arabic change of n...n to n...l, eg badənjal < bāđinjān, fənjal < finjān.
2.34 lòrúsà wedding: probably from al-`arūs “bride” (Maghrebi Arabic l-aʕṛuṣa), rather than direct from ʕurs. Cp. Siwi aʕṛus “wedding”, with the same semantic shift.

There are also a few cases, many probably originally formatting issues, where the correct form is given in comments, but contradicted elsewhere:

3.25 sheep: the source cited, Kossmann 2005 (67), points out that the form quoted by Skinner, *adaman, is unattested. The correct form, adəmman, is found in Arabic as well as Berber, and refers to a type of sheep said to come from sub-Saharan Africa. Given that it refers to a specifically sub-Saharan sheep breed, 5 would seem a better classification than 4, though 4 is understandable.
3.78 camel: Kossmann 2005, cited, makes it rather clear than an Arabic origin for this word is very improbable. Moreover, there is no such Arabic word as “ləγəmal”; only the form jamal is correct.
4.87 physician: If Shuwa Arabic or some such variety has a term liktaay, there can be little doubt that it is a loan into Shuwa, not from Shuwa. As the comment indicates, this comes from English, not from Arabic.
7.422 blanket: The comments indicate a Berber form abroγ, but the field gives abrok. The Arabic etymology is less implausible than it appears, since the semantic shift to “full body covering” is well-attested, as in English “burka” from the same source.
12.081 above: here it is called areal and probably not Arabic, but under “sky” and “heaven” the same word is listed as “clearly borrowed”. One of these statements must be wrong.
13 zero: the Hausa form is transcribed correctly in comments, but wrongly under “Source words”.
18.51 write: rubuta is Hausa, not Berber, as the sources quoted make clear. The proto-Berber form had no suffix -t (as Kossmann indicates), and neither do any of the equivalent modern Berber verbs.
19.62/20.11 quarrel: If it’s related to “alhilaafu”, the Arabic form is al-xilāf. If it’s related to “judge”, that form is irrelevant. In either case, there is no Arabic word “alwalaʔ” with appropriate meaning.

Identify the language of this manuscript

A scan of much of the manuscript MS Leiden Or. 14.052 is available online. The main text of this manuscript is in a rather poor Arabic. The marginal and interlinear notes, however, are "in one or more West African languages", as yet unidentified. My best guess is that they're in Mandinka, based on the orthography's use of tanwīn and on the frequent word-initial a/i (suggestive of Mande's 3rd person subject pronouns), but I'm not sure; I haven't been able to decipher any phrases. Anyone else feel like having a look?

Subjacency: The judgements

Thank you very much for your responses, everybody! (If you haven't answered yet and want to, please do it before reading the rest of this post.)

Chomsky's intuitions were as follows (* marks ungrammaticality as usual):
  1. * That's the boy who they intercepted John's message to.
  2. * That's the boy who he believed the claim that John tricked.
  3. * That was a lecture that for him to understand was difficult.
  4. * Which book did John wonder why Bill had read?
  5. √ Which book did John think that Bill had read?
  6. √ What would you approve of John's drinking?
  7. * What would you approve of John's excessive drinking of?
Mine were that 1, 4, 5, 7, and (only after some thought) 6 were good, while 2 and 3 were wrong - but I exclude those judgements here, since I was reading the book and might have been swayed by my reactions to the arguments. My sister found 1, 2, and 4 wrong, 3 "weird but comprehensible", and 5-7 good - so even within a single family judgements vary significantly. Your 11 collective judgements (plus some friends and family, and excluding non-native speakers) add up as follows (grading "uncertain" as 0.5):

The discrepancy, and the level of individual variation, are striking - not a single reader agrees with all of Chomsky's judgements, and the only consistent judgements are 2 (always wrong) and 5 (always right.) Most of Chomsky's judgements also happen to be predicted by his (and others in the generative tradition's) theories; your judgements therefore often pose problems for those. According to Chomsky, 1 and 2 should both be ungrammatical for the same reason - they involve movement past more than one "barrier" (boundary of a 'noun phrase' (DP) or clause excluding the complementiser (IP)) at a time. Yet more than half the people here (including me) accept 1, while nobody accepts 2; one could argue that 2 should be less acceptable than 1 because it crosses three barriers rather than two, but why should 1 be acceptable at all? 4 should be ungrammatical because "why" is occupying a position that "which book" should have to move through - but about half of you (including me) think it's fine. And most readers of this blog find 7 to be better than 6 - the opposite of Chomsky's judgements and of the predictions of the "A-over-A" principle he was working with then (although the latter is obsolete.)

Chomsky (1963:51) said of sentences like these: "In some unknown way, the speaker of English devises the principles of [wh-movement etc.] on the basis of data available to him; still more mysterious, however, is the fact that he knows under what formal conditions these principles are applicable... The sentences of [1-3] are as 'unfamiliar' as the vast majority of those that we encounter in daily life, yet we know intuitively, without instruction or awareness, how they are to be treated by the system of grammatical rules which we have mastered." This seems to be false; individually we often find it difficult to decide the grammaticality of sentences like these, and collectively we routinely disagree on them. Certainly it cannot be construed as belonging to that part of the "knowledge of language" that is, in the words of Chomsky (1963:64), "independent of intelligence and of wide variations in individual experience".

If it did, then that would be rather interesting: it has been claimed that the principles of Subjacency must be innate, because children aren't exposed to enough evidence to deduce them otherwise. But given the level of variation actually observed, it is tempting to reverse the reasoning: children don't deduce most of the principles of Subjacency, so they must neither be exposed to enough evidence for them nor have innate knowledge of them. Rather than postulating arbitrary rules hard-wired into the brain and specific to the language faculty, a more promising way to explain Subjacency phenomena might be to try to derive them from processing difficulties, as suggested by Sag et al.

Subjacency intuitions

I've been reading an old Chomsky book, Language and Mind, lately. As usual, the moment he starts discussing what would eventually be called subjacency I find my intuitions are systematically different from his, and I'm curious: how common is this? By way of testing, here's a few sentences in English: which ones would you consider ungrammatical/unacceptable as phrased?
  1. That's the boy who they intercepted John's message to.
  2. That's the boy who he believed the claim that John tricked.
  3. That was a lecture that for him to understand was difficult.
  4. Which book did John wonder why Bill had read?
  5. Which book did John think that Bill had read?
  6. What would you approve of John's drinking?
  7. What would you approve of John's excessive drinking of?
Chomsky's grammaticality judgements will be provided later - they're on pp. 50-54 of the book.

Berber manuscripts in Arabic script online

A major collection of early Tashelhiyt manuscripts from the 16th century onwards has gone online: Manuscrits arabes et berbères du Fonds Roux. It includes a copy of al-Hilali's Berber-Arabic lexicon. The Lmuhub Ulaḥbib library of Bejaia has also put a number of works online, including an 18th/19th century manuscript on theology in Kabyle: العقيدة السنوسية. Both collections are also of interest for their many Arabic books, but the Berber ones are particularly significant due to the serious paucity of materials for the study of precolonial Berber writing traditions.

Friday, February 05, 2010

Word Loanword Database

I shouldn't really be blogging at this stage of my thesis-writing, but this I had to share: the World Loanword Database has come online. Vocabularies likely to be of particular interest include Tarifiyt, Hausa, Kanuri, Iraqw, but there are plenty more, all carefully analysed for loanwords... Have fun, and feel free to discuss any mistakes you think you spot in it here :)

Language endangerment: thoughts from Igli

I recently found a forum for the town of Igli, about 150 km north of Tabelbala as the crow flies. Igli's traditional language is a Berber variety called "Tabeldit", or in Arabic "Shelha" شلحة, reasonably close to the better-documented dialect of Figuig across the border but with significant differences (such as the first person singular in -ɛ rather than -γ.) In Igli, it is at least as endangered as Kwarandzyey, and is likely to disappear in another couple of generations - although I was told that it is doing better in the small neighbouring town of Mazzer. I think the reason, as in Tabelbala, is that parents started speaking only Arabic to their kids in the hope of giving them a head start in school, but all I know about Igli I heard from Glaouis in other towns. In situations like this, speakers inevitably see their language's disappearance with mixed feelings, and the following pair of posts forms a microcosm of the global language preservation debate:
The "Xiṭ Azugar" Project (posted by Shayma)

"Tabeldit Shelha is part of the fragrance of the Saoura region... a treasure inherited from our ancestors. Shall we preserve it, or let it disappear before our eyes?.... A secret weapon that saved some of us from death. How long will we remain with our hands tied as our language disappears before our eyes? Until when, until when?

I hope that these words have awakened your sleeping hearts and moved your sentiments. Therefore I present to you today this project, consisting of the establishment of an "Arabic-Shelha" dictionary to preserve our language. Therefore I ask the director and administrators and even the members to study this project; if you accept the idea, then let's start to lay down precise plans to overcome difficulties... and if you don't accept the suggestion, then we will do our ancestors an injustice... I urge you to take the matter seriously. To the administration, and all the members, let us put hand in hand. No more lamentation over Shelha, that doesn't help. What helps is effective work.

Forgive me for my harsh words, and I hope you accept the idea. The project is called "Xiṭ azugar" for historical reasons, because these words have saved a person from certain death.
This suggestion was acclaimed and adopted, and there is now a small Arabic-Shelha Dictionary forum. However, there was also some scepticism - the following post started a vigorous debate:
What would we lose if Shelha becomes extinct? (posted by igliab)

Following the increased concern with the local dialect "Shelha" from the brother members, for which thanks are due, I decided to pose the following question: What would we lose if this dialect became extinct?

It's not a language of civilisation, nor a language of science. And supposing we are able to make an "Arabic-Shelha" dictionary and lay down the rules for this language, will our sons agree to learn it? What would the motive be? It's not used at home, nor in public places. Or do we want to put it in museums and say we have "saved" it?

Moreover, by my reckoning those who speak it today are:
90% old men - 8% middle-aged men - 1.5% youths - 0.5% children. Admittedly I haven't made a study to come up with these figures but it could be worse than I anticipate, so it can be said that Shelha has no future in Igli.

I also told myself that if everyone thought the way I think then they would put down their pens and wait for the demise of Shelha, the way an ill man who has despaired of his state waits for death. But I rethought the issue, this time positively, and realised the need to put together a plan for its preservation. But what is the point of solutions if there is no logical, powerful reason, so the first question we have to answer is: why should we preserve Shelha? I urge the brothers to think deeply about this issue and put sentiments aside.
What would your thoughts be? Have you had a parallel experience?

Ajami in Boston

The Boston Globe has an article today about Ajami, the tradition of transcribing African languages in the Arabic script. It focuses particularly on the efforts of Fallou Ngom, whose work has been mainly on Wolof Ajami in Senegal, the subject of one of my first posts here. In the article he emphasises the potential historical significance of such work in opening up neglected sources on African history. While most African manuscripts are in Arabic, some historically rather interesting Ajami sources are known; for Mandinka, published historical manuscripts include the Pakao Book and the Bijini manuscript, the latter outlining regional history over the past 500 years. There are undoubtedly more out there that have gone uninvestigated simply for lack of enough historians who can read them. My work on Ajami has focused more on issues of orthography, however: most African languages have rather different sound systems to Arabic, and it's quite interesting to see what kind of devices they developed to make the alphabet fit better.

Saturday, January 09, 2010

Earliest Kwarandzyey source online (also Tarifit of Arzew)

It turns out that the earliest and most extensive published source on Kwarandzyey (Korandje), the language of Tabelbala in southwestern Algeria which I am studying, is downloadable online:

* Cancel, Lt. 1908. "Etude sur le dialecte de Tabelbala". Revue Africaine 52.

Readers may also be interested in Biarnay's study of the probably extinct Tarifit dialect that was then spoken at Arzew, in volumes 54 and 55 of the same publication.

Siwi Scarborough Fair

Over the dinner mentioned in the last post I was also shown a Siwi poem sent as a text message - it's a rather below average example of the genre, but interesting as an representative illustration of Siwis' orthographic preferences.
كان تازمرت تجبد تيني
كان تفكت تعمار تازيري
كان اتغت تيرو اغي
كان امان نلبحورا يسقلبن اخي
كان الغم ينسخط ايزي
بردو شك غوري (غالي)
Or in Latin Berber orthography:
Kan tazemmurt tejbed tayni
Kan tfukt teɛmaṛ taziri
Kan tγatt tiṛew aγi,
Kan aman n lebḥuṛa yesqelben axi,
Kan alγem yensxeṭ izi,
Beṛdu cek γuṛi "γali".

So I decided to render it into English, taking a few liberties to reproduce the rhyme (for added faithfulness, change "flea" to "fly", and eliminate "someday" and "or three"):

If dates can come from an olive tree,
If the sun someday a moon shall be,
If a goat gives birth to a calf or three,
If milk fills the waters of every sea,
If a camel can turn itself into a flea -
Then only will you be dear to me.