Monday, July 14, 2014

Northern Songhay comparative wordlists

Linguistically, the northern and southern shores of the Sahara have remained surprisingly distinct, and most Saharan groups are easily identifiable as outposts of one or the other. Occasionally, however, a greater degree of language mixture is found. Nowhere is trans-Saharan language mixture more prominent than in Northern Songhay, a group of languages spoken in Niger, Mali, and Algeria combining a Songhay base with an enormous Berber superstratum, including Korandjé, a southwestern Algerian language I've been working on for a few years now.

Following an inquiry I recently received, I've been comparing Korandjé data to the Northern Songhay comparative wordlist in Rueck and Christiansen (1999). In the spirit of open data, you can view the wordlist (with a few remaining gaps to be filled) here: Korandjé 380-word list for Northern Songhay lexical comparison. Draft version, 14 July 2014. The results should be treated as provisional, since the Tasawaq part of this wordlist in particular appears a bit unreliable and since a few gaps remain in the Korandjé and even Tadaksahak lists, but are nevertheless interesting.

Counting cognates makes it very clear that Korandjé is the outlier, as might be expected based on geography:

KorandjéTadaksahakTagdalTabarogTasawaq
Korandjé139140141152
Tadaksahak139242238214
Tagdal140242304237
Tabarog141238304229
Tasawaq152214237229

The other three Northern Songhay varieties (treating Tagdal+Tabarog as one variety) form a linkage, which, following Wolff and Alidou's suggestion, we might label Azawagh Songhay - from west to east: Tadaksahak, Tagdal+Tabarog, then Tasawaq. On this wordlist Korandjé is clearly closest to Tasawaq, but that's only because Korandjé and Tasawaq have both kept more Songhay vocabulary, a fact irrelevant for subgrouping. The only innovation in vocabulary that Korandjé and Tasawaq share to the exclusion of the rest is the borrowing of numerals from 5 up from Arabic, and if you look at the sound correspondences it's clear that Tasawaq and Korandjé each borrowed their current numerals separately from different dialects of Arabic. Tadaksahak, Tagdal, and Tabarog all show almost the same number of items shared with Korandjé due to common borrowing from Berber, and most of that is due to shared borrowings of widespread Berber words that could easily have happened independently. The use of a Berber form originally meaning "weaver" for "spider" in Korandjé and Tadaksahak alone is striking, but very likely coincidental.

Another way to look at this is to note that 188 of the 332 items are shared across all of Azawagh Songhay, whereas only 108 are shared across all of Azawagh Songhay plus Korandjé. Of the latter, only 9 are Berber or Arabic loans, while 99 are Songhay retentions:

eye, ear, mouth, head, hair, neck, milk, belly, foot, hand, skin, blood, urine, liver, person, man, woman, owner, name, dog, cow, donkey, (venomous) snake, louse, meat, fat, stick, grass, rope, salt, pot, pit (hole), iron, fire, smoke, ashes, night, sun, day, yesterday, wind, water, stone, one, two, hot, cold, long, old, lots, red, black, white, dry, full, what, where, near, far, and, sit down, stand up, lie down, sleep, bite, eat, drink, suck, laugh, cry, see, hear, know, love, give, steal, hide, give birth, die, kill, walk, run, fall, wash, pierce, hit, tie, do, sew, bury, sandals, horse, truth, falsehood, finish, dig, stand, find.
This list is dominated by basic, rarely loaned words: nearly half of it overlaps with the Leipzig-Jakarta list. However, more culturally specific shared retentions such as "iron", "owner", "cow", "donkey", "horse", "pot", "sew", and "sandals" remind us that the split of Northern Songhay is after all rather recent (much more so, in fact, than these words alone might suggest).

These pan-Northern retentions, however, by no means exhaust the Songhay lexicon of Northern Songhay. Korandjé alone retains some 183 list items of Songhay origin, at least 135 of them shared with Tasawaq, while for many words (eg "four", "green"), only Tasawaq has kept Songhay forms. Well over 227 items have Songhay equivalents in at least one Azawagh Songhay variety, and more than 241 have equivalents either in the Azawagh or in Korandje. If the even more conservative (but extinct) Emghedesie variety were added to the list, that number would no doubt be even larger. Proto-Northern Songhay certainly had a significantly larger Songhay lexicon than any of its descendants does.


[Later addendum]: Removing all words with Arabic-derived Korandje forms from the list makes no difference to the classification; the table ends up like this:

KorandjéTadaksahakTagdalTabarogTasawaq
Korandjé135136138142
Tadaksahak135188186174
Tagdal136188231188
Tabarog138186231181
Tasawaq142174188181

Saturday, June 28, 2014

Grammatically analysing "Sahha Ramdankoum!"

Sahha Ramdankoum صحّة رمضانكم!‍ ‍This Darja phrase, which might be rendered as "happy Ramadan!", is familiar to any Algerian. It groups with a few others - notably Sahha Ftourkoum صحة فطولاركم "happy fast-breaking dinner!" and Sahha Eidkoum صحة عيدكم "happy Eid!" - as an example of a not very productive template "Sahha X+2nd person possessive" expressing good wishes on the occasion of X. But what is "sahha" doing in such forms?

In many contexts, "sahha" is a noun meaning "health"; we can be sure it is a noun, since it can be the object of a preposition and take personal possessive endings, as in b-sahht-ek بصحتك "good for you" (with your health). But there is also a defective verb, taking 2nd person perfective endings: sahhit صحيت (to a man), sahhiti صحيتي (to a woman), sahhitou صحيتو (to a group) "thanks / well done" (a little stronger than sahha "thanks"). The expected 3rd person masculine singular form of this verb would be sahh صح or sahha صحى; sahh actually is attested as an impersonal verb (ysahh-lek يصحلك "it is appropriate for you"), but its meaning is sufficiently distant that it's not necessarily part of the same paradigm. So in principle, "sahha" in "Sahha Ramdanek" could be interpreted as a noun, or a verb. Is there any way to decide which?

If it's a noun, then the phrase's syntax is bizarre - the literal interpretation would then be "Health is your Ramadan", whereas to make it fit the actual meaning we want at least something like "Your Ramadan is health", which would be the opposite order (?Ramdanek Sahha رمضانك صحة). If it's a verb, on the other hand, the syntax is fine - subjects in Algerian Arabic routinely follow the verb, and perfective verbs are routinely used to express states, so we could interpret it as something like "Healthy is your Ramadan!" or even, if we allow the perfective to be optative as in Classical Arabic, "May your Ramadan be healthy!"

On the other hand, if it's a verb, then it should agree in gender and number with what follows it, with feminine "sahhat" صحات and plural "sahhaw" صحاو. This can't actually be tested directly: in all such expressions that I can think of, the noun happens to be masculine and singular, and this expression cannot normally be extended to congratulate people on other occasions. But if we imagine using this formula to congratulate someone on their happiness, I for one would much sooner say "Sahha Farhatkoum" صحة فرحتكم than "Sahhat Farhatkoum" صحات فرحتكم, which suggests that my mind, at least, is not analysing it as a verb.

Perhaps it's neither noun nor verb, then? There are a few words in Algerian Arabic that form predicates and comme at the start of the clause, but do not take verbal morphology - for instance, makash ماكاش "there is no" or oulah ولاه "no need (for)". Putting it in this class would take care of the problem, but just leads us to a different one: can this class of non-verbal predicators be given a coherent positive definition, or is it just whatever happens to be left over from defining the major word classes?

Be that as it may, best wishes to all readers for this coming month, and, for those fasting it, Sahha Ramdankoum!

Tuesday, June 24, 2014

From Figuig to Igli: Berber in the Algerian-Morocco borderland

The number of good Berber descriptive dictionaries has been slowly but steadily increasing in recent years, but Hassane Benamara's new Dictionnaire amazigh-français : Parler de Figuig et ses régions (Rabat: IRCAM, 2013), which I was lucky enough to be lent a copy of lately, is surely one of the best. Apart from being quite unusually large (800 pages), it incorporates examples, multiple senses, pictures of items difficult to describe, an appendix with encyclopedic information on culturally specific words such as festivals and childrens' games. It incorporates a few neologisms useful for schooling, but takes a fairly inclusive attitude towards Arabic loanwords. There are barely 15,000 people in Figuig, but, astonishingly enough, this is actually the second dictionary of Figuig Berber published by a native speaker; the first, Ali Sahli's معجم أمازيغي-عربي (خاص بلهجة أهالي فجيج) (Oujda: Al Anwar Al Maghribia, 2008), was a good effort, but is substantially shorter and used a less accurate transcription. (There's even another linguist from Figuig, Mohamed Yeou, threatening to make a third dictionary – if he goes ahead with the project, he'll have a high hurdle to clear.)

Across the border in Algeria, the situation is rather different. A number of towns across a wide area around Bechar and Ain Sefra speak Berber varieties closely related to that of Figuig, collectively imprecisely termed "Shelha". Some of them seem to be shifting to Arabic (on my latest trip, I was told that in Lahmar they had stopped speaking Berber with their children, and for Igli I had heard the same much earlier.) But little effort – and no official effort, as far as I know – is being made to document them. The only (very) partial exceptions of which I am aware are Igli and Boussemghoun.

For Igli (population 7000), I have already described the local Scouts' efforts to put together an online dictionary. More recently, however, I came across a laudable local attempt at approaching the problem academically: Fatima Mouili's The Berber Speech of Igli, Language towards Extinction. After a very brief summary of Igli grammar and phonology, unfortunately made frequently illegible by font problems, the author discusses the reasons for language shift. Corresponding to my impressions for the region, including Tabelbala, she cites emigration and the desire to ensure educational success as important drivers; others are more surprising, including the immigration of refugees expelled by the French from a nearby village during the Algerian War of Independence. Apparently, her thesis discusses similar issues, for those with 59€ to spare...

For Boussemghoun (population 4000), a few articles and a book by Mohamed Benali may be cited, all focusing – as far as I can see – exclusively on the sociolinguistic situation of Berber in the town. A local Berber-language poet billed as "the Ait Menguellet of Boussemghoun", Bashir Oulhaj, has a considerable presence on YouTube, eg here; he's even been interviewed, by Figuig News. It seems to be treated as the centre for Amazigh identity in the region; the HCA has even organised a symposium there. Nevertheless, little if any descriptive work has been published on its variety of Berber.

Taken together, there are probably more speakers of Berber in southwestern Algeria than in and around Figuig. Why the difference, then? Is it because linguistics is better represented in Moroccan universities than in Algerian ones? (Notwithstanding some interesting work coming out of Algeria, I think that is fair – it would be hard to think of any linguist working in Algeria with a profile comparable to Abdelkader Fassi Fehri, for example.) Or is it because the Amazigh movement in Morocco is less closely associated with one side in the "culture war"? (Benali observes that, while most Semghounis wanted Berber to be taught in schools, they rejected the installation of an HCA office due to distrusting their politics.) Or are there more specific, purely local factors explaining the difference? That would be worth a study in itself – though perhaps not as much so as the Berber varieties in question!

Tuesday, June 17, 2014

Why Yiddish is not Slavic, and language families are not families

Recently I came across a popular article, Where Did Yiddish Come From?, discussing Paul Wexler's eccentric claim that Yiddish is a "relexified" Slavic language (and Modern Hebrew, in turn, "relexified" Yiddish). To make any sense of this claim, we have to stop and consider what historical linguists mean when they talk about language origins.

If you want to learn a language perfectly, the best way to start is to pick it up as a child from your family and the community they're part of. That way, you and your generation end up speaking the same language as your parents and their generation, modulo a few little innovations you threw in just to annoy them. As those little innovations pile up, generation on generation, sooner or later you end up speaking something that the first generation wouldn't have been able to understand. In such a scenario, everyone agrees, the latest generation's language – let's call it B – is descended from the first generation's (A). If some of the children of that first generation moved far away early on and went through the same process of gradual change, their descendants speak another language, C, which speakers of B can't understand, but which is also descended from A. So we say that B and C belong to the same language family, just as their speakers belong at some remove to the same extended family.

If you're reading this, it's probably too late to learn a language that way. (Sorry.) You can still learn another language, say B, but the odds are that, at best, you'll always speak it with a bit of a foreign accent, and keep using expressions that make sense in English but sound weird to native speakers. If you're just an individual migrant learning it to fit in, that won't matter in the long run – your kids will learn the language in the playground and come back speaking it better than you do. But what if it's not just you that's learning it, but also your spouse, and your brothers, and almost everyone you know? What if your whole community is starting to prefer to speak this language with their kids, instead of the one they grew up with? In that case, the kids will still end up speaking it – but instead of speaking it like natives, they'll probably end up speaking it with your foreign accent and all those expressions of yours that native speakers laugh at. In that scenario, does the kids' language (let's call it D) belong to the same language family as B and C, or not? That's the ambiguity that Wexler is playing with.

The obvious answer – and the one most linguists would give – is yes*. For one thing, assuming you did a half-decent job of learning B, it's the same language – speakers of D can understand speakers of B, and vice versa, even if they laugh at each other's crazy accents. The influence of Gaelic may pervade Irish English, but Irish English is still English, not some Celtic language. It's the vocabulary and the morphology that really make English understandable – a weird accent or a funny way of putting things is just not that big an obstacle on its own. Wexler proposes exactly the opposite criterion: "Yiddish – in contrast to its massive German vocabulary – has a native Slavic syntax and sound system – and thus must be classified as a Slavic language" (1993:5). The origins of Yiddish syntax and phonology I can't comment on, but there's a good reason why historical linguists normally prioritise the vocabulary and the morphology over the syntax and phonology, even apart from the one just given. Vocabulary and morphology are eminently reconstructible, using the comparative method. Phonology, on the other hand, can only be reconstructed from vocabulary, and syntax is notoriously hard to reconstruct at all. If language families were to be defined based on phonology and syntax, it would hardly be possible to define them, much less reconstruct them or state regular correspondences between them.

In short, saying that Yiddish (much less Modern Hebrew) belongs to the Slavic language family is just a word game – in the sense that historical linguists normally use the concept of "language family", it doesn't, and wouldn't even if every last Yiddish speaker happened to be of Slavic ancestry and to speak Yiddish with a heavy Slavic accent. But such word games do not vitiate Wexler's work. After a large enough community has shifted to a different language, it is usually possible to find traces of their former language – although identifying them as such, rather than as later borrowings, may be hard. That's what Wexler is trying to do for Yiddish, and that's how he supports his claim that Yiddish speakers' ancestors used to speak a Slavic language.


* However, the question can easily be made more controversial. Suppose you and your community didn't learn it that well to start with, and aren't trying to imitate native speakers anyway? In that case, the kids will end up speaking something that sounds utterly ridiculous to native speakers; the basic words are recognisable, but the way they're put together seems all wrong. Whatever Tok Pisin is, most people would agree that it's not English. A few people would defend the claim that Tok Pisin belongs to the same family as English, on the basis that that's where the vocabulary comes from, but most would say that it doesn't belong to a language family. The language family model presupposes that the language is being passed on reasonably well as a whole, including not just vocabulary but also some amount of grammar; if all that's learned is a bunch of words, the model breaks down. The border must be drawn somewhere between the extremes of Irish English and Tok Pisin, but linguists can and do disagree on where exactly to draw it.

Tuesday, June 10, 2014

The Subclassification of Songhay, now online

After more than a year, I can now finally put a PDF of my article The Subclassification of Songhay and its Historical Implications online for whoever may be interested. The abstract follows:
This paper seeks to establish the first cladistic subgrouping of Songhay explicitly based on shared arbitrary innovations, a prerequisite both for distinguishing recent loans from valid extra-Songhay comparanda and for determining how Songhay spread. The results indicate that the Northern Songhay languages of the Sahara form a valid subfamily, even though no known historical records link Tabelbala to the others, and that Northern Songhay and Western Songhay (spoken around Timbuktu and Djenné) together form a valid subfamily, Northwestern Songhay. The speakers of Proto-Northern Songhay practised cultivation and permanent architecture, but were unfamiliar with date palms. Proto-Northwestern Songhay was already in contact with Berber and probably (perhaps indirectly) with Arabic, and was spoken along the Niger River. Proto-Songhay itself appears likely to have been in contact with Gur languages, confirming its relatively southerly location. This result is compatible with two scenarios for the northerly spread of Songhay. On Hypothesis A, Northern Songhay spread out from an oasis north-east of Gao, probably Tadmakkat or Takedda, and Northwestern Songhay had been spoken in areas west of Gao which now speak Eastern Songhay. On Hypothesis B, Northern Songhay spread out from the Timbuktu region, and Western Songhay derives from heavy “de-creolising” influence by Eastern Songhay on an originally Northern Songhay language. To choose between these hypotheses, further fieldwork will be required.
Comments welcome!

Beni-Snous Berber

I have the pleasure of announcing that my article with Fatma Kherbache, Syntactically conditioned code-switching? The syntax of numerals in Beni-Snous Berber, has just been published online in the International Journal of Bilingualism. Long-term readers may recall that, five years ago now, I noticed an astonishing claim in Destaing's grammar of Beni-Snous Berber (spoken near Tlemcen, in western Algeria): that, with numerals above ten, they only used Arabic nouns. In this article, I finally try to get to the bottom of this, based both on Destaing's corpus and on data gathered by my co-author from the half-dozen or so last speakers; the real situation turns out to be a little more complicated than Destaing described, but his claim is correct as a statistical generalisation. Syntactically conditioned code-switching as a systematic part of otherwise monolingual discourse has rarely been described, but one other instance is reported – numeral+noun combinations in the Jerusalem dialect of Domari, an Indic language of the Levant spoken by the Dom "gypsies". Comparing the circumstances of switching in both languages supports the generalisation, building on Myers-Scotton's work, that syntactically conditioned code-switching (Matras' "bilingual suppletion") can only happen when a word shared by both languages has different selectional requirements in each language.

Sunday, June 08, 2014

Standard Arabic and cartoons

In the Arab world, practically all cartoons are dubbed in Standard Arabic, rather than in the different countries' spoken varieties. Until recently, Disney was the exception, using Egyptian Arabic; its decision to use Standard Arabic like everyone else has attracted some controversy (New Yorker, Language Log, Arabic Literature, MEI), although it will be very welcome in the Maghreb, where kids don't understand Egyptian anyway. In general, however, there's a strong consensus in favour of Standard Arabic in cartoons; it's seen as a good way to get children used to Standard Arabic, and thus prepare them for school. What are the effects of this?

Let's start by looking around us. We see that younger generations understand Standard Arabic rather well, and have a much larger Standard Arabic vocabulary than earlier generations did at the same age. A cursory search suggests that cartoons have played a role in this; for example, Weyers 1999 shows that American students of Spanish improved their listening comprehension and used a larger vocabulary after watching a Spanish-language telenovela, and Blosser 1988 that Hispanic children, once they've mastered the basics of English, improve their English by watching more TV (although this does not seem to work below the age of 2). So parents are probably right to think that Standard Arabic cartoons are helping their kids learn Standard Arabic.

However – let's be honest – those same younger generations remain largely unable to write a grammatically correct paragraph in it, and normally speak in Standard Arabic only to quote prestigious texts or to parody TV presenters or politicians. This suggests that what they're gaining from it is limited to what Weyers 1999 identified for learners of Spanish: better comprehension and a larger vocabulary, but not better production. I don't think it's an exaggeration to say that, in Algeria at least, Standard Arabic is effectively a read-only language: everyone under a certain age can understand it and read it, practically no one can express themselves in it correctly or confidently. So, as an educational tool, cartoons have their limits.

But education isn't everything. Cartoons are one of the most secure domains of spoken Standard Arabic, right up there with news broadcasts, documentaries, and historical soap operas, and well ahead of teaching, sermons, political speeches, and interviews, all of which often use varying amounts of dialect. For younger children, unless their parents read to them, cartoons may well be the only context other than school and prayer where they regularly hear Standard Arabic (cp. Hamzaoui 2014), and in any case are one of the first contexts they learn to associate with Standard Arabic. Shouldn't we be asking how this affects their feelings about the language?

Sunday, April 27, 2014

Speaking in Oran

It's a bit last minute, but I'm glad to announce that I will be giving two talks in Oran over the next few days: It would be a pleasure to see some readers of this blog there.