Thursday, December 26, 2013

Does Arabic have the most words? Don't believe the hype.

For some time, I've been hearing rumours (from Arabs, of course) that Arabic has the largest number of words of any language. Recently I found one vector for this rumour: Comparison of the Number of Words in Languages of the World, a poster put together by Azzam Aldakhil which has the merit of at least giving the sources for its figures, namely Muʕjam ʕAjā'ib al-Lughah by Shawqī Ḥamādah, 2000. (In a follow-up comment he gives the page numbers, 83-84.) This poster claims that "Arabic has 25 times as many words as English".

Unfortunately for this claim, if you go to the book cited, what you actually find is a calculation of the number of possible roots in Arabic, without regard to whether or not the root actually has a meaning. Such a count includes huge numbers of unused roots such as بزح bzḥ or قذب qḏb, while at the same time lumping together all words derived from the same root; كتاب book, كاتب writer, and مكتب office are three words, but only one root. The result of such a calculation might tell us something about the potential for expanding Arabic, but absolutely nothing about the state of the Arabic language. And since in practice both Arabic and the languages it is being compared to on that poster allow arbitrary long words without real roots, if only in loanwords, it doesn't even tell us much about its potential.

Both the number of Classical Arabic roots with actual meanings and the number of words can be estimated from the classic dictionaries: according to Sakhr's statistics, there seem to be around 10,000 roots, and up to 200,000 distinct words. Roots don't play such a major role in the lexicography of most non-Semitic languages, so it's difficult to compare the number of roots cross-linguistically. But in terms of words, that would be slightly fewer than English (250,000 in the OED, although the poster cites 600,000) and slightly higher than French (over 100,000 excluding proper nouns, according to the Académie Française).

However, such comparisons can hardly fail to be misleading. For one thing, English is much more hospitable towards dialectal and colloquial usages than Arabic is – the OED is full of words marked as Scottish or Northern or slang or whatnot, the equivalents of which would never be accepted by an Arabic dictionary. For another thing, the whole enterprise of counting words across languages runs into apparently insuperable problems, especially when it comes to compounds, which Arabic dictionaries do not normally treat as words. If you include compounds, then compound-friendly languages like German or Turkish or Inuktitut are automatically going to beat all the rest – and all the available statistics that I've seen for, say, English happen to include compounds.

So the best answer is that we don't really know, and that word count, even if we could measure it better, is not a very good measure of a language's expressive power anyway. Some missing words make a genuine difference, as I've discussed here before. But is English really missing out by not having distinct words for male camels (جمل) vs. female camels (ناقة)? Is Arabic really missing out by not having a special word for cornpone, or for scones?

Wednesday, December 11, 2013


Tadaksahak, a heavily Berber-influenced Northern Songhay language spoken in northern Mali and Niger and closely related to Korandjé, is a remarkable example of how far language mixture can go. While the core grammar remains Songhay, causatives and passives can only be formed using Berber morphology attached to Berber stems, so every non-Berber verb in the language has a suppletive causative and passive (there are only a couple of hundred of those left, though, so it's not that impossible to learn.) I recently finally finished a review of Regula Christiansen-Bolli's Grammar of Tadaksahak (you can read the review here). For various reasons, I ended up taking the opportunity to write an overview of the general problem of how the language came into being. I don't have a final answer, but I did find that it was even more complicated than it looks.

You see, Tadaksahak speakers are currently mostly bilingual in Tuareg, and well integrated into Tuareg culture. Most of the Berber loanwords in Tadaksahak are from one or another Tuareg variety. But quite a few – including some of those irregular causatives and most numerals up to 20 - are demonstrably not from Tuareg, but from some other Berber language, closely related to Tetserrét (Niger). Today, Tetserrét is nearly extinct, and nobody speaks it as a second language; obviously things must have been different in the past. It looks like most Tadaksahak speakers are visibly of Berber descent, so probably they shifted from Tetserrét to Northern Songhay and then came under Tuareg influence. But why would anyone want to adopt Northern Songhay, currently barely hanging on in one or two remote towns of northern Niger, as a first language? Again, obviously things must have been different, but it's not easy to see how. My best guess for the moment is that they did so in order to reinforce their identity as religious specialists (ineslemen, "marabouts"), since Songhay was the language of the urban centres where advanced religious studies could be pursued, but there are a lot of question marks over that. To confuse matters further, their neighbours like to claim that Tadaksahak speakers are of Jewish descent - probably just to undermine their religious specialist status, but possibly reflecting some more complex history.

Oral tradition isn't much help; there is no firm consensus within the group on their history, and such genealogies as have been circulated, by themselves or by their neighbours, look very much like efforts to push self-serving agendas. About the only common theme across them is that they came from the west. Genetic testing might give firmer data, but the results could be politically sensitive. More lexical data, both for Tadaksahak and for other minority languages of the region, would certainly help, but the problem is ultimately cross-disciplinary - historians, archeologists, anthropologists, etc. take note! Any ideas?

Monday, December 09, 2013

wləd/wlid- "boy, son": An irregular development

There's a curious feature I recently noticed about the Arabic of Dellys in Algeria (I can't imagine what took me so long, since it's in my own idiolect as well!). In Morocco and western Algeria "boy" and "son" are both ولْد wəld, corresponding regularly to Classical Arabic وَلَد walad. In Dellys, "boy" is ولد wləd, again corresponding regularly (in Morocco, CaLaC and CaLC, where C is any consonant and L is a sonorant, both end up as CəLC; in central Algeria, the former becomes CLəC, the latter CəLC). But with a possessor – ie, in the sense of "son" – is not wləd, but وليد wlid. You can say وليد خويا wlid xu-ya "my brother's son" or وليدك wlid-ək "your son", but not *wləd xuya or *wəld-ək. It's not obvious how to explain this historically; on the face of it, it looks like a completely irregular development. There are a few other nouns derived from the pattern CaCaC – for instance حنش ħnəš "snake", حبق ħbəq "basil" – but I can't think of any cases offhand which frequently occur in the construct state (that is, with a possessor directly following them). It might be compared to the diminutive, but in present-day Dellys Arabic anyway, the diminutive is وليّد wliyyəd, not wlid.

Has anyone come across a similar phenomenon in any other Arabic variety?

Friday, December 06, 2013

19th c. Songhay sources from Tanzania and the US

A while ago, I posted about the earliest European source for Songhay. Shuichiro Nakao, who's been doing some interesting work on the 19th-century development of Arabic-based creoles, recently sent me a link to an early record of Songhay from an even more surprising source: the journal Tanganyika Notes and Records. The article in question is a summary autobiography of Adrien Atiman, who spent most of his life working as a Catholic missionary in central Africa. Apparently, as a child he was taken (sold or kidnapped? he would never know which) as a slave from Tindirma (modern Mali) and brought north to Metlili, where he was "ransomed" by a Catholic priest in 1876, converted, trained for priesthood, and finally sent off to a completely different part of Africa to be a missionary. He gives a few words, the only ones he could still remember of his native language after so many years: ""Coro" meaning lion, "Boro" man, "Elham" meat, "Bri" bone, and "Kunduhari" beer." These are easily recognisable as the Koyra Chiini forms (after Heath 2005): kooro hyena, boro person, ham meat (crossed here with Algerian Arabic lħəm "meat"), biri bone, and kundu "bourgou grass" + hari "water" (a syrup is traditionally made from bourgou grass). But it is striking that, even for these last holdouts, the meanings are not remembered exactly. Your first language is not necessarily the language you are most fluent in!

As it happens, another Songhay-speaking slave also left us his biography, from slightly earlier in the nineteenth century (1854): Mahommah Gardo Baquaqua. A native of Djougou (modern Benin), he was taken prisoner while visiting a different town and sold south to the coast, ending up as a slave in Brazil, but eventually managed to escape while passing through New York, which had already abolished slavery. He gives the numbers from one to a thousand in Dendi, as well as a few vocabulary items scattered throughout the book. (Not all the latter are Dendi – some are Hausa, eg "cofa" (properly ƙofa) for "gate".)

I've managed to trace a few Songhay loanwords in North Africa, but as far as I know no one has ever reported a Songhay loanword in the Americas. That is probably to be expected, since most slaves there would have come from regions closer to the sea – but it would be interesting to look more closely...

Propaganda and grammatical gender

I try my best to avoid reading products of the propaganda wars currently raging in the Middle East, but today I found that they had managed to leak into the usually apolitical world of linguistics blogging. In a recent post about the way grammatical gender affects how we imagine anthropomorphic characters, Asya Pereltsvaig alludes to a fatwa supposedly arguing that "the word for ‘sea’ is grammatically masculine in Arabic, and so when a woman goes swimming and “the water touches the woman’s private parts, she becomes an ‘adulteress’ and should be punished”." This is sourced to an article in India Today, based on Al Masry Al Youm, which in turn cites a report by Dr. Sayyid Zayed of Al-Azhar titled "The Errant Fatwas of the Muslim Brotherhood and the Salafis" (الفتوى الضالة عن الإخوان والسلفيين). This report is not online, and none of the links identify the author of the fatwa in question, but Google provides an answer - an article from 17 September 2012 gives a screenshot of a Tweet allegedly posted on 11/5/2011 by @AliAlirabieeii saying "It is one of the greatest sins for a woman to go down into the sea, even covered, since the sea is masculine, and when the water goes into her private parts she thus becomes an adulterer and liable to the stipulated punishment." There is in fact a Dr. Ali Al-Rabiei, a vocal Saudi imam, and he does have a Twitter account - @DrAliAlrabieei. On this Twitter account, he tweeted on 28 May 2012 that "The Shia are counterfeiting a sixth fake account in my name - @AliAlirabieeii - to display smears and fakery; we call upon you to inform about it and get it closed."

In some ways, this brief odyssey through the sad world of Twitter warfare was superfluous. The slightest knowledge of Middle Eastern politics should be enough to tell anyone that a story run by Al Masry Al Youm, or a report by Al Azhar, published not long after the ouster of Morsy and explaining how the Brotherhood are completely crazy, might need to be taken with a pinch of salt. In the current political battles of the Middle East, attributing horrifying fake quotes to leaders from the other side has become a rather popular tactic. I don't know what the background is for the Iraqi fatwa cited later in the same post (a slightly different account is sourced by the Daily Telegraph to the observations of a Sunni leader from Anbar), but common sense tells us it's more likely to be hostile propaganda than to be anybody's actual belief, no matter how crazy. Salafis are known for being especially strict about the need to separate men and women; whoever was behind these stories must have decided that the idea of extending this to separating grammatically masculine things from feminine things would be just plausible enough to fool ordinary people while at the same time ridiculous enough to horrify them. Apparently, he was right.

[Addendum: Looking at this post again, it occurs to me that it's missing the human dimension; you can probably reconstruct it from the facts, but just in case, here are the basics. The Twitter accounts were very likely intended as satire, notwithstanding Alrabieei's furious response – and he may well have deserved satire, if his positions on the Shia are as extremist as they seem to be. The fact that a number of sketchy Arabic news sources picked it up as if it were real might be an honest mistake, but much more likely was simply because they were looking for any opportunity, honest or dishonest, to embarrass someone on the opposite side of the current culture wars. The Egyptian media then picked it up because what they wanted to do was paint opponents of the current government as insane fanatics, but left out his name and identity because he's Saudi, and the Saudi government is strongly on the side of the current Egyptian government. That's dishonesty any way you spin it.]