Thursday, December 26, 2013

Does Arabic have the most words? Don't believe the hype.

For some time, I've been hearing rumours (from Arabs, of course) that Arabic has the largest number of words of any language. Recently I found one vector for this rumour: Comparison of the Number of Words in Languages of the World, a poster put together by Azzam Aldakhil which has the merit of at least giving the sources for its figures, namely Muʕjam ʕAjā'ib al-Lughah by Shawqī Ḥamādah, 2000. (In a follow-up comment he gives the page numbers, 83-84.) This poster claims that "Arabic has 25 times as many words as English".

Unfortunately for this claim, if you go to the book cited, what you actually find is a calculation of the number of possible roots in Arabic, without regard to whether or not the root actually has a meaning. Such a count includes huge numbers of unused roots such as بزح bzḥ or قذب qḏb, while at the same time lumping together all words derived from the same root; كتاب book, كاتب writer, and مكتب office are three words, but only one root. The result of such a calculation might tell us something about the potential for expanding Arabic, but absolutely nothing about the state of the Arabic language. And since in practice both Arabic and the languages it is being compared to on that poster allow arbitrary long words without real roots, if only in loanwords, it doesn't even tell us much about its potential.

Both the number of Classical Arabic roots with actual meanings and the number of words can be estimated from the classic dictionaries: according to Sakhr's statistics, there seem to be around 10,000 roots, and up to 200,000 distinct words. Roots don't play such a major role in the lexicography of most non-Semitic languages, so it's difficult to compare the number of roots cross-linguistically. But in terms of words, that would be slightly fewer than English (250,000 in the OED, although the poster cites 600,000) and slightly higher than French (over 100,000 excluding proper nouns, according to the Académie Française).

However, such comparisons can hardly fail to be misleading. For one thing, English is much more hospitable towards dialectal and colloquial usages than Arabic is – the OED is full of words marked as Scottish or Northern or slang or whatnot, the equivalents of which would never be accepted by an Arabic dictionary. For another thing, the whole enterprise of counting words across languages runs into apparently insuperable problems, especially when it comes to compounds, which Arabic dictionaries do not normally treat as words. If you include compounds, then compound-friendly languages like German or Turkish or Inuktitut are automatically going to beat all the rest – and all the available statistics that I've seen for, say, English happen to include compounds.

So the best answer is that we don't really know, and that word count, even if we could measure it better, is not a very good measure of a language's expressive power anyway. Some missing words make a genuine difference, as I've discussed here before. But is English really missing out by not having distinct words for male camels (جمل) vs. female camels (ناقة)? Is Arabic really missing out by not having a special word for cornpone, or for scones?


PhoeniX said...

It's interesting these 'vocabulary sizes'. Disregarding any form of counting, I'm often confronted in English with a much broader range of choice in certain basic vocabulary which I would never have in my native language Dutch.

That is mostly because a lot of the basic vocabulary in English has at least two pairs, one of Germanic origin and one of Romance origin.

Pairs like "to eat/to consume" are quite rare to find in Dutch (although technically, consumeren might be a word in Dutch... but well... can't think of a better example right now).

Another pair would be "drink/beverage".

That kind of 'richness' in options which, if anything, only have a semantic distinction in how sophisticated it sound, is almost absent in Dutch. That really does give the impression to me sometimes, that the English vocabulary is much greater than the Dutch is.

Impressionistic measurements probably make a lot more sense to evaluate these kind of things, than numeral evaluations. Haha.

Lameen Souag الأمين سواق said...

Of course Arabic doesn't have an equivalent of English's quasi-systematic Germanic/Romance doublets (as I discussed a while back). On the other hand, it does have quite a lot of other doublets or near-doublets which I take to result from dialect mixing at various periods, encouraged by the demands of rhetorical style – eg "be able" yaqdiru vs. yastaṭīʕu, or all the famous synonyms for "lion" – 'asad, sabʕ, ghaḍanfar... and these have sometimes been supplemented by pairs regionally borrowed from different languages, eg bandūra vs. ṭamāṭim for "tomato", or quraydis vs. jambarī vs. rubyān for "shrimp". Impressionistically I would also say Arabic tends to make fine distinctions of meaning more often than many languages, but that would be difficult to demonstrate. In practice, of course, there's a very good reason why learning the Arabic lexicon should be harder than most other languages: so much of it is much more than doubled by diglossia!

Anonymous said...

Dear Lameen,
I have send you a message on Facebook please check your inbox (other box of unknown members)

Piotr Gąsiorowski said...

In my native Polish (an in many other languages, e.g. Italian and Spanish) it's easy to inflate the noun inventory by the liberal use of expressive derivatives (diminutives, augmentatives, pejoratives). Standard dictionaries normally ignore them unless they have a really distinct meaning or are very frequent. Thus, from żaba 'frog' we get żabka, żabcia, żabusia, żabeńka, żabeczka, żabunia, żabuńka, żabiątko, żabula, żabulka, żabuleńka (dim., often used as terms of endearment), żabsko, żabisko (aug.), etc.

David Marjanović said...

English doesn't have a special word for the opposite of "loud" (German leise – "quiet" has to cover it, which must result in unintended implications sometimes. French even extends a word with an even wider range of meanings (doucement) that include "slow down!".

German, in turn, lacks a word for "turd". In cases of emergency "sausage" or its diminutives are stretched to cover it, disgusting everyone. :-)

David Marjanović said...

Oops. Forgot to close the parenthesis after leise.

qqq said...

Regardless of the number of words as a possible measure of language capabilities to express, I'd say one of the most amazing features in Arabic is the flexibility in sentence construction. Not only that, but also the ability to provide a very distinct meaning and focusing the sentence to a single unique meaning. This is hardly achievable in English. Not only that, did like English more only after I read English Translation of Qur'an Through which I have seen how Arabic construction given much eloquent tone to traditional English. Finally, I have also seen how difficult it is to translate Qur'an to English and how translators suffer to transfer the whole meaning with all included possibilities and shades to English. Arabic is a very clean, logical and expressive language if compared to English.

Anonymous said...

That is correct. I have seen so many translators suffer.

Anthonie said...

The Greek language is officially the richest language with 5.000.000 words and 70.000.000 word types.Even stated in the Guinness Book of Records. The distant number 2 is English language with 490.000 words of which 54.000 is Greek of origin.

1). Guinness book of Records: The Greek language is the richest language in the world

2). The Greek language is the first world language in the world and was spoken throughout the entire ancient world.

3). On top of that. The entire medical, mathematical, scientific, astronomic world uses Greek words. Which does not exist in ANY other language.

4). The Greek language has been in use for at least 3600 years and is the longest continuous still living language in the world.
"Greek has been spoken in the Balkan Peninsula since around the late 3rd millennium BC. The earliest written evidence is a Linear B clay tablet found in Messenia which dates to between 1450 and 1350 BC,[12] making Greek the world's oldest recorded living language."

5).Homer wrote one of the most complex poetry in the entire ancient world is 850 BC, 2850 years ago.

6). The Greek language is the basis of all European languages and in 54.000 words in English is Greek of origin. The total amount of words of Greek origin in all European languages (German, French, Spanish, English, etc) are 500,000 words.

Not only is there not a single language in the world that is the root and has influenced 60% of all world languages(East European, French, Latin, German, English, Spanish, Celtic, Russian, Asian, etc), but it's also the oldest living language still in use. and with the entire scientific, medic, astronomic world using the Greek language, it can only be the Greek language as the richest language in the world, which is logically officially recognized as the richest language in the world.

-Greek language: 5,000,000 words and. and contains approximately 70,000,000 words including, derivatives, medical terms and scientific expressions.

Lameen Souag الأمين سواق said...

Aaand thank you for demonstrating that it's not just Arabs who feel the need to bolster their self-worth with made-up statistics about numbers of words. For a more serious examination of the question for Greek, see Nick Nicholas:

Lerna I

Lerna II

khalid said...

Well said

khalid said...

Well said

Osama Al-Ghamdi said...

Yeah English is poor language if it has to be compared to Arabic. A small example to that is when you refer to relationships:
Aunt (goes to both parents sisters)
But in Arabic it's more specific
خاله which goes to mother's sister
عمه which goes to the father's
And the list goes on and on.
If we were to say that Arabic is more specific and English is more general.

Lameen Souag الأمين سواق said...

Kinship terms are certainly more specific in Arabic than in English; in that domain, English is relatively impoverished. But picking a single semantic domain tells us nothing about how the two languages' vocabularies compare overall.