Thursday, December 26, 2013

Does Arabic have the most words? Don't believe the hype.

For some time, I've been hearing rumours (from Arabs, of course) that Arabic has the largest number of words of any language. Recently I found one vector for this rumour: Comparison of the Number of Words in Languages of the World, a poster put together by Azzam Aldakhil which has the merit of at least giving the sources for its figures, namely Muʕjam ʕAjā'ib al-Lughah by Shawqī Ḥamādah, 2000. (In a follow-up comment he gives the page numbers, 83-84.) This poster claims that "Arabic has 25 times as many words as English".

Unfortunately for this claim, if you go to the book cited, what you actually find is a calculation of the number of possible roots in Arabic, without regard to whether or not the root actually has a meaning. Such a count includes huge numbers of unused roots such as بزح bzḥ or قذب qḏb, while at the same time lumping together all words derived from the same root; كتاب book, كاتب writer, and مكتب office are three words, but only one root. The result of such a calculation might tell us something about the potential for expanding Arabic, but absolutely nothing about the state of the Arabic language. And since in practice both Arabic and the languages it is being compared to on that poster allow arbitrary long words without real roots, if only in loanwords, it doesn't even tell us much about its potential.

Both the number of Classical Arabic roots with actual meanings and the number of words can be estimated from the classic dictionaries: according to Sakhr's statistics, there seem to be around 10,000 roots, and up to 200,000 distinct words. Roots don't play such a major role in the lexicography of most non-Semitic languages, so it's difficult to compare the number of roots cross-linguistically. But in terms of words, that would be slightly fewer than English (250,000 in the OED, although the poster cites 600,000) and slightly higher than French (over 100,000 excluding proper nouns, according to the Académie Française).

However, such comparisons can hardly fail to be misleading. For one thing, English is much more hospitable towards dialectal and colloquial usages than Arabic is – the OED is full of words marked as Scottish or Northern or slang or whatnot, the equivalents of which would never be accepted by an Arabic dictionary. For another thing, the whole enterprise of counting words across languages runs into apparently insuperable problems, especially when it comes to compounds, which Arabic dictionaries do not normally treat as words. If you include compounds, then compound-friendly languages like German or Turkish or Inuktitut are automatically going to beat all the rest – and all the available statistics that I've seen for, say, English happen to include compounds.

So the best answer is that we don't really know, and that word count, even if we could measure it better, is not a very good measure of a language's expressive power anyway. Some missing words make a genuine difference, as I've discussed here before. But is English really missing out by not having distinct words for male camels (جمل) vs. female camels (ناقة)? Is Arabic really missing out by not having a special word for cornpone, or for scones?


PhoeniX said...

It's interesting these 'vocabulary sizes'. Disregarding any form of counting, I'm often confronted in English with a much broader range of choice in certain basic vocabulary which I would never have in my native language Dutch.

That is mostly because a lot of the basic vocabulary in English has at least two pairs, one of Germanic origin and one of Romance origin.

Pairs like "to eat/to consume" are quite rare to find in Dutch (although technically, consumeren might be a word in Dutch... but well... can't think of a better example right now).

Another pair would be "drink/beverage".

That kind of 'richness' in options which, if anything, only have a semantic distinction in how sophisticated it sound, is almost absent in Dutch. That really does give the impression to me sometimes, that the English vocabulary is much greater than the Dutch is.

Impressionistic measurements probably make a lot more sense to evaluate these kind of things, than numeral evaluations. Haha.

Lameen Souag الأمين سواق said...

Of course Arabic doesn't have an equivalent of English's quasi-systematic Germanic/Romance doublets (as I discussed a while back). On the other hand, it does have quite a lot of other doublets or near-doublets which I take to result from dialect mixing at various periods, encouraged by the demands of rhetorical style – eg "be able" yaqdiru vs. yastaṭīʕu, or all the famous synonyms for "lion" – 'asad, sabʕ, ghaḍanfar... and these have sometimes been supplemented by pairs regionally borrowed from different languages, eg bandūra vs. ṭamāṭim for "tomato", or quraydis vs. jambarī vs. rubyān for "shrimp". Impressionistically I would also say Arabic tends to make fine distinctions of meaning more often than many languages, but that would be difficult to demonstrate. In practice, of course, there's a very good reason why learning the Arabic lexicon should be harder than most other languages: so much of it is much more than doubled by diglossia!

Anonymous said...

Dear Lameen,
I have send you a message on Facebook please check your inbox (other box of unknown members)

Piotr Gąsiorowski said...

In my native Polish (an in many other languages, e.g. Italian and Spanish) it's easy to inflate the noun inventory by the liberal use of expressive derivatives (diminutives, augmentatives, pejoratives). Standard dictionaries normally ignore them unless they have a really distinct meaning or are very frequent. Thus, from żaba 'frog' we get żabka, żabcia, żabusia, żabeńka, żabeczka, żabunia, żabuńka, żabiątko, żabula, żabulka, żabuleńka (dim., often used as terms of endearment), żabsko, żabisko (aug.), etc.

David Marjanović said...

English doesn't have a special word for the opposite of "loud" (German leise – "quiet" has to cover it, which must result in unintended implications sometimes. French even extends a word with an even wider range of meanings (doucement) that include "slow down!".

German, in turn, lacks a word for "turd". In cases of emergency "sausage" or its diminutives are stretched to cover it, disgusting everyone. :-)

David Marjanović said...

Oops. Forgot to close the parenthesis after leise.

qqq said...

Regardless of the number of words as a possible measure of language capabilities to express, I'd say one of the most amazing features in Arabic is the flexibility in sentence construction. Not only that, but also the ability to provide a very distinct meaning and focusing the sentence to a single unique meaning. This is hardly achievable in English. Not only that, did like English more only after I read English Translation of Qur'an Through which I have seen how Arabic construction given much eloquent tone to traditional English. Finally, I have also seen how difficult it is to translate Qur'an to English and how translators suffer to transfer the whole meaning with all included possibilities and shades to English. Arabic is a very clean, logical and expressive language if compared to English.

Anonymous said...

That is correct. I have seen so many translators suffer.

Anthonie said...

The Greek language is officially the richest language with 5.000.000 words and 70.000.000 word types.Even stated in the Guinness Book of Records. The distant number 2 is English language with 490.000 words of which 54.000 is Greek of origin.

1). Guinness book of Records: The Greek language is the richest language in the world

2). The Greek language is the first world language in the world and was spoken throughout the entire ancient world.

3). On top of that. The entire medical, mathematical, scientific, astronomic world uses Greek words. Which does not exist in ANY other language.

4). The Greek language has been in use for at least 3600 years and is the longest continuous still living language in the world.
"Greek has been spoken in the Balkan Peninsula since around the late 3rd millennium BC. The earliest written evidence is a Linear B clay tablet found in Messenia which dates to between 1450 and 1350 BC,[12] making Greek the world's oldest recorded living language."

5).Homer wrote one of the most complex poetry in the entire ancient world is 850 BC, 2850 years ago.

6). The Greek language is the basis of all European languages and in 54.000 words in English is Greek of origin. The total amount of words of Greek origin in all European languages (German, French, Spanish, English, etc) are 500,000 words.

Not only is there not a single language in the world that is the root and has influenced 60% of all world languages(East European, French, Latin, German, English, Spanish, Celtic, Russian, Asian, etc), but it's also the oldest living language still in use. and with the entire scientific, medic, astronomic world using the Greek language, it can only be the Greek language as the richest language in the world, which is logically officially recognized as the richest language in the world.

-Greek language: 5,000,000 words and. and contains approximately 70,000,000 words including, derivatives, medical terms and scientific expressions.

Lameen Souag الأمين سواق said...

Aaand thank you for demonstrating that it's not just Arabs who feel the need to bolster their self-worth with made-up statistics about numbers of words. For a more serious examination of the question for Greek, see Nick Nicholas:

Lerna I

Lerna II

Anonymous said...

Well said

Anonymous said...

Well said

Unknown said...

Yeah English is poor language if it has to be compared to Arabic. A small example to that is when you refer to relationships:
Aunt (goes to both parents sisters)
But in Arabic it's more specific
خاله which goes to mother's sister
عمه which goes to the father's
And the list goes on and on.
If we were to say that Arabic is more specific and English is more general.

Lameen Souag الأمين سواق said...

Kinship terms are certainly more specific in Arabic than in English; in that domain, English is relatively impoverished. But picking a single semantic domain tells us nothing about how the two languages' vocabularies compare overall.

Ignat831 said...

Please don't compare recently colloquialisms which are not included in major Arabic dictionaries such as Lisan AlArab and use that to show how Arabic borrows from other languages. One commenters playing on the ignorance of the reader has claimed that Arabic words have no real root. In fact, any word which done not have an etymology us clearly a non-Arabic word. Each Arabic word can not only be traced to its root, but to the circumstances under which it was created be thousands of years back. Arabic, which has a known estimated number of between 90 to 500,000,000 words. As I am a teacher of English linguistics as well as Arabic and Hebrew, which is in fact a diminutive form of Canaanite Arabic, both acquired languages, I can at test to the stark inferiority of all Indo-European languages to Semitic languages, and to Arabic in particular which has a grammatical feature known as "i'raab". I'raab in the Arabic language gives it clarity and order to such the extent that it is nearly impossible to have a misinterpretation which is unique in the Muslim Holy book, the Qur'an. Many Western Islamophobic apologetics attempt to diminish the overwhelming superiority of a language from which most of their axiomatic expression have been borrowed as well as the very concept of poetry as was borne out of the Islamic inspired European Renaissance. The attempt to present Indo-European languages as superior to Arabic is as futile as comparing Greek mathematics with its stick numbers to algorithms of Arabic, both terms of which Gabriel roots in Arabic.

Lameen Souag الأمين سواق said...

The comment of "Ignat831" is thoroughly confused. Arabic certainly does not have 500 million words, as discussed in detail above, nor is i3raab a particularly special feature - it's just the morphological marking of case and mood, both of which are similarly morphologically marked in many other languages, such as Latin or Sanskrit or Japanese. And, needless to say, poetry long predates both Islam and the Renaissance.

Unknown said...
This comment has been removed by a blog administrator.
Lameen Souag الأمين سواق said...

Unknown: "You should've probably site that economist article "the biggest vocabulary""

You mean the one specifically linked in my post under "apparently insuperable problems"?

Anyway, Godwin's Law violation + swearing at me = deletion. If you have a point to make, try reposting and making it in a more venue-appropriate fashion - this is not Reddit.

Chris said...

The question seems to have moved from word count to expressiveness where one could argue, English wins since far more people will understand it. Not that I would assert such a specious argument.

qahwagi said...

You pointed out that, according to Sakhr's statistics, Arabic has "up to 200,000 distinct words". However, according to an archived version of the page you linked to (which is currently not operable) (*/, the figure shown under عدد الكلمات is 2,000,000 for the dictionary of الغني and 3.948.160 for the dictionary تاج العروس. Where do you get the figure 200,000? Thank you for your attention.

qahwagi said...

P.S. The link I posted to Sakhr's archived page did not show up correctly. It should be

Lameen Souag الأمين سواق said...

Thanks for the archive link. I understand عدد المشتقات as referring to the number of words defined in the dictionary, and عدد الكلمات as referring to the total number of words in the dictionary including those in the definitions and examples. عدد المشتقات is 195,000, so I rounded up slightly.

qahwagi said...

Thanks for the reply. I should have looked at that chart more carefully; I totally missed the عدد المشتقات column.

Anonymous said...

Arabic has 12 million words plus, more than all European languages added together.

qahwagi said...

@Anonymous John ---- and what is the source of your claim?

Anonymous said...

bro you got this wrong

arabic does have the largest amount of words
it has 12,302,912

without roots or something like that

if i were to say with the roots of the word it will be

from 90,000,000 to 500,000,000 words

we have like a 1000 name for lion only

Anonymous said...

Languages such as Greek have similar flexibility in sentence construction.
As for the difficulty of translation, that plays true when translating between any 2 languages that are from different language families. Try translating a complex English novel into Arabic, and you will likewise find that Arabic doesn't have the words to express the original idea as elequontly and accurately. But you can derive the same meaning

Anonymous said...

Prove that it has 12,302,912 words. Where is that list?

flashdrive said...

Refering to comments on Greek being the oldest continuous language are we discounting Chinese which as a written laguage is over 6,000 years old?