Jabal al-Lughat: Sapir-Whorf hypothesis

Showing posts with label Sapir-Whorf hypothesis. Show all posts

Sunday, May 10, 2015

How to remember numerals better

In all the debate around "Whorfian" effects of language on cognition, one relatively well-known case has received oddly little attention among linguists, despite being widely discussed by psychologists and popularised by Malcolm Gladwell: the effect of word length on short-term memory (Baddeley et al. 1975). Basically, all other things being equal, it's easier to remember a sequence of short words than a sequence of long words. This suggests that our short-term memory for words (what psychologists confusingly call phonological memory) has a capacity limited by length - specifically, the amount that can be pronounced in about 2 seconds (Schweickert & Boruff 1987). That should suggest, in particular, that numbers presented orally will be easier to remember in a language with short numerals than in one with long numerals. (Note that this affects, among other things, IQ test results, since IQ tests typically include tests of numeral recall.)

Psychologists followed up on this by attempting to test this hypothesis with a number of language pairs (for an overview, see Baddeley (1997). Disclaimer: I'm not a psycholinguist, and the following references are certainly not exhaustive). The best-tested and most consistent result concerns Chinese. Mandarin and Cantonese numerals take shorter to say than English ones, and a number of psychologists have accordingly confirmed that Chinese speakers can remember longer numerals than English speakers (Stigler, Lee, & Stevenson (1986), Hoosain & Salili (1987)), even at 4 years old Chen and Stevenson (1988)), and that this applies even when bilinguals are tested across their two languages (Hoosain 1979). It goes further than that, in fact: Chincotta & Underwood (1997) find that, out of Cantonese, English, Greek, Finnish, Swedish, and Spanish, only Cantonese speakers remember significantly more digits than speakers of other languages - and that this difference disappeared if the subjects were prevented from rehearsing the numbers auditorily by being asked to keep repeating "la-la" while being tested, proving its linguistic nature. The difference ranges around 2 digits, with the exact figure depending on the experiment.

Data for other languages is less clearcut. Welsh numerals take longer to say in isolation than English ones, and Ellis & Hennelly (1986) accordingly found that English-Welsh bilinguals can on average remember longer numerals in English than Welsh. Naveh-Benjamin & Ayres (1986) simultaneously tested the hypothesis for university students in Israel speaking English, Spanish, Arabic, and Hebrew natively (but excluding the digits "seven" and "zero"). They found that the average number of digits recalled was highest in English (7.21), followed by Hebrew (6.51), then Spanish (6.37), and lowest in Arabic (5.77); the ordering by average number of syllables per digit, or by average time taken to read a digit, was English, Spanish, Hebrew, Arabic. However, the difference in number of digits recalled was smaller than predicted by the time taken to read a digit in each language, suggesting that other factors were also relevant.

A proviso is necessary: some recent work, without disputing the differences observed, has made a strong case that they relate not simply to length ( Lovatt, Avons, & Masterson 2000), but crucially to phonological factors (Service 2010, Lethbridge, Hinton & Nimmo 2002). This has been argued for Welsh numerals vs. English ones by Murray & Jones (2002), who find that Welsh digits take longer to say in isolation but actually take less time to say in connnected speech than English ones, and that changes of place of articulation at word boundaries negatively affect memory.

The research is curiously selective in terms of languages examined, and many of the experiments don't control for all possible confounding factors, such as diglossia and social status in the case of Welsh or Arabic. Nevertheless, it does at least seem well-established that speaking Chinese gives a short-term digit memory advantage over speaking major European or Semitic languages. So, if for some reason you regularly need to remember long numerals, and your preferred language doesn't happen to be Chinese, how do you compensate for this handicap?

There are two obvious ways to get around this (assuming you care enough about remembering numerals to want to, which depends very much on your tastes and circumstances). One is to remember the number visually (as a sequence of written digits) or even kinesthetically (as a sequence of typing actions), in which case this particular constraint no longer applies (cf. eg Olsthoorn, Andriga, & Hulstijn 2012). This only helps, however, if you remember numerals better visually or kinesthetically than auditorily, and my impression is that most people don't.

A probably more helpful alternative is to establish a code that lets you turn long numerals into much shorter words by identifying digits with single letters or single phonemes. This solution has a very long history in Arabic and Hebrew, in which each letter of the alphabet can be used to represent a digit: 'a is 1, b is 2, etc. (the first 9 digits are units, the second 10 are tens, and the rest are hundreds). Since short vowels are not letters, the resulting word can be given whatever vowels the user sees fit to give it. A common game of later poets using the Arabic script was to encode the date of their poem within the poem as a chronogram; more practically, Moroccan schoolchildren used to memorise the multiplication tables as a series of meaningless words formed by this encoding (Meakin 1905). Chronograms have been formed using Roman numerals, but for memorisation, at least, they are rather ill-adapted to such a system - think how much padding would be required to turn a number like MDCCCLXXXIII into words.

However, the spread of Hebrew studies in Western Europe following the Renaissance, and the increasing importance of memorising statistics there, encouraged European mnemonists to look for ways of emulating this encoding without having to learn a Semitic language. Doing so at a time when place notation was widely used, they introduced a crucial improvement: each consonant represented a digit in a place notation system, rather than a number in an additive notation system. After various cumulative efforts at improvement, this culminated in the early 19th century with the so-called Major system: 0=s/z, 1=t/d, 2=n, 3=m, 4=r, 5=l, 6=š/ž/č/j, 7=k/g, 8=f/v, 9=p/b, with vowels, semivowels, and laryngeals ignored. To remember 94801 (LACITO's zip code), for example, one would turn it into "professed". This system apparently remains in use among professional mnemonists to this day, despite being virtually unknown to wider society.

Perhaps this is why linguists haven't paid more attention to the word-length effect in the context of the Whorfian debate: it's a clear-cut effect of language on cognition, but not a very profound one, in that it should be fixable by some very simple hacks (or even just by borrowing some one else's numerals). But I'm not aware of any experimental work testing the effect of this particular hack on digit recall...

Saturday, March 14, 2015

Sapir-Whorf is no shortcut

Lately the Sapir-Whorf hypothesis - that the language you speak influences the way you think - has had a bit of a revival; investigators such as Boroditsky or Levinson have finally managed to demonstrate small Whorfian effects on colour perception and sense of direction. Unfortunately, these successes only underscore how difficult it would be to make a convincing case for the version of this idea that perennially fascinates the public: the idea that language determines aspects of our worldview. Well before Sapir or Whorf, Nietzsche summarises it in Beyond Good and Evil:

"The strange family resemblance of all Indian, Greek, and German philosophizing is explained easily enough. Where there is affinity of languages, it cannot fail, owing to the common philosophy of grammar - I mean, owing to the unconscious domination and guidance by similar grammatical functions - that everything is prepared at the outset for a similar development and sequence of philosophical systems; just as the way seems barred against certain other possibilities of world-interpretation. It is highly probable that philosophers within the domain of the Ural-Altaic languages (where the concept of the subject is least developed) look otherwise "into the world", and will be found on paths of thought different from those of the Indo-Germanic peoples and the Muslims [...]" (Walter Kaufman's translation)

If a community's grammar really does affect its worldview, two centuries of speculation have hardly brought us any nearer to proving it, much less figuring out how. The commonsense converse, that a community's worldview affects its grammar, is rather better supported. But this idea's attraction for intellectuals, I think, is basically technological: it holds out the promise of being able to change the way people think "just" by changing the way they talk, as envisioned for Newspeak and Láadan. Ironically, it's observably true that imposing a new language on a previously monolingual community usually implies major changes in the way they think - that's what happens when you introduce compulsory schooling - but that has less to do with the language than with the institutions diffusing it.

The technological question remains, then: can we redesign some aspects of our language to help us think more effectively?

For grammar, the answer is not obvious. For the lexicon, however, the answer is yes, and we do it all the time. If something seems to need a name, we give it one - "mouse" or "selfie". Sometimes we choose a name that transparently encodes an property of this item that's particularly important to remember - "henbane" or "fool's gold". Ask any taxonomist whether the existence and form of a name matters, or any mathematician whether all notations are equal.

But this isn't actually the shortcut that some science fiction would have us believe. Many readers probably know that "henbane" is some kind of plant, but couldn't identify it if it was sitting in front of them, much less take advantage of knowing the name to prevent some unfortunate fowl's death. Understanding a given domain requires you to have words for the items signified by its technical vocabulary, but the most important part of that is learning to identify and think about the referents. Hundreds of New Age texts attest to the fact that you can use the vocabulary of quantum mechanics without understanding the first thing about it.

This points the way towards a solution, but not a very linguistic one: If you want to make your language better for thinking with, then first learn to perceive and think about the world more clearly yourself, and then share what you learn (and the labels you've given to it) with other interested speakers. Make a point of spotting and labelling relevant differences between things or situations, and involve yourself in a wider range of situations than you're used to. A sign is a link between word and world - between the set of all possible combinations of phonemes, meaningless in themselves, and the set of everything the speaker has some idea how to recognise. Expanding the former is meaningless unless you're expanding the latter.

Sunday, August 25, 2013

Why having "no word for X" can matter

The nice thing about French, from an English speaker's perspective, is that its lexical structure is so much like that of English that you can often translate a sentence without having to think much about what it means. Let's try this sentence, for example:

"Process and Reality presents a system of speculative philosophy which is based on a categorical scheme of investigation designed to explain how concrete aspects of human experience can provide a foundation for our understanding of reality."

Without seriously contemplating whatever it is that the author of this sentence is trying to say, I can render this in French as:

"Procès et Réalité présente un système de philosophie spéculative qui ~~est fondé~~ s'appuie sur un plan catégorique d'investigation ~~destiné~~ qui vise à expliquer comment des aspects concrets de l'expérience humaine peuvent fournir une base pour notre compréhension de la réalité."

~~No doubt there are some issues with this translation – my French has a long way to go.~~ (fixed) But producing it was a relatively easy, almost mechanical task. Translating it into Standard Arabic I have to think a good deal more about the sense of each word (and also have less confidence in the results since I don't own a philosophy-focused dictionary) but I can still readily make it nearly word-for-word:

"كتاب السيْر والواقع يقدم نظام فلسفة نظرية مبني على مشروع فحص تصنيفي معمول ليفسر كيف يمكن لبعض الجوانب الملموسة لتجربة الإنسان أن تعطينا أساسا لفهم الواقع.
("kitābu s-sayri wa-l-wāqiʕ yuqaddimu niđ̣āma falsafatin nađ̣ariyyatin mabniyyun ʕalā mašrūʕi faħṣin taṣnīfiyyin li-yufassira kayfa yumkinu li-baʕđ̣i l-jawānibi l-malmūsati li-tajribati l-'insāni 'an taʕṭiyanā 'asāsan li-fahmi l-wāqiʕi.")

Now suppose I want to translate this into Algerian Arabic. What am I going to do about words like "process", "reality", "speculative", "concrete"? Plenty of Algerians have studied such notions, but they've done so in French or in Standard Arabic. What I would normally do in such cases is simply substitute a Standard Arabic word wherever I can't think of one that would count as Algerian Arabic, yielding something like this:

"كتاب السير والواقع يقدّم واحد النظام تاع الفلسفة النظرية اللي مبنية على مشروع تصنيفي تاع الفحص، خدمُه باش يفسّر كيفاش الجوانب الملموسة نتاع تجربة الإنسان تقدر تعطيلنا أساس باش نفّهمو الواقع."
("ktab əs-sayr w-əl-wāqiʕ yqəddəm waħəd ən-niđ̣am taʕ əl-fəlsafa n-nađ̣aṛiyya lli məbniyya ʕla məšṛuʕ təṣnifi taʕ əl-fəḥṣ, xədmu baš yfəssər kifaš əl-jawanib əl-məlmusa ntaʕ təjribt-əl-'insan təqdər təʕṭi-lna 'asas baš nəffəhmu əl-wāqiʕ.")

On the other hand, what a lot of other educated Algerians would do is something more like this, filling in all the gaps from French:

"كتاب بروسي إي رياليتي يقدّم واحد السيستام تاع لا فيلوزوفي تيوريك اللي مبنية على أن پلان كاتيڤوريك دانفيستيڤاسيون خدمُه باش يفسّر كيفاش ليزاسپي كونكري نتاع ليكسبيريانس إيمان يقدرو يعطولنا إين باز باش نفّهمو لا رياليتي."
("ktab pRose e Reạlite yqəddəm waħəd əs-sistam taʕ lạ-filozofi teoRik əlli məbniyya ʕla ãn plõ kạtegoRik d-ãvestigasyõ xədmu baš yfəssər kifaš lizạspe konkRe ntaʕ l-ekspeRyõs üman yəqqədru yəʕṭu-lna ün bạz baš nəffəhmu lạ-Reạlite.")

Neither of these rather macaronic passages would be comprehensible to any monolingual speaker of Algerian Arabic; they're essentially parasitic on the speaker's knowledge of Standard Arabic or French. Granted, probably most Algerian Arabic speakers are not really monolingual; but even then, there is no guarantee that a speaker who understands one version will understand the other. If you really wanted to produce a consensus-friendly Algerian Arabic version, that a monolingual speaker would understand – then, basically, you need to completely rephrase the whole sentence to explain these notions in advance. And before I can do that, I need a clearer notion of what the writer means by things like "concrete aspects of human experience". My job has morphed into something that's not so much translation as totally rewriting, and frankly, for a sentence like this I'm not even willing to try it.

Now suppose you're dealing with a language none of whose speakers have ever studied academic philosophy, or for that matter gotten into high school. You can no longer expect to get away with the dodge of code-switching at appropriate moments. How much effort do you think it would take to translate this sentence, compared with the amount of effort it takes to translate it into French? What effect do you think this would have in practice on the cross-cultural transmission of such ideas?

That's one reason why having "no word for X" can matter. The absence of the word – or more precisely, of a fixed expression for it – impedes translation, and hence impedes the transmission of foreign ideas to monolingual speakers. And fixing the problem isn't just a matter of inventing or borrowing a word; to be able to do either, you need to have formulated the corresponding concept, and, in the case of abstract words like these, that presupposes putting a lot of speakers into an originally foreign system of education, with a lot of associated time and expense and all-round hassle.

(Chain of thought prompted by How would you say that in Derja?).

Saturday, May 06, 2006

Whorf meets warmongering

Pop Whorfianism (usually in forms that Whorf would have been the first to laugh at) is something I usually associate with a slightly hippy-ish multiculturalism. However, it seems to have a certain appeal to Islamophobes as well.

The thesis they find so appealing is summarized in one James Coffman's question: "Does the Arabic Language Encourage Radical Islam?". Apparently, he did a survey in 1988 in Algiers which confirmed a number of fairly obvious facts - notably, that the younger students that year, who were the first cohort of students whose secondary education had been mainly in Arabic, were more "Islamist" than their predecessors who had gone through a partly or wholly Francophone educational system. From this, he concluded that the Arabic language encouraged "radical Islam" - not, for example, that Arabic-literate students had much easier access to "Islamist" literature (and Islamic literature in general), or that the transition to Arabic had been accompanied by a vast expansion of the school system to cover more conservative rural areas, or that many of the imported Arabic teachers who helped tide Algeria over the transition period were Islamic Brotherhood members fleeing crackdowns in Egypt, or indeed (most importantly) that the collapse of the Algerian economy in the late 1980's was encouraging the growth of anti-government ideologies. It's an old, old saw, but one that apparently still bears repeating: correlation does not equal causation.

Mind you, like most people who cite the Sapir-Whorf hypothesis, he doesn't seem to have a very clear idea of its content. On my reading of Whorf, his core idea is (plausibly enough) that a language might make its speakers more conscious of some grammaticalized categories by forcing its speakers to mark them, or less conscious of them by not providing any simple way to describe them; it would thus render some ideas more intuitive than others. For this sort of deep influence to be plausible, the speaker has to do most of his/her thinking in the language in question. But both classical Arabic and French in Algeria are only ever used by most speakers in writing, or in highly formal contexts - scarcely the sort of situation Whorf had in mind...

(PS: It seems Language Log have also just done another post on "No word for X" fallacies. Another example of ham-handed anti-Arab efforts at Whorfian analysis is alluded to on Linguistic Life.)