Friday, May 22, 2015

Old Arabic in Greek letters, in 3rd/4th century Jordan

An article published this year (Al-Jallad and Al-Manaser 2015) reveals the oldest known fully vocalised Arabic inscription by far - written in Greek letters in northeastern Jordan, probably in the 3rd or 4th century AD. Here it is: New Epigraphica from Jordan I: a pre-Islamic Arabic inscription in Greek letters and a Greek inscription from north-eastern Jordan. The inscription's author describes himself as "al-'Idāmī" - probably to be interpreted as "the Edomite" - a nisba featuring the definite article al-, unique within Semitic to Arabic.

There are a fair number of Arabic names transcribed in Greek at this period in various sources, but this seems to be the only known attempt to write Arabic text in Greek letters until much later. Most contemporary Arabic inscriptions were instead written in the Safaitic script, which does not indicate vowels. A text like this thus enables us to see much more clearly how the Arabic of the nomads of 3rd/4th century Jordan was pronounced. It confirms two crucial points. In Arabic, case is usually indicated only by final vowel choice; in this inscription, accusative case (-a) is clearly marked, but the Classical nominative and genitive (-u, -i) are not transcribed, suggesting that this dialect had dropped final short high vowels and thus developed a case system like that of Geez. Also reminiscent of Geez is the fact that intervocalic semivowels elided in Classical Arabic were unambiguously pronounced - thus 'atawa rather than 'atā for "he came". There may well be more material like this out there in the deserts on the Syrian-Jordanian border; let's hope research on the Syrian side becomes possible again soon...

Incidentally, next week I'll be at Bucharest for AIDA - if you're there, come to my talk on Wednesday!

Sunday, May 10, 2015

How to remember numerals better

In all the debate around "Whorfian" effects of language on cognition, one relatively well-known case has received oddly little attention among linguists, despite being widely discussed by psychologists and popularised by Malcolm Gladwell: the effect of word length on short-term memory (Baddeley et al. 1975). Basically, all other things being equal, it's easier to remember a sequence of short words than a sequence of long words. This suggests that our short-term memory for words (what psychologists confusingly call phonological memory) has a capacity limited by length - specifically, the amount that can be pronounced in about 2 seconds (Schweickert & Boruff 1987). That should suggest, in particular, that numbers presented orally will be easier to remember in a language with short numerals than in one with long numerals. (Note that this affects, among other things, IQ test results, since IQ tests typically include tests of numeral recall.)

Psychologists followed up on this by attempting to test this hypothesis with a number of language pairs (for an overview, see Baddeley (1997). Disclaimer: I'm not a psycholinguist, and the following references are certainly not exhaustive). The best-tested and most consistent result concerns Chinese. Mandarin and Cantonese numerals take shorter to say than English ones, and a number of psychologists have accordingly confirmed that Chinese speakers can remember longer numerals than English speakers (Stigler, Lee, & Stevenson (1986), Hoosain & Salili (1987)), even at 4 years old Chen and Stevenson (1988)), and that this applies even when bilinguals are tested across their two languages (Hoosain 1979). It goes further than that, in fact: Chincotta & Underwood (1997) find that, out of Cantonese, English, Greek, Finnish, Swedish, and Spanish, only Cantonese speakers remember significantly more digits than speakers of other languages - and that this difference disappeared if the subjects were prevented from rehearsing the numbers auditorily by being asked to keep repeating "la-la" while being tested, proving its linguistic nature. The difference ranges around 2 digits, with the exact figure depending on the experiment.

Data for other languages is less clearcut. Welsh numerals take longer to say in isolation than English ones, and Ellis & Hennelly (1986) accordingly found that English-Welsh bilinguals can on average remember longer numerals in English than Welsh. Naveh-Benjamin & Ayres (1986) simultaneously tested the hypothesis for university students in Israel speaking English, Spanish, Arabic, and Hebrew natively (but excluding the digits "seven" and "zero"). They found that the average number of digits recalled was highest in English (7.21), followed by Hebrew (6.51), then Spanish (6.37), and lowest in Arabic (5.77); the ordering by average number of syllables per digit, or by average time taken to read a digit, was English, Spanish, Hebrew, Arabic. However, the difference in number of digits recalled was smaller than predicted by the time taken to read a digit in each language, suggesting that other factors were also relevant.

A proviso is necessary: some recent work, without disputing the differences observed, has made a strong case that they relate not simply to length ( Lovatt, Avons, & Masterson 2000), but crucially to phonological factors (Service 2010, Lethbridge, Hinton & Nimmo 2002). This has been argued for Welsh numerals vs. English ones by Murray & Jones (2002), who find that Welsh digits take longer to say in isolation but actually take less time to say in connnected speech than English ones, and that changes of place of articulation at word boundaries negatively affect memory.

The research is curiously selective in terms of languages examined, and many of the experiments don't control for all possible confounding factors, such as diglossia and social status in the case of Welsh or Arabic. Nevertheless, it does at least seem well-established that speaking Chinese gives a short-term digit memory advantage over speaking major European or Semitic languages. So, if for some reason you regularly need to remember long numerals, and your preferred language doesn't happen to be Chinese, how do you compensate for this handicap?

There are two obvious ways to get around this (assuming you care enough about remembering numerals to want to, which depends very much on your tastes and circumstances). One is to remember the number visually (as a sequence of written digits) or even kinesthetically (as a sequence of typing actions), in which case this particular constraint no longer applies (cf. eg Olsthoorn, Andriga, & Hulstijn 2012). This only helps, however, if you remember numerals better visually or kinesthetically than auditorily, and my impression is that most people don't.

A probably more helpful alternative is to establish a code that lets you turn long numerals into much shorter words by identifying digits with single letters or single phonemes. This solution has a very long history in Arabic and Hebrew, in which each letter of the alphabet can be used to represent a digit: 'a is 1, b is 2, etc. (the first 9 digits are units, the second 10 are tens, and the rest are hundreds). Since short vowels are not letters, the resulting word can be given whatever vowels the user sees fit to give it. A common game of later poets using the Arabic script was to encode the date of their poem within the poem as a chronogram; more practically, Moroccan schoolchildren used to memorise the multiplication tables as a series of meaningless words formed by this encoding (Meakin 1905). Chronograms have been formed using Roman numerals, but for memorisation, at least, they are rather ill-adapted to such a system - think how much padding would be required to turn a number like MDCCCLXXXIII into words.

However, the spread of Hebrew studies in Western Europe following the Renaissance, and the increasing importance of memorising statistics there, encouraged European mnemonists to look for ways of emulating this encoding without having to learn a Semitic language. Doing so at a time when place notation was widely used, they introduced a crucial improvement: each consonant represented a digit in a place notation system, rather than a number in an additive notation system. After various cumulative efforts at improvement, this culminated in the early 19th century with the so-called Major system: 0=s/z, 1=t/d, 2=n, 3=m, 4=r, 5=l, 6=š/ž/č/j, 7=k/g, 8=f/v, 9=p/b, with vowels, semivowels, and laryngeals ignored. To remember 94801 (LACITO's zip code), for example, one would turn it into "professed". This system apparently remains in use among professional mnemonists to this day, despite being virtually unknown to wider society.

Perhaps this is why linguists haven't paid more attention to the word-length effect in the context of the Whorfian debate: it's a clear-cut effect of language on cognition, but not a very profound one, in that it should be fixable by some very simple hacks (or even just by borrowing some one else's numerals). But I'm not aware of any experimental work testing the effect of this particular hack on digit recall...

Monday, May 04, 2015

Foucauld's Tuareg (Tamahaq) dictionary on Wikisource

A reader of this blog, Julian Jarosch, wrote in to announce a collaborative project that will very likely interest some other readers:
The purpose of writing to you is to ‘promote’ a project I started some years ago: digitizing Charles de Foucauld’s Dictionnaire touareg – français on Wikisource. You probably already know that Wikisource, like Wikipedia, is an open and collaborative project. So far, I’m working on this alone. I’ve transcribed 13% of the text, almost all of which is not yet proofread. Wikisource provides quality management tools, so each page is marked and colour-coded for proofreading status.
The digital text has some cross-references as links; more could be added once it is complete. All Berber words and phrases are marked as such in the html code. I’ve appended an ebook version of the digital text, generated automatically from the online version, to demonstrate just one derived usage. Deriving a print edition or an enriched structured XML version should be feasible as well. I also experimented with automatically ‘updating’ Foucauld’s mode of transcription, but this proved to be too complicated, due to the ambiguities in his use of 〈i〉 and 〈ou〉.
I hope you find this useful and solid work. In principle, I’d like to spread word about this project in Berber linguistics; I just hardly get around to it, since I pursue this in my spare time.

For anyone who wants to help with this project, the link is: Livre:Foucauld, Dictionnaire touareg.djvu.

Monday, April 20, 2015

Archaic and innovative Islamic prayer names around the Sahara, finally out

Just a quick alert: my article about Islamic prayer time names that I discussed here almost two years ago (post) is finally out! If your institution has a subscription, you can view it at the following link:
Archaic and innovative Islamic prayer names around the Sahara
Or you could email to ask me for a copy.

Saturday, April 18, 2015

Dreams and tales in Siwa and Ouargla

Valentina Schiattarella, who recently finished her PhD thesis on some aspects of Siwi grammar, has gathered the first serious collection of Siwi folk tales recorded in Siwi (forthcoming from Köppe some time soon). Like most languages, Siwi has opening and closing formulae to mark the beginning and end of a tale. The commonest closing formula in the stories she's recorded seems to be:
ħattuta ħattuta qaṣṣaṛ ʕṃəṛha, akəṃṃus n xer i ənšni, akəṃṃus n šaṛ i ntnən
Hattuta hattuta its span has shortened[Ar], bundle of good for us, bundle of bad for them
The first part of this is in Arabic, and is not too different from what you might hear elsewhere in Egypt: ħattuta ħattuta is a corruption of حدوتة ħadduta, Egyptian Arabic for "story". (For similar formulae in Palestinian tales, such as tūtū tūtū faraɣat il-ħaddūtu, see Sirhan 2014.) The second part is in Berber, and hence presumably has an older history within Siwi; it is precisely paralleled in an opening formula used at Ouargla (Algeria):
Ṛəbbi yəttamən f lxiṛ ụhụ f ššəṛṛ, lxiṛ nn-iw, ššəṛṛ nn-əs, ini yiwi-tən gaɛ
God believes(?) in good not in bad; the good for me, the bad for him, or may He take them both
Basset (1920:107) places this formula in a wider context; throughout the Berber world, opening or closing formulae commonly take the form of "propitiatory formulae or formulae for the expulsion of evil", which he takes to indicate that the act of storytelling must have been viewed as potentially dangerous. Alongside Ouargla, he cites Kabyle examples blessing the group and cursing the jackal, and Shilha ones wishing the teller the meat and the others the tripes. The Siwi formula, however, is far closer to the Ouargla one than to anything else Basset mentions. And whereas the Kabyle formula invokes an animal whose importance in Berber folklore and mythology is obvious, and the Shilha one remians close to everyday life, all the key words of the Ouargli and Siwi formulae are specifically Arabic and religious (Rabbī "my Lord", khayr "good", sharr "bad"). This suggests that, while the idea may be Berber, the formulation itself might be taken from Arabic.

As it happens, the early Islamic period furnishes us with just such a formula in Arabic, in a similar but curiously different context. The still widely used Interpretation of Dreams, attributed to Ibn Sirin, explains in its introduction that a dream interpreter who does not want to reveal his interpretation to his client should instead tell him the following: "May good be for you and bad be for your enemies; may you receive good and avoid bad" (خير لك وشر لأعدائك، خير تؤتاه وشر تتوقاه), or, if the interpretation concerns the interpreter too: "May the good be for us and the bad be for our enemies (etc.)" This expression is also found in an unmistakeably related context in some dubious hadiths reporting Umar ibn al-Khattab as saying "Learn to read the Qur'an in Arabic, and the interpretation of dreams, and say: May good (khayr) be for us and bad (sharr) for our enemy", and: "If one sees a vision and recounts it to one's brother, let him say: May good be for us and bad for our enemy".

The obvious interpretation is that, at some point in the early history of these Saharan oases, the act of telling tales was locally assimilated to the act of recounting dreams, allowing the Arabic formula for the latter to be adopted for the former. It would be interesting to know why this happened; was the idea that a tale, no less than a dream, somehow contained cryptic clues about the future? Or did Saharan Berbers in late antiquity make a habit of recounting dreams to one another on winter evenings, as well as folktales? Unfortunately, we'll probably never know for sure, but it can be interesting to speculate...

Saturday, April 04, 2015

Improving language?

In a natural segue from Ibn Khaldun, I've been reading Ernest Gellner - specifically, Words and Things, his attack on Linguistic Philosophy (that is, on Wittgenstein and his followers at Oxford). As he presents it, Linguistic Philosophy amounted to, essentially, the descriptive study of lexicography and semantics. Since meaning is defined by usage, any statement that would be accepted as true in ordinary language is ipso facto true, and any philosophical argument suggesting otherwise can only be the result of some semantic misunderstanding; a philosopher's only legitimate goal is to figure out how words are used in ordinary language to prevent such misunderstandings. The key weak point of this view, for Gellner, is its underlying assumption that ordinary language is unimprovable:
To "observe how we use words" is to make statements, in ordinary language, about the role, function, effects, and context of expressions. But in doing this, the concepts and presuppositions of that ordinary language are taken for granted and insinuated as the only possible view [...] It is true that certain things may be said in favour of ordinary language. It would not be in use, and it would not have survived were it not wholly without merit. But this argument, as in politics where it is often used to buttress conservatism, proves fairly little. Very silly and undesirable things often survive, and neither society nor language is such a tightly integrated whole as would disastrously suffer from alteration of some one part. (pp. 195-197)
For Gellner, contra Wittgenstein, ordinary language can be improved upon by the very activity of reflecting on it, leaving a positive role for philosophy after all:
[T]here are many language games which become unworkable when properly understood: where self-consciousness not merely does not "leave everything as it is" but simply necessitates change. Many "conceptual systems", in primitive societies and in advanced ones, contain confusions and absurdities which are essential for their functioning. To lay them bare is to make such a framework unworkable. (p. 206)
The notion of improving language (my paraphrase) would need a lot more working out than I see in this book, but presumably means something like "make the concepts and presuppositions underlying language use more internally coherent and in better accord with non-linguistic experience."

Such a standard would not necessarily imply that one language can be superior to another. For one thing, while such concepts and presuppositions certainly play a role in language use, they don't seem to be critical to the definition of a language; you can change them and leave the language sufficiently intact to be mostly understood by speakers who have retained the old ones. A single language has room for many different kinds of language use.

However, it would suggest a potentially interesting alternative to a purely descriptive approach to linguistics. If Linguistic Philosophy was the effort to identify ways in which attention to ordinary native speakers' usage might correct misunderstandings embedded in philosophical thought, would Philosophical Linguistics be the effort to identify ways in which attention to philosophical thought might correct misunderstandings embedded in ordinary native speakers' usage?

Saturday, March 14, 2015

Sapir-Whorf is no shortcut

Lately the Sapir-Whorf hypothesis - that the language you speak influences the way you think - has had a bit of a revival; investigators such as Boroditsky or Levinson have finally managed to demonstrate small Whorfian effects on colour perception and sense of direction. Unfortunately, these successes only underscore how difficult it would be to make a convincing case for the version of this idea that perennially fascinates the public: the idea that language determines aspects of our worldview. Well before Sapir or Whorf, Nietzsche summarises it in Beyond Good and Evil:
"The strange family resemblance of all Indian, Greek, and German philosophizing is explained easily enough. Where there is affinity of languages, it cannot fail, owing to the common philosophy of grammar - I mean, owing to the unconscious domination and guidance by similar grammatical functions - that everything is prepared at the outset for a similar developmnent and sequence of philosophical systems; just as the way seems barred against certain other possibilities of world-interpretation. It is highly probable that philosophers within the domain of the Ural-Altaic languages (where the concept of the subject is least developed) look otherwise "into the world", and will be found on paths of thought different from those of the Indo-Germanic peoples and the Muslims [...]" (Walter Kaufman's translation)
If a community's grammar really does affect its worldview, two centuries of speculation have hardly brought us any nearer to proving it, much less figuring out how. The commonsense converse, that a community's worldview affects its grammar, is rather better supported. But this idea's attraction for intellectuals, I think, is basically technological: it holds out the promise of being able to change the way people think "just" by changing the way they talk, as envisioned for Newspeak and Láadan. Ironically, it's observably true that imposing a new language on a previously monolingual community usually implies major changes in the way they think - that's what happens when you introduce compulsory schooling - but that has less to do with the language than with the institutions diffusing it.

The technological question remains, then: can we redesign some aspects of our language to help us think more effectively?

For grammar, the answer is not obvious. For the lexicon, however, the answer is yes, and we do it all the time. If something seems to need a name, we give it one - "mouse" or "selfie". Sometimes we choose a name that transparently encodes an property of this item that's particularly important to remember - "henbane" or "fool's gold". Ask any taxonomist whether the existence and form of a name matters, or any mathematician whether all notations are equal.

But this isn't actually the shortcut that some science fiction would have us believe. Many readers probably know that "henbane" is some kind of plant, but couldn't identify it if it was sitting in front of them, much less take advantage of knowing the name to prevent some unfortunate fowl's death. Understanding a given domain requires you to have words for the items signified by its technical vocabulary, but the most important part of that is learning to identify and think about the referents. Hundreds of New Age texts attest to the fact that you can use the vocabulary of quantum mechanics without understanding the first thing about it.

This points the way towards a solution, but not a very linguistic one: If you want to make your language better for thinking with, then first learn to perceive and think about the world more clearly yourself, and then share what you learn (and the labels you've given to it) with other interested speakers. Make a point of spotting and labelling relevant differences between things or situations, and involve yourself in a wider range of situations than you're used to. A sign is a link between word and world - between the set of all possible combinations of phonemes, meaningless in themselves, and the set of everything the speaker has some idea how to recognise. Expanding the former is meaningless unless you're expanding the latter.