Saturday, December 23, 2017

Tokenistic Tifinagh #fail

In Oran (Algeria) when I was there a few days ago, political party posters were everywhere, advertising the recent local elections. Oran is nowhere near any major Berber-speaking region (though it has attracted a significant Kabyle Berber minority), and such posters – along with a few telecom ads – were almost the only publicly visible mark of Berber on its linguistic landscape. Their bilingualism is a token gesture towards the government's pious aspiration to make Tamazight (Berber) a national language, emanating from the centre rather than from the regions where it's actually spoken.

Among these, the FLN posters in particular caught my attention. Right under the Arabic name of the party, they included a line in Tifinagh (the Berber “heritage” script) that I couldn’t make head or tail of: ⵔⴵⵏⵜⴷⴻ ⵉⴱⵔⴰⵜⵉⴵⵏ ⴰⵜⵉⴵⵏⴰⵍⴻ. Transcribed, this reads rğntde ibratiğn atiğnale – which makes no sense; it’s not even possible in Berber to have e (schwa) at the end of a word.

It wasn’t until I started looking at my pictures on the flight back that the penny dropped. Just substitute o for ğ, and you get rontde ibration ationale. Restoring the capital and accented letters (neither of which Tifinagh has), you get Front de Libération Nationale. When the order came from on high to add Tamazight to the poster, some supremely indifferent functionary in the local FLN office must have literally downloaded a Tifinagh keyboard, typed in the French name of the party, and stuck it on the poster.

Most likely, this functionary was an Arabic speaker. In fairness, though, plenty of Kabyle speakers would have little idea how to render “National Liberation Front” into Kabyle. The officially acceptable way of doing so relies on neologisms developed by activists and familiar mainly to other activists, despite the gradually expanding efforts of teachers and broadcasters – and such activists are especially unlikely to be members of the FLN, given its general reluctance to promote Tamazight. What everyone actually calls it in practice (in Kabyle and in Arabic alike) is “FLN”.

I didn’t notice any similarly clearcut fails on other parties’ posters – though some didn’t bother with Tamazight at all, and at least one, the PT, opted for Latin characters instead. I did see a similar case on a jewelry shop, though, which prominently advertises ⴰⵔⴳⴻⵏⵜ argent, next to a picture of a recognizably Kabyle earring:

It's striking that both cases are based on French, rather than Arabic - even though the normal Kabyle word for "silver" is actually an Arabic loan, lfeṭṭa from الفضة. For some, apparently, the only really important thing about Tamazight is that it's not Arabic...

Sunday, December 10, 2017

Jerusalem's suppletive gentilic

Jerusalem stands out among Arab cities today not only culturally and religiously, but morphologically as well. In Modern Standard Arabic, the city of Jerusalem is al-Quds القدس, and the gentilic suffix is (properly -iyy), but "Jerusalemite" is Maqdisī مقدسي rather than the expected *Qudsī (though the latter is attested as a personal name). As a general cross-linguistic rule of thumb, morphological irregularities are most likely with older, more basic words. Yet this type of irregularity is rather unusual, even among the region's oldest and most prominent cities: Dimashq (Damascus) yields Dimashqī (Damascene), Baghdād yields Baghdādī, Makkah (Mecca) yields Makkī... How did it arise?

It turns out that, in the early Muslim era, it was formed in a perfectly regular way. In his masterwork, the medieval geographer Al-Maqdisī (d. 991) calls his hometown Bayt al-Maqdis بيت المقدس ("house of holiness"), a title now largely supplanted by al-Quds ("the holy"). It survives to the present in certain religious contexts or as a poetic synonym, not only in Arabic but in Kabyle Berber as well: H. Genevois ("Croyances") notes a traditional popular belief that the souls of the dead gather in Bit Elmeqdes, corresponding exactly to Al-Maqdisī's boast that Jerusalem is "the site of the Day of Judgement, and from it is the Resurrection, and to it is the Gathering" (عرصة القيامة ومنها النشر وإليها الحشر).

A quick search of Alwaraq's heritage library suggests that the shorter name "al-Quds" became popular around the period of the Crusades, when Jerusalem was as much a subject of dispute as now. The earliest attestation I can spot on a cursory search (excluding a work falsely attributed to al-Wāqidī) is a mention by the Andalusi traveller Ibn Jubayr (1185), who notes that "between [Kerak] and al-Quds is a day's march or so, and it is the best location in Palestine" (بينه وبين القدس مسيرة يوم أو اشف قليلاً، وهو سرارة أرض فلسطين). Very likely a longer search would yield slightly older attestations. By the time of the next major Palestinian writer I notice in the collection - Al-Ṣafadī (d. 1363) - al-Quds had clearly become the unmarked term for the town; it recurs constantly in his work.

The name Bayt al-Maqdis was thus replaced in practice by the shorter and catchier name al-Quds a good 800 years ago, yet the corresponding gentilic continues to preserve the older name. Since 1967, the Israeli government has imposed a third name as its official term for the city in Arabic: Ūrshalīm, a transcription of the Syriac name used in Christian liturgical contexts which provoked "furious ridicule" from residents (Segev 2007:492). Since this usage remains entirely unknown to most Arabic speakers, it is unlikely to have much impact on Arabic usage. Yet the timing of the shift from Bayt al-Maqdis to al-Quds reminds us that political upheaval impacts placenames as well as people's lives.

Monday, December 04, 2017

Tifinagh and place of articulation

The order of the Latin alphabet we use is a matter of historical chance; if it ever made sense, the reasons behind it were lost millennia ago. Many other writing systems, however, have tried to order their letters in a less arbitrary fashion. The most prominent successes for this approach are found in and around India, where scripts are usually ordered by place of articulation - ie, by how far back in the mouth they are pronounced - as in Devanagari: a..., ka ga kha gha ŋa, ca cha ja jha ña, ṭa ṭha ḍa ḍha ṇa... (After a couple of sound changes, this order ultimately also yields that of the Japanese kana: a, ka, sa (< ca), ta na, ha (< pa) ma, ya ra wa n.) In Arabic, the normal order of letters reflects a partial reordering by shape rather than by sound (thus ب ت ث are all grouped together, whereas in the older order they were far apart from one another). However, for technical purposes such as traditional phonetics and Qur'an recitation, one occasionally also finds the place-of-articulation order: indeed, the earliest Arabic dictionary (Kitāb al-`Ayn) used it (ع ح هـ خ غ ق ك ج ش ض ص س ز ط ت د ظ ذ ث ر ل ن ف ب م و ي ا ء).

Tifinagh, the traditional script of the Tuareg people of the Sahara, seems not to have any established traditional ordering. However, if you organize its letters by place of articulation, an obvious pattern emerges:

This table represents Tifinagh as used at Imi-n-Taborăq in Mali, as recorded by Elghamis (2011:64-65). (Note that w is a labio-velar sound; for obvious reasons, I've chosen to place it in the velar column rather than the labial one. Also, the letter put in the laryngeal plosive slot actually just indicates the presence of a final vowel, although there are reasons to suspect that it once represented a glottal stop.) There is a lot of regional variation in Tifinagh, but one thing stands out: in every variety, everything on the right side of the thick line - ie, everything velar or further back - is consistently formed exclusively out of dots, except for g - and even that is often composed of a combination of dots and lines. Throughout much of Tuareg, original g tends to be palatalized to [ɟ], and some dialects - like this one - have lost the distinction altogether.

How this distribution emerged is unclear for the moment. It is noteworthy, however, that dot letters did not exist in Tifinagh's ancestor, Libyco-Berber as used in the pre-Roman and early Roman periods (with rare, doubtful exceptions). Two of the dot letters have clear Libyco-Berber origins; ⴾ (k, three dots in a triangle) was originally ⥤ (k, a rightwards open arrow), while : (w) was originally =. Based on these two alone, one might suppose a sort of regular form shift of = to :, in which case the development might simply be coincidental. ⵗ (ɣ) may derive from the rarely attested ÷, whose value (q?) is speculative, while ... (x) is simply a rotation of ɣ. :: (q) had no Libyco-Berber equivalent, and is perhaps historically a visual "ligature" of ɣ and + (t) - the word-final cluster *ɣt becomes qq in Tuareg. The final vowel sign · might derive from classical ☰, which had the same function; alternatively, one might derive it from or the dot occasionally used to separate words, and suppose that classical ☰ actually yielded ⵂ (h), in which case the extra dot needs to be explained.

It's not impossible that Tifinagh users at some stage made a conscious link between back consonants and dots. But even if the distribution is just a coincidence, it should still be useful for anyone seeking to memorise the script.

Sunday, October 29, 2017

Butterfly-collecting: the history of an insult

Chomsky's barb about butterfly-collecting has echoed in the ears of descriptive linguists for decades, and is sometimes blamed for the withering away of field linguistics over the late 20th century. The earliest published version I could track down via Google is:
"You can also collect butterflies and make many observations. If you like butterflies, that’s fine; but such work must not be confounded with research, which is concerned to discover explanatory principles of some depth and fails if it does not do so." (Chomsky 1979:57)
So I was surprised to find a similar statement attributed to the eminent early 20th century physicist Ernest Rutherford, quoted by Dyson (2006:179) as saying "Physics is the only real science; the rest are butterfly-collecting." How did this metaphor make its way into linguistics?

For a start, it appears that Dyson's version is somewhat inexact. The Rutherford quote appears to belong to the oral tradition of physics, rather than deriving from any publication of his; the earliest version that I can find on Google Books is from Baker (1942:96):

"These ideas are crystallized in the statement, attributed to Rutherford, that science consists of physics and stamp- collecting. This is an epigram intended to mean that particular objects are uninteresting : it is the extreme view-point of a general analytical scientist."
The shift from stamps to butterflies came decades later, first attested only in 1974. In fact, the derisive comparison to butterfly collecting seems likely to have seeped into linguistics not from physics but from, of all subjects, anthropology. Edmund Leach (1961:2) makes it the central metaphor of his assault of Radcliffe-Brown:
"Radcliffe-Brown maintained that the objective of social anthropology was the 'comparison of social structures'. [...] Comparison is a matter of butterfly collecting — of classification, of the arrangement of things according to their types and subtypes. The followers of Radcliffe-Brown are anthropological butterfly collectors and their approach to their data has certain consequences."
Anthropologists would reuse the metaphor in debates over the distinction between different types of comparison in linguistics itself, whether endorsing it like Lehman (1964:387) or rebutting the criticism like Sarana (1965:29). From there it seems to have been taken up by Chomskyan linguists as an argument against Bloomfield's "disovery procedures", if I am correctly interpreting the incomplete fragment of Ferber and Lynd (1971) that I can find on Google Books:
"These procedures, which are largely a matter of classification, have been uncharitably called "butterfly-collecting" in the manner of pre-Darwinian biology: they account for a detailed "external" description of each language (what Chomsky [...]"
Geoffrey Leech (1969:4) deploys the same metaphor against rhetoric:
"Connected to this is a second weakness of traditional rhetoric - what I am tempted to call its 'train-spotting' or 'butterfly-collecting' attitude to style. This is the frame of mind in which the identification, classification and labelling of specimens of given stylistic devices becomes an end in itself [...]"
The redeployment of this argument to belittle descriptive work in general, rather than particular approaches, seems to be attributable to David DeCamp (1971:158), criticizing sociolinguistics from a Chomskyan perspective:
"The weakest theory is a 'functional' model, which only relates outputs from the black box to inputs, e. g. a grammar which would generate all and only the sentences of a language; the goal of much scientific research is to replace such a functional model with a 'structural' model, one that makes the stronger claim of describing what is actually in the black box. Mendel's 'genes' were only a functional model of genetics; the research on the DNA and RNA molecules has yielded a model that is much more nearly structural. Thus one branch of biology has at last become a true science; general linguistics is approaching that status; sociolinguistics is still in the pre-theoretical, butterfly-collecting stage, with no theory of its own and uncertain whether it has any place in general linguistic theory."
He then clarifies (ibid:170) that:
"'Butterfly collecting' is simply the collection of a whole lot of information toward the day when somebody can produce a formal theory. Now this is valuable, this is useful. We need a lot of empirical data collection also. I certainly would not want to imply by this that in this I'm saying that there is not an importance to the kinds of things that the Urban Language Survey is doing at CAL, or Bill Labov's work in New York. This is immensely important. What I am saying is that although it is necessary, it is not sufficient. We've got enough data now; it is about time to guide further research by means of some sort of a theory."
So, if we have to blame one person for reducing descriptive linguistics to butterfly collecting, it looks like it would be David DeCamp, at least until someone tracks down an earlier citation. But that misses a broader point: the disparaging comparison of data gathering to butterfly collecting seems to have become rather pervasive across a variety of disciplines in the late 20th century - including biology itself, which may well be part of where DeCamp got it from. All the way back in 1964, Theodosius Dobzhansky - who had been an ardent butterfly collector before becoming a prominent evolutionary biologist - comments sarcastically that:
"The notion has gained some currency that the only worthwhile biology is molecular biology. All else is "bird watching" or "butterfly collecting." Bird watching and butterfly collecting are occupations manifestly unworthy of serious scientists!" (Dobzhansky 1964:443)
Had he lived to see molecular biology turn to such quintessentially descriptive, list-making pursuits as the Human Genome Project, he would surely have enjoyed having the last laugh.

(If you have any earlier citations bearing on the history of this metaphor in linguistics, please tell me below!)

Tuesday, October 24, 2017

Siwi on Wikipedia

I am not a big fan of Wikipedia, despite its usefulness. To contribute good material to it - and there is a lot of wonderful material there - is to make an article look reassuringly reliable. That appearance of reliability then makes the article prime prey for anybody with an ideological or even commercial agenda to push: one little edit, and their propaganda is integrated into the same text, gaining credibility from its context, and getting copied over and over and over. Nevertheless, the insistent niggling itch of knowing that "someone is wrong on the internet" eventually got to me, and last month I ended up massively expanding the article Siwi language - including a fairly extensive section on Siwi oral literature. Suggestions or comments are welcome, although I make no promises.

Thursday, October 12, 2017

Shoes in Songhay and West Chadic: towards an etymology

The proto-Songhay word for "(pair of) shoes, sandals" is *tàgmú (Zarma tà:mú, Kandi tà:mú, Gao taam-i, Hombori tà:mí, Kikara tă:m, Djenne taam, Tadaksahak taɣmú, Korandje tsaɣmmu). It is evidently related to a less widely attested verb *tàgmá "step on" (Zarma tà:mú, Gao taama, Hombori tà:mà, Djenne taam). (Velar stop codas are lost in all of Songhay except the Northern branch, leaving behind either compensatory lengthening or a w; see Souag 2012.)

In Hausa, the word for "shoe, boot, sandal" is tà:kàlmí: (borrowed directly into the Songhay (Dendi) variety of Djougou as tàkăm). Within Hausa, this likewise corresponds to a verb tá:kà: "step on". The two-way similarity is striking, but if there was borrowing, which way did it go? A cognate set in Schuh (2008) casts some light on the question.

Hausa belongs to the West Chadic family, in which the best comparison to Hausa "shoe" seems to be Bole tàkà(:), with no obvious cognates within its own subgroup, Bole-Tangale (Ngamo tà:hò looks similar, but Ngamo h seems normally to correspond to Bole p, not k.) For "step on", however, Schuh points to a potential cognate set in a slightly more distantly related West Chadic subgroup, Bade. In this subgroup, we have Gashua Bade tà:gɗú, Western Bade tàgɗú, Ngizim tàkɗú which Schuh analyses as *tàk- plus an unproductive verbal extension -ɗu supported by Bade-internal evidence, eg tə̀nkùku "press" vs. tə̀nkwàkùɗu "massage". Within Bole-Tangale, one might speculate that Gera tàndə̀- is cognate, but Gera seems to be known only from short wordlists, so that would be difficult to show.

So the comparative evidence provides some support for the idea that Hausa tá:kà: "step on" goes back to proto-West Chadic. If tà:kàlmí: "shoe" could be regularly derived from this verb within Chadic, then the answer would appear clear: Songhay borrowed it from Chadic. However, while Hausa frequently forms deverbal nouns with a suffix -i: (Newman (2000:157), there seems to be no plausible language-internal explanation for the -lm-. In Songhay, on the other hand, a suffix -mi forming nouns from verbs (sometimes -m-ey with a former plural suffix stuck on) is reasonably well-attested: Gao (Heath 1999:97) dey "buy" vs. dey-mi "purchase (n.)", key "weave" vs. key-mi "weaving", Kikara (Heath 2005:97-98) kà:rù "go up" vs. kàr-mɛ̂y "going up", húná "live" vs. hùnà-mɛ̀y "long life". A shift *-mi to *-mu seems natural enough, especially since a few Songhay varieties actually have reflexes of "shoe" with a final -i in any case; so the Songhay form looks kind of like it could be **tàg "step on" plus deverbal -mí̀. To top it off, deverbal noun-forming suffixes in -r- are widely attested in Songhay, and Zarma attests a combined suffix -àr-mì: zànjì "break" vs. zànjàrmì "shard", bágú "break" vs. bàgàrmì "piece of debris" (Tersis 1981:244). If we treat the Hausa form as a borrowing from Songhay, we can then analyse it as **tàg "step on" plus deverbal -àr-mí. But before we get carried away, we should note that within Songhay there's no motivation for analysing the -mu / -mi in "shoe" as a suffix; the verb and the noun differ (if at all) only in the final vowel.

So what to make of all this? So far, the scenario that suggests itself is something like the following:

  1. Songhay borrows a verb *tàk "step on" from West Chadic (or vice versa?).
  2. Songhay internally forms a deverbal noun *tàk-mí "shoe" (there is no reconstructible contrast between *k and *g in coda position in proto-Songhay), alongside a variant *tàk-àr-mí.
  3. Hausa borrows this as tà:kàlmí:.
  4. Songhay replaces *tàk with a denominal verb formed from "shoe" (which becomes internally unanalysable): *tàgm-á. This step has possible internal motivations: in most of Songhay, final velar stops disappeared leaving behind only compensatory lengthening on the preceding vowel, and the resulting form tà: would have been homophonous with the much commoner verb "receive, take".
  5. Djougou Dendi, a heavily Hausa-influenced, somewhat creolized Songhay variety spoken in Benin, borrows the Hausa form as tàkăm.

Further Chadic comparative data may yet turn out to bear upon this etymology, but one thing seems clear: these two families have been affecting each other for a long time.

Friday, September 15, 2017

Berber and not so Berber words in Tunisian Arabic

Not too long ago I finished reading Lotfi Sayahi's Diglossia and Language Contact: Language Variation and Change in North Africa. The book is a valuable contribution to the study of synchronic language contact between Tunisian Arabic, Standard Arabic, and French in Tunisia, with some coverage of the rest of the region as well. Unfortunately, when it briefly looks at Berber lexical influence on Arabic (pp. 135, 187), reflecting joint work with Zouhir Gabsi, its conclusions are rather over-hasty. Since this book is likely to become a standard point of departure for English speakers studying language contact in North Africa, I think it's worth correcting the record here even at the risk of being pedantic:
  • fakru:n "turtle" and ferzazzu "wasp" really are Berber, though the -u:n suffix in the former was first added in dialectal Arabic (almost all Berber varieties have forms similar to Kabyle ifker/ikfer).
  • garžu:ma "throat" is a very difficult word to etymologize, but may ultimately be Berber (compare Tuareg a-gurzăy), although it does bring to mind Romance forms such as French gorge.
  • karmu:s "fig" is clearly derived from karm-a "fig tree", which is definitely not Berber, and seems to come from a narrowing of the meaning of Classical Arabic كرم karm "orchard" (see the brief discussion in Behnstedt & Woidich 2011:491). The suffix -u:s might theoretically be Berber, I suppose, but probably not; it's not widely attested across Berber, and it fits well with the widespread dialectal Arabic pattern of augmentatives in -u:-.
  • sebsi: "pipe" is from Turkish sipsi.
  • bu-telli:s "monster/nightmare" ("sleep paralysis", to be precise) is a compound involving bu- "possessor of" (originally "father of") plus telli:s (a kind of rug). The latter is well-attested within Arabic in the Middle East as well as in North Africa; its etymology is controversial, but it may derive from Latin trilicium "triple-twilled fabric".
  • ḍabbu:ṭ "axilla" (ie "armpit") is evidently an expressive formation from Arabic إبط 'ibṭ. The widespread Berber word for this is rather taddeɣt (from which we get Maghrebi Arabic dəɣdəɣ "tickle").
  • dagdag "to shatter" is a reduplicated form from Arabic دقّ daqqa "pulverize".

I don't have the time to check the rest of the reduplicated verbs he cites (tartar "to mutter", dardar "to muddy", maxmax "to nibble", maṣmaṣ "to rinse", sɛksɛk "to flow", tɛftɛf "to graze", and wɛdwɛd "to talk nonsense"), but maxmax and maṣmaṣ include phonemes with no regular proto-Berber sources, and I doubt any of them is really Berber in origin.

I don't mean to pick on the authors; notwithstanding this brief lapse, it's a good book, and worth reading. But I do want to hammer home to every linguist the message that etymology needs to be done properly. If you want to do etymology in a North African dialect, don't just assume that any word you don't recognize from Modern Standard Arabic or French is a Berber loanword; check other regional languages (especially Turkish), check existing publications on the subject, check the distribution of the word across different Berber and Arabic varieties. Etymology may not be a very trendy subject, but that doesn't mean it's easy.

Monday, August 28, 2017

Street math and diglossia

In "Mathematics in the streets and in schools" (Carraher et al. 1985), child street vendors were given a paper and pencil and asked to calculate multiplications that they had, in fact, already done in their heads in the course of selling their wares. The results were often sobering, as in the following case:
Informal test
Customer: OK, I'll take three coconuts (at the price of Cr$ 40.00 each). How much is that?
Child: (Without gestures, calculates out loud) 40, 80, 120.

Formal test
Child solves the item 40 x 3 and obtains 70. She then explains the procedure 'Lower the zero; 4 and 3 is 7'.

As you can see, the children were perfectly capable of doing (some!) multiplication their own way, but when faced with school-style problems, this ability frequently deserted them. Confronted with a piece of paper, they attempted to apply the algorithm they had learned at school, without so much as checking their answers against the algorithm they had mastered as part of their daily life. In daily life, conversely, they presumably weren't getting much out of the multiplication algorithm they had learnt at school, even though it would let them tackle a much wider range of multiplication problems. School-learning that stays at school, and never affects real life despite having an obvious potential to be useful there: it's an educator's nightmare.

What this immediately reminded me of is diglossia. In a schoolroom or an essay, you obediently attempt to use Standard Arabic, and all the grammatical rules and vocabulary you learned for it. Almost anywhere else, you carefully avoid it, even while claiming to accept that Standard Arabic is correct and that what you actually make very sure to speak is wrong. To me, that seems to send a fundamentally problematic message: that what you learn in school is not supposed to be useful outside of some limited institutional contexts. I hope that's not the message most people get from it, but it would be great to know for sure. I don't suppose anyone knows of a study addressing the question?

Thursday, August 24, 2017

*-min-: an Algonquian morpheme that went global

American English was born in the clearing of the eastern woodlands, where British settlers encountered native Americans mostly speaking Algonquian languages. The same is true, mutatis mutandis, of Canadian French. If either language can be said to have a native American substratum at all, it's Algonquian. This substratum is hardly conspicuous, manifesting itself almost exclusively in loanwords. If the Algonquian languages had vanished without record, as most of the pre-Indo-European languages of Europe did, could anything at all be said about their morphology on the basis of this influence?

It turns out that there's at least one bound morpheme that shows up in quite a few loanwords: *-min- "berry, fruit". But it manifests itself more clearly in French than in English, where it has been obscured by a number of irregular developments.

Today, French barely survives in the upper Midwest; but before Jefferson's purchase of the Louisiana Territory, France claimed the whole of this vast area, and attempted to back up its ambitions with a handful of missionaries and settlers. There, up among the Illinois near Peoria, French speakers encountered two quite unfamiliar fruits, and adopted their names from the Myaamia-Illinois language:

English missed the chance to borrow a local term for the pawpaw - the English word derives from papaya, a fruit originating much further south - but adopted a reflex of the same word for "persimmon", along with several other terms containing this. Unfortunately, most are fairly obscure (although no more so than "asimine"), and no two show the same form of the morpheme:
  • persimmon; cf. Virginia Algonquian putchamins (Smith), pushenims (Strachey), apparently reconstructed by Siebert as pessi:min (cf. Skeat 1908; although that looks rather implausible given the Illinois form).
  • hominy (because it's made from corn); cf. Virginia Algonquian ustatahamen (Smith), vshvccohomen (Strachey) and other forms.
  • chinquapin (a kind of chestnut); cf. Virginia Algonquian chechinquamins (Smith), checinqwamins (Strachey).
  • saskatoon (a berry); cf. Cree misâskwatômin ᒥᓵᐢᑲᐧᑑᒥᐣ.
  • pembina (a kind of cranberry); cf. Cree nîpiniminân ᓃᐱᓂᒥᓈᐣ.
The prospects are not that encouraging, but combining the English and French evidence, an alert etymologist just might be able to spot the *-min- morpheme, and hence guess that Algonquian had head-final compounds. Thankfully, in North America, such hyper-speculative substrate chasing is hardly necessary; Algonquian is a fairly well-documented family. In other parts of the world, though, such approaches may occasionally prove effective.

Tuesday, August 22, 2017

What's wrong with the obvious analysis of waš bih واش بيه?

In the Algerian Arabic dialect I grew up speaking, "what's wrong with him?" is waš bi-h? واش بيه. (Further west, in Oran and in Morocco, it's the more classical sounding ma-leh? ما له.) When the object is a pronoun, as it usually is, waš bi-h? can readily be understood as waš "what?" and bi-, the form of "with" (otherwise b) used before pronominal suffixes (in this case, -h "him"). But substitute a noun, and this historically correct interpretation becomes synchronically untenable: we say waš bi jedd-ek? "what's wrong with you (lit. your grandfather)?" واش بي جدّك, whereas "with your grandfather" would be b-jedd-ek بجدّك. Nor can we cleft it with the relative/focus marker lli اللي: *waš lli bi jedd-ek? (*"what is it that's wrong with you?") is totally ungrammatical, while *waš lli b-jedd-ek? does not have the appropriate meaning (in fact, out of context, it makes no sense at all). This tells us that, whatever its origins, waš bi- can no longer be analysed as "what?" plus a preposition "with"; it has to be treated as a morphosyntactic unit in its own right. In particular, this bi- cannot be used to form an adverbial - it only forms a predicate - so it can hardly be treated as a preposition. Nevertheless, it continues to take the prepositional pronominal suffixes: "what's wrong with me?" is waš bi-yya? واش بيَّ, not *waš bi-ni.

The independent unity of waš bi-? becomes a lot clearer when the construction is borrowed into another language, as has happened in the Berber variety of Tamezret in southern Tunisia. The stories recorded there by Hans Stumme shortly before 1900 are a bit hard to read, but provide probably the single most extensive published corpus of material in Tunisian Berber. These texts furnish many examples of aš bi-, although Tamezret Berber neither has to mean "what?" (that would be matta) nor bi- to mean "with" (that would be s). Many of these look just like Arabic: aš bi-k "what's wrong with you? (m.)" (p. 14, l. 11); aš bi-kum "what's wrong with you (pl.)?" (p. 27, l. 26), aš bi-h "what's wrong with him?" (p. 14, l. 3); and even, with a noun, aš bi iryazen "what's wrong with men?" (p. 41, l. 5). But the similarity is somewhat deceptive; in some cases, this construction takes Berber rather than Arabic pronominal suffixes, as illustrated by aš bi-ṯ "what's wrong with her?" (p. 25, l. 21) instead of Arabic aš bi-ha, aš bi-m "what's wrong with you (f.)?" (p. 10, l. 5). Unfortunately, the texts do not provide a complete paradigm - further documentation is needed! But judging by the available data, all cells but match well with the Berber paradigm:

Algerian ArabicTamezretTamezret, direct objectsTamezret, objects of prepositionsš bi-kaš bi-k-ak-kš bi-kaš bi-m-am-mš bi-kumaš bi-kum-akum / -awem-kumš bi-haš bi-h-ṯ-sš bi-haaš bi-ṯ-ṯ-s

The and suffixes are quasi-identical between Tamezret Berber and Arabic, facilitating the borrowing; for the second person, neither language clearly distinguishes direct object forms from objects of prepositions. The third person, however, distinguishes the two in Berber but not in Arabic, and suggests that the object in this construction is treated as a direct object, not as the object of a preposition, contrary to the situation seen for Arabic. This fits Berber-internal patterns; throughout Berber, nonverbal predicators (Aikhenvald's "semi-verbs") typically take the direct object pronominal paradigm, and assign absolutive case to their arguments. The perfect agreement of the most frequently used cells in this paradigm between Arabic and Berber surely facilitated the borrowing of this item, but within Berber the paradigm got rebuilt on a largely Berber basis. In morphology, etymology is not destiny!

Saturday, July 22, 2017

Can slur avoidance be taken too far?

I was rather flabberghasted to read an otherwise good post on Language Log seriously suggesting that racial slurs are so painful they should be coyly asterisked out even in careful lexicographical explanations of why they should not be used. I do not pretend to any expertise on the impact of the specific slur in question there - I'd prefer to hear more black linguists' comments on that - but much of the argument they make is general, not specific:
If you take the standard linguistic analysis of slurs, though, the word’s power does not come from mere taboo [...] The word literally has as part of its semantic content an expression of racial hate, and its history has made that content unavoidably salient. It is that content, and that history, that gives this word (and other slurs) its power over and above other taboo expressions. It is for this reason that the word is literally unutterable for many people, and why we (who are white [...]) avoid it here.

Yes, even here on Language Log. There seems to be an unfortunate attitude — even among those whose views on slurs are otherwise similar to our own — that we as linguists are somehow exceptions to the facts surrounding slurs discussed in this post. In Geoffrey Nunberg’s otherwise commendable post on July 13, for example, he continues to mention the slur (quite abundantly), despite acknowledging the hurt it can cause. We think this is a mistake. We are not special; our community includes members of oppressed groups (though not nearly enough of them), and the rest of us ought to respect and show courtesy to them.

Anglo culture has a long tradition of scrupulously avoiding certain words in order to respect and show courtesy towards, in particular, women and children - people who were thought of as weaker and more emotional than adult men, and in need of their protection. Politeness is great, but if you treat people like they're made of glass, you're not only patronizing them, you're excluding them - you're implying that there are some discussions they just can't handle. (The term "white knight" comes to mind.)

This is ironic in general - people who have made it through serious oppression tend to be pretty tough, though everyone has their vulnerabilities. It's doubly ironic within an academic context, in that a core academic skill is the ability to confront and (if necessary) rebut personally threatening arguments without getting carried away by one's immediate reactions. In order to master North African historical linguistics, I've had to read works by colonial generals and OAS terrorists who fought and killed to subjugate my ancestors, and whose attitudes often colour their work; most people working on marginalized languages will have had similar experiences. If I can deal with that, do you really expect me to be incapacitated by some professor's cautious mention of, say, the word "raghead"? Words certainly can hurt, but slurs have enough power as they stand without adding the power of absolute taboo on top.

Wednesday, June 14, 2017

Sticks and stones and value inversion

In the Western world over the past few years, freedom of speech seems to be becoming a matter not just of human rights but of cultural identity. While many threats to this principle are routinely ignored, some are singled out for a great deal of attention. In particular, legions of columnists stand firm against the efforts of ungrateful foreigners and degenerate youths – suicide bombers and special snowflakes – to undermine our liberal traditions. Such whiners, apparently, have forgotten one of the first proverbs an Anglo child learns:
Sticks and stones may break my bones, but words can never hurt me.
I am not aware of any close equivalent of this saying among the other cultures I know best; in that sense, it can indeed be seen as reflecting a distinctive characteristic of Anglo culture, if not necessarily Western culture. However, this saying is also much more recent than you might expect; its first appearance in print seems to be in mid-19th century America. This timing coincides well with the rise of classical liberalism, and its form seems to be a deliberate inversion of earlier proverbs, reversing the original meaning. Medieval Englishmen used to say precisely the opposite:
Malicious tongues, though they have no bones,
Are sharper than swords, sturdier than stones. (Skelton, Against Venemous Tongues, ed. Dyce, i. 134)
Tongue breaketh bone, all if the tongue himself have none. (Wyclif, Works, ed. Arnold, ii. 44)
Rhyming proverbs to the same effect can be found all over northern Africa, in Algerian Arabic (of Oran):
əḷḷahumma ḍəṛba bdəmmha wala kəlma bsəmmha.
اللهم ضربة بدمها ولا كلمة بسمها.
O God, better a blow drawing blood than a word dripping poison.
or Kabyle Berber:
Ljerḥ yeqqaz iḥellu, yir awal yeqqaz irennu.
A wound digs deep and heals, a bad word digs deep and keeps digging.
or even Zarma (Songhay), down in Niger:
Yaaji me ga daray, amma sanni futo me si daray.
A lance’s edge goes away, but a bad word’s edge doesn’t go away.
Both contrasting sets of proverbs are, of course, gross exaggerations, false if taken literally. Words certainly can hurt, and wounds can certainly hurt worse than words; no one in any culture is likely to deny either fact. What they represent in each case is a cultural consensus – robust, but subject to change – on how seriously to take the hurt that words can cause, and by implication on how sharp a response is justified.

The most compelling by far of the classical liberal arguments for freedom of speech is that it deepens our understanding of the truth. An opinion left unchallenged starts to seem like intuitive common sense; it becomes something people adhere to out of habit rather than out of conviction. Freedom of speech, ironically, is a case in point. Ideally, we are exposed to the arguments for its value at some point, in university if not in high school. But long before that, we’ve already had a weak version of it inculcated by elements of everyday life, like “Sticks and stones...” Such an early exposure makes it seem like universal common sense, like something that should be instinctively obvious to everyone. It’s not; even Englishmen assumed the opposite not too long ago. If you want everyone to believe it, you have to be able to make a good argument for it – and to do that effectively, you need to understand something of where they’re coming from.

How does this compare with cultures you've lived? Are you familiar with any other proverbs on the relative harmfulness of words and weapons?


Sunday, May 21, 2017

Latin-speaking Muslims in medieval Africa

In the Middle Ages as today, Christians and Jews regularly called God "Allah" when speaking Arabic, just as Muslims did . It is perhaps not as well known that the converse was often also true: from a very early period, North African Muslims called God "Deus" when speaking Latin. This can clearly be seen on the 8th century Umayyad coins of Tunisia and Spain, which include statements such as:
  • Non deus nisi Deus solus - There is no god but God alone (لا إله إلا الله)
  • Deus magnus omnium creator - God is great, the creator of all things (الله أكبر خالق كل شيء)

I had always assumed it more or less stopped there, as Latin-speaking Muslims shifted to Arabic. But in the towns of southern Tunisia, the former Bilad ul-Jarid, Latin was still being spoken well into the 12th century. In his recent book La langue berbère au Maghreb médiéval (p. 313), Mohamed Meouak uncovers a short recorded example of spoken African Latin from between these two periods, which otherwise seems to have escaped notice so far.

The 11th-century Ibadi history of Abu Zakariyya al-Warjlani, he gives a brief biography of the Rustamid governor Abu Ubayda Abd al-Hamid al-Jannawni (d. 826), who lived in the Nafusa Mountains of northwestern Libya. Before assuming his position, this future governor swore an oath:

Bi-llaahi (by God) in Arabic, and bar diyuu in town-language (بالحضرية), and abiikyush in Berber, I shall entrust the Muslims' affairs only to a person who says: "I am only a weak being, I am only a weak being."
In al-Shammakhi's later retelling, the languages are named as Arabic, Ajami, and Berber (بلغة العرب وبلغة العجم وبلغة البربر). As Mohamed Meouak correctly though hesitantly notes, diyuu must be Deo; he leaves bar uninterpreted, but it is equally clearly Latin per, making the expression an exact translation of Arabic bi-llaahi. The Berber form is probably somewhat miscopied, but seems to include the medieval Berber word for God, Yuc / Yakuc.

The earliest Romance text is the Old French part of the Oaths of Strasbourg, made in 842 and opening Pro Deo amur... "for the love of God". The Ibadi phrase recorded above curiously echoes this, although it predates it by several decades.

Saturday, May 13, 2017


In English, "re-" is a moderately productive derivational prefix - reboot, remake, redo... In French, though, it seems more like an incorporated adverb - it's practically the main way you say "again": remanger (eat again), repleuvoir (rain again), redire (say again) are all perfectly normal. It's even possible to say ravoir (have again), although it seems to be less and less frequent.

Now a number of states are expressed in French with the verb avoir "to have" plus a bare noun: avoir faim "to be hungry", avoir peur "to be afraid", avoir besoin "to need" etc. Given the preceding remarks, you would naturally assume that "need again" should be ravoir besoin - and, indeed, it is possible to find this expression at least in 19th century texts, eg:

Rentré dans le journalisme, cet esprit capable, mais aride et paresseux va ravoir besoin de moi. (1856)

It appears to be very little used in the 20th century, though. Instead we hear avoir rebesoin: j'ai rebesoin de ça, I need this again. The only Italian I asked said this is quite impossible in Italian, but even there ho ribisogno gets a few dozen hits on Google (though for all I know they're all second language speakers.)

The fact that besoin appears bare, with no article, already makes it unusual among nouns. The ability to take the prefix re- makes it stand out even more: you certainly can't say *revoiture (car again) or *repain (*bread again). So maybe it's not a noun any more? It certainly looks like it's become kind of verby; but what can we label it? In an Australian context, the uninflected element of a complex verb would be called a preverb, but apart from suggesting the wrong order of elements, this term has way too many different meanings depending on which part of the world you're in. Perhas, as in Japanese, we could call besoin a verbal noun - although that, too, is all too potentially ambiguous. Any better terminological suggestions are welcome.

Wednesday, May 03, 2017

Translating the comedy of diglossia

Even in English, you can sometimes get a laugh by inappropriately mixing high and low registers - gangster slang in blank verse*, or discussions of medieval agriculture in Cockney. In a diglossic language such as Arabic, this trick is both easier and more effective. An excellent example is provided by Message to the Parliamentarians, a recent political satire by Algerian YouTuber Anes Tina. Apart from its primary themes - the offensive meaninglessness of Algerian elections and the hopelessness of abstention - this video is a spectacular send-up of the bombastic period dramas that occupy such a significant role in Arab TV schedules. In such shows, often set in the pre-Islamic period, the characters speak intimidatingly classical Arabic, case endings and all, as a matter of course. (This is, incidentally, somewhat anachronistic: no attempt is ever made to reproduce even the substantial inter-tribal dialectal variation that early Arabic grammarians explicitly tell us about, much less the substandard non-Bedouin varieties they preferred to ignore.) In this video, the characters speak accordingly - but with carefully planted intrusions from the world of everyday speech. Consider the opening scene:
lam yabqaa lanaa 'illaa Hallun waHid.
wamaa huwa lHall?
falnaktub irrisaalah.
wayHak! ma lladhii taf3aluh?
uktub: wilaayatu banuu qaynuqaa3, firraabi3i min shubaaTi l'awwal. risaalatun min ibnu taynah, annaaTiqu rrasmiyy walmukallifu l'i3laamiyy liqabiilati shsha3b, 'ilaa lfaasiq alfaajir almunaafiqi lla3iin addaa3ir alxabiithu ssaaqiTu lmaariq azzindiiq quzaaHah 'amiiru qabiilati lxarlamaaniyyiin. ammaa ba3d. la3natu l'aalihati 3alaykum. la3natu l3uzzaa wa hubal 3alaa Hamlatikumu l'intikhaabiyya. waHaqqi 'aalihati lwaay waay, waHaqqi 'aalihati shshiita, naHnu lan nuHallibakum fil'intikhaabaat. lan nashtarii sila3akum, walan natazawwaja minkum, walla3natu 3alaykum 'ilaa yawmi ddiin.
hal bu3itha lmiisaaJ? hal hum 'on liin?
Sabran ya bna taynah, fa'inna la koneksyoona thaqiila.
tabban littiSaalaati quraysh. faltuxbirnii idhaa xarajati lvüü firrisaalah.
How on earth are we to translate this? The "letter" itself is not so hard - the inflated rhetoric is easy to render into olde English, and the occasional dialectal intrusions (bolded) correspond pretty well to English slang, producing a roughly similar effect. The allusions to pre-Islamic religion and early Islamic history are unlikely to make much sense to most English speakers, but corresponding names with appropriate resonances can be substituted without much damage; thus:

Only one solution remains before us.
What, then, is the solution?
Let us... write the letter.
Perdition! What are you doing?
Write! Province of Idumaea, on the 4th of Zivim. A letter from Taenaus, the official spokesman and media officer of the tribe of The People, to the evildoer [cymbals!], the sinner [!], the accursed hypocrite [!], the debauched [!], the malignant degraded renegade [!], the miscreant Cuzahah, prince of the tribe of the Charlamentarians. May the gods' curses be upon you. May the curses of Ashtoreth and Moloch be upon your electoral campaign. By the gods of canned applause, and the gods of brown-nosing, we shall not suck up to you in the elections. We shall not buy your goods, nor shall we marry from among you. And curses be upon you until the Day of Judgement.
But what can an English speaker possibly do to reproduce the comic effect of the dialogue that follows it?
Has the message been sent? Are they online?
Patience, O Taenaus, for the connection is slow.
Damnation unto Quraysh Telecom. Inform me when the message gets a view.
All the bolded words are from French except "slow"; but it would be a mistake to treat them as switches into French. Each of them is the normal, well-established way to refer to its referent in spoken Algerian Arabic. In daily conversations, the corresponding Standard Arabic synonyms (if known at all) would be used only by an insufferable pedant, or - more likely - as a joke. Conversely, in a school composition - almost the only context where the average Algerian child is expected to actually produce Standard Arabic - such terms would be strictly banned. No dialect of English that I know of has non-standard words for telecommunication technology (if it comes to that, I can't think of one offhand that has its own word for "slow" either.) The problem rears its head again soon after, as the protagonist attempts to buy a mobile phone in the marketplace. Suggestions are welcome, but it looks to me like this is one gag that simply can't be translated into English. Among their many other effects, it appears that sociolinguistic situations limit what kind of jokes you can make!
* I think John Cowan will have the link for this one?

Friday, April 14, 2017

Languages in 2117

Charlie Stross, a Scottish science fiction writer, recently posted some speculations on predictions for 2117 that touch rather heavily on the domain of linguistics. Linguists who like science fiction may want to consider commenting over there; he's got some good ideas, but some elements are clearly off. The basic conclusions are:
[B]y 2117, [t]here's [g]oing to be a decline in the number of languages spoken: the main world languages will be down to English, Mandarin, Spanish, and some dialect of Arabic (Arabic is highly fragmented), plus surviving secondary languages with large bodies of adherents (over a hundred million each: for example German, Russian, Japanese).

We're also going to see the widespread deployment of deep learning driven machine translation and, most importantly, near-real-time interpretation. There'll be less reason for a native speaker of an apex language to learn other tongues [...]

And the apex languages will have changed considerably [...]

I suspect that over the next century (assuming we don't lose our technological infrastructure) current mechanisms for writing will be supplanted by newer ones--e.g. the replacement of discrete mechanical keys on keyboards with multitouch keyboards and then with gestural/swipe interfaces, where each dictionary word is replaced by a directional ideogram swiped across a QWERTY keymap, until eventually the ideogram replaces the alphabetic word or is auto-replaced by a corresponding emoji.

So: gradual obsolescence of some grammatical forms, appearance of entire new writing systems, unforseen changes due to the vagaries of machine translation, assimilation of loan words from other cultures, and the 2117 equivalent of "don't drone me, bro" (new shorthand to describe stuff that has become the new normal).

What am I overlooking?

My immediate thoughts would be:
  • Actually, a lot of languages with less than 100 million speakers each will still be around 100 years from now. Even if the Netherlands decided overnight to stop teaching, broadcasting, or providing government services in Dutch - and it won't, quite the opposite - it would take more than 100 years for the language to die out. If anything, the fragmentation of mass media into social media already makes it easier to maintain small languages, and to the extent that e-learning becomes a thing, it will have similar effects. On the other hand, only a handful of Native North American or Australian Aboriginal languages seem likely to make it as far as 2117: right now most of them are already down to elderly speakers only, and revitalization efforts are not likely to succeed without a really drastic rethinking of the school system. This is because of grossly coercive educational policies inflicted on them decades earlier. Chinese educational policy has become significantly less tolerant of minority languages over the past few years, and if that trend continues, I suspect many currently viable languages of China are likely to be in a similar situation by 2117: not yet extinct, but reduced to the point that they seem doomed. More broadly, what to predict about language survival worldwide 100 years from now depends fundamentally on two factors: how compulsory education changes, and how much of the population ends up in big cities. The former, at least, is more than anything else about political decisions.
  • Adequate machine translation does seem likely - not good enough for contexts where precision counts, but easily sufficient for casual conversation or listening to speeches. I wouldn't expect this to have any really major effects on languages, but it might allow literal translations of new idiomatic expressions to spread faster between languages.
  • Emoji are basically discourse markers: they won't become ideograms, they'll become punctuation. If they really catch on, our descendants may be as puzzled by how we get by with just half a dozen punctuation marks as we are by how people used to read with no punctuation at all.
Finally, a line that's calculated to get a lot of linguists up in arms: "[L]anguages are vanishing, and to the extent that we can only reason about things we have words for, this may be a subtle but far-reaching loss." Obviously we can reason about things we don't have words for, and equally obviously not having words for them makes it more cumbersome to talk about them. But more to the point, even where languages are in rude health, words for certain things are vanishing at a rapid pace in them. Algerian Arabic isn't going anywhere, but the vocabulary it used to have for wild plants, for traditional farming technologies, for family relationships that are only relevant in a three-generation household? I don't even think most people my age know them, much less their grandchildren in 2117. Large written languages with sufficiently developed institutions can maintain such vocabulary precariously at the margins by having specialists use it - botanists, agricultural experts, historians, etc. Most languages can't.

Sunday, April 09, 2017

Code-switching as a teaching method?

I haven't done much language teaching in my life, but as a person who likes learning new languages, I've seen a fair range of different teaching methods applied, from only speaking the target language to saying almost everything in English. But the approach used in Simon Bird's "#LilMoshom" series of Cree-teaching videos was new to me, and very interesting. Take a moment to watch some of them before reading further (especially "Respecting pisiskowak" and "Survival tips on the Rez"):

There are a lot of strong points one could comment on - the CGI, the subtitles, and the humour, for instance - but what particularly draws my attention is the way he combines the two languages. To introduce the words he's teaching, he usually speaks in English - but he doesn't just gloss, much less lecture (contrast, say, the more conventional approach used in this Ojibwe video series). When speaking in English, he throws in Cree discourse particles and sometimes even content words, gives the sentence a distinctly non-mainstream English intonation pattern which I assume reflects Cree, and even pronounces the English with a Cree accent. In different contexts, the maker of these videos speaks English like any other Canadian academic, so this appears to be a deliberate teaching strategy. The beauty of this is that, before the learner can even formulate a full sentence, they're already getting a chance to acquire some aspects of language - discourse structure and intonation - that are super-important for actually making yourself understood, yet play a minor role or get left out entirely in many traditional curriculums and textbooks (not to mention grammars!).

Have you ever encountered such a teaching method? If so, did you find it effective?

Sunday, March 26, 2017

Why it's Siwi, not Tasiwit

In English- and French-language discussions of the languages of Egypt, the Berber language of Siwa Oasis in the Western Desert is more and more often called "Tasiwit". Please, don't do this.

In Moroccan and Algerian Berber, as in the Sahel, language names are feminine, and are formed with the feminine circumfix t-...-t: Taqbaylit, Tarifit, Tamazight... In Siwi, however, languages are masculine, as in Egyptian Arabic. Ordinarily, Siwis simply call their language Siwi. When they want to specify the language as opposed to anything else from the oasis, they call it Jlan n Isiwan, "speech of Siwa/Siwis".

If you're writing in a more westerly Berber language, it's quite appropriate to nativise this term into Tasiwit. But if you do so when writing in a Western language, you're just imposing a Moroccan/Algerian convention on a language whose speakers are even less familiar with it than your readers are. On top of that, the feminine of Siwi in Siwi is Tsiwett, not Tasiwit as it would be further west. So just stick with Siwi, OK?

Wednesday, March 15, 2017

Getting from "Hey you!" to "If only"

A well-known Algerian proverb has it that:
لي عندهٌ مية يقول يا ميتين
li `andu mya yqul ya mitin
who has hundred says oh two.hundred
He who has a hundred says "If only it were two hundred!" (literally: "Oh two hundred!")

The ya here is not a general-purpose interjection. Unlike English "oh", it's normally used as a vocative, followed by the name of the person you're addressing. That's its primary function in Classical Arabic too. But in Classical Arabic, you can't use it on its own to mean "if only..."; in fact, that usage isn't very common in Algerian Arabic either. Yet the same extension of function from vocative to wish-marker is found in Algerian Berber. In an 18th century Kabyle poem recorded by Mouloud Mammeri in his Poèmes kabyles anciens (p. 132), an aspiring poet, Muh At Lemsaawd, begs the better-established Yusef u Qasi to accept him as an apprentice:

Ul-iw fellak d amaalal
A wi k-isâan d ccix is

My heart is sick for you
If only I had you as my teacher (literally: "Oh he who has you as his teacher!")

You can't do this in Classical Arabic, nor in English: a vocative followed by a noun phrase is going to be interpreted as an act of addressing, not of wishing. But in Arabic you do find an otherwise unexpected vocative particle showing up in some wish constructions, notably يا ليت yaa layta "if only". And in (slightly archaic) English you have a very similar construction with an infinitive in "to" or a prepositional phrase in "for", instead of with a noun phrase: "Oh to be young again!", "Oh for a thousand tongues to sing!" That suggests that the connection between vocative and wishing reflects some general feature of human cognition, or at least of a rather large culture area.

The obvious connection would be through requests. One reason to address someone is to ask them to bring you something. It's not such a big step from "Hey kid, get me a glass of water" to "Hey, a glass of water!", with the addressee and the verb erased, and the vocative particle effectively serving as much to mark the wish as to get the addressee's attention. But that doesn't really predict forms like the Kabyle one, where the state wished for takes the form of a relative clause, nor even the old-fashioned English constructions discussed, so I'm not really happy with this explanation. Any ideas? And can you think of any parallels in other languages?

Sunday, February 26, 2017

On Olathe

A few days ago, two unarmed young engineers from India were shot in a bar in Olathe, Kansas by a man yelling "Get out of my country!", as was a heroic bystander who tried to stop the shooter. As this contemptible crime put a normally quiet suburb of Kansas City into the international news, journalists and readers worldwide must have been wondering, as I wondered the first time I heard of it a couple of years ago: "How do you pronounce Olathe, and what sort of a name is that anyway?"

The way the locals pronounce it is /ou'leɪθʌ/, as you can hear early in the Mayor's speech. This is remarkably irregular: I can't think offhand of any other word in the English language in which a final e is pronounced /ʌ/, except occasionally "the". You might expect the etymology to provide an explanation, but it turns out to complicate the story further.

The town of Olathe was founded in 1857 by one John Barton, a doctor from Virginia, who - by his own account - got it into his head that "beautiful" would be a good name for the town he envisaged, and:

... meeting Capt. Joseph Parks, head chief of the Shawnees, he said: 'Captain, what in the Shawnee language would you call two quarters of land, all covered with wild flowers? In English we would say it was beautiful." Parks replied: "We would say it was 'Olathe,' "giving it the Indian pronunciation Olaythe, with an explosive accent on the last syllable. Barton made the same inquiry of the official interpreter, an educated Indian, who made the same reply, adding that for English use it would be best to pronounce it "Olathe," with the accent on the second syllable. So it came to pass that the new town was named "Olathe," the city beautiful. (History of Johnson County, Kansas)

In Shawnee, an Algonquian language, (h)oleθí is indeed documented as meaning "pretty" (Gatschet II:2, II:6, III:5); the root also seems to mean "good", judging from its occurrences (spelled <lafi>) in Alford's Shawnee New Testament translation, eg in Matthew 5:45, 19:16, 20:15. One might assume the Shawnees had their own name for the place, but that is not necessarily true, considering they had gotten there barely a generation earlier. Originally from Ohio, they were induced to sign a treaty to move to Kansas in 1831, onto land originally belonging to the Kaws (Kanzas). A few years after the foundation of Olathe, they were pushed out again, to Oklahoma.

It thus seems pretty clear that the original pronunciation of the town's name was /ou'leɪθi/, corresponding better with the spelling (cp. "synecdoche"). How did that turn into /ou'leɪθʌ/? I think the answer lies in English sociolinguistic variation. In the 19th century, standard English word-final /ʌ/ was often pronounced dialectally as /i/, yielding forms like "Americkee" for America or "Canadee" for Canada. In more recent times this pronuciation seems to show up mainly in caricatures of rural or Appalachian speech. The current pronunciation of Olathe as if it were Olatha can thus best be understood as a hypercorrection by people who didn't want to sound uneducated.

Update: A very helpful article linked by Y below, The Pronunciation of Missouri, reveals that the phenomenon is more systematic in the area than I had realised: it extends not only to placenames like Missouri, but even to words like spaghetti, macaroni, or prairie. This makes hypercorrection seem a less likely explanation. Instead, it looks as though final /ɪ/, which becomes /i/ in standard American English, was instead reduced to schwa in parts of the Midwest, including the area surrounding Kansas City. Andrews' (1994) Shawnee Grammar indicates that Shawnee /i/ was often realised as [ɪ], so this fits together nicely.

Friday, February 24, 2017

The Origin of Mid Vowels in Siwi

How does a language with a relatively small vowel system react to pressure from a language with a larger one?

Most northern Berber varieties have a simple four-vowel system: tense /a/, /i/, /u/, vs. lax schwa (/ə/, written e in the official orthography), the latter being mostly predictable and limited to closed syllables. In the eastern and southern Sahara, however, we tend to find slightly larger vowel systems, and it looks very much as though proto-Berber had a rather asymmetrical six-vowel system, close to modern Tuareg but missing /o/: it had tense /a/, /e/, /i/, /u/ vs. lax /ɐ/, /ə/.

Siwi Berber, in western Egypt, has a more symmetrical six-vowel system: tense /a/, /e/, /i/, /o/, /u/ vs. lax /ə/. All of these vowels occur in inherited vocabulary as well as in Arabic loanwords. It is obvious by inspection that, in almost all contexts, *ɐ merged into /ə/. But the distribution of /e/ shows little connection with that of *e: in fact, most instances of proto-Berber *e correspond to Siwi /i/. And the origin of /o/ is not immediately clear at all. How did this happen?

My latest article - written together with Marijn van Putten - proposes some answers. It turns out that proto-Berber */e/ was retained in Siwi only before word-final /n/. Most instances of /e/ and /o/ are found in Arabic loanwords. Within inherited vocabulary, almost all instances of /e/ - and all instances of /o/ - are phonetically conditioned innovations, arising from at least three distinct regular sound changes and one sporadic one. The net effect of this "conspiracy" of sound changes is to extend phonemes otherwise almost entirely restricted to Arabic loans into inherited Berber vocabulary.

If you want the full story, go read our article: The Origin of Mid Vowels in Siwi (published in Studies in African Linguistics 45:1-2 (2016), pp. 189-208).

Sunday, February 19, 2017

A real-life subjacency problem sentence

There are some kinds of questions and relative clauses that you just can't form without resorting to a resumptive pronoun, even in languages - like English - that otherwise don't allow resumptive pronouns to begin with. Ever since Ross (1967) came up with a typology of "island constraints", syntacticians have hotly debated both which ones these are and how to account for them.

Unfortunately, real-life examples of people trying to say such things are very scarce on the ground. As a result discussion of this phenomenon tends to be dominated by artificial examples. Much of the literature on subjacency inadvertently demonstrates how unsatisfactory the result can be (as discussed here: 1, 2). Every once in a long while, however, you find a completely spontaneous case of someone running up against such constraints - and here's today's, courtesy of some person on Reddit:

Step zero: find a couple million complete and utter morons, who it's a miracle they can breathe in and out without f***ing it up, to support you.

Normally, a relative clause starting in "who" would have no overt subject within the clause itself apart from "who", as in:

Step zero: find a couple million complete and utter morons, who in all honesty Ø can barely breathe in and out without f***ing it up, to support you.

But that's impossible here: note the ungrammaticality of:

*Step zero: find a couple million complete and utter morons, who it's a miracle Ø can breathe in and out without f***ing it up, to support you.

Instead, you end up having to fill the subject position to which "who" refers with a resumptive pronoun "they".

Thursday, February 09, 2017

Romance languages in 17th century North Africa

In 1609, 117 years after conquering Granada, the Spanish state decreed the expulsion of all "Moriscos" - that is, everyone descended from Muslims forcibly converted to Christianity, numbering in the hundreds of thousands. In the 1720s, a century later, two separate travellers - Jean-André Peyssonel and Francisco Ximenez - found that a number of towns in Tunisia, including Testour, Bizerte, and Tebourba, were Spanish-speaking, inhabited by the descendants of these refugees (as I was surprised to learn from Vincent 2004). According to Peyssonel, for example, "the inhabitants of Tebourba practically all speak Spanish there, a language which they have conserved from father to son"; referring to the same town, Ximenez adds "immediately after their arrival from Spain, they had schools in our language. They were insultingly told they were not real Moors, and the Bey took away their books and their schools; after that, they little by little forgot Spanish and learnt Arabic." All in all, the reports seem compatible with a three-generation pattern of language shift: the people they met still spoke Spanish, but were likely mostly not to pass it on to their children, as they became more closely integrated into the wider society of their new home.

In 1627, a couple of decades after the expulsion of the Moriscos, a corsair ship from Algiers raided Iceland, capturing a couple of hundred unfortunate villagers, one of whom left a description of his experiences. While the distance travelled in this raid was unusual, the practice itself was less so: the capitals of the Barbary states were full of European slaves captured by state-sponsored pirates, waiting for ransoms that might never come. Likewise, many North Africans were captured and held as slaves in Europe (see eg Wettinger 2002 on Malta): describing Algiers in 1612, Diego de Haedo comments that "there are many Muslims who have been captives in Spain, Italy and France" and hence speak those countries' languages (Vincent 2004:107). To further complicate matters, not all immigration from Europe was involuntary: Haedo adds that "There are also an infinite number of renegades [converts to Islam] from these countries and a large number of Jews who have been there, who speak polished Spanish, French, or Italian. The same holds for all the children of renegades who, having learned their national language from their parents, speak it as well as those born in Spain or in Italy."

In brief, 17th-century North Africa contained plenty of European immigrants - some refugees, some captives, and even some voluntary - learning the language spoken around them while maintaining, for a while, the language they had arrived with. What impact did this have on Maghrebi Arabic and Berber? Unfortunately, it's not easy to date Romance loans into either, but we can safely assume that some of the precolonial loans arrived in this period. A good dialect map, in combination with historical data on where these groups ended up, might help identify such loans more precisely - but that doesn't really exist yet, except to some extent for Morocco (Heath 2002).


Vincent, Bernard. 2004. In Jocelyne Dakhlia ed., Trames de langues. Usages et métissages linguistiques dans l’histoire du Maghreb, Tunis-Paris, IRMC, Maisonneuve & Larose, 2004, 561 p.

Saturday, February 04, 2017

Why the sun really does rise

In response to someone comparing "alternative facts" to science fiction, the eminent science fiction writer Ursula LeGuin recently wrote:
The test of a fact is that it simply is so - it has no "alternative." The sun rises in the east. To pretend the sun can rise in the west is a fiction, to claim that it does so as fact (or "alternative fact") is a lie.
The comments (never read the comments!) include several people trying to be smart by pointing out that, actually, "the truth of the matter is that the sun does not rise, but rather that the Earth turns". This apparent conflict is worth unpacking from a descriptive linguistic perspective.

All fluent speakers of English use phrases like "The sun rises in the east". They also use phrases like "Hot air rises." The commenter quoted previously seems to be applying something like the following reasoning:

  • When something (eg hot air) rises, it moves upwards away from the earth.
  • When the sun "rises", it's not moving upwards away from the earth - rather, the earth is turning relative to it.
  • Therefore, the sun does not actually rise.
A lexicographer will immediately see at least one ironclad way to vitiate such an argument: identify two distinct senses for "rise". Rise1 means "to move upward away from the ground", while rise2 means "for a celestial body's apparent position to come closer to the zenith" (or something along those lines.) The sun rises2, but it doesn't rise1.

But not so fast! It's perfectly plausible that someone could believe the earth is stationary and the sun physically moves upwards when it rises. For someone holding that belief (or even just using that mental model without necessarily believing it), "rise" could easily have a single sense, not two different ones. Is there any language-internal evidence that "rise" has two senses?

As it happens, there is: look at antonyms. We say "The sun sets in the West", but "Hot air sinks" (and "Empires fall", but that's another story); you can't say "*Hot air sets". "Set" is the antonym of rise2, but not of rise1. That seems like a pretty good reason to assume that, even for flat-earther speakers of English, the two senses are lexically distinct. So it looks like Ursula LeGuin wins this one, as you might expect.

Wednesday, January 25, 2017

Tigre between ejectives and pharyngealization

There is some debate over the original pronunciation of the "emphatic" consonants (Arabic ط ض ظ ص ق) in Semitic and more generally in Afroasiatic: were they ejective as in Amharic, or pharyngealized/uvular as in Arabic? For a number of reasons, such as that in proto-Semitic they did not show a voicing contrast, the general opinion is that they were glottalized. Yet pharyngealized consonants show up not just in Arabic and neo-Aramaic but even in Berber, which would on the face of it suggest that the feature predates proto-Semitic. Either we have to suppose independent parallel development, or we must assume that Berber ejectives turned into pharyngealized consonants under the influence of Arabic. The latter seems more probable, but only if we can show that it is indeed plausible for a language to make such a change as a result of widespread bilingualism in Arabic.

It turns out that Tigre, the main language of northern Eritrea, offers a concrete example of just that. The inland plateau dialect of the Mansa`, commonly considered as standard, is described by Raz (1983) as having four ejectives k' (usually [ʔ]), t', s', and č̣ , and no pharyngealized or uvular consonants. You can hear an example of standard Tigre here, which seems consistent with his description. The coastal Hirgigo dialect spoken around Massawa, however - as heard in these Learn Tigre YouTube videos, however, show a rather different situation. ḳ is simply [q] (as in "elbow", "neck", "thigh"), ṭ is [tˤ] (as in "goat"), ṣ is [sˤ] (as in "white", "black", "back"); only for č̣ can you occasionally hear a slightly ejective realization [tʃ] ~ [tʃ'] (as in "fingers" or "fingernails"). The result is a good deal easier for an Arabic speaker to pronounce! This should not be too surprising: the port of Massawa has had extensive contact with Arabic speakers for many centuries. In fact, it's said to be the place where some of the first Muslims, seeking refuge from the persecution they were suffering in Mecca, landed on their way to the Abyssinian court. Such a diversity of emphatic consonant realizations within a single language confirms in turn that it is plausible for the habit of pharyngealizing emphatic consonants to be transferred from a language to its neighbors.

Saturday, January 21, 2017

Semitic languages in two Arabic novels

I've been reading two novels in Arabic lately. Frankenstein in Baghdad, by Ahmad Saadawi, reimagines Baghdad's descent into chaos in the mid-2000s, blending gritty realism with semi-allegorical horror. Samraweet, by Hajji Jaber, is an altogether gentler but still cutting narrative of the Eritrean diaspora, interleaving scenes from the narrator's life in Jeddah with ones from his first visit to Asmara as he gradually realizes the difficulty of being part of either place. Both turned out to share a feature I hadn't been expecting to find: dialogue in other Semitic languages.

In Frankenstein in Baghdad, one of the main characters is an elderly Assyrian woman, Elishawa "Umm Daniel". All her relatives have long since moved abroad, and keep begging her to come live with them where it's safe, so there are few occasions for her to speak anything but Arabic. However, the Assyrians of northern Iraq traditionally speak a variety of Neo-Aramaic, and when she meets her grandson from Melbourne, they have the following fairly elementary conversation (pp. 276-277), which I hope I've transcribed correctly:

"داخي إيوَت؟" (Dāx īwat?) "How are you?"
"سباي إيْوَن باسيما" (Spāy īwan basīmā) "I am fine, thanks."

The author of the book seems to be from southern Iraq, so I found it remarkable that he took the trouble to get some Neo-Aramaic dialogue - especially since the copula is appropriately put in the feminine form both times (in Assyrian Neo-Aramaic, even the 1st person singular copula agrees in gender). Probably he felt it would enhance her symbolic status as a reminder of what Iraq once was. Unfortunately, while Aramaic has been spoken in Iraq for almost three millennia, its prospects there are dim: after all these years of war and frequently persecution, most speakers live in Western cities, and unless they're exceptionally good at remaining a distinct, cohesive immigrant group, their descendants seem more likely to speak English or Swedish than Aramaic.

In Eritrea, unlike Iraq, most people have as their first language a Semitic language other than Arabic: Tigre in the north, Tigrinya in the south. So it was less surprising to find an Asmaran waiter on the first page saying "سنّي ما سيام" (?Senni mā syām), which I assume from context means something like "Good afternoon!" However, the occasional glimpses provided into Eritrean sociolinguistics were more eye-opening. The narrator and most of his friends are from a Tigre-speaking background and know how to speak it, but Tigre per se seems to play little part in their linguistic identity. They grew up not only speaking Arabic in the street, but feeling that Arabic is an Eritrean national language, and resenting the government's treatment of it as less central than Tigrinya. When an Eritrean in Jeddah speaks Tigre with him, the narrator assumes it's because he only arrived recently until he finds out, to his surprise, that this person simply "enjoys speaking it, even in Jeddah" (p. 76). It would be interesting to see how this compares to the attitudes of Tigre speakers living in Eritrea: between the prestige of Arabic and the status of Tigrinya, what are the long-term prospects for Tigre?

Saturday, January 07, 2017

Of words and pens

In Algerian Arabic, this is a stilu ستيلو - a word instantly recognizable as a borrowing from French stylo:

In Standard Arabic, on the other hand, as any Algerian learns in primary school, it's a qalam قَلَمٌ. This, as it happens, may also be a borrowing, though a much older one; compare ancient Greek kálamos κάλαμος "reed, reed-pen", which apparently has an Indo-European etymology. Clearly, either pre-modern Algerians were so sunk in illiteracy as to have forgotten the word for a pen altogether, or they replaced a pre-existing word for pen with a French borrowing - right?

Well, no. In the Middle Ages, there weren't too many fountain pens or biros around. Classical Arabic qalam referred to something more like these:

Any Algerian who went to Qur'anic school up to the 1960s or so will remember this - a simple reed pen anyone can make using nothing more complicated than a sharp knife. (The Algerian version was a bit different than those in the picture, as it happens - usually people would use a quarter-circumference of a large reed, not the whole circumference of a small one.) More than that, they will remember what it's called: qləm قلم. There are probably people in Algeria who still use these, and very likely they still call them that.

But no one calls a modern industrial pen qləm. When industrial pens were introduced, sometime in the 19th century, ordinary Algerians ended up classing them as a new object, quite distinct from the reed pen despite its similar function, and deserving of an unrelated name. The guardians of Standard Arabic, on the other hand, decided to extend the reference of qalam to cover both. It may be no coincidence that French distinguishes calame from stylo, like Algerian Arabic, whereas English, like Standard Arabic, treats both as diferent types of pen.

Historical linguists regularly use lexical reconstruction to shed light on technological history, an approach called "Wörter und Sachen". This approach has been very fruitful in many cases. But, as this case illustrates, there are some pitfalls to watch out for: whether something counts as the same object or as a new one is a rather culture-bound question, and if investigators impose their own ideas about this on the situation they are investigating, they will get the wrong answer.