Saturday, April 25, 2009

French among Algeria's elite

The key issue in Algerian linguistic politics - substantially overshadowing the question of the role of Berber - is what should be the language of bureaucracy and education: Standard Arabic (the official language, and the primary pre-colonial language of literacy for all Algeria) or French (the colonial language, and hence ironically the language which most of the few educated Algerians at independence had studied in.) In practice, it's settled on the one setup most certain to minimise social mobility: Standard Arabic is the primary language of education and symbolism, and French of bureaucracy and social climbing. On top of that, the language of everyday life is Algerian Arabic or Berber, from either of which reaching fluency even in Standard Arabic, let alone the much more different language French, is an uphill struggle.

I recently came across a very illustrative quote from a survey specifically focusing on minor political actors in Algeria - party cadres, journalists, bureaucrats, businessmen, trade unionists, etc:
"To a limited extent, the only space open to [political] actors with little or no knowledge of French were independent unions, independent NGOs, the Arabic press and Islamist parties. This tendency was illustrated by the fact that third-generation elites barely speaking French - only one out of ten interviewees - came from one of these domains. Most other interviewees were either Francophone or bilingual, the latter having difficulties determining which language they considered to be their mother tongue [a footnote suggests she means "primary language"]. The same interviewee often gave different answers depending on whether he filled in this author's questionnaire prior to the interview, or whether he was asked in the course of an interview what language he felt most comfortable speaking and writing. A huge majority of the third-generation interviewees according to their own assessment were better with written French than Standard Arabic. As far as oral skills went, a third of the interviewees said they spoke Standard Arabic as well as or better than French. Over half the interviewees put their oral French skills at the same level as their command of Algerian Arabic or Kabyle Berber dialect, and one out ten claimed to speak French better than anything else." (Isabelle Werenfels, Managing Instability in Algeria, pp. 85-6)
This kind of situation is a recipe for resentment. The government has spent years educating people to be better at Standard Arabic and telling them that it was everyone's duty to use it rather than French; but unfortunately their passion for reform, after creating legions of eager Standard Arabic-using job-seekers, stopped at the gates of the Civil Service. Check out Algerian government websites sometime - many of them don't so much as have Arabic versions (eg Energy, Health, CNRC Finance), and most default to French.

As always, I think language skills should be a barrier only when they're necessary in themselves, not merely as a badge of class membership (and regionalism - people from Algiers or Kabylie are enormously more likely to speak good French than people from, say, the Sahara.) I'd certainly prefer Standard Arabic to French - it's much more like Algerian Arabic than French is, and more a part of Algeria's identity - but in the long run it would be better to create a situation where people could use their own mother tongue for official purposes.

Thursday, April 23, 2009

Healed by the right words

We all know that placebos can be surprisingly effective. But - though it's not exactly surprising - I hadn't realised that there is experimental evidence that simply saying the right thing can have a curative effect.

Two hundred patients with abnormal symptoms, but no signs of any concrete medical diagnosis, were divided randomly into two groups. The patients in one group were told "I cannot be certain what is the matter with you", and two weeks later only 39% were better"; the other group were given a firm diagnosis, with no messing about, and confidently told they would be better within a few weeks. 64% of that group got better in two weeks." (Bad Science, p. 75, citing Thomas 1987)

I can imagine a lot of factors that could affect the effectiveness of the doctor's words here - mainly anthropological, but some of them would certainly fall within the domain of linguistics. For example, the intonation pattern will affect the patient's perception of the doctor's confidence; does that affect the efficacy? Likewise, the accent and the choice of vocabulary could both affect comprehension and perceived competence, and hence presumably the efficacy. Not really my field, but it could be a line of research with unusually clear-cut potential benefits. The obvious problem with this example is that it involves doctors lying to patients, but if the effect could be reproduced without that it would certainly be worth doing.

Thomas KB. General practice consultations: is there any point in being positive? BMJ (Clin Res ed) (9 May 1987); 294 (6581): 1200-2.

"Political complexity predicts the spread of ethnolinguistic groups"

An interesting paper: Political complexity predicts the spread of ethnolinguistic groups. Two basically unsurprising claims that it's good to have calculations supporting: "pastoralists were found to have larger language areas than agriculturalists" and "languages associated with more politically complex societies cover significantly larger areas than those of less complex societies". They also present arguments that "although regions of high biological and cultural diversity do overlap to a striking degree, it is unlikely that biological diversity has any direct effect on cultural diversity on a global scale." Surprisingly, mountainousness was found to correlate with larger language areas, not smaller ones - seems a little suspicious that, though some mountainous areas are pretty un-diverse. Flaws: well, it relies on Ethnologue data and GMI maps, both of which are often unreliable, and systematically more splittist in some areas than in others; but it's not obvious that that would substantially affect the result. Also, ethnic groups, languages, and political units very often don't match up, and their measure of political complexity is based on data for ethnic groups rather than for languages.

(Via GNXP.)

Thursday, April 16, 2009

A Fulani village in Algeria

Anyone acquainted with West African history will be aware of the remarkable extent of the Fulani diaspora, stretching from their original homeland in Senegal all the way to Sudan. However, I was surprised to read the following note in a history of the Tidikelt region of southern Algeria (around In-Salah):
"Le village actuel de Sahel a été créé en 1779 par Sidi Abd el Malek des Foullanes, venu à Akabli dans l'intention de se joindre à une pèlerinage, dont le départ n'eut pas lieu... Les Foullanes sont des Arabes originaires du Macena (Soudan); il y a encore des Foullanes au Sokoto; Si Hamza, le cadi d'Akabli appartient à cette tribu." (L. Voinot, Le Tidikelt, Oran:Fouque 1909, p. 63)
(The current village of Sahel was created in 1779 by Sidi Abd el Malek of the Fulani, who had come to Akabli with the intention of joining a pilgrimage whose departure never occurred... The Fulani are Arabs originating from Macina (Sudan [modern-day Mali]); there are still Fulani at Sokoto; Si Hamza, the qaid of Akabli, belongs to this tribe.)

I very much doubt there would be any traces of the language left - even assuming that Sidi Abd el Malek came with a large enough entourage to make a difference - but wouldn't it be interesting to check?

Sunday, April 12, 2009

How many words are there in a language?

In a recent discussion, the question came up of whether a language's vocabulary could be tallied (briefly addressed at Language Log a while back, and at FEL.) I have no firm answer to that (and it's logically independent of whether or not you can estimate the proportion of the vocabulary coming from a given language - that's a sampling problem.) But, notwithstanding the bizarre if occasionally entertaining acrimony of that discussion, it's actually a rather interesting question.

Clearly, any given speaker of a language - and hence any finite set of speakers - can know only a finite number of morphemes, even if you include proper names, nonce borrowings, etc. ("Words" is a different matter - if you choose to define compounds as words, some languages in principle have productive systems defining potentially infinitely many words. The technical vocabulary of chemists in English is one such case, if I recall rightly.) Equally clearly, it's practically impossible to be sure that you've enumerated all the morphemes known by even a single speaker, let alone a whole community; even if you trust (say) the OED to have done that for some subset of English speakers (which you probably shouldn't), you're certainly not likely to find any dictionary that comprehensive for most languages. Does that mean you can't count them?

Not necessarily. You don't always have to enumerate things to estimate how many of them there are, any more than a biologist has to count every single earthworm to come up with an earthworm population estimate. Here's one quick and dirty method off the top of my head (obviously indebted to Mandelbrot's discussion of coastline measurement):
  • Get a nice big corpus representative of the speech community in question. ("Representative" is a difficult problem right there, but let's assume for the sake of argument that it can be done.)
  • Find the lexicon size required to account for the 1st page, then the first 2 pages, then the first 3, and so on.
  • Graph the lexicon size for the first n pages against n.
  • Find a model that fits the observed distribution.
  • See what the limit as n tends to infinity of the lexicon size, if any, would be according to this model.

A bit of Googling reveals that this rather simplistic idea is not original. On p. 20 of An Introduction to Lexical Statistics, you can see just such a graph. An article behind a pay wall (Fan 2006) has an abstract indicating that for large enough corpora you get a power law.

But if it's a power law, then (since the power obviously has to be positive) that would predict no limit as n tends to infinity. How can that be, if, for the reasons discussed above, the lexicon of any finite group of speakers must be finite? My first reaction was that that would mean the model must be inapplicable for sufficiently large corpus sizes. But actually, it doesn't imply that necessarily: any finite group of speakers can also only generate a finite corpus. If the lexicon size tends to infinity as the corpus size does, then that just means your model predicts that, if they could talk for infinitely long, your speaker community would eventually make up infinitely many new morphemes - which might in some sense be a true counterfactual, but wouldn't help you estimate what the speakers actually know at any given time. In that case, we're back to the drawing board: you could substitute in a corpus size corresponding to the estimated number of morphemes that all speakers in a given generation would use in their lifetimes, but you're not going to be able to estimate that with much precision.

The main application for a lexicon size estimate - let's face it - is for language chauvinists to be able to boast about how "ours is bigger than yours". Does this result dash their hopes? Not necessarily! If the vocabulary growth curve for Language A turns out to increase faster with corpus size than the vocabulary growth curve for Language B, then for any large enough comparable pair of samples, the Language A sample will normally have a bigger vocabulary than the Language B one, and speakers of Language A can assuage their insecurities with the knowledge that, in this sense, Language A's vocabulary is larger than Language B's, even if no finite estimate is available for either of them. Of course, the number of morphemes in a language says nothing about its expressive power anyway - a language with a separate morpheme for "not to know", like ancient Egyptian, has a morpheme for which English has no equivalent morpheme, but that doesn't let it express anything English can't - but that's a separate issue.

OK, that's enough musing for tonight. Over to you, if you like this sort of thing.

Houhou yentakheb rouhou

(Warning: this post contains no significant linguistic content.)

The results are in: Bouteflika has been “re-elected” as President of Algeria with a staggering 90.24% of votes cast. According to Government figures, 74.54% of eligible voters voted (although oddly enough, the polling booths looked deserted in all the main towns.) He had already served two terms, which had been the limit, so, to let himself run for re-election, he had had the constitution changed shortly beforehand. I would start mocking the guy, but why bother? With figures like that, he's making a fool of himself with no help from me. Time was when he was willing to settle for figures that naive observers might be capable of taking seriously; as he turns senile either his intelligence or his capacity for shame must be declining. The best measure of the glory of his achievements is the 50% of Algerian youths who intend to try to leave the country.

In case you were wondering how this result was achieved, here's my best somewhat informed guess: In the countryside, especially in areas like the Sahara where tribalism is still present, the local patriarchs simply tell everyone to vote en masse for the President, on the basis that he will stay in power no matter what they do and a conspicuous display of loyalty will earn them government investment (although even that wouldn't be enough to produce things like the 97% turnout in Tissemsilt without further fraud.) In the cities or the larger towns of the north, practically nobody bothers to vote apart from people on government payrolls, so they simply exaggerate the participation figures. In Kabylie, uniquely, we have a largely rural, somewhat tribal region fed up enough with the government that even the villages have organised themselves to refuse it legitimacy, so conspicuously that even government figures acknowledge a much lower turnout. If we assume that the government figures are broadly accurate regarding relative turnout (though certainly not absolute), then the situation shows up in the negative slope on this plot of population against turnout (participation); the two 30% wilayas are Tizi-Ouzou and Bejaia, the main Kabyle regions.

Another post on this worth looking at: Victory over the People.

Wednesday, April 08, 2009

When goals create blind spots

You're watching a ball game attentively. A person in a gorilla suit walks right through the middle, remaining visible for 5 seconds. Can you imagine not noticing the gorilla guy? Well, it turns out that nearly half of all people undergraduate volunteers don't, if they're busy trying to count passes - and the authors of that study cite 7 other experiments confirming the same principle.

It strikes me that there's a lesson there for linguists. Often linguists study a language for a specific theoretical goal - looking at Malagasy primarily to see what VOS syntax is like, or Oneida primarily to learn how polysynthesis works, or Songhay primarily to see whether it's related to Nilo-Saharan or not. That's fair enough; no one can focus on everything at once. But we can miss some really interesting stuff by focusing on one aspect of the language to the exclusion of others. For example, when Laoust studied Siwi, he was interested almost exclusively in its Berber origins - and as a result, his generally excellent study somehow ignored the vowels e and o (which are found even in Berber words, but are not phonemic in the Moroccan Berber varieties he was more familiar with), and mistakenly attributed the Arabic elements of Siwi to the adjacent Bedouin dialects, when in fact they show some very distinctive non-Bedouin characteristics. This is something we all need to watch out for.

Saturday, April 04, 2009

Flora of the Central Sahara and elsewhere

Ever found yourself trying to sort out a plant name you've elicited, not knowing any botany worth mentioning? Well, it turns out the botanists are a step ahead of the linguists on the digital libraries game, at least in Spain: the Digital Library del Real Jardín Botánico CSIC has a pretty remarkable array of books to browse online. The one that just saved my etymology of the Kwarandzyey plant name tsifəṛfəẓ is Etudes sur la flore et la végétation de la Sahara centrale. Vol. III: Hoggar, which gives both Tamasheq and binomial names for each plant mentioned. Unfortunately it's clear that not all the works give translations of the names, but it's still worth a look.

On a similar note, I've found Sahara-Nature handy sometimes.