Sunday, October 12, 2008

Tifinagh at Leiden

There were two more talks at Leiden that I should have mentioned, on a subject I've always been interested in - Berber writing systems.

Ramada Elghamis is working on a thesis about Tuareg writing systems, and described the purpose of "ligatures" (a more appropriate term would be "conjuncts") in the Tifinagh of the Air region of Niger. Tuareg Tifinagh allows a number of letter pairs (rt, zt, nk...) to be combined into a single letter. It turns out that this is not artistic license, but an essential feature of the script. In traditional Tifinagh, no vowels are written - but if two letters are combined into a ligature, that means that there is no vowel between them, thus resolving a lot of ambiguities. For example (from memory, so details may be wrong), t-m-r-t is read "tamarit", a woman who is loved, whereas t-m-rt is read "tamart", beard; in unvocalised Arabic script, or in traditional Tifinagh minus the ligatures, there would be no way to distinguish the two.

Robert Kerr came up with a nice argument that Libyco-Berber, the pre-Roman script from which Tifinagh is descended, was adapted specifically from the Punic (early Carthaginian) variant of the Phoenician script, not the original Lebanese one and not the later Neo-Punic one. Basically, Old Phoenician marks no vowels at all; Punic marks a few vowels, almost always final ones; and Neo-Punic marks most vowels in all positions. Libyco-Berber (and traditional Tifinagh) also marks vowels only in final position; this rather odd idiosyncrasy is best interpreted as having been adopted from Punic rather than independently innovated.

Friday, October 10, 2008

Berberologie colloquium at Leiden

I've spent the past couple of days at the Berberologie colloquium in Leiden, and it's been great fun. There were plenty of very interesting speakers, but for me two languages stole the show: Tetserrét and Ghomara.

Tetserrét (discussed by Cécile Lux) is spoken by a Tuareg tribe, the Ayt-Tawari, in Niger. But it's not linguistically Tuareg at all - its closest relative is Zenaga, the Berber of Mauritania (not northern Berber, contrary to Wikipedia), and Tuaregs can't even understand it. It seems to be an isolated survival of the Berber language spoken in the region before the Tuareg got there. It's not in Ethnologue either. (Taine-Cheikh's new Zenaga dictionary is out, by the way, and was selling as fast as a book reasonably can in a conference of twenty people.)

But Ghomara, in northern Morocco, is something else. Across Berber, borrowed Arabic nouns typically behave like in Arabic (keeping their Arabic plurals, and not changing for case.) In Ghomara (discussed by Jamal El Hannouche), Arabic adjectives take Arabic rather than Berber agreement marking - and even some Arabic verbs get conjugated fully in Arabic, not in chance code-switching but regularly by all speakers, and up to and including pronominal object suffixes. It's not quite unprecedented worldwide, but that level of contact influence is pretty darn rare.

I didn't put Tadaksahak in the first paragraph because it's much less unfamiliar to me, but Regula Christiansen's paper on that had some interesting implications. Basically, Tadaksahak has all but lost the Songhay method of forming attributive adjectives; instead, it's substituted a simplified version of the Tuareg one (suffixing -an), which has become productive for Songhay adjectives too. The funny part is this: Songhay has a lot of CVC adjectives (stative verbs). Tuareg doesn't really do CVC adjectives; it prefers longer words. So when you add the -an to these, you typically reduplicate the adjective. For example, kan "be sweet" > kankanan "sweet". This comes worryingly close to invalidating a conjecture I had made on the borrowability of templatic morphology (but not quite!)

My own paper established that much of the Berber element of Kwarandzyey derives from an extinct close relative of Zenaga. In effect, the "Western Berber" genetic subgroup of Berber has four members: Zenaga itself (finally with a decent dictionary), Tetserrét (awaiting further publications), the large Berber element of Hassaniya, and part of the proportionally larger Berber element of Kwarandzyey.

Saturday, October 04, 2008

Translating from linguists' English to normal English

Machine translation between languages is hard, obviously. There are all sorts of reasons why just looking words up and constructing syntactic trees and changing orders appropriately isn't enough to produce a good output - mainly, the fact that to disambiguate ambiguities you often need real world knowledge, and different vocabularies are not always organised in the same way. How much that matters is really emphasised by thinking about a slightly different problem: translation from a technical vocabulary to a non-technical one within the same language.

Take the following sentences, pulled at random from a grammar on my shelf (Stroomer's Grammar of Boraana Oromo):
"Nouns ending in -ni (mostly -aani) have ultimate or penultimate stress in free variation."

"Verbs with the verb extension -ad'd'-, -at- have an AFF.IMPER.sg: -ád'd'i, -ád'd'u and a NEG.IMPER.sg: -atín(n)i, see 10.10." (p. 72)

If you are, say, a foreign worker about to be posted to northern Kenya, or a second-generation emigrant Oromo planning to go back and visit, you may well want to try and learn some Oromo from this book. But the odds are you will not know what either of these English sentences means, and that applies to quite a lot of the book.

How could you translate these sentences into terms a wider audience would understand? If you can assume a certain amount of basic knowledge (traditional parts of speech, consonants and vowels) then that makes things easier:
"Nouns ending in -ni (mostly -aani) get stressed on the last or second-to-last vowel, it doesn't matter which."

"Verbs with -ad'd'-, -at- added at the end have an imperative singular: -ád'd'i, -ád'd'u and a negative imperative singular: -atín(n)i, see 10.10."
Realistically, you can't assume that level of knowledge, certainly not in Britain at any rate (I still can't believe that what little grammar gets taught in schools here only ever seems to get taught in foreign language classes, not in English ones; that no doubt explains part of the country's comparatively low foreign language skills.) So what does that leave you with? Something like:
"When you say a word that refers to a person, place, or thing* and ends in -ni (mostly -aani), you put the emphasis at the end or just before the end, it doesn't matter which."

"If you have a word that means doing something* that has -ad'd'-, -at- added at the end, then to order one person to do that you add -ád'd'i, -ád'd'u, and to order them not to do that you add -atín(n)i, see 10.10."
(*Yes, I know that syntactic tests like whether they can be the object of a preposition yield more accurate definitions, but in practice these are a good first approximation, and the former does work even on gerunds: "Killing is a bad thing", so "killing" is a noun, but *"Kill is a bad thing", so "kill" isn't.)

Could this be done algorithmically? A simple substitution table would certainly not be enough. Just try it with any set of definitions you can think of:
"Words referring to a person, place, or thing ending in -ni (mostly -aani) have final or pre-final emphasis such that it doesn't matter which."

"Words that mean doing something with the words that mean doing something extension -ad'd'-, -at- have an agreeing order-giving one-entity: -ád'd'i, -ád'd'u and a denying order-giving one-entity: -atín(n)i, see 10.10." (p. 72)
Not terribly helpful, I think you'll agree... To come up with something a little more helpful (and I'm sure my renditions could be improved on) we had to change the whole structure of the sentence. Even then, at some point it's probably going to be more effective to just teach the person the grammatical notions and let them go forward from there than to keep giving brief explanations of the same notion over and over again.

The problem is certainly not unique to linguistics. Medicine, law, ecology - most fields have technical vocabularies that pose an obstacle to non-specialists, who will often have good reason to be interested in trying to make sense of them. Is there any role for algorithms in this (apart from obvious things like hyperlinking technical terms to dictionary entries)? It's well outside my usual field, but it would be interesting to hear of any attempts.