Saturday, October 04, 2008

Translating from linguists' English to normal English

Machine translation between languages is hard, obviously. There are all sorts of reasons why just looking words up and constructing syntactic trees and changing orders appropriately isn't enough to produce a good output - mainly, the fact that to disambiguate ambiguities you often need real world knowledge, and different vocabularies are not always organised in the same way. How much that matters is really emphasised by thinking about a slightly different problem: translation from a technical vocabulary to a non-technical one within the same language.

Take the following sentences, pulled at random from a grammar on my shelf (Stroomer's Grammar of Boraana Oromo):
"Nouns ending in -ni (mostly -aani) have ultimate or penultimate stress in free variation."

"Verbs with the verb extension -ad'd'-, -at- have an AFF.IMPER.sg: -ád'd'i, -ád'd'u and a NEG.IMPER.sg: -atín(n)i, see 10.10." (p. 72)

If you are, say, a foreign worker about to be posted to northern Kenya, or a second-generation emigrant Oromo planning to go back and visit, you may well want to try and learn some Oromo from this book. But the odds are you will not know what either of these English sentences means, and that applies to quite a lot of the book.

How could you translate these sentences into terms a wider audience would understand? If you can assume a certain amount of basic knowledge (traditional parts of speech, consonants and vowels) then that makes things easier:
"Nouns ending in -ni (mostly -aani) get stressed on the last or second-to-last vowel, it doesn't matter which."

"Verbs with -ad'd'-, -at- added at the end have an imperative singular: -ád'd'i, -ád'd'u and a negative imperative singular: -atín(n)i, see 10.10."
Realistically, you can't assume that level of knowledge, certainly not in Britain at any rate (I still can't believe that what little grammar gets taught in schools here only ever seems to get taught in foreign language classes, not in English ones; that no doubt explains part of the country's comparatively low foreign language skills.) So what does that leave you with? Something like:
"When you say a word that refers to a person, place, or thing* and ends in -ni (mostly -aani), you put the emphasis at the end or just before the end, it doesn't matter which."

"If you have a word that means doing something* that has -ad'd'-, -at- added at the end, then to order one person to do that you add -ád'd'i, -ád'd'u, and to order them not to do that you add -atín(n)i, see 10.10."
(*Yes, I know that syntactic tests like whether they can be the object of a preposition yield more accurate definitions, but in practice these are a good first approximation, and the former does work even on gerunds: "Killing is a bad thing", so "killing" is a noun, but *"Kill is a bad thing", so "kill" isn't.)

Could this be done algorithmically? A simple substitution table would certainly not be enough. Just try it with any set of definitions you can think of:
"Words referring to a person, place, or thing ending in -ni (mostly -aani) have final or pre-final emphasis such that it doesn't matter which."

"Words that mean doing something with the words that mean doing something extension -ad'd'-, -at- have an agreeing order-giving one-entity: -ád'd'i, -ád'd'u and a denying order-giving one-entity: -atín(n)i, see 10.10." (p. 72)
Not terribly helpful, I think you'll agree... To come up with something a little more helpful (and I'm sure my renditions could be improved on) we had to change the whole structure of the sentence. Even then, at some point it's probably going to be more effective to just teach the person the grammatical notions and let them go forward from there than to keep giving brief explanations of the same notion over and over again.

The problem is certainly not unique to linguistics. Medicine, law, ecology - most fields have technical vocabularies that pose an obstacle to non-specialists, who will often have good reason to be interested in trying to make sense of them. Is there any role for algorithms in this (apart from obvious things like hyperlinking technical terms to dictionary entries)? It's well outside my usual field, but it would be interesting to hear of any attempts.

3 comments:

bulbul said...

Interesting translation technique there: you translate "in free variation" as "it doesn't matter which" while I would prefer "and there is no way to tell when which one occurs". Functionally, of course, yours is better, since it's aimed at a learner. And it's shorter, too :)

Could this be done algorithmically?
I've seen something like that attempted here. There's a bunch of texts without translation, a morphemic key (a list of morphemes and possible combinations represented by codes) and a glossary where each entry is followed by allowed morphemic codes. I don't know if it's been tested on non-linguists, but you might want to check out Miltner's article on the theory behind the morphemic key in Archív Orientální 35/1967, p. 607-612.
I also remember, way way way back, a 'revolutionary' method of teaching Slovak created by a linguist who promised to dispense with the jargon altogether and even claimed her method could be used to teach any language. It was a big hit for a while, then all of those walkman methods came in and that was all she wrote. Let me see if I still have the magazine clippings.

David Marjanović said...

you translate "in free variation" as "it doesn't matter which" while I would prefer "and there is no way to tell when which one occurs". Functionally, of course, yours is better, since it's aimed at a learner. And it's shorter, too :)

And it implies that both ways are correct. Yours implies that in each case one is correct and the other is wrong, and people have to learn each word by heart to avoid mistakes.

Panu said...

"Verbs with the verb extension -ad'd'-, -at- have an AFF.IMPER.sg: -ád'd'i, -ád'd'u and a NEG.IMPER.sg: -atín(n)i, see 10.10."

My proposal.

"Suppose that you have one of these words for doing things, and towards its end there is this -ad'd'-, -at-. Now, if you want to use the word for telling someone - just one person - to do this thing, then it must end in -ád'd'i, -ád'd'u. But when you want to say the opposite, i.e. to tell someone not to do this particular thing, you use the ending -atín(n)i."