Wednesday, July 25, 2007

Writing codas, from Sylhet to Winnipeg

In Greek-based scripts (like Latin or Cyrillic), unless a consonantal letter is followed by a vowel letter, it is assumed not to be followed by a vowel. This seems natural enough if you're used to it; but if you look at it differently, it's rather wasteful. The commonest sound to follow any given consonant is usually a vowel, not another consonant, so if you allow a single letter to represent a consonant plus a vowel you're saving space and effort.

But if you do that, then how do you represent the fact that a consonant is not followed by a vowel? Different writing systems use different solutions. In alphabets that have stuck more closely to their Canaanite prototype, like Arabic, Hebrew, Syriac, or (traditional) Tifinagh, you normally don't bother: a consonant may be followed by a vowel or may not, and you rely on the reader to figure it out. However, sometimes the reader needs additional cues: maybe the word you're writing is obscure, or two words have the same consonants, or it's very important that the text be read exactly right with no possibility of error. In that case, in Arabic, Hebrew, and Syriac, you mark what follows each consonant with a little sign above or below the letter - one sign for "a", say, another for "i", and another to indicate that nothing follows it. Such a sign is necessary if you're still mainly using the system with no vowel marking, because if you left the letter unmarked it would mean not that the letter had no vowel but that what vowel, if any, followed the consonant should be deduced from context.

Typical Indic scripts, such as Devanagari (the script used for Hindi and Nepali), adopt a rather different solution. A consonant letter on its own is to be read with a default vowel, short a ([ʌ]); a consonant followed by a consonant is written as a single "conjunct" letter, formed in any of several ways, but usually by either putting the second letter underneath the first or taking away a line on the right of the first letter and joining it to the second. On the plus side, this yields much of the compactness of a vowel-optional system without any of the ambiguity, and means that each letter is pronounceable on its own; on the minus side, this means fonts have to include a much larger number of letter forms.

Sylheti Nagri is an Indic script formerly (up to the 1950s or so) in use in the district of Sylhet, in eastern Bangladesh. Like Devanagari, it represents consonant-consonant sequences using conjuncts. However, its users were often also familiar with the Arabic script, where letters could be combined into ligatures whether or not they had vowels between them. This may have inspired them to do something rather unusual for an Indic script: develop vowel-consonant conjuncts, such as a+m, a+l, i+n... and consonant-vowel-consonant conjuncts, like pi+r, mo+t... In fact, judging by the examples in the Unicode proposal, it seems that, for at least some historic users, Sylheti did not have a conjunct system at all, just a ligature system.

One very nice solution is that adopted in Canadian Syllabics, the family of writing systems used by a number of Native American tribes in Canada. The name is potentially misleading: I prefer to reserve the term "syllabary" for writing systems like hiragana, where different syllables differ from each other unpredictably. In Canadian Syllabics, for example Cree, the shape of a symbol represents the consonant, while its orientation represents the vowel that follows it, and length or labialisation may be represented by dots. If no vowel follows the consonant, then the base shape is simply written small and superscripted, using the a-orientation, or for labialised consonants the u-orientation.


Joel said...

Very interesting post. How would you classify Hangul? Rhyming and alliterative syllables show similarities, but shape and position of consonants in the written syllable (assemblage? morph? graph?) differ according to position. And of course, written syllables don't always equal spoken syllables.

jangari said...

This is fascinating!

I've never really paid much attention to the intricacies of writing systems, indeed, I can only read the one. It's certainly interesting to discover (it's obvious really, but it's something that needs to be pointed out nonetheless) that the way we do things in a Latin script isn't necessarily the absolute way.

Tibetan appears to operate similarly to Devanagari; vowel quality is marked with diacritics, the absence of a diacritic signals the default vowel /a/. Codas are marked using a diminutive version of the syllable, placed underneath the syllable it closes.

Lameen Souag said...

Thanks! Hangul and Tibetan are both interesting cases to examine. I hadn't realised that the shape as well as the position differed according to position in Hangul. The Tibetan method is rather sensible for a language with so many codas but rather a major departure from Devanagari; I wonder whether they invented it?

David Marjanović said...

In Hangul, does the shape of the consonant differ more than necessary to squeeze every syllable into a square, as happens with the parts of Chinese characters?

28481k said...

Well, in Hangul, the shape of consonant letter inside a syllabic block differ as little as possible. Indeed, the changing glyph is to do with calligraphic development aligned with Chinese.

In fact, in Hunmin jeong-eum "The Proper Sounds for the Education of the People", the shape don't differ at all. (cf this)

Even today, simple cuttings as used by travel buses don't really care calligraphic beauty, retain all shapes as they are: sometimes they don't even have the same size! Those with trailing consonant appears longer vertically than those without.