Sunday, February 26, 2017

On Olathe

A few days ago, two unarmed young engineers from India were shot in a bar in Olathe, Kansas by a man yelling "Get out of my country!", as was a heroic bystander who tried to stop the shooter. As this contemptible crime put a normally quiet suburb of Kansas City into the international news, journalists and readers worldwide must have been wondering, as I wondered the first time I heard of it a couple of years ago: "How do you pronounce Olathe, and what sort of a name is that anyway?"

The way the locals pronounce it is /ou'leɪθʌ/, as you can hear early in the Mayor's speech. This is remarkably irregular: I can't think offhand of any other word in the English language in which a final e is pronounced /ʌ/, except occasionally "the". You might expect the etymology to provide an explanation, but it turns out to complicate the story further.

The town of Olathe was founded in 1857 by one John Barton, a doctor from Virginia, who - by his own account - got it into his head that "beautiful" would be a good name for the town he envisaged, and:

... meeting Capt. Joseph Parks, head chief of the Shawnees, he said: 'Captain, what in the Shawnee language would you call two quarters of land, all covered with wild flowers? In English we would say it was beautiful." Parks replied: "We would say it was 'Olathe,' "giving it the Indian pronunciation Olaythe, with an explosive accent on the last syllable. Barton made the same inquiry of the official interpreter, an educated Indian, who made the same reply, adding that for English use it would be best to pronounce it "Olathe," with the accent on the second syllable. So it came to pass that the new town was named "Olathe," the city beautiful. (History of Johnson County, Kansas)

In Shawnee, an Algonquian language, (h)oleθí is indeed documented as meaning "pretty" (Gatschet II:2, II:6, III:5); the root also seems to mean "good", judging from its occurrences (spelled <lafi>) in Alford's Shawnee New Testament translation, eg in Matthew 5:45, 19:16, 20:15. One might assume the Shawnees had their own name for the place, but that is not necessarily true, considering they had gotten there barely a generation earlier. Originally from Ohio, they were induced to sign a treaty to move to Kansas in 1831, onto land originally belonging to the Kaws (Kanzas). A few years after the foundation of Olathe, they were pushed out again, to Oklahoma.

It thus seems pretty clear that the original pronunciation of the town's name was /ou'leɪθi/, corresponding better with the spelling (cp. "synecdoche"). How did that turn into /ou'leɪθʌ/? I think the answer lies in English sociolinguistic variation. In the 19th century, standard English word-final /ʌ/ was often pronounced dialectally as /i/, yielding forms like "Americkee" for America or "Canadee" for Canada. In more recent times this pronuciation seems to show up mainly in caricatures of rural or Appalachian speech. The current pronunciation of Olathe as if it were Olatha can thus best be understood as a hypercorrection by people who didn't want to sound uneducated.

Update: A very helpful article linked by Y below, The Pronunciation of Missouri, reveals that the phenomenon is more systematic in the area than I had realised: it extends not only to placenames like Missouri, but even to words like spaghetti, macaroni, or prairie. This makes hypercorrection seem a less likely explanation. Instead, it looks as though final /ɪ/, which becomes /i/ in standard American English, was instead reduced to schwa in parts of the Midwest, including the area surrounding Kansas City. Andrews' (1994) Shawnee Grammar indicates that Shawnee /i/ was often realised as [ɪ], so this fits together nicely.

Friday, February 24, 2017

The Origin of Mid Vowels in Siwi

How does a language with a relatively small vowel system react to pressure from a language with a larger one?

Most northern Berber varieties have a simple four-vowel system: tense /a/, /i/, /u/, vs. lax schwa (/ə/, written e in the official orthography), the latter being mostly predictable and limited to closed syllables. In the eastern and southern Sahara, however, we tend to find slightly larger vowel systems, and it looks very much as though proto-Berber had a rather asymmetrical six-vowel system, close to modern Tuareg but missing /o/: it had tense /a/, /e/, /i/, /u/ vs. lax /ɐ/, /ə/.

Siwi Berber, in western Egypt, has a more symmetrical six-vowel system: tense /a/, /e/, /i/, /o/, /u/ vs. lax /ə/. All of these vowels occur in inherited vocabulary as well as in Arabic loanwords. It is obvious by inspection that, in almost all contexts, *ɐ merged into /ə/. But the distribution of /e/ shows little connection with that of *e: in fact, most instances of proto-Berber *e correspond to Siwi /i/. And the origin of /o/ is not immediately clear at all. How did this happen?

My latest article - written together with Marijn van Putten - proposes some answers. It turns out that proto-Berber */e/ was retained in Siwi only before word-final /n/. Most instances of /e/ and /o/ are found in Arabic loanwords. Within inherited vocabulary, almost all instances of /e/ - and all instances of /o/ - are phonetically conditioned innovations, arising from at least three distinct regular sound changes and one sporadic one. The net effect of this "conspiracy" of sound changes is to extend phonemes otherwise almost entirely restricted to Arabic loans into inherited Berber vocabulary.

If you want the full story, go read our article: The Origin of Mid Vowels in Siwi (published in Studies in African Linguistics 45:1-2 (2016), pp. 189-208).

Sunday, February 19, 2017

A real-life subjacency problem sentence

There are some kinds of questions and relative clauses that you just can't form without resorting to a resumptive pronoun, even in languages - like English - that otherwise don't allow resumptive pronouns to begin with. Ever since Ross (1967) came up with a typology of "island constraints", syntacticians have hotly debated both which ones these are and how to account for them.

Unfortunately, real-life examples of people trying to say such things are very scarce on the ground. As a result discussion of this phenomenon tends to be dominated by artificial examples. Much of the literature on subjacency inadvertently demonstrates how unsatisfactory the result can be (as discussed here: 1, 2). Every once in a long while, however, you find a completely spontaneous case of someone running up against such constraints - and here's today's, courtesy of some person on Reddit:

Step zero: find a couple million complete and utter morons, who it's a miracle they can breathe in and out without f***ing it up, to support you.

Normally, a relative clause starting in "who" would have no overt subject within the clause itself apart from "who", as in:

Step zero: find a couple million complete and utter morons, who in all honesty Ø can barely breathe in and out without f***ing it up, to support you.

But that's impossible here: note the ungrammaticality of:

*Step zero: find a couple million complete and utter morons, who it's a miracle Ø can breathe in and out without f***ing it up, to support you.

Instead, you end up having to fill the subject position to which "who" refers with a resumptive pronoun "they".

Thursday, February 09, 2017

Romance languages in 17th century North Africa

In 1609, 117 years after conquering Granada, the Spanish state decreed the expulsion of all "Moriscos" - that is, everyone descended from Muslims forcibly converted to Christianity, numbering in the hundreds of thousands. In the 1720s, a century later, two separate travellers - Jean-André Peyssonel and Francisco Ximenez - found that a number of towns in Tunisia, including Testour, Bizerte, and Tebourba, were Spanish-speaking, inhabited by the descendants of these refugees (as I was surprised to learn from Vincent 2004). According to Peyssonel, for example, "the inhabitants of Tebourba practically all speak Spanish there, a language which they have conserved from father to son"; referring to the same town, Ximenez adds "immediately after their arrival from Spain, they had schools in our language. They were insultingly told they were not real Moors, and the Bey took away their books and their schools; after that, they little by little forgot Spanish and learnt Arabic." All in all, the reports seem compatible with a three-generation pattern of language shift: the people they met still spoke Spanish, but were likely mostly not to pass it on to their children, as they became more closely integrated into the wider society of their new home.

In 1627, a couple of decades after the expulsion of the Moriscos, a corsair ship from Algiers raided Iceland, capturing a couple of hundred unfortunate villagers, one of whom left a description of his experiences. While the distance travelled in this raid was unusual, the practice itself was less so: the capitals of the Barbary states were full of European slaves captured by state-sponsored pirates, waiting for ransoms that might never come. Likewise, many North Africans were captured and held as slaves in Europe (see eg Wettinger 2002 on Malta): describing Algiers in 1612, Diego de Haedo comments that "there are many Muslims who have been captives in Spain, Italy and France" and hence speak those countries' languages (Vincent 2004:107). To further complicate matters, not all immigration from Europe was involuntary: Haedo adds that "There are also an infinite number of renegades [converts to Islam] from these countries and a large number of Jews who have been there, who speak polished Spanish, French, or Italian. The same holds for all the children of renegades who, having learned their national language from their parents, speak it as well as those born in Spain or in Italy."

In brief, 17th-century North Africa contained plenty of European immigrants - some refugees, some captives, and even some voluntary - learning the language spoken around them while maintaining, for a while, the language they had arrived with. What impact did this have on Maghrebi Arabic and Berber? Unfortunately, it's not easy to date Romance loans into either, but we can safely assume that some of the precolonial loans arrived in this period. A good dialect map, in combination with historical data on where these groups ended up, might help identify such loans more precisely - but that doesn't really exist yet, except to some extent for Morocco (Heath 2002).


Vincent, Bernard. 2004. In Jocelyne Dakhlia ed., Trames de langues. Usages et métissages linguistiques dans l’histoire du Maghreb, Tunis-Paris, IRMC, Maisonneuve & Larose, 2004, 561 p.

Saturday, February 04, 2017

Why the sun really does rise

In response to someone comparing "alternative facts" to science fiction, the eminent science fiction writer Ursula LeGuin recently wrote:
The test of a fact is that it simply is so - it has no "alternative." The sun rises in the east. To pretend the sun can rise in the west is a fiction, to claim that it does so as fact (or "alternative fact") is a lie.
The comments (never read the comments!) include several people trying to be smart by pointing out that, actually, "the truth of the matter is that the sun does not rise, but rather that the Earth turns". This apparent conflict is worth unpacking from a descriptive linguistic perspective.

All fluent speakers of English use phrases like "The sun rises in the east". They also use phrases like "Hot air rises." The commenter quoted previously seems to be applying something like the following reasoning:

  • When something (eg hot air) rises, it moves upwards away from the earth.
  • When the sun "rises", it's not moving upwards away from the earth - rather, the earth is turning relative to it.
  • Therefore, the sun does not actually rise.
A lexicographer will immediately see at least one ironclad way to vitiate such an argument: identify two distinct senses for "rise". Rise1 means "to move upward away from the ground", while rise2 means "for a celestial body's apparent position to come closer to the zenith" (or something along those lines.) The sun rises2, but it doesn't rise1.

But not so fast! It's perfectly plausible that someone could believe the earth is stationary and the sun physically moves upwards when it rises. For someone holding that belief (or even just using that mental model without necessarily believing it), "rise" could easily have a single sense, not two different ones. Is there any language-internal evidence that "rise" has two senses?

As it happens, there is: look at antonyms. We say "The sun sets in the West", but "Hot air sinks" (and "Empires fall", but that's another story); you can't say "*Hot air sets". "Set" is the antonym of rise2, but not of rise1. That seems like a pretty good reason to assume that, even for flat-earther speakers of English, the two senses are lexically distinct. So it looks like Ursula LeGuin wins this one, as you might expect.