Friday, January 05, 2007

Executable articles

Mark Liberman recently made an excellent suggestion on Language Log: scientific and technical papers should include an explicit, executable recipe for generating their numbers, tables and graphs from published data. However, I think if anything he undersells the possibilities. Where executable algorithms could make a big difference to linguistics papers is in testing proposed rules - phonological principles; historical sound shifts; morphological rules...

If someone proposes a vocabulary of protoforms and a set of regular sound shifts, they are writing an explicit algorithm already. With an accompanying executable version of it, taking protoforms as input and outputting descendant forms (and programs to do some of this have already been written, eg Geoff's Sound Change Applier, Phono), you would be able to know exactly what the predicted forms in each descendant language would be, and be able to spot irregularities (in other words, prediction failures) with ease.

Likewise, if someone proposes a phonological principle (whether conceived of as a rule, as a constraint, or otherwise), you could test its effect on arbitrary data and see if it makes the right predictions. Gaps in the theory (for example, the representation of clicks in Government Phonology) or non-computable theories (an accusation sometimes made against Optimality Theory) would be conspicuous: the input would rejected, or the program would simply be unwritable. The phonology of a language would be accompanied as a matter of course by a program generating phonetic representations from phonemic inputs. In fact, a widespread and welcome trend in modern phonology is to regard individual language's phonologies as specific instantiations of universal constraints or structures; so let each theory be fully specified as a programming language, and each individual language's phonology be represented in it (as an ordered list of constraints, or a set of parameter settings, or whatever your favorite theory does). Inconsistencies or ambiguities would stand out like sore thumbs as you played around with the program. In short, executable articles would make linguistic theories more accountable, and make it easier to spot gaps and places where further work is called for.

Incidentally, I have occasionally been known to practice what I'm preaching here: way back when I was studying mathematics, I wrote a program to generate Algerian Arabic broken plurals from singulars, and found the experience very informative. (It worked, too.) Unfortunately, this program was written in Visual Basic 3.0, not a programming language that has aged very well... which is, of course, another issue that would have to be considered.

2 comments:

Anonymous said...

One more sound change applier by the redoubtable Zompist:

http://zompist.com/sounds.htm

C source code available.

Anonymous said...

(My apologies for double commenting. Let me fix the link.)

One more sound change applier by the redoubtable Zompist:

Sounds: The Sound Change Applier

C source code available.