My Money’s on Hausa

Mark Liberman at Language Log has posted another Language quiz… I love these things: the idea is that he posts an audio clip of a random language, and you pull out all your linguistic stops trying to figure out what it is.

Warning… spoilers ensue.

Or at least, if my guess turns out to be correct, then a spoiler ensues. Otherwise it’s just me babbling nonsensically.

Here’s the (very) loose transcription I came up with:

dem angong LIti shin ti mutani tere su MUtu a su ku la tin doko U SIN ji kata loko tin de SORURISE ke harbitz kinta RO masa zanga zangar.

(I’m a firm believer in using the simplest transcription conceivable when one starts transcribing an unknown language—jumping into using exotic IPA detracts from the goal at hand, at least in my experience.)

I used capitalization to indicate what sounded like tonal variation to me. As soon as I listened to the recording I suspected it was an African language, where tonal languages are rampant, but I couldn’t really tell you why. My first guess was “something Bantu” but now I think that was the wrong family.

The first bit that caught my ear was something like zanga zangar, which is spoken quite clearly. It struck me as pretty hard to transcribe incorrectly (assuming that the language was written in a Roman-letter alphabet, as many African languages are). Furthermore it seemed likely that there was some sort of, um, what’s the technical word, “process” going on there—it looked like reduplication.

So I just stuck it into Google as a phrase, like this.

Golly, that wasn’t too hard: sure looks like Hausa.

So how can we try to verify this theory… well, usually I build a corpus of the language and question and start, uh, poking at it. I hacked together some code to build a corpus using the Yahoo search API, which is quite easy to use. (I’ll post it if anyone asks.)

So anyway, after looking at bigrams and such in the corpus I noticed that zanga zangar is preceded occasionally by masu, and sure enough, another Google search turns up 33 results with the string masu zanga zangar. I’d originally transcribed it as masa.

But whatever, it’s late, you know?

