infundibulum

Yahoo’s Cross-Language Search

July 15th, 2005

Yahoo! Search blog: Sprechen Sie Deutsch?

Machine translation was once a rather obscure field — until Babelfish hit the web, I suppose.

Wait, I take that back. There was a period of time back in the 50’s when MT was very much in the public eye — until it became clear that it wasn’t going to be useful (well, not for a few more decades, anyway). Check out this nice history of MT in a nutshell for details.

But I digress.

If you’ve nosed around in academic MT within the last decade or so (or even just poked dilettanteishly at its periphery, as I have), then you were surely inundated by the torrents of boffin-speak.

I find it interesting to watch public-facing search engine companies like Google and Yahoo are being forced to find simple terminology to describe their work in MT . I often find myself mentally, uh, translating from these more verbose descriptions back into the terminology of academia. From the above link, for instance:

So what does this really mean? We apply our Yahoo! Search Translation Technology by taking your query, looking across the entire Web and across languages to assemble the most comprehensive set of relevant results, and then returning that information in your local language.

“Oh, you mean this thing does CLIR…”

People complain a lot about technical terminology, but of course it’s actually useful. It’s just that it’s more trouble than it’s worth, for most people. In any case, it’s great to see this kind of tech seeping out onto the web.

China, Microsoft, and Translation

June 18th, 2005

I’ve been following the story about Microsoft’s latest adventure in China with some interest, but it really wasn’t until I read the latest post at Global Voices that I saw that this story is directly related to a topic I’ve been sort of obsessed with lately, what I think of as “ the wall of translation .”

If you missed the story, basically what’s happened is that Microsoft is cooperating with China’s censorship of MSN Spaces blogs and blocking words like “democracy” and “human rights” in the way some blogging systems block words like “fuck.”

I’m glad mine doesn’t. :)

Anyway, what I mean by the “ wall of translation ” is that this is a story where a dialog could take place on a large scale between Chinese-speaking and English-speaking bloggers (or speakers of any language, really), if there were an effective mechanism for that translation to take place.

But the conversation hits a wall, because the connections and routines that make translation happen aren’t public.

Presumably some day machine translation will solve that problem. But that day isn’t today. And despite what Google says, I don’t think it’s going to come within in the next few years.

People need to think about this problem, a lot, because it must be solved.

I think about it.

A lot.

Machine Translation, Blogging, and Bird Flu

June 4th, 2005

With all the brouhaha over Google’s recent “demo” of their machine translation (MT) system, I’ve become interested in the way that MT is actually used in the blogosphere.

Here’s one interesting example:

Notes from the world of wildlife disease: Over 8000 Bird Flu Deaths in Gangcha County Qinghai China?

Dr. Niman is relying on a machine translation from a Chinese language website where anyone can post. As Crawford Kilian notes, this sounds like Nostradamus.

I think some perspective is necessary.

And so on. What interests me here is the fact that Dr. Niman, whoever he is, is making a medical statement based on machine translation. Pretty surprising.

So what we’re looking at here is someone who apparently has expertise in wildlife diseases criticizing another expert’s discussion of the output of an MT system (SYSTRAN, natch).

But Dr. Niman’s article says that the content has been edited. Here’s what we don’t know:

  • What editing was done
  • How much editing was done
  • Whether the editor has any expertise in Chinese

That said, I imagine that this translation is more or less okay. I don’t doubt that Systran can translate Chinese numerals accurately, or the names of animals (assuming that they’re in the database).

Nonetheless, it’s sort of surprising that this MT’d content is showing up, for instance, as “news” in Google News, without human expertise in the loop:

(Blogs have also begun picking up the story.)