infundibulum

Want to Know How to Make Some Money?

June 25th, 2005

Here, I’ll tell you.

News Sentinel | 06/24/2005 | Funding cut for translator service

Asterisk + Wireless network + Laptops + Webcams + Subscriptions + Nationwide (Worldwide?) network of on-call interpreters for lots of languages.

Well, go on.

Update: This probably wouldn’t work: I Bet You Didn’t Make Any Money…

China, Microsoft, and Translation

June 18th, 2005

I’ve been following the story about Microsoft’s latest adventure in China with some interest, but it really wasn’t until I read the latest post at Global Voices that I saw that this story is directly related to a topic I’ve been sort of obsessed with lately, what I think of as “ the wall of translation .”

If you missed the story, basically what’s happened is that Microsoft is cooperating with China’s censorship of MSN Spaces blogs and blocking words like “democracy” and “human rights” in the way some blogging systems block words like “fuck.”

I’m glad mine doesn’t. :)

Anyway, what I mean by the “ wall of translation ” is that this is a story where a dialog could take place on a large scale between Chinese-speaking and English-speaking bloggers (or speakers of any language, really), if there were an effective mechanism for that translation to take place.

But the conversation hits a wall, because the connections and routines that make translation happen aren’t public.

Presumably some day machine translation will solve that problem. But that day isn’t today. And despite what Google says, I don’t think it’s going to come within in the next few years.

People need to think about this problem, a lot, because it must be solved.

I think about it.

A lot.

Machine Translation, Blogging, and Bird Flu

June 4th, 2005

With all the brouhaha over Google’s recent “demo” of their machine translation (MT) system, I’ve become interested in the way that MT is actually used in the blogosphere.

Here’s one interesting example:

Notes from the world of wildlife disease: Over 8000 Bird Flu Deaths in Gangcha County Qinghai China?

Dr. Niman is relying on a machine translation from a Chinese language website where anyone can post. As Crawford Kilian notes, this sounds like Nostradamus.

I think some perspective is necessary.

And so on. What interests me here is the fact that Dr. Niman, whoever he is, is making a medical statement based on machine translation. Pretty surprising.

So what we’re looking at here is someone who apparently has expertise in wildlife diseases criticizing another expert’s discussion of the output of an MT system (SYSTRAN, natch).

But Dr. Niman’s article says that the content has been edited. Here’s what we don’t know:

  • What editing was done
  • How much editing was done
  • Whether the editor has any expertise in Chinese

That said, I imagine that this translation is more or less okay. I don’t doubt that Systran can translate Chinese numerals accurately, or the names of animals (assuming that they’re in the database).

Nonetheless, it’s sort of surprising that this MT’d content is showing up, for instance, as “news” in Google News, without human expertise in the loop:

(Blogs have also begun picking up the story.)

The Education Equality Act

June 1st, 2005

Here’s a translation issue that will probably end up becoming a media circus in New York, if not elsewhere:

The Education Equality Act (Gotham Gazette. June, 2005)

Intro 464: The Education Equity Act was introduced in the City Council by council members Hiram Monserrate and David Yassky. The legislation requires the Department of Education to translate documents, such as report cards and notices, into the eight most widely spoken languages — Spanish, Chinese, Russian, Italian, French, Yiddish, Korean and Polish — and provide interpretation services for parents who don’t speak English.

Hmm, a quick look at Technorati already uncovers some indignation: Multicultural Madness in NYC implies that the “victims” of this legislation would be the students, except that it’s aimed at parents, who are attempting to help the students learn English. The example cited is a parent who doesn’t read English and doesn’t know that their child is skipping class.

Which is in English.

But whatever.

The conversation around this bill should prove interesting. (Granted, it is a pretty vague title, given what the legislation does.)

Volunteer Translation Banks

May 25th, 2005

I ran across an article from last year on something called a “language bank”: Volunteer translators break down barriers

It describes a program at the Seattle Red Cross that brings together translators for over 75 languages. They help with all kinds of needs that immigrants run into:

The bank and its volunteers negotiate with apartment managers, communicate with citizenship and immigration services, decipher cable bills, and even assist in emergency situations such as residential fires; it all adds up to about 4,000 cases a year.

I was unsurprised to find, after a little digging, that there’s a similar program in my own Montgomery County, Maryland: the Montgomery County, MD - Language Bank.

Cool!

I’ve done a tiny bit of interpreting and also some translation before, and lemme tell ya, it’s hard work. To do it under the kind of pressure that I’m sure these programs run into must be at least, uh, stressful.

The administrators and translations at these language banks deserve a lot of appreciation.

It seems like the only language policy stories you’ll ever read in big media in the States is about the English only movement. But language banks are also concrete reminders of the fact that the US is actually an incredibly multilingual society, probably one of the most multilingual societies in the world.

We should be proud of that.

Web equivalents to the OSX dictionary application

May 24th, 2005

I’ve heard mixed reviews of OSX Tiger, but the little dictionary widget seems to be universally popular. There’s actually a class of applications that do something similar on the web, usually through a proxy:

I’d be interested to know of any others!

Transliteration as Poor Man’s Translation

May 20th, 2005

Here’s a thought I’ve never gotten around to implementing or really trying out.

Transliteration is the process of converting text in one script into another script. Here’s an example from Wikipedia: Greek -> English:

Greek Script: Ελληνική Δημοκρατία
Transcription: Ellēnikē Dēmokratia
Transliteration: Elliniki Dimokratia

The details of such conversion are pretty complex — there are two distinct systems of conversion here. The Wikipedia article tries to maintain a distinction between “transcription” and “transliteration,” but whatever, you get the idea: convert from one writing system to another.

Now, let’s suppose you have reason to believe, as blogger Ethan Zuckerman recently did, that there is an article written about you in a language you don’t know :

…two days ago, when ego-surfing Technorati, I discovered that a Saudi blogger had linked to me, mentioning that an interview with me had just been published in Al-Hayat. I can’t read Arabic, but the few English phrases in the piece connected to topics I’m deeply interested in. So hey, perhaps it was an interview with me.

Let’s imagine Ethan wasn’t fortunate enough to find an Arabic blogger to translate the article for him (which he in fact was in this instance). Is there some way that he might be able to determine if his name is in the thing at all?

Maybe so, using automated transliteration (or transcription, whatever!) and a bit of fuzzy matching.

When you get right down to it, the basic operation in transliteration is just making a bunch of substitutions. As with many tasks related to language processing, the best first step is often to simply think of what you’d do if you had to accomplish the task by hand.

Well, let’s say you were going to work with that Greek up there.

Ελληνική Δημοκρατία
Elliniki Dimokratia

(I picked the simpler transliteration system.)

Anyone can do a little inspection and make an educated guess as to which letter corresponds to which… something like this:

Ε E
λ l
λ l
η i
ν n
ι i
κ k
ή i

Δ D
η i
μ m
ο o
κ k
ρ r
α a
τ t
ί i
α a

And of course we’ll need more such pairs to figure out all the letters, but that’s not hard to find. In fact, we could just cut and paste 30 or 40 words from Wikipedia. (Say, city names Αθήνα Athína; Θεσσαλονίκη Thessaloníki; Πελοπόννησος Peloponnesos, etc.)

Once we’ve done that, we can write a simple program which will make those substitutions, and go from one script to the other.

And of course, this is all grossly simplified and won’t work very well at all.

More later…

“Môme du script???”

May 18th, 2005

This is so bizarre… the Office québécois de la langue française proposing official translations for “script kiddie”:

script kiddy / pirate adolescent

Note(s) : Les pirates adolescents utilisent des programmes de script conçus par d’autres au lieu d’en créer eux-mêmes. Généralement, ils laissent des traces pour marquer leur passage. Leur but est souvent la célébrité (ou tout au moins d’impressionner les copains). Ils constituent une menace pour tous les systèmes informatiques, puisqu’ils font habituellement une sélection aléatoire de leurs cibles.

Les termes "pirate adolescent" et "pirate ado" ont été proposés par l’Office de la langue française comme équivalents de script kiddy. Une traduction littérale de l’anglais donnerait : enfant du script, môme du script ou gamin du script.

My lousy translation (my French is really bad):

Note(s): Adolescent pirates use script programs written by others instead of creating them themselves. Generally, they leave traces that mark their passage. Their goal is above all celebrity status (or at least to impress their friends). They are a menace for all computer systems, since they tend to select their targets randomly.

The terms "adolescent pirate" and "teen pirate" have been proposed by the Office of the French Language as equivalents of script kiddie. A literal translation to English would give: script kiddie, script monkey, or script urchin.

I don’t think that language academies are necessarily as pointless as most linguists would argue (few programmers would argue that the W3C is pointless, by way of comparison — standardization is sometimes necessary), but this is beyond absurd.

Official translations for slang?

Phraselator

May 3rd, 2005

If you’re interested in (machine) translation, check out Jeremy Faludi’s post on the Phraselator, a handheld real-time translation device.

It got its start, as many technologies do, in military use, but aparently the things are beginning to become available to the public. The actual data is stored in flash cards. I think there’s a niche for these devices, but one has to wonder how long they’ll survive in the face of the increasing ubiquity and processing power of cell phones.

Text and Meaning

February 25th, 2005

An interesting post at “The Translator’s Blog“: The translation of text vs. the translation of meaning.

A colleague raised the issue of translation at the beginner stage, when you basically just “run” through the text word by word to polish the style afterwards, again and again until it works for you.

The experienced translator, in contrast, will extract the meaning of a text and start from there rather than “copy” the actual word into his target language.

I’ve been experiencing this distinction firsthand of late. I’ve been using a new, open source CAT(Computer Aided Translation) tool called “OmegaT”:http://www.omegat.org/omegat/omegat.html to do some translations of my own, from Portuguese (oi Jonas) and also from Welsh (hylo Nic). I definitely fall into the “polish the style afterwards” camp, although I’d have to say that “polish” may be too ambitious a word– my Portuguese is rusty, and my Welsh is… well, no comment. Give me a couple more years.

Translation is a strange game–in a weird way, I can imagine being a good translator of a language without being a terribly fluent speaker. Becoming fluent is a process of internalizing the language completely, to the point where you speak by intuition. Translation is more like being hyper aware of all the details of both languages at once: you have to know every possible rendering of a phrase in the target language, in order to reflect the original text as idiomatically as possible. That’s the impression I have had, in any case, in my limited attempts at translation.

(By the way, I’ve been caught up in another project, but that little phrase-splitting script I mentioned in the previous post will be coming up soon. I promise.)