infundibulum

Dear Valentine’s day

February 14th, 2008

go away, kthx

Transliteration Project

November 9th, 2007

(Warning: I am way tired right now, but I wanted to get this down…)

I have long been interested in transliteration:

mundotype was my first stab at all that, and it kind of works, for a few languages. But let me tell you, building that transliteration map for Amharic was no walk in the park (and it was mostly thanks to my pals Daniel Yacob and Ephrem Menji that I got anywhere!)

My goal remains the same: I want to create (or see someone else create, whatevarr) a Javascript-based transliteration input system that covers a WHOLE BUNCH of languages. With a consistent, easy-to-understand format for writing and editing rules.

But even better than all that would be coming up with an automated way to infer the rules in the first place.

That’s what I’ve been playing with.

Eventually this should end up on my “serious” blog, over at Blogamundo, but I’ve become a little self-conscious about just rambling there ever since Planet I18n came into being; I’d really rather post there when I have something that’s distributable.

It’s late right now but let me give you the 5-minute rundown of where I am:

What’s a transliteration?

Ask Wikipedia. For example: Epictetus is a transliteration of the Greek name Επίκτητος into the Roman alphabet.

I have some code that goes through Wikipedia dumps and extracts all the interwiki links and article titles, and spits out gigantic “lexicons.”

Here’s an example where I grepped out a Greek/English lexicon (the original has a bazillion languages):

http://ruphus.com/svn/translit/en2el.txt

Which has 2432 lines, with stuff like this:

Archaeology Αρχαιολογία
Austria Αυστρία
Australia Αυστραλία
ASCII ASCII
Africa Αφρική

Now, some of these are “transliterations” and some are “translations” — and in the case of ASCII (oh, the irony), straight out borrowing in the original script.

(By the way, the definition between “translation” and “transliteration” is kind of blurry if you start thinking too hard… fortunately, I don’t.)

Having hacked thru the first few chapters of “Teach Yourself Greek,” I can surmise that the pairs Austria/Αυστρία and Australia/Αυστραλία look like “perfect” transliterations.

And by “perfect,” I mean:

  1. Each word in the pair is the same length
  2. Each word in the pair has the same “letter pattern”

(”Perfect” is just an arbitrary designation.)

It’s #2 that I’ve been thinking about, and getting some results with. It involves “patternizing” a word, and you do that like this:

Replace each letter in the word with the numeric index of the first occurrence of the letter in the word.

Examples:

cat → 012
asia → 0120
Ασία → 0123
Ωκεανός → 0123456
Βιόσφαιρα → 012345175

Get it?

Interestingly, this simple trick is very good at helping to find transliterated words. All I do is go through the word pairs in that file at the top, and check to see if both words produce the same pattern.

Check out some results:

http://ruphus.com/svn/translit/matches-en2el.txt

Croatia
0123453
Κροατία

Cyclades
01234567
Κυκλάδες

Dance
01234
Χορός

Kilo
0123
Κιλό

Keflavík
01234567
Κεφλαβίκ

Methanol
01234567
Μεθανόλη

Montreal
01234567
Μόντρεαλ

For one thing, there are some mistakes. “Χορός” is no transliteration of “Dance,” it’s a translation. But mostly transliterated things come up — notice all the place and personal names?

So from there, I zip up these pairs of words into pairs of letters, like this:

T Τ
r ρ
o ό
f φ
a α

And

K Κ
e ε
f φ
l λ
a α
v β
í ί
k κ

Rinse and repeat for every pair in the list, do a bit of frequency-based manipulatin’, and you get something that looks like this:

http://ruphus.com/svn/translit/schema-en2el.txt

Which is incomplete and imperfect, but pretty damn good for zero linguistic knowledge before hand, aside from the lexicon.

More soon.

(digraphs are a thorny problem, for one thing…)

i was thinking about transliteration

November 7th, 2007

again

and i wrote 30 lines of python about it.

$svn co http://ruphus.com/svn/translit/

if you are bored and or curious.

jQuery junk

October 30th, 2007

I put up a bunch of junk in a directory with jQuery stuff. It’s largely broken experiments. JUST FOR YOU. jQuery

Facebook Groups

October 25th, 2007

Back in gringolândia, I guess I’ll start speaking gringuês again. Man, I miss Brazil.

Tonight I went to Starbucks, where I was reading a book. I had a few conversations. But it’s sort of weird trying to start conversations with random people. Especially if they’re all face down in their laptops (and lattes).

Thing is, though, other people have to be thinking the same thing–”Why am I so damn popular on Facebook  but have no one to talk to at Starbucks??”

Or something.

You know how Facebook groups mostly suck? They’re just like Orkut groups. Or Friendster groups. People go there to be identified, and then they’re like… uh, what now? Because being off-topic seems pretty retarded in a group that’s defined by having a ridiculously specific topic.

It makes a lot more sense to “be identified” with reference to a place that’s… you know… social.

What I’m getting at is, when I got home, I wished there was a Facebook group (or something) for that one particular Starbucks.

(Okay, mainly so I could have the huevos to message up that one girl with the German accent.)

Does anyone know what I’m trying to say? Why doesn’t every place in the world have a place online, that everyone knows about?

Não, é sério

September 18th, 2007

Antes de ler isso, tem que imaginar um tio de 92 anos. 
Eu: O Tio, sabe que eu li no jornal hoje? Um meteoro caiu no Peru.

Tio: Coitado do peru.

pretty much

August 26th, 2007

After the Sept. 11, 2001, attacks, Breyer said: “I began to see that the true division of importance in the world is not between different countries. The important division is between those who are committed to reason, to working out things, to understanding other people, to peaceful resolution of their differences … and those who don’t think that.”

GTD can kiss my ass: FTDB

August 19th, 2007

Yes children, GTD is waaay too complicated for yours truly.

Don’t these people understand that they are dealing with someone whose attention span has been shortened by the internet to slightly shorter than that of a gnat?

No no, no “steps” for me. No projects, no categories, no classification whatsoever.

No mahogany drawers that make nice encouraging snappy noises when you close them.

No fountain pens.

No paper.

As a matter of fact, no saving.

As a matter of fact, barely any justification for writing whatsoever.

Enforced throw-away-ness.

The ultimate in lack of self-respect.

Harnessing self-deprecation for better hallway vision!

I give you, FIVE THINGS DONE BITCH.

When you look at it, and think, “wait, it has no features…”

…That’s the point.
(And when you think, “it’s very rude,” that’s me, talking to myself.)

I swear, it works.

birds/aves

August 17th, 2007

blackbird       melro

canary  canário

crow    corvo

cuckoo  cuco

dove    pomba

duck    pato

eagle   águia

falcon  falcão

flamingo        flamingo

goose   ganso

seagull gaivota

hawk    gavião

jay     gralha

mallard pato-real

ostrich avestruz

owl     coruja

parakeet        periquito

parrot  papagaio

pelican pelicano

penguin pinguim

pheasant        faisão

raven   corvo

rooster galo

sparrow pardal

stork   cegonha

swallow andorinha

swan    cisne

turkey  peru

vulture abutre

woodpecker      pica-pau

wren    carriça
no particular reason whatsoever, except that i tried to learn them.

guantes

July 27th, 2007

Eu: Com licença, será que vocês tem guantes?

Moça da loja: Um… o quê?

Eu: Guantes.

*Moça da lojafica me olhando confusa*

Eu: Sabe, aqueles negócios que cê coloca na mão quando está limpando…

Moça da loja: Será que você está falando de luvas? Como essas aqui?

YES, FRIENDS AND NEIGHBORS, IT’S TRUE! YOU CAN TRAVEL ALL OF SOUTH AMERICA SPEAKING NOTHING BUT PORTUNHOL!