<?xml version="1.0" encoding="utf-8"?>
<!-- generator="wordpress/2.0.5" -->
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>infundibulum</title>
	<link>http://ruphus.com/blog</link>
	<description>mostly stuff about language.</description>
	<pubDate>Fri, 15 Feb 2008 02:05:38 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.0.5</generator>
	<language>en</language>
			<item>
		<title>Dear Valentine&#8217;s day</title>
		<link>http://ruphus.com/blog/2008/02/14/dear-valentines-day/</link>
		<comments>http://ruphus.com/blog/2008/02/14/dear-valentines-day/#comments</comments>
		<pubDate>Fri, 15 Feb 2008 02:05:38 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Whatever</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2008/02/14/dear-valentines-day/</guid>
		<description><![CDATA[go away, kthx
]]></description>
			<content:encoded><![CDATA[<p>go away, kthx</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2008/02/14/dear-valentines-day/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Transliteration Project</title>
		<link>http://ruphus.com/blog/2007/11/09/transliteration-project/</link>
		<comments>http://ruphus.com/blog/2007/11/09/transliteration-project/#comments</comments>
		<pubDate>Sat, 10 Nov 2007 06:50:02 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Uncategorized</category>

		<category>Language</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/11/09/transliteration-project/</guid>
		<description><![CDATA[(Warning: I am way tired right now, but I wanted to get this down&#8230;)
I have long been interested in transliteration:
mundotype was my first stab at all that, and it kind of works, for a few languages. But let me tell you, building that transliteration map for Amharic was no walk in the park (and it [...]]]></description>
			<content:encoded><![CDATA[<p><small>(Warning: I am way tired right now, but I wanted to get this down&#8230;)</small></p>
<p>I have long been interested in transliteration:</p>
<p><a href="http://ruphus.com/mundotype/">mundotype</a> was my first stab at all that, and it kind of works, for a few languages. But let me tell you, building that transliteration map for Amharic was no walk in the park (and it was mostly thanks to my pals Daniel Yacob and Ephrem Menji that I got anywhere!)</p>
<p>My goal remains the same: I want to create (or see someone else create, whatevarr) a Javascript-based transliteration input system that covers a WHOLE BUNCH of languages. With a consistent, easy-to-understand format for writing and editing rules. </p>
<p>But even better than all that would be coming up with an automated way to <em>infer</em> the rules in the first place.</p>
<p>That&#8217;s what I&#8217;ve been playing with. </p>
<p>Eventually this should end up on my &#8220;serious&#8221; blog, <a title="Hacklog: Blogamundo" href="http://blogamundo.net/dev/">over at Blogamundo</a>, but I&#8217;ve become a little self-conscious about just rambling there ever since <a title="Planet I18n" href="http://www.w3.org/International/planet/">Planet I18n</a> came into being; I&#8217;d really rather post there when I have something that&#8217;s distributable.</p>
<p>It&#8217;s late right now but let me give you the 5-minute rundown of where I am:</p>
<p>What&#8217;s a transliteration?</p>
<p><a title="Transliteration - Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Transliteration">Ask Wikipedia</a>. For example: <em>Epictetus</em> is a transliteration of the Greek name <em>Επίκτητος</em> into the Roman alphabet. </p>
<p>I have some code that goes through <a title="Database dump progress" href="http://download.wikimedia.org/backup-index.html">Wikipedia dumps</a> and extracts all the <a title="Help:Interwiki linking - Meta" href="http://meta.wikimedia.org/wiki/Help:Interwiki_linking">interwiki links</a> and article titles, and spits out gigantic &#8220;lexicons.&#8221; </p>
<p>Here&#8217;s an example where I grepped out a Greek/English  lexicon (the original has a bazillion languages):</p>
<p><a href="http://ruphus.com/svn/translit/en2el.txt">http://ruphus.com/svn/translit/en2el.txt</a></p>
<p>Which has 2432 lines, with stuff like this:</p>
<p>Archaeology	Αρχαιολογία<br />
Austria	Αυστρία<br />
Australia	Αυστραλία<br />
ASCII	ASCII<br />
Africa	Αφρική</p>
<p>Now, some of these are &#8220;transliterations&#8221; and some are &#8220;translations&#8221; &#8212; and in the case of <em>ASCII</em> (oh, the irony), straight out borrowing in the original script. </p>
<p>(By the way, the definition between &#8220;translation&#8221; and &#8220;transliteration&#8221; is kind of blurry if you start thinking too hard&#8230; fortunately, I don&#8217;t.)</p>
<p>Having hacked thru the first few chapters of &#8220;Teach Yourself Greek,&#8221; I can surmise that the pairs <em>Austria/Αυστρία</em> and <em>Australia/Αυστραλία</em> look like &#8220;perfect&#8221; transliterations. </p>
<p>And by &#8220;perfect,&#8221; I mean: </p>
<ol>
<li>Each word in the pair is the same length</li>
<li>Each word in the pair has the same &#8220;letter pattern&#8221;</li>
</ol>
<p>(&#8221;Perfect&#8221; is just an arbitrary designation.)</p>
<p>It&#8217;s #2 that I&#8217;ve been thinking about, and getting some results with. It involves &#8220;patternizing&#8221; a word, and you do that like this:</p>
<p><strong>Replace each letter in the word with the numeric index of the <em>first</em> occurrence of the letter in the word.</strong> </p>
<p>Examples:</p>
<p>cat  →  012<br />
asia  →  0120<br />
Ασία  →  0123<br />
Ωκεανός  →  0123456<br />
Βιόσφαιρα  →  012345175</p>
<p>Get it?</p>
<p>Interestingly, this simple trick is very good at helping to find transliterated words. All I do is go through the word pairs in that file at the top, and check to see if both words produce the same pattern. </p>
<p>Check out some results:</p>
<p><a href="http://ruphus.com/svn/translit/matches-en2el.txt">http://ruphus.com/svn/translit/matches-en2el.txt</a></p>
<p>Croatia<br />
0123453<br />
Κροατία</p>
<p>Cyclades<br />
01234567<br />
Κυκλάδες</p>
<p>Dance<br />
01234<br />
Χορός</p>
<p>Kilo<br />
0123<br />
Κιλό</p>
<p>Keflavík<br />
01234567<br />
Κεφλαβίκ</p>
<p>Methanol<br />
01234567<br />
Μεθανόλη</p>
<p>Montreal<br />
01234567<br />
Μόντρεαλ</p>
<p>For one thing, there are some mistakes. &#8220;Χορός&#8221; is no transliteration of &#8220;Dance,&#8221; it&#8217;s a translation. But mostly transliterated things come up &#8212; notice all the place and personal names?</p>
<p>So from there, I zip up these pairs of words into pairs of letters, like this:</p>
<p>T Τ<br />
r ρ<br />
o ό<br />
f φ<br />
a α</p>
<p>And </p>
<p>K Κ<br />
e ε<br />
f φ<br />
l λ<br />
a α<br />
v β<br />
í ί<br />
k κ</p>
<p>Rinse and repeat for every pair in the list, do a bit of frequency-based manipulatin&#8217;, and you get something that looks like this:</p>
<p><a href="http://ruphus.com/svn/translit/schema-en2el.txt">http://ruphus.com/svn/translit/schema-en2el.txt</a></p>
<p>Which is incomplete and imperfect, but pretty damn good for <em>zero</em> linguistic knowledge before hand, aside from the lexicon.</p>
<p>More soon. </p>
<p>(digraphs are a thorny problem, for one thing&#8230;)
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/11/09/transliteration-project/feed/</wfw:commentRss>
		</item>
		<item>
		<title>i was thinking about transliteration</title>
		<link>http://ruphus.com/blog/2007/11/07/i-was-thinking-about-transliteration/</link>
		<comments>http://ruphus.com/blog/2007/11/07/i-was-thinking-about-transliteration/#comments</comments>
		<pubDate>Wed, 07 Nov 2007 23:30:46 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Translation</category>

		<category>Language</category>

		<category>Python</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/11/07/i-was-thinking-about-transliteration/</guid>
		<description><![CDATA[again
and i wrote 30 lines of python about it.
$svn co http://ruphus.com/svn/translit/
if you are bored and or curious.

]]></description>
			<content:encoded><![CDATA[<p>again</p>
<p>and i wrote 30 lines of python about it.</p>
<p><code>$svn co http://ruphus.com/svn/translit/</code></p>
<p>if you are bored and or curious.
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/11/07/i-was-thinking-about-transliteration/feed/</wfw:commentRss>
		</item>
		<item>
		<title>jQuery junk</title>
		<link>http://ruphus.com/blog/2007/10/30/jquery-junk/</link>
		<comments>http://ruphus.com/blog/2007/10/30/jquery-junk/#comments</comments>
		<pubDate>Wed, 31 Oct 2007 02:38:03 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Javascript</category>

		<category>jQuery</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/10/30/jquery-junk/</guid>
		<description><![CDATA[I put up a bunch of junk in a directory with jQuery stuff. It&#8217;s largely broken experiments. JUST FOR YOU. jQuery

]]></description>
			<content:encoded><![CDATA[<p>I put up a bunch of junk in a directory with <a href="http://ruphus.com/blog/http:/jquery.com">jQuery</a> stuff. It&#8217;s largely broken experiments. JUST FOR YOU. <a href="http://ruphus.com/code/jquery/">jQuery</a>
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/10/30/jquery-junk/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Facebook Groups</title>
		<link>http://ruphus.com/blog/2007/10/25/facebook-groups/</link>
		<comments>http://ruphus.com/blog/2007/10/25/facebook-groups/#comments</comments>
		<pubDate>Fri, 26 Oct 2007 07:48:05 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Uncategorized</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/10/25/facebook-groups/</guid>
		<description><![CDATA[Back in gringolândia, I guess I&#8217;ll start speaking gringuês again. Man, I miss Brazil.
Tonight I went to Starbucks, where I was reading a book. I had a few conversations. But it&#8217;s sort of weird trying to start conversations with random people. Especially if they&#8217;re all face down in their laptops (and lattes).
Thing is, though, other [...]]]></description>
			<content:encoded><![CDATA[<p><em>Back in gringolândia, I guess I&#8217;ll start speaking gringuês again. Man, I miss Brazil.</em></p>
<p>Tonight I went to Starbucks, where I was reading <a title="Joe Torre and the psychology of persuasion - Boing Boing" href="http://www.boingboing.net/2007/10/22/joe-torre-and-the-ps.html">a book</a>. I had a few conversations. But it&#8217;s sort of weird trying to start conversations with random people. Especially if they&#8217;re all face down in their laptops (and lattes).</p>
<p>Thing is, though, other people have to be thinking the same thing&#8211;&#8221;Why am I so damn popular on Facebook  but have no one to talk to at Starbucks??&#8221;</p>
<p>Or something.</p>
<p>You know how Facebook groups mostly suck? They&#8217;re just like Orkut groups. Or Friendster groups. People go there to be identified, and then they&#8217;re like&#8230; uh, what now? Because being off-topic seems pretty retarded in a group that&#8217;s defined by having a ridiculously specific topic.</p>
<p>It makes a lot more sense to &#8220;be identified&#8221; with reference to a place that&#8217;s&#8230; you know&#8230; social.</p>
<p>What I&#8217;m getting at is, when I got home, I wished there was a Facebook group (or something) for that one particular Starbucks.</p>
<p>(Okay, mainly so I could have the huevos to message up that one girl with the German accent.)</p>
<p>Does anyone know what I&#8217;m trying to say? Why doesn&#8217;t every <em>place </em>in the world have a <em>place</em> online, that everyone knows about?
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/10/25/facebook-groups/feed/</wfw:commentRss>
		</item>
		<item>
		<title>Não, é sério</title>
		<link>http://ruphus.com/blog/2007/09/18/nao-e-serio/</link>
		<comments>http://ruphus.com/blog/2007/09/18/nao-e-serio/#comments</comments>
		<pubDate>Tue, 18 Sep 2007 19:07:06 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Heh</category>

		<category>Gringo no Brasil</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/09/18/nao-e-serio/</guid>
		<description><![CDATA[Antes de ler isso, tem que imaginar um tio de 92 anos. 
Eu: O Tio, sabe que eu li no jornal hoje? Um meteoro caiu no Peru.
Tio: Coitado do peru.

]]></description>
			<content:encoded><![CDATA[<p><em>Antes de ler isso, tem que imaginar um tio de 92 anos. </em><br />
<strong>Eu: </strong>O Tio, sabe que eu li no jornal hoje? Um meteoro caiu no Peru.</p>
<p><strong>Tio: </strong>Coitado do peru.
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/09/18/nao-e-serio/feed/</wfw:commentRss>
		</item>
		<item>
		<title>pretty much</title>
		<link>http://ruphus.com/blog/2007/08/26/pretty-much/</link>
		<comments>http://ruphus.com/blog/2007/08/26/pretty-much/#comments</comments>
		<pubDate>Mon, 27 Aug 2007 06:56:40 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Whatever</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/08/26/pretty-much/</guid>
		<description><![CDATA[After the Sept. 11, 2001, attacks, Breyer said: &#8220;I began to see that the true division of importance in the world is not between different countries. The important division is between those who are committed to reason, to working out things, to understanding other people, to peaceful resolution of their differences &#8230; and those who [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Breyer Says Last Term Was Difficult" href="http://www.wtopnews.com/?nid=116&#038;sid=1216224">After the Sept. 11, 2001, attacks, Breyer said: &#8220;I began to see that the true division of importance in the world is not between different countries. The important division is between those who are committed to reason, to working out things, to understanding other people, to peaceful resolution of their differences &#8230; and those who don&#8217;t think that.&#8221;  </a>
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/08/26/pretty-much/feed/</wfw:commentRss>
		</item>
		<item>
		<title>GTD can kiss my ass: FTDB</title>
		<link>http://ruphus.com/blog/2007/08/19/gtd-can-kiss-my-ass-ftdb/</link>
		<comments>http://ruphus.com/blog/2007/08/19/gtd-can-kiss-my-ass-ftdb/#comments</comments>
		<pubDate>Mon, 20 Aug 2007 01:49:17 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Heh</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/08/19/gtd-can-kiss-my-ass-ftdb/</guid>
		<description><![CDATA[Yes children, GTD is waaay too complicated for yours truly.
Don&#8217;t these people understand that they are dealing with someone whose attention span has been shortened by the internet to slightly shorter than that of a gnat?
No no, no &#8220;steps&#8221; for me. No projects, no categories, no classification whatsoever.
No mahogany drawers that make nice encouraging snappy [...]]]></description>
			<content:encoded><![CDATA[<p>Yes children, GTD is waaay too complicated for yours truly.</p>
<p>Don&#8217;t these people understand that they are dealing with someone whose attention span has been shortened by the internet to slightly shorter than that of a gnat?</p>
<p>No no, no &#8220;steps&#8221; for me. No projects, no categories, no classification whatsoever.</p>
<p>No mahogany drawers that make nice encouraging snappy noises when you close them.</p>
<p>No fountain pens.</p>
<p>No paper.</p>
<p>As a matter of fact, no saving.</p>
<p>As a matter of fact, barely any justification for writing whatsoever.</p>
<p>Enforced throw-away-ness.</p>
<p>The ultimate in lack of self-respect.</p>
<p>Harnessing self-deprecation for better hallway vision!</p>
<p>I give you, <a title="FTDB" href="http://ruphus.com/ftdb/">FIVE THINGS DONE BITCH</a>.</p>
<p>When you look at it, and think, &#8220;wait, it has no features&#8230;&#8221;</p>
<p>&#8230;That&#8217;s the point.<br />
(And when you think, &#8220;it&#8217;s very rude,&#8221;  that&#8217;s me, talking to myself.)</p>
<p>I swear, it works.
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/08/19/gtd-can-kiss-my-ass-ftdb/feed/</wfw:commentRss>
		</item>
		<item>
		<title>birds/aves</title>
		<link>http://ruphus.com/blog/2007/08/17/birdsaves/</link>
		<comments>http://ruphus.com/blog/2007/08/17/birdsaves/#comments</comments>
		<pubDate>Sat, 18 Aug 2007 04:36:05 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Translation</category>

		<category>Gringo no Brasil</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/08/17/birdsaves/</guid>
		<description><![CDATA[blackbird       melro
canary  canário
crow    corvo
cuckoo  cuco
dove    pomba
duck    pato
eagle   águia
falcon  falcão
flamingo        flamingo
goose   ganso
seagull gaivota
hawk    gavião
jay     gralha
mallard pato-real
ostrich avestruz
owl     coruja
parakeet        periquito
parrot  papagaio
pelican pelicano
penguin pinguim
pheasant        faisão
raven   corvo
rooster galo
sparrow pardal
stork   cegonha
swallow andorinha
swan    cisne
turkey  peru
vulture abutre
woodpecker      pica-pau
wren    carriça
no particular reason whatsoever, except that i tried to learn them.

]]></description>
			<content:encoded><![CDATA[<p>blackbird       melro</p>
<p>canary  canário</p>
<p>crow    corvo</p>
<p>cuckoo  cuco</p>
<p>dove    pomba</p>
<p>duck    pato</p>
<p>eagle   águia</p>
<p>falcon  falcão</p>
<p>flamingo        flamingo</p>
<p>goose   ganso</p>
<p>seagull gaivota</p>
<p>hawk    gavião</p>
<p>jay     gralha</p>
<p>mallard pato-real</p>
<p>ostrich avestruz</p>
<p>owl     coruja</p>
<p>parakeet        periquito</p>
<p>parrot  papagaio</p>
<p>pelican pelicano</p>
<p>penguin pinguim</p>
<p>pheasant        faisão</p>
<p>raven   corvo</p>
<p>rooster galo</p>
<p>sparrow pardal</p>
<p>stork   cegonha</p>
<p>swallow andorinha</p>
<p>swan    cisne</p>
<p>turkey  peru</p>
<p>vulture abutre</p>
<p>woodpecker      pica-pau</p>
<p>wren    carriça<br />
<em>no particular reason whatsoever, except that i tried to learn them.</em>
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/08/17/birdsaves/feed/</wfw:commentRss>
		</item>
		<item>
		<title>guantes</title>
		<link>http://ruphus.com/blog/2007/07/27/guantes/</link>
		<comments>http://ruphus.com/blog/2007/07/27/guantes/#comments</comments>
		<pubDate>Fri, 27 Jul 2007 21:36:48 +0000</pubDate>
		<dc:creator>pat</dc:creator>
		
		<category>Language</category>

		<category>Heh</category>

		<guid isPermaLink="false">http://ruphus.com/blog/2007/07/27/guantes/</guid>
		<description><![CDATA[Eu: Com licença, será que vocês tem       guantes?
Moça da loja: Um&#8230; o quê?
Eu: Guantes.
*Moça da lojafica me olhando confusa*
Eu: Sabe, aqueles negócios que cê coloca na       mão quando está limpando&#8230;
Moça da loja: Será que você está falando de      [...]]]></description>
			<content:encoded><![CDATA[<p><strong>Eu:</strong> Com licença, será que vocês tem       guantes?</p>
<p><strong>Moça da loja:</strong> Um&#8230; o <em>quê?</em></p>
<p><strong>Eu:</strong> Guantes.</p>
<p><strong>*Moça da loja</strong>fica me olhando confusa*</p>
<p><strong>Eu:</strong> Sabe, aqueles negócios que cê coloca na       mão quando está limpando&#8230;</p>
<p><strong>Moça da loja</strong>: Será que você está falando de       luvas? Como essas aqui?</p>
<p>YES, FRIENDS AND NEIGHBORS, IT&#8217;S TRUE! YOU CAN TRAVEL ALL OF SOUTH AMERICA SPEAKING NOTHING BUT PORTUNHOL!
</p>
]]></content:encoded>
			<wfw:commentRss>http://ruphus.com/blog/2007/07/27/guantes/feed/</wfw:commentRss>
		</item>
	</channel>
</rss>
