i was thinking about transliteration
November 7th, 2007again
and i wrote 30 lines of python about it.
$svn co http://ruphus.com/svn/translit/
if you are bored and or curious.
again
and i wrote 30 lines of python about it.
$svn co http://ruphus.com/svn/translit/
if you are bored and or curious.
It suddenly occurred to me today that I really don’t know what the right way to install Python packages is, any more.
This state of affairs drives me inSANE.
I’m just a run of the mill guy who uses Python for lots of stuff. And increasingly, I find that I just don’t bother to figure out how to install things that I might like to try, because I really just don’t feel like taking the time to install yet another installation system.
Maybe I sound like a whiney bastard or something, I don’t know. Maybe there’s something utterly heinous about $ python setup.py install that my little brain just doesn’t get, I dunno.
What I DO know is that I now officially loathe installing Python stuff. That means that I’m installing less Python stuff. Which means that I’m increasingly looking for Ruby stuff.
Which bums me out, because personally I like Python way more than Ruby.
Sometimes you’re programming, and something spooky happens:
>>> os.listdir('.')
You see the face? OMG!
Interesting talk on statistics (no, really) from TED:
Peter Donnelly gives some evidence that people are bad at estimating probabilities. One example he gives is the following:
Given a fair coin, how many times do you need to flip the coin to produce the pattern
HTH? How many times to produceHTT?
Seems like they should be equal, right?
They’re not.
from random import choice
"""
I watched this guy's talk:
http://tedblog.typepad.com/tedblog/statistics/index.html
And didn't believe him.
Now I believe him.
"""
def avg(seq): return float(sum(seq)) / len(seq)
def game(pattern):
"""count number of flips required to produce pattern"""
s = ''
i = 0
while not s.endswith(pattern):
i += 1
s += choice(['h', 't'])
return i
def tournament(pattern, numgames):
"""average many games"""
return pattern, avg([game(pattern) for i in range(numgames)])
for pattern in ['hth', 'htt']: print tournament(pattern, 100000)
Here’s the output:
$ python coins.py
('hth', 10.009219999999999)
('htt', 7.99892)
The explanation in the video is pretty good, but it still makes my eyes cross because for some reason I want ‘htt’ to be less common than ‘hth’, which is exactly wrong.
I mean, a way to explain them in terms of a real, meaningful disinction that is actually applicable while you’re coding. Rather that the operational sorts of explanations that seem to float around, like “You use them when you’re sorting a dictionary.”
Read all about it:
James Tauber : Python Tuples are Not Just Constant Lists
By the way, James Tauber writes killer Python, and if you’re looking for some stuff to study I suggest his work. This post, in particular, is brain-bendy.
Something I’ve never understood about “beginner” tutorials in programming:
LiveWires 2005 : Python for the Inexperienced
This is a really nice tutorial, with interesting examples. But check out this bit:
Ranges
As you can see, Python’s loops are a bit different from BASIC’s and C’s. (And from those in most other languages, too. Lisp has something similar.) Instead of giving a range of values (as in BASIC), or a recipe for getting from each value to the next one (as in C), you give a list of values. Obviously this is more flexible; but what if you want to get the same effect as BASIC’s FOR loops?
Well, no, actually, a beginner never has heard of BASIC. Or (!) C. Or Lisp.
The Python documentation on Python.org is filled with stuff like this. Consider the first sentence in the “tutorial” “introduction” to Classes:
Python’s class mechanism adds classes to the language with a minimum of new syntax and semantics. It is a mixture of the class mechanisms found in C++ and Modula-3.
I mean, whatever, I guess there are plenty of people out there who are already programmers who will appreciate info like that.
About Modula-3.
In an introduction to Python.
But there an awful lot more people who are not, and who won’t.
Python seems to me to be the best choice for teaching programming, but outside of an O’Reilly book or something, I don’t think there’s much well-and-truly “for beginners” tutorial stuff around at all.
Not that I’m saying such stuff is easy to write — on the contrary, it’s mindbogglingly difficult. And I’ve tried.
Which is why I’m happy to discover an awesome wiki started by Frederik Lundh:
((An Unofficial) Python Tutorial Wiki)
Here’s the intro to Classes, not a Modula in sight:
Classes (introduction) ((An Unofficial) Python Tutorial Wiki)
Rad.
Boing Boing: Nontransitive dice — how to win every time
This is sooo counterintuitive that I had to convince myself by writing a program (in Python, as it happens):
from random import choice
import sys
verbose = 1
dice = {
'A' : [0,0,4,4,4,4],
'B' : [3,3,3,3,3,3],
'C' : [6,6,2,2,2,2],
'D' : [5,5,5,1,1,1]
}
trick = {
'B' : 'A',
'C' : 'B',
'D' : 'C',
'A' : 'D'
}
scoreboard = {
'matches' : 0,
'sucker' : 0,
'con' : 0
}
def match():
scoreboard['matches'] += 1
sucker = choice(dice.keys())
sucker_roll = choice(dice[sucker])
if verbose: print 'sucker rolls %d on %s' % (sucker_roll,sucker)
con = trick[sucker]
con_roll = choice(dice[con])
sucker_roll = choice(dice[sucker])
if verbose: print 'con rolls %d on %s' % (con_roll,con)
if sucker_roll > con_roll:
scoreboard['sucker'] += 1
else:
scoreboard['con'] += 1
def score():
print 'nnTotal:'
print "sucker wins: ",
print "%.2f%% of matches" % (scoreboard['sucker'] / float(scoreboard['matches']))
print "con wins: ",
print "%.2f%% of matches" % (scoreboard['con'] / float(scoreboard['matches']))
for i in range(int(sys.argv[1])):
match()
score()
Lo and behold:
$ python ntdice.py 1000 sucker rolls 5 on D con rolls 6 on C sucker rolls 2 on C con rolls 3 on B sucker rolls 5 on D con rolls 2 on C sucker rolls 1 on D con rolls 6 on C ... sucker rolls 1 on D con rolls 2 on C sucker rolls 3 on B con rolls 0 on A sucker rolls 2 on C con rolls 3 on B Total: sucker wins: 0.33% of matches con wins: 0.67% of matches
I suppose proving it to myself would involve some sigmas.
Python is my favorite language.
But I have an axe to grind, here.
from xml.sax import ContentHandler
class MyHandler(ContentHandler):
def __init__(self):
This is the typical beginning to using Python’s standard xml.sax module, which implements a SAX parser.
But that’s sort of irrelvant. What I think is confusing is the class syntax:
class MyHandler(ContentHandler):
Call me crazy, but in that line ContentHandler looks like a parameter. Everywhere else in Python, and indeed in most languages on the planet, parens mean “this is a parameter.” It doesn’t mean that at all. It means “MyHandler is a kind of ContentHandler.” So if you want to compose a variation on the theme of ContentHandlers, you write MyHandler(ContentHandler).
As far as how the thing is actually called, it’s SIMPLE: you just look in the __init__ method. Because you see, the __init__ method is sort of the constructor. Not really, but sort of. (There is also a __new__ method, and I really haven’t figured out what the hell that does.)
So when you read as far as:
class MyHandler(ContentHandler):
def __init__
You have to think “to see the way that my ContentHandler will be called, I have to look at what comes after the __init__.” Which is back in the realm of normal, right? Our old friends, the parentheses, who have changed their wayward ways and now really do mean something like “here comes a list of parameters.”
Except, not really.
Well okay yeah really, but the thing is, the first parameter isn’t something that gets passed in when you do the __init__ dance… it refers to “the thing itself.”
It’s “self.” (You can call it anything you want, actually, but you shouldn’t, because to call it anything else would be, you know, confusing.) So the list of parameters there is really only parameter-y from the second argument on. Assuming you have more than one parameter. If there is just self, well… it’s like the self isn’t really there. But you have to have it.
Plain as day.
class MyHandler(ContentHandler):
def __init__(self):
Right, so, let’s review.
MyHandler is a kind of ContentHandler, even though it looks just like a function, and ContentHandler were a parameter. You must banish that intuition because it is wrong.
And then to instantiate the class, you need to have an __init__ (actually, come to think of it, I think I heard somewhere that __init__ is optional, did I mention that? Although people don’t actually seem to leave them out, well… ever). And actually when you initialize the thing, the init isn’t part of the call.
And please, people, two underlines, people, both sides. Kthx.
And you have to unbanish that bit about parameters, because now parentheses mean parameters again.
Except that the first parameter is not a parameter, it is self.
Behold, the self.
Behold, me kicking my SAX parser in the genitals.
Yes, I know that any programmer who is halfway decent will quickly get used to such details and move on with their lives. But I defy anyone to claim that they guessed that this is how all this ever-so-deceptively-simple-looking syntax worked when they first encountered it.
And besides, what good’s a weblog if you can’t whine in it.
Weird thing about Javascript arrays:
d = [['a','1'],['b','2'],['c','3']]
for (i in d) { alert(i) }
Will pop up 0,1, and 2. That is, iterating through something returns the index of each of the elements of the array. If you want the elements themselves, you have to dereference them:
for (i in d) { alert(d[i]) }
This is nutty, as far as I can tell. I don’t know of any other programming language that does that. Why would you want to loop through the indexes of an array as the default? Python’s approach for example is what seems normal to me:
>>> d = [['a','1'],['b','2'],['c','3']]
>>> for q in d:
... print q
...
['a', '1']
['b', '2']
['c', '3']
If you want the indexes, of course, you can get them like this:
>>> for i in range(len(d)):
... print i
...
0
1
2
That is all.
Mark Liberman at Language Log has posted another Language quiz… I love these things: the idea is that he posts an audio clip of a random language, and you pull out all your linguistic stops trying to figure out what it is.
Warning… spoilers ensue.
Or at least, if my guess turns out to be correct, then a spoiler ensues. Otherwise it’s just me babbling nonsensically.
Here’s the (very) loose transcription I came up with:
dem angong LIti shin ti mutani tere su MUtu a su ku la tin doko U SIN ji kata loko tin de SORURISE ke harbitz kinta RO masa zanga zangar.
(I’m a firm believer in using the simplest transcription conceivable when one starts transcribing an unknown language—jumping into using exotic IPA detracts from the goal at hand, at least in my experience.)
I used capitalization to indicate what sounded like tonal variation to me. As soon as I listened to the recording I suspected it was an African language, where tonal languages are rampant, but I couldn’t really tell you why. My first guess was “something Bantu” but now I think that was the wrong family.
The first bit that caught my ear was something like zanga zangar, which is spoken quite clearly. It struck me as pretty hard to transcribe incorrectly (assuming that the language was written in a Roman-letter alphabet, as many African languages are). Furthermore it seemed likely that there was some sort of, um, what’s the technical word, “process” going on there—it looked like reduplication.
So I just stuck it into Google as a phrase, like this.
Golly, that wasn’t too hard: sure looks like Hausa.
So how can we try to verify this theory… well, usually I build a corpus of the language and question and start, uh, poking at it. I hacked together some code to build a corpus using the Yahoo search API, which is quite easy to use. (I’ll post it if anyone asks.)
So anyway, after looking at bigrams and such in the corpus I noticed that zanga zangar is preceded occasionally by masu, and sure enough, another Google search turns up 33 results with the string masu zanga zangar. I’d originally transcribed it as masa.
But whatever, it’s late, you know?