infundibulum

wow

December 7th, 2006

HD 188753 Ab - Wikipedia, the free encyclopedia

Heads and Tails

November 30th, 2006

Interesting talk on statistics (no, really) from TED:

TED Blog: Statistics

Peter Donnelly gives some evidence that people are bad at estimating probabilities. One example he gives is the following:

Given a fair coin, how many times do you need to flip the coin to produce the pattern HTH? How many times to produce HTT?

Seems like they should be equal, right?

They’re not.

from random import choice

"""
I watched this guy's talk:
http://tedblog.typepad.com/tedblog/statistics/index.html
And didn't believe him.
Now I believe him.
"""

def avg(seq): return float(sum(seq)) / len(seq)

def game(pattern):
"""count number of flips required to produce pattern"""
s = ''
i = 0
while not s.endswith(pattern):
i += 1
s += choice(['h', 't'])
return i

def tournament(pattern, numgames):
"""average many games"""
return pattern, avg([game(pattern) for i in range(numgames)])

for pattern in ['hth', 'htt']: print tournament(pattern, 100000)

Here’s the output:

$ python coins.py
('hth', 10.009219999999999)
('htt', 7.99892)

The explanation in the video is pretty good, but it still makes my eyes cross because for some reason I want ‘htt’ to be less common than ‘hth’, which is exactly wrong.

Two columns of text is not for teh intartubes. No, rly.

November 22nd, 2006

Look!

A brainy article about brainy stuff!

Put into a design made by morons!

Genetic breakthrough that reveals the differences between humans

Seriously, somebody please tell me how it makes sense to force readers on the internet to scroll back up to see a second column of text?

It’s bonkers.

Even the International Herald Tribune has switched to a single column format, after long using the weird three-column thing as a default (which at least doesn’t force you to needlessly scroll).

Emily Oster on death, etc.

November 21st, 2006

The Freakonomics Blog is always worth a read. I’m not really qualified to put their claims into a broad enough context to decide to believe them, but they sure as hell make you think.

A recent post on economist Emily Oster is a good example of the stories they tend to pick up.

Here are her main ideas:

Three Things You Don’t Know About Aids In Africa

  1. It’s the wrong disease to attack.
  2. It won’t disappear until poverty does.
  3. There is less of it than we thought, but it’s spreading as fast as ever.

I find the second reason to be the most compelling. Oster argues that different reactions to dealing with AIDs in Africa as compared to those in places such as the US are a function of how people in Africa estimate the quality of their future lives (with or without AIDs). In other words, if you you think your life is going to suck anyway, you just can’t work up the motivation to take care of yourself enough to prolong that life.

It’s a harsh assessment, but it seems to be that it’s not all remote from my own experience. Think about young people and drugs: why did people mostly think that Nancy Reagan’s “Just Say No” campaign was so laughable?

Take the typical example of someone who’s strung out: they’re depressed. Their life sucks. They can imagine no way that it could improve. So, when someone in a frilly outfit comes along and says “just say no,” the rational replay is to say “what the fuck should I do that for?”

Of course, this is just me speculating, and I’m no economist. Good thing people like the Freakonomics twins and Oster are out there pushing buttons.

Things Lakota/Dakota/Sioux. And copyright.

November 14th, 2006

I spent a few hours tonight poking around in the American University library tonight, and as usual I headed for the “P” section… “PM,” as it happened.

That would be languages… Hyperborean, Indian and artificial languages, according to the ever-aleatoric Library of Congress classification. (Ugh, Shirky was right; ontology is overrated.)

The one I ended up reading was Dakota Grammar: With Texts and Ethnography. I didn’t dig too deeply but it looked like a nice, competent, descriptive piece of work. There a text of a related language (Omaha) at Project Gutenberg with the interlinear text and everything, from an edition recorded by the same anthropologist: Illustration Of The Method Of Recording Indian Languages by James Owen Dorsey.

Now, here’s an honest question, one to which I don’t know the answer: that book is listed as having been published in 2004. But it was actually first published by the Government Printing Office in 1893. Now, doesn’t that mean that the book is in the public domain? Could I go and scan the whole thing and put it on the web, or would that be (by some reasoning unbeknownst to me) a violation of the Minnesota Historical Society edition?

Also interesting: Tampa, Follow the Stories: Lakota Dictionary

Physics for President

November 9th, 2006

Via the always-worth-a-read Open the Future: PffP syllabus, a whole course online. Good stuff.

Jamais Cascio suggests that the author, who wrote a good course, after all, might have benefitted from something along the lines of Googling for Future Physics Professors.

I’d humbly add a suggestion for Web Design for Future Physics Professors Because Let’s Just Go Ahead and Admit It, Nobody Should have to be Seen in Public inside a Frameset Anymore.

Pick nits? Who, me?

The Problem Is, Serendipity Works

October 18th, 2006

There are a million people in the world who want to tell you how to act. What the principles of effective life are, and crap like that.

Case in point.

The real work is happening in your brain and practically every other place that’s not an inbox. Stop allowing yourself to be brow-beaten by the latest, loudest, or most dramatic item that’s landed in your world.

The problem is, this is patently not true.

Randomly wandering around the internet, nay, pointlessly, obsessively, addictively wandering around the internet is productive. People who think that they will make themselves more efficient by not wandering around pointlessly on the internet are kidding themselves. People have an amazing ability to sort signal from noise.

But the thing is, the more noise there is, the more signal you get.

This is what the “efficiency” crowd doesn’t want to admit, because it means that their systems aren’t more productive than obsessive wandering and clicking through toplists.

A few hours ago (I really don’t even know what time it is), I screwing around with some CSS to render parallel text—basically I was looking for good ways to mark up a source text and its translation with HTML and CSS.

In the process, I started randomly sticking in sample texts from the first article of the Universal Declaration of Human Rights in a bazillion languages.

One of those languages was Thai, and I saw that the Thai text wasn’t wrapping correctly.

Ah, that old problem. Thai doesn’t use spaces (well, it does, but… erm… I don’t exactly understand when and why), so browsers don’t know where to break long strings of text. (They usually rely on spaces.)

This was not too far from my mind, because a few nights ago, while randomly looking through the web pages of NLP courses, I found this one at Stanford, which had a really interesting paper on Mostly-Unsupervised Statistical Segmentation of Japanese Kanji Sequences (pdf). The problem is similar in Japanese, of course, so my experience with miswrapped Thai immediately made me wonder whether the (very successful) technique from that paper could be ported to Thai.

But before I started looking into that (probably by trying to implement the paper’s algorithm for Japanese, to start with), I figured I would… wander around aimlessly a bit more googling for anything related to text wrapping and Thai.

So I started thinking of terms to lookup. One thing that popped into my mind was the name of a guy who goes by “bact” on Wikipedia. So, totally randomly, I googled: thai bact.

Look at the first hit:

Thai Words Separator :: Mozilla Add-ons :: Add Features to Mozilla …1
Thai Words Separator is an extension to fit thai words in webpage layout without … This implementation developed from bact’ (http://bact.blogspot.com/) …

And a few clicks away from that, Bact’s public domain ThaiWrap bookmarklet.

That little piece of code has some a very original and useful approach to solving my CSS text-wrapping problem. But that’s not all, it’s another piece of the puzzle that could play a role in the much more critical problem of probabilistically splitting Thai (and Japanese, and Khmer, and…) text into words.

That’s a serious problem for Blogamundo, a real problem for which we have to find a solution, or at least, an approach.

And I got closer to a solution by just wandering around aimlessly.

Yes, I stayed up all night. Yes, I ate half a box of Triscuits.

And you know what? It was pretty damn productive.

Bye Tower

October 14th, 2006

So Tower Records is closing.

You know, I’m surprised how bummed out that makes me. I have gone to Tower for years. that one and that one and that one and that one.

It’s sort of stupid to feel nostalgic about a retail chain. Retail chains don’t feel nostalgic about you. (Or your privacy — it always bugged the hell out of me when cashiers at Tower would ask for my zip code. I’d say “90210.” They didn’t like that.)

And yet, the fact that I can remember these places means that they have been part of my life like any other, I guess. And while Tower was a huge, corporate store peddling stuff sold by record companies that (on the whole) can only be described as clueless about recent changes in the way music is distributed, and terribly disrespectful of their customers… even then it makes me sad that Tower is going away.

I thought about it a little, and I think I put my finger on it.

Tower Records was the one holdout in suburbia that could make you think that somebody, somewhere, was still willing to invest in the idea that being a little unusual was alright.

They gave jobs to the pierced kids, the punks, the goths, and the classical wonks too. In fact, they gave the classical wonks a whole room of their own.

And, if you dug around in their book sections, you’d find stuff that you will not find at Barnes and Noble or Borders. Subversive stuff. Some stuff that I had zero interest in reading. But at least it was there, you know? At least you could still buy that stuff in suburbia. It existed in the brick and mortar world, and there was no massive catastrophe as a result of it.

As a business entity, I really won’t miss it. They didn’t figure out how to live in the age of the long tail. They never clued in to the iPod world, they never got creative, and that’s the way it goes.

But tonight I was talking to one of those hipster chicks that work there. You know her, the cute one with the slightly blue hair that you joke around with and wish you could keep talking to, but hey, transaction completed? The one who’s cooler than you. That one.

We were joking about what would be left at the bitter end — miles of NKOTB remixes? Would they turn it into a Costco-sized McDonalds? “Tower Records is the only place open at midnight for miles.” She said, “yeah.”

Then she told me not to say any more about it, it was depressing.

And it hit me: even someone who really values the idea that they’re outside of the system, and they say “screw you, system, my hair is blue and I’m putting tattoos on my eyelids!”…

They’re still stranded in suburbia too. And at quarter to midnight, where are they going to hang out now?

I have this icky feeling that a fundie somewhere is snickering.

Back to random topics.

October 10th, 2006

Okay, the language news thing was fun for a while. But, it started to feel like work (thus I slowed down precipitously), so I’m going to go back to blogging about whatever’s on my mind.

You know, the first blog I ever had was called “A blog with a name that keeps changing.”

I really cannot fathom how anyone can go on writing about something with a strict range of topics over a period of years. Yes, I have a short attention span.

My policy is to embrace that.

Oh, I started another blog: rivaldo, sai desse lago!

Note to self: Stop being lazy

July 16th, 2006

You know, there’s this programmer saying that goes “lazy is good.”

For instance, if you understand how to use libraries, and you use them, then your code will be shorter and easier to understand and maintain.

I’ve discovered repeatedly that that is in fact true, as far as it goes.

But it’s directly opposed to the other programming maxim: RTFM

The discovery, for me, is that in point of fact RTFM’ing is in fact easier in the long run, often.

(And before I describe this, I should point out that I really don’t care who thinks I’m clueless about what. I only care about what I think I’m clueless about. Because if I think I am clueless about something, well, by the demiurge, I am.)

Case in point:

When you go digging around for tutorials on how web servers work, you eventually run into this thing called “CGI.” Which you later learn means “Common Gateway Interface.” Which fact you dutifully attempt to absorb:

Oh ho ho, yes, old chum, why, that’s the Common Gateway Interface! It’s so… why, it’s Common!

And then, you go and you bang your head against whatever tutorials you can find.

But see, you still haven’t RTFM’d. Because you still don’t know how a web server works, because you don’t know how HTTP works.

Now this can be a slippery slope, because you don’t know how TCP works, and networking, and hardware, and, and, and, well, and sooner or later you hit Heisenberg, right? (Then there is no F’ing M.)

Eheh.

Like, today for some god forsaken reason I tried firing up this weird little builtin Python module called CGIHTTPServer. Which is kind of surprising, really — it’s a web server that you can start up from your command line, (if you have Python), like this:

$ python -c 'import CGIHTTPServer; CGIHTTPServer.test()'

Whammo, web server. You open up http://localhost:8000, and there’s a web server, which will actually run CGI scripts, which you put in a directory called cgi-bin, in the same place where you run that command. And you can put Python cgi scripts in there.

And it spits out log messages and everything.

It’s nutty, I tell you.

Now I mean, I’ve actually “known” how all this HTTP/CGI/blah blah works for a long time. But there’s something about finding a minimal arrangement that generates much grokkery:

Oh, that’s why there has to be a blank line after that Content-type: text/html; charset=utf-8\n thing… it really does separate the headers from the body… Oh yeah right, the difference between a GET and a POST is just obvious, you see the URL in the logs on a GET

And so on.

I think the programmers who are naturals are people who can look at some new problem, and they can say “okay, I know what I need to accomplish, I think I need to R this much of TFM in order to get to the point where I can start being lazy…”

In other news I’m a complete and utter insomniac.