infundibulum

Attention Deficit Trait

March 30th, 2005

So this CNET article “Why can’t you pay attention anymore?” shot up del.icio.us and it, uh, caught my attention.

It’s the same old story about how gadgets & information overflow doesn’t increase productivity:

When people find that they’re not working to their full potential; when they know that they could be producing more but in fact they’re producing less; when they know they’re smarter than their output shows; when they start answering questions in ways that are more superficial, more hurried than they usually would; when their reservoir of new ideas starts to run dry; when they find themselves working ever-longer hours and sleeping less, exercising less, spending free time with friends less and in general putting in more hours but getting less production overall.

Yeah, okay, I guess that is pretty much true. But this part is all wrong:

I assume that high-tech companies, which are themselves such avid consumers of tech gadgetry, are rife with ADT?

Yes, but they’re also–and this is why I love those people so much–able to say no to it. They’re playful. Play is one of the best antidotes to this. They’re able to rise above it and get around it. The ones who suffer the most in that field are the ones who don’t have the
creative powers of the techies, and they just kind of slog along.

Hmm, he loves those people. Whatever. In my experience the windowing desktop system is exactly the right metaphor for “attention deficit trait,” and there is no one as prone to such a series of symptoms as a geek. Nobody needs to read all those bloglines posts. Or to obsessively reload del.icio.us.

But then, of course, if I hadn’t I wouldn’t be writing this.

Sorry, I have to go. I’m like 500 posts behind in my aggregator.

LUI LUI

March 28th, 2005

If you haven’t checked out IT Conversations before, you’re missing out. Well, most of the time… there are a few clunkers, but overall I’d say it’s one of the better sources of (geeky) audio programming on the web right now.

I just found an interview from November of last year called “Simulation, Agents and Accelerating Change.” I haven’t listened to it yet but the following tidbit caught my eye:

One of the most important accelerating transitions occuring today is the emergence of the Linguistic User Interface or LUI. The LUI is the natural language front end to our increasingly malleable, intelligent, and humanizing Internet. Primitive LUIs exist today in interfaces like Google, but will become dramatically more powerful over the next few decades.

In my experience to date, most “LUI”s serve only to drive people nuts. But whatever, maybe they will get better.

UPDATE: It’s random marketing speak. Don’t bother.

The Raven Paradox

March 25th, 2005

An interesting bit from Wikipedia, The Raven’s Paradox:

When numerous people over thousands of years observe something like the law of gravity, we tend to believe that it is true with very high probability.
This type of reasoning could be summarized by the principle of induction:

  • If an instance X is observed that is consistent with theory T, then the probability that T is true increases.

Hempel gives an example of the principle of induction. The theory is that all ravens are black. We go out and examine a million ravens, and observe that they are all black. After each observation, our belief in the theory “all ravens are black” will rise slightly. The principle of induction looks reasonable here.

Now comes the problem. The statement “all ravens are black” is logically equivalent to the statement “all non-black-things are non-ravens”. If we observe a red apple, that is consistent with that statement. A red apple is a non-black-thing, and when we examine it, we observe that it is a non-raven. So by the principle of induction, observing a red apple should increase our belief that all ravens are black!

Chew on that while you’re waiting at the bus stop.

Long live Unicode & whatnot.

March 23rd, 2005

Oh whatever, Cory Doctorow.

HACKERS <HEART> PLAIN TEXT

Geeks store what they do in text and spurn big apps, using plain
text editors. Simplicity and speed, ease of search and
extraction, cut and paste. All you need in a filing system.

Real hackers ❤ UTF-8.

And anyway, There Ain’t No Such Thing As Plain Text.

Random Thoughts on Compression and Lingustic Typology

March 22nd, 2005

Here’s a random thought I wanted to write down before I forgot it.

There has been discussion of using compression to identify languages. Here’s a neat little Python script by Dirk Holtwick that proves that the idea works. It’s based on this short paper, which was quite controversial when it came out (mostly because it was published in a physics journal, and the physicists got grumpy).

I wonder what other uses compression could be put to in the linguistic sphere. The idea that came to my mind was typology. It seems to me that languages that are agglutinative would have detectable differences in compression patterns than languages that are isolating, for instance.

For example, one would expect many strings in Turkish text to show up as substrings of other words, because Turkish is agglutinating…

I haven’t articulated this well, just wanted to get it out of my head before I forgot it.

Love, Your Friendly Neighborhood Outlying App Developer

March 22nd, 2005

Have you ever stopped to think about why Flash games are the way they are?

Thinking about webapps a lot lately has made me sensitive to this sort of thing. Flash apps are as capable as desktop apps, in many ways, but I can’t think of any “desktop-esque” Flash app that I use on a regular basis (aside from Flickr, a notable exception). I mean, when was the last time you used a Flash email client? Me neither. Flash is what it is, and what it is partially, now, is an aesthetic.

No one would ever, ever, ever write this in C. It just wouldn’t happen. Obviously bandwidth is a part of that, or at least it was when Flash first came out… but I think there’s more to it than that. There’s something about the web that encourages things to be “a la carte.” People talk a lot about how web apps are suddenly going to have practically all the functionality of desktop apps, and how that’s kind of dumb, because why start over from scratch?

To me the fact that web apps replace operating system woes and software download and installation with the comparatively manageable problems of cross-browser compatibility is justification enough.

But there’s also an element of the unknown in the webapp world. When Macromedia wrote Flash, they had no idea that it would spawn a world of cult comics and plastic bubble popping. Just so, no one really knows what web apps will look like.

The critical difference, in my opinion, is that on the web networking is available to everyone, and thus the nutty world of distributed effort and the long tail and all that. Nat Friedman’s blog is always interesting (er, when he updates it), and this post puts what I’ve been trying to say here in a 4 am blur much more clearly:

It’s interesting to watch the traditional application-and-platform developers (Apple, Microsoft, the Linux desktop projects) parade down the “web will never be good enough for real applications” path. Apple has Sherlock, MS has Avalon/XAML, we have Gtk/Qt/XUL/etc. Meanwhile you find more and more outlying app developers writing web apps. I wonder if, in a few years, we will look like withered old timers to the new armies of web application devleopers. Clinging to our dated ways. “I like my trackball just fine, sonny!”

Official title “more outlying app developer,” hmm, I’ll take that.

Unicode pr0n!!!

March 21st, 2005

Hawttest thing evar.

Why does Javascript have the Reputation of Sucking? A Lot?

March 19th, 2005

You know, after having read a fair amount about Javascript as a result of all this Ajax stuff, I have to say I feel sort of conned. Realization du jour: “Oh, Javascript is a language, not a scriptkiddie popup machine.”

I think one reason has to do with its association with web developers instead of “real” developers — people who write desktop applications, you know, in C or something. I freely admit that I have little interest in learning C or C++ or, frankly, any other compiled language. I’ve actually tried a bit of C++, and I hated it. Honestly, what good is learning a compiled language to me, when I can build good enough networked applications with XHTML, CSS, a little server side PHP or Python, and Javascript?

You tell me…

Del.icio.us is Crack

March 15th, 2005

It’s taken me a while to really figure it out, but you know what?

http://del.icio.us is huge.

I figure that anyone who happens upon this blog already knows what it is, but on the intensely outside chance that you haven’t, here’s an introduction.

delicious logo

I’ve been using del.icio.us for a long time — here’s my tag map doohickey provided by extisp.icio.us, and here’s my page on del.icio.us proper. A few stats: 1059 tags, 4695 posts.

So lately I’ve been thinking about what this thing really is. And when I thought about it, it occurred to me that it’s probably the biggest “classification” of anything I’ve ever made in my life.

Which is kind of funny, considering that I once had a job where I was officially a “technical lexicographer,” i.e., I was paid to classify terms in a hierarchy (it was for a natural language search engine — the product ended up being bought out by these guys).

What we did there all day long was decide which word should be a “subword” of which other word. to jaunt goes under to run. Except most of them weren’t that easy. Arguments were inevitable, and people would come up with pet plans for generalizing how to organize stuff (I did too). And we just weren’t fast, because we felt like we had to be right. The project didn’t start from scratch, it was based partially on Wordnet which is a hierarchical lexical database which has been around for a long time. There’s a web interface where you can take a look at the sort of data structures it contains… heh, I see that they don’t call jaunting a kind of running.

So they have their stats up. Here’s the bottom line: “The total of all unique noun, verb, adjective, and adverb strings is actually 144309.” Now, if I myself have 1K tags in del.icio.us, it’s blatantly obvious that del.icio.us is bigger than Wordnet: there are clearly more than 100 users of del.icio.us (geeze I hate typing that).

One could argue that it’s apples and oranges, Wordnet has nothing to do with applying tags to URLs. But at least 80 percent of what I did in my job had to do with nouns. We avoided off verbs as much as possible (as the example I mention shows, it’s really hard to classify verbs consistently), adjectives were just as difficult, and adverbs? Fuhgeddaboutit. If I recall correctly, the early versions of Wordnet didn’t even touch adverbs. (Hell, it’s hard to even define an adverb.)

Not to mention that those categories aren’t necessarily useful across languages.

I’m just thinking out loud here, but I think the scale of the success of delicious really makes one wonder about the wisdom of attempting to build a lexical database in any way that isn’t distributed. It’s Yahoo versus Google all over again. “Folksonomy” isn’t necessarily the most mellifluous, er, tag, for the tagging craze, but the tagging craze itself is definitely onto something.

Anyway, I’m babbling on. Joshua Schachter is a genius, and goodnight.

Programming in the browser…

March 10th, 2005

Getting Unicode straight across platforms has been a huge hangup for me in trying to get together some tutorials on doing language processing with Python. And then, there’s another barrier to cross: how to deal with markup?

Generally speaking, what I’m interested in dealing with is text, but most multilingual text on the web is HTML.

One weird observation that keeps occurring to me is that you could teach text processing without teaching people to deal with setting up a programming environment at all: use Javascript.

This seems a little weird, but I think the reason that it seems weird is because people who work with text processing have never thought of Javascript as a real language. But it is a real language. And the barriers to programming in Javascript are incredibly low. (Go type javascript:alert('hello world!') on your address bar to see what I mean.)

And then, I was reading through some stuff on Crockford.com, and I came across this:

String is a sequence of zero or more Unicode characters. There is no separate character type.

Good grief! Music to my ears!

And as for dealing with HTML, well, Javascript has that abstraction built in. Try explaining to a newbie how to extract the text from an HTML page in Python. “Well, you start by subclassing a parser and…” Javascript is designed for a browser; and browsers are where all that markup stuff comes from in the first place: to turn a css rule into “put this text in a blue box in the corner,” the “text” bit is a given.

Of course, it still looks like C — or at least, certainly not as friendly as Python, but I have to say, combining these characteristics with Greasemonkey open up some very interesting possibilities… input/output becomes “go to this url.” Process the text becomes “Paste this Greasemonkey script into the editor and run it — the result will be investigate character distributions/statistical language id/sentence splittling keyword extraction/blah blah blah….”

Is it crazy to think that such things can be done in a learnable way with Javascript? I don’t think it is…

I’m just thinking out loud. But lately I’ve been thinking about all that Ajax stuff (and rolling it into my present project), and it’s gotten me thinking about the browser as a place to do programming. Kind of blue sky, yes, but certainly a fun angle on the topic of processing natural language.