Programming in the browser…
March 10th, 2005Getting Unicode straight across platforms has been a huge hangup for me in trying to get together some tutorials on doing language processing with Python. And then, there’s another barrier to cross: how to deal with markup?
Generally speaking, what I’m interested in dealing with is text, but most multilingual text on the web is HTML.
One weird observation that keeps occurring to me is that you could teach text processing without teaching people to deal with setting up a programming environment at all: use Javascript.
This seems a little weird, but I think the reason that it seems weird is because people who work with text processing have never thought of Javascript as a real language. But it is a real language. And the barriers to programming in Javascript are incredibly low. (Go type javascript:alert('hello world!') on your address bar to see what I mean.)
And then, I was reading through some stuff on Crockford.com, and I came across this:
String is a sequence of zero or more Unicode characters. There is no separate character type.
Good grief! Music to my ears!
And as for dealing with HTML, well, Javascript has that abstraction built in. Try explaining to a newbie how to extract the text from an HTML page in Python. “Well, you start by subclassing a parser and…” Javascript is designed for a browser; and browsers are where all that markup stuff comes from in the first place: to turn a css rule into “put this text in a blue box in the corner,” the “text” bit is a given.
Of course, it still looks like C — or at least, certainly not as friendly as Python, but I have to say, combining these characteristics with Greasemonkey open up some very interesting possibilities… input/output becomes “go to this url.” Process the text becomes “Paste this Greasemonkey script into the editor and run it — the result will be investigate character distributions/statistical language id/sentence splittling keyword extraction/blah blah blah….”
Is it crazy to think that such things can be done in a learnable way with Javascript? I don’t think it is…
I’m just thinking out loud. But lately I’ve been thinking about all that Ajax stuff (and rolling it into my present project), and it’s gotten me thinking about the browser as a place to do programming. Kind of blue sky, yes, but certainly a fun angle on the topic of processing natural language.
If you haven’t seen it yet, check out Jesse Ruderman’s excellent Javascript Shell:
http://www.squarefree.com/shell/
The bookmarklet lets you fire up a javascript interpreter from within the current page’s context. Out-freaking-standing.
Of course, there’s also Venkman.
https://addons.update.mozilla.org/extensions/moreinfo.php?id=216
More info on Venkman:
http://www.google.com/custom?hl=en&lr=&ie=ISO-8859-1&cof=AWFID%3A9262c37cefe23a86%3BL%3Ahttp%3A%2F%2Fwww.mozilla.org%2Fimages%2Fmlogosm.gif%3BLH%3A60%3BLW%3A174%3BBGC%3Awhite%3BT%3A%23000000%3BLC%3A%23990000%3BVLC%3A%23800080%3BALC%3A%230000ff%3BGALT%3A%23666633%3BGFNT%3A%23808080%3BGIMP%3A%23cc0000%3BDIV%3A%23990000%3BLBGC%3Awhite%3BAH%3Acenter%3B&domains=mozilla.org&q=venkman+firefox&btnG=Search&sitesearch=mozilla.org
Lots of people program within Firefox.
Ironically, I just found crockford.com a few days ago myself, and found your page because you referred to Greasemonkey.
- Jeremy Dunck @ 11 March 2005