Archive for Tech

I Bet You Didn’t Make Any Money…

Here’s an update to a random idea I had a while back: Want to Know How to Make Some Money?, where I babbled:

Want to Know How to Make Some Money? Here, I’ll tell you.

News Sentinel | 06/24/2005 | Funding cut for translator service

+ Wireless network + Laptops + Webcams + Subscriptions + Nationwide
(Worldwide?) network of on-call interpreters for lots of languages.

Well, go on.

The idea being that one could start a business capitalizing on the relatively cheap availability of video conferencing tools to sell distributed interpretation services.

Well, I talked to my sister about this idea. She’s a nurse.

The concept is D.O.A., and here’s why: there are strict rules about how the interaction between doctors, patients, and interpreters are to take place. Specifically, the interpreter is not allowed to be a “participant” in the conversation: the interpreter must not speak directly to the patient. The patient looks only at the doctor, never at the interpreter.

That’s a rule.

Which obviates the whole point of the webcam idea. Perhaps the VOIP aspect would still be doable, however.


Javascript Mailing List?

There’s a Rails mailing list and a bunch of Python mailing lists and small industry of Perl mailing lists and so on and so forth.

So where’s the Javascript mailing list? Am I just missing it? Because it seems like something that would be useful, what with all the webappishness going around and unobtrusivity and all that.


Comments (2)

In which I ask You

Why don’t word processors keep the cursor in the middle of the page?

Good grief.

Down arrow, down arrow, down arrow, up arrow, up arrow, up arrow, edit. Cursor goes down to the bottom of the page. Repeat. Get mad. Repeat.


Font Problems with Hindi in Firefox

Debugging font issues is a pain , in my experience. If something isn’t rendering correctly, my first reaction is usually “I have absolutely no idea why that’s happening.” Gentle reader, feel my pain.

I find myself working with an awful lot of languages (you’ll see why when Jonas and I launch our project), and I often have to learn just enough characters to determine that a particular script seems to be rendering correctly. We have to know if rendering problems are caused by some kind of configuration problem that we can fix, or if it’s something out of our control: “Sorry, no hieroglyphics in Unicode, not our problem!”

Debugging such stuff is not the same thing as actually being able to read in all these languages: in most cases it’s enough to learn just a bit about how the script is put together and how characters combine, and perhaps a few words for testing purposes.

So here’s an example of a typical problem that I face. Compare a the two screenshot clips I took this morning. I added the red-bordered boxes to point out the differences:

Even if you don’t know Devanāgarī from a salad fork, it doesn’t take much to guess that something is askew in my Firefox’s rendering of that page. (Never mind the fact that the word “Hindi” is actually spelled incorrectly… Doh!) Opera seems to get it right.

Now I’m not going to get into the details of how Devanagari works in Hindi at the moment (primarily because I don’t know much, heheh). The main problem for me is that there are so many possible causes for any problem in text rendering. Is this a configuration problem on my end, or is it some pernicious software problem buried in a library underneath the text?

  1. The font could be bad.
  2. The browser?
  3. Is it the case that my operating system is missing some library? (Linux, in my case.) If so, what library? Can I upgrade something to fix it? Who ya gonna call?
  4. Or maybe it’s part of my desktop environment? I wonder if it works in that other desktop environment… blech, switching desktops is a pain…
  5. Could it be an encoding problem? Maybe the HTML page is encoded incorrectly in the first place.
  6. Or, maybe their server is futzing up the encoding somehow?
  7. Is it part of that “font shaping” thing, Pango? Am I even using Pango?

nd, but dag.

update…Σμς suggests an eighth potential culprit to this situation: there could be a problem with CSS. He also found a relevant bug in the bug database for Firefox. (See the comments. Thanks, Simos!)

In this particular case, the comparison above leads me to suspect #2, of course. But you get the picture here: these kinds of problems are a mess. Particularly in the open source world, it’s hard to know what to do in this situation. And I’m moderately techie. Imagine what a run of the mill user faces.

I was chatting with Chad Fowler and he made an interesting observation: for the development of any given application, in order to be sure, really sure, that everything is okay for every particular writing system, each development group would have to have someone who can read each language. Which, er, ain’t gonna happen.

And it shouldn’t really have to: the operating system is supposed to abstract the basic rendering of text away from coding.

OSX is pretty darn good at this. But then, it’s also a very closed system: it’s all tested, Apple owns and delivers a wide variety of high-quality (proprietary) fonts with its machines, and there are far fewer points of variation than you’ll see in your average Linux distribution.

Matters in Windows are less variable than Linux, but more complex than OSX, as Michael Kaplan can attest in great detail at his excellent blog.

I think these complexities are makes many programmers reticent about Unicode: they’ve been burned in the past with encoding matters, gotten a glimpse of the gruesome entrails underlying text rendering on their platform, and decided I just don’t have time to really learn how all these text rendering variables fit together.

And quite frankly, despite being something of a Unicode zealot myself, I can sympathize.

Most developers accept that they need to know the absolute minimum about Unicode. They already know that Unicode is good. The thing is, as a previous commenter pointed out, and as this tiny example demonstrates, the “Unicode” part of handling text is only the tip of the iceberg.

And it’s a big iceberg.

Comments (2)

Sitepoint’s CSS and DHTML Books

I’ve recently become a fan of Sitepoint’s books on programming. They’re very cleanly put together, and generally speaking seem to be quite up to date. Here are a couple of titles I went ahead and took the plunge on:

HTML Utopia: Designing Without Tables Using CSS
I like this book quite a bit. The CSS reference in the back is almost worth the price of admission… there are references online (duh) but I guess I’m just still a sucker for paper. There’s a lot of useful info on styling text, which turns out to have more tricks available than I’d ever heard of. One thing about this book that annoyed me intensely was in chapter 6, “Putting Things in Their Place,” when he gives a Javascript solution to the problem of getting columns to flow to equal heights. Admittedly, he gives an alternative, but there are a lot of pure CSS solutions to this problem out there, and one would think that if there’s a reliable one out there, that this would be the book to find it. So yeah, that bit rubbed me the wrong way.
DHTML Utopia: Modern Web Design Using JavaScript & DOM
I’ve been looking forward to this one for quite a while. At that link you can get the first four chapters for free. To be honest, I debated whether to buy the book, because judging from the table of contents, it seems that most of the stuff that I had doubts about was in the free sample chapters. But I’m a big fan of the author and editor: Stuart Langridge through the ridiculously awesome LugRadio (or listen on Odeo) and Javascript/Python guru Simon Willison. So in the end I felt pretty good about picking up a copy. Haven’t started digging in yet. One nit to pick: forty smackers is a lot to ask for a book that’s just 300 pages. Not saying it won’t turn out to be worth it in the end, but dag.
update… The sample chapters are available as HTML now: DHTML Utopia: Modern Web Design Using JavaScript & DOM. I can’t seem to get the example from this chapter to work, though, can you?

All this DHTML stuff is surprisingly fun. And I’ve mentioned before that Javascript has the right policy on Unicode, which makes me pretty happy.

Like this ☞ ☺

Especially considering the headache that is dealing with multibyte stuff in just about every other scripting language. Which makes me kind of sad.

☹ ☜ Like that.


Weird Interface Moment

I just had a weird interface moment.

I use Backpack constantly as an outboard brain, and drag and drop lists were recently added. So I’ve sort of become used to them.

Now, here’s the weird part: I was just trying to type up a list of stuff, in a run-of-the mill text file (in my favorite text editor, gedit). And I had this urge to reorganize the list drag-and-drop style. Except, my word processor couldn’t do that . Kinda backwards, eh? Usually web interfaces are thought of as the relatively impoverished cousins of desktop apps…

Yes, friends and neighbors, the browser will eat the desktop, sooner or later.

on second thought…
I guess gedit sort of does have drag and drop: you can highlight a line and then drag and drop that. But it’s still not as simple as Backpack’s lists, because it’s really hard to grab or skip newlines just by highlighting–with an HTML list there are bullets.


On-the-fly ASCII to Unicode Transliteration with Javascript?

Here’s an interesting little script I found on the Reta Vortaro (that is, the Esperanto web dictionary).

anstataŭigu cx, gx, …, ux

Try typing the string jxauxdo in that box. And press “Trovu”, if you like, that will search Google for ĵaŭdo (Esperanto for “Thursday”). Notice that jxĵ and uxŭ “on the fly,” as you type. (Come to think of it, maybe “transliteration” isn’t the right word for this process…)

So, backing up a bit, Esperanto has a few odd characters in its orthography:

Letter Pronunciation (IPA) Unicode x-system
ĉ [ʧ] U+0109 cx
ĝ [ʤ] U+011D gx
ĥ [x] U+0125 hx
ĵ [ʒ] U+0135 jx
ŝ [ʃ] U+015D sx
(as aŭ, eŭ)
[u̯] U+016D ux

Even today those characters are relatively rare in fonts–if you can’t see them I imagine this post may not make too terribly much sense. 8^)

The good doktoro even got a little flak back in the day, for choosing to include such unusual characters in a supposedly universal language. Nowadays, however, they’re all in Unicode–here’s the full info for ŝ, for example:


But pragmatically speaking, there’s still a problem with input. Suppose you are a gold-star-wearing green-flag-waving Esperanto afficionado, and you want to post something on the internet. How do you actually type these characters? The “right” answer is that you install a keyboard layout for the language in question, and you memorize its layout.

This is a pain, of course.

And it’s nothing new: in the (typographical) bad old days of all-ASCII USENET, Unicode wasn’t widely available, and what people would generally do (for many languages, not just Esperanto) was come up with all-ASCII transliteration systems. The “x-system” added to the table above was probably the most popular. It so happens that there is no letter x in Esperanto, so it didn’t cause any massive problems with ambiguity.

So let’s look at the script in question, it’s quite simple:

function xAlUtf8(t) {
  if (document.getElementById("x").checked) {
    t = t.replace(/c[xX]/g, "\u0109");
    t = t.replace(/g[xX]/g, "\u011d");
    t = t.replace(/h[xX]/g, "\u0125");
    t = t.replace(/j[xX]/g, "\u0135");
    t = t.replace(/s[xX]/g, "\u015d");
    t = t.replace(/u[xX]/g, "\u016d");
    t = t.replace(/C[xX]/g, "\u0108");
    t = t.replace(/G[xX]/g, "\u011c");
    t = t.replace(/H[xX]/g, "\u0124");
    t = t.replace(/J[xX]/g, "\u0134");
    t = t.replace(/S[xX]/g, "\u015c");
    t = t.replace(/U[xX]/g, "\u016c");

Include it with something like:

< script type="text/javascript" src=""> < /script > 

And the function gets called with an onkeyup="xAlUtf8(this.value)" inside the input tag.

(Using onkeyup is actually sort of verboten these days–it should be done with unobtrusively, etc.)

So anyway, that’s a pretty interesting way to enter some unusual characters. It’s interesting to muse on just how far one could take this approach. Would it be possible to create a script that would handle an entire writing system? Say, a script that would convert an entire textarea from an ASCII-based transliteration to Unicode characters, on the fly? Japanese and Chinese are definitely excluded from this approach (every Chinese character in RAM? Er, no.) but people who use those languages generally already have keyboard input taken care of.

That would be neat: you could, for instance, have textareas where users without keyboard layouts could input something in Amharic or Persian or whatever without having the keyboard layout actually installed.

But as it stands, it’s just simple substitution, and no string which is to be substituted can be a substring of another such string. In order to handle a more generalized set of substitutions, you’d probably need to use a Trie structure. (nice trie implementation in Python by James Tauber. )

I’m sure there are complications that would arise from what’s called “font shaping” — that is, how operating systems combine adjacent characters. In Arabic or Thai, for instance, characters vary depending on which characters they’re adjacent to. How does this process affect text in textareas, for instance, or text which is mushed around with Javascript?

I’ll be playing around with this.

Comments (2)

What Font?

Neat trick if you use the DOM Inspector in Firefox:

If you right click on any text and choose “Inspect element,” the DOM Inspector will show you where that element is. Then if you choose the little dropdown as shown, and select “Computed style” like this:

…you can look up the value of font-family for whatever element it was that you selected. This is easier than trying to work it out by looking through reams of stylesheets and HTML, methinks.

Also, I never tire of saying “DOM Inspector.”

And now a special opinion piece from Teh Blog Hator.

He doesn’t mince words.


Hindi and Unicode

यूनिकोड क्या है?
What is Unicode? in Hindi

DIT gives push to language software :

The contents of the free CD will include Hindi language true type fonts with keyboard driver, Hindi Language Unicode Compliant Open Type Fonts, generic fonts code and storage code converter for Hindi, Hindi language version of Bharateeya OO, Firefox Browser in Hindi, Multi Protocol Messenger in Hindi, Email Client in Hindi among others.

This is forward-thinking on the part of the Indian government; for a long time it seemed to be the case that the only major website that encoded Hindi in UTF-8 was a foreign site, BBCHindi. Most news sites in Hindi use any of a bewildering array of proprietary encodings, with a proprietary font to accompany it. (Intended presumably to lock in users).

But India is a country which stands to benefit more than most from Unicode: not only does it have a huge variety of languages, it has a large number of scripts (which are already defined in Unicode). Standardizing on a single character set will make it much easier to localize software and spread digital literacy.

And literacy, period…

Whether these efforts will be officially extended to other languages and scripts in India remains to be seen, but the fact that it’s been done in Unicode for Hindi will make the path much easier.

Incidentally, all of this is related to other domains besides news — email, for instance. Consider one blogger’s criticism of Yahoo Mail… gaping void: Why Yahoo will not be my primary mail client?)

See also: वेब पर हिन्दी - हिन्दी - hindi A blog on the Hindi language, in Hindi and English.

Comments (10)

Reading lots of Rails Source

If, like me, you happen to have been studying Ruby on Rails, here’s a silly trick for reading through the source to the applications awaiting judging at Rails Day Contest.

The source of the entries is here in Subversion repositories, but there really isn’t any way to navigate between projects. Each project has the typical Rails folder hierarchy, under URLs like:

So if you’re like me, you like to look at lots of code and compare stuff . In the case of Rails, I wanted to get a general feel, for instance, for the sorts of stuff that goes in /app/controllers or /app/models in various projects.

It so happens that Jesse Ruderman has written some navigation bookmarklets that work great for nagivating around those Rails projects: if you drag those two linked words (”increment” and “decrement”) to your toolbar and then visit the first project, you can click them to navigate around.

increment: Increases the last number in the URL by 1.

decrement: Decreases the last number in the URL by 1.

Okay, maybe that didn’t warrant an entire post.


Comments (2)

· « Previous entries