Font Problems with Hindi in Firefox
Debugging font issues is a pain , in my experience. If something isn’t rendering correctly, my first reaction is usually “I have absolutely no idea why that’s happening.” Gentle reader, feel my pain.
I find myself working with an awful lot of languages (you’ll see why when Jonas and I launch our project), and I often have to learn just enough characters to determine that a particular script seems to be rendering correctly. We have to know if rendering problems are caused by some kind of configuration problem that we can fix, or if it’s something out of our control: “Sorry, no hieroglyphics in Unicode, not our problem!”
Debugging such stuff is not the same thing as actually being able to read in all these languages: in most cases it’s enough to learn just a bit about how the script is put together and how characters combine, and perhaps a few words for testing purposes.
So here’s an example of a typical problem that I face. Compare a the two screenshot clips I took this morning. I added the red-bordered boxes to point out the differences:
Even if you don’t know Devanāgarī from a salad fork, it doesn’t take much to guess that something is askew in my Firefox’s rendering of that page. (Never mind the fact that the word “Hindi” is actually spelled incorrectly… Doh!) Opera seems to get it right.
Now I’m not going to get into the details of how Devanagari works in Hindi at the moment (primarily because I don’t know much, heheh). The main problem for me is that there are so many possible causes for any problem in text rendering. Is this a configuration problem on my end, or is it some pernicious software problem buried in a library underneath the text?
- The font could be bad.
- The browser?
- Is it the case that my operating system is missing some library? (Linux, in my case.) If so, what library? Can I upgrade something to fix it? Who ya gonna call?
- Or maybe it’s part of my desktop environment? I wonder if it works in that other desktop environment… blech, switching desktops is a pain…
- Could it be an encoding problem? Maybe the HTML page is encoded incorrectly in the first place.
- Or, maybe their server is futzing up the encoding somehow?
- Is it part of that “font shaping” thing, Pango? Am I even using Pango?
nd, but dag.
In this particular case, the comparison above leads me to suspect #2, of course. But you get the picture here: these kinds of problems are a mess. Particularly in the open source world, it’s hard to know what to do in this situation. And I’m moderately techie. Imagine what a run of the mill user faces.
I was chatting with Chad Fowler and he made an interesting observation: for the development of any given application, in order to be sure, really sure, that everything is okay for every particular writing system, each development group would have to have someone who can read each language. Which, er, ain’t gonna happen.
And it shouldn’t really have to: the operating system is supposed to abstract the basic rendering of text away from coding.
OSX is pretty darn good at this. But then, it’s also a very closed system: it’s all tested, Apple owns and delivers a wide variety of high-quality (proprietary) fonts with its machines, and there are far fewer points of variation than you’ll see in your average Linux distribution.
Matters in Windows are less variable than Linux, but more complex than OSX, as Michael Kaplan can attest in great detail at his excellent blog.
I think these complexities are makes many programmers reticent about Unicode: they’ve been burned in the past with encoding matters, gotten a glimpse of the gruesome entrails underlying text rendering on their platform, and decided I just don’t have time to really learn how all these text rendering variables fit together.
And quite frankly, despite being something of a Unicode zealot myself, I can sympathize.
Most developers accept that they need to know the absolute minimum about Unicode. They already know that Unicode is good. The thing is, as a previous commenter pointed out, and as this tiny example demonstrates, the “Unicode” part of handling text is only the tip of the iceberg.
And it’s a big iceberg.
Σμς Said,
August 5, 2005 @ 11:23 am
Could you please add to your list above an eight issue?
8. Could it be an element in the CSS spoils the rendering? (https://bugzilla.mozilla.org/show_bug.cgi?id=240914)
Seriously, the issue of rendering Indic, Khmer, Burmese and similar languages is a big one, and Firefox is almost there. There is integration work going on with Pango and it appears it will take a bit more time to get them. For example, just now there is full support in Pango for Khmer, meaning it will take a bit to propagate to Firefox and end-users.
The reason why it’s difficult to render Indic languages, etc is because reordering can occur when you combine glyphs together. In some cases, this reordering depends on the context, requiring a dictionary of words in order to display properly.
If affected people speak up and spent some time on Bugzilla, identifying bugs, providing support, the problem will get solved sooner, and once for all. We will take it for granted by then.
pat Said,
August 5, 2005 @ 5:37 pm
Hi Simos!
Thanks for the comment. I didn’t mean to pick on Firefox in particular, this was more of a general rant about how hard it can be to figure out the specific causes of text rendering issues lie. But the Bugzilla suggestion is a good one, I’ll add a comment.
I have to admit I’m sometimes wary to leave bug reports on the Firefox bugzilla — it seems so… uh… organized.
I don’t *think* it is actually a CSS problem in this case, however, since at the page “Type in Hindi Devanagri Unicode Keyboard - िहंदी कीबोर्ड,” turning the CSS off doesn’t resolve the problem. I also made a simple test page with just the first word with the problem and no CSS at all, and the problem persisted.
In any case, though, point taken, and I’ll try to pay off my ranting by doing something productive in the future. ☺
By the way (your posts at Advogato) made this GNOME user happy…