infundibulum

Font Problems with Hindi in Firefox

August 1st, 2005

Debugging font issues is a pain , in my experience. If something isn’t rendering correctly, my first reaction is usually “I have absolutely no idea why that’s happening.” Gentle reader, feel my pain.

I find myself working with an awful lot of languages (you’ll see why when Jonas and I launch our project), and I often have to learn just enough characters to determine that a particular script seems to be rendering correctly. We have to know if rendering problems are caused by some kind of configuration problem that we can fix, or if it’s something out of our control: “Sorry, no hieroglyphics in Unicode, not our problem!”

Debugging such stuff is not the same thing as actually being able to read in all these languages: in most cases it’s enough to learn just a bit about how the script is put together and how characters combine, and perhaps a few words for testing purposes.

So here’s an example of a typical problem that I face. Compare a the two screenshot clips I took this morning. I added the red-bordered boxes to point out the differences:

Even if you don’t know Devanāgarī from a salad fork, it doesn’t take much to guess that something is askew in my Firefox’s rendering of that page. (Never mind the fact that the word “Hindi” is actually spelled incorrectly… Doh!) Opera seems to get it right.

Now I’m not going to get into the details of how Devanagari works in Hindi at the moment (primarily because I don’t know much, heheh). The main problem for me is that there are so many possible causes for any problem in text rendering. Is this a configuration problem on my end, or is it some pernicious software problem buried in a library underneath the text?

  1. The font could be bad.
  2. The browser?
  3. Is it the case that my operating system is missing some library? (Linux, in my case.) If so, what library? Can I upgrade something to fix it? Who ya gonna call?
  4. Or maybe it’s part of my desktop environment? I wonder if it works in that other desktop environment… blech, switching desktops is a pain…
  5. Could it be an encoding problem? Maybe the HTML page is encoded incorrectly in the first place.
  6. Or, maybe their server is futzing up the encoding somehow?
  7. Is it part of that “font shaping” thing, Pango? Am I even using Pango?

nd, but dag.

update…Σμς suggests an eighth potential culprit to this situation: there could be a problem with CSS. He also found a relevant bug in the bug database for Firefox. (See the comments. Thanks, Simos!)

In this particular case, the comparison above leads me to suspect #2, of course. But you get the picture here: these kinds of problems are a mess. Particularly in the open source world, it’s hard to know what to do in this situation. And I’m moderately techie. Imagine what a run of the mill user faces.

I was chatting with Chad Fowler and he made an interesting observation: for the development of any given application, in order to be sure, really sure, that everything is okay for every particular writing system, each development group would have to have someone who can read each language. Which, er, ain’t gonna happen.

And it shouldn’t really have to: the operating system is supposed to abstract the basic rendering of text away from coding.

OSX is pretty darn good at this. But then, it’s also a very closed system: it’s all tested, Apple owns and delivers a wide variety of high-quality (proprietary) fonts with its machines, and there are far fewer points of variation than you’ll see in your average Linux distribution.

Matters in Windows are less variable than Linux, but more complex than OSX, as Michael Kaplan can attest in great detail at his excellent blog.

I think these complexities are makes many programmers reticent about Unicode: they’ve been burned in the past with encoding matters, gotten a glimpse of the gruesome entrails underlying text rendering on their platform, and decided I just don’t have time to really learn how all these text rendering variables fit together.

And quite frankly, despite being something of a Unicode zealot myself, I can sympathize.

Most developers accept that they need to know the absolute minimum about Unicode. They already know that Unicode is good. The thing is, as a previous commenter pointed out, and as this tiny example demonstrates, the “Unicode” part of handling text is only the tip of the iceberg.

And it’s a big iceberg.

Comments

  1. 1

    Could you please add to your list above an eight issue?

    8. Could it be an element in the CSS spoils the rendering? (https://bugzilla.mozilla.org/show_bug.cgi?id=240914)

    Seriously, the issue of rendering Indic, Khmer, Burmese and similar languages is a big one, and Firefox is almost there. There is integration work going on with Pango and it appears it will take a bit more time to get them. For example, just now there is full support in Pango for Khmer, meaning it will take a bit to propagate to Firefox and end-users.

    The reason why it’s difficult to render Indic languages, etc is because reordering can occur when you combine glyphs together. In some cases, this reordering depends on the context, requiring a dictionary of words in order to display properly.

    If affected people speak up and spent some time on Bugzilla, identifying bugs, providing support, the problem will get solved sooner, and once for all. We will take it for granted by then.

    - Σμς @
  2. 2

    Hi Simos!

    Thanks for the comment. I didn’t mean to pick on Firefox in particular, this was more of a general rant about how hard it can be to figure out the specific causes of text rendering issues lie. But the Bugzilla suggestion is a good one, I’ll add a comment.

    I have to admit I’m sometimes wary to leave bug reports on the Firefox bugzilla — it seems so… uh… organized.

    I don’t *think* it is actually a CSS problem in this case, however, since at the page “Type in Hindi Devanagri Unicode Keyboard - िहंदी कीबोर्ड,” turning the CSS off doesn’t resolve the problem. I also made a simple test page with just the first word with the problem and no CSS at all, and the problem persisted.

    In any case, though, point taken, and I’ll try to pay off my ranting by doing something productive in the future. ☺

    By the way (your posts at Advogato) made this GNOME user happy…

    - pat @
  3. 3

    The bug in “Type in Hindi Devanagri Unicode Keyboard” would appear to be bad typing rather than the CSS bugs in Firefox. Looks like the author has placed the vowel before the consonant, instead of following the consonant.

    ie, typed िहंदी instead of हिंदी ?

    Andrew

    - andrewc @
  4. 4

    Hi Andrew,

    Interestingly enough, I’m told by a friend (Stephanie Booth), who is a student of Hindi, that either spelling is okay.

    I had the very same thought as you when I first looked at Joel Spolsky’s new wiki translation project, at the Hindi page .

    The spelling he has there is like the one on the keyboard page:

    ह DEVANAGARI LETTER HA
    ि DEVANAGARI VOWEL SIGN I
    ं DEVANAGARI SIGN ANUSVARA
    द DEVANAGARI LETTER DA
    ी DEVANAGARI VOWEL SIGN II

    The Hindi Wikipedia, on the other hand, uses (हिन्दी) :

    ह DEVANAGARI LETTER HA
    ि DEVANAGARI VOWEL SIGN I
    न DEVANAGARI LETTER NA
    ् DEVANAGARI SIGN VIRAMA
    द DEVANAGARI LETTER DA
    ी DEVANAGARI VOWEL SIGN II

    It seems that Hindi orthography (and other languages written with Devanagari, I presume?) allows a vowel+’n’ sequence to be written with ‘anusvara’ (the dot) or as vowel + na + virama.

    Neat. However, for what it’s worth, judging by Google, हिंदी is much less common (95,400 hits) than हिन्दी (2,770,000 hits).

    - pat @
  5. 5

    Hi pat ,

    The right way of appearence is in sequence of

    1. ि DEVANAGARI VOWEL SIGN I

    2. ह DEVANAGARI LETTER HA

    3. DEVANAGARI SIGN ANUSVARA

    4. द DEVANAGARI LETTER DA

    5.ी DEVANAGARI VOWEL SIGN II

    ////////Neat. However, for what it’s worth, judging by Google, हिंदी is much less common (95,400 hits) than हिन्दी (2,770,000 hits).\\\\\\ [Your judgeing was Wrong.]

    In spite of being both are right but in written pratice on copy or newspaper हिन्दी is never used, only हिंदी is used. Because हिन्दी make sence only as only being grammatically correct, but because of already available softwares हिन्दी is most popular but हिंदी is best .

    open this page in internate explorer(IE) you will get everything right. the sequence of appearence हिन्दी will be found corrected on Internet explorer as this was the problem of Mr. andrewc (To whome you replied).

    - ashvini @
  6. 6

    Firefox cannot show unicode for Indian lanugages properly. Use IE7 or Opera. Both are better in all terms than firefox.

    - Kudos @
  7. 7

    There is a patched version of firefox around. It may solve the problem.

    - Kerala People @
  8. 8

    The newest firefox release, 2.0.0.2, has fixed the Indic script rendering. Now I can view all the Hindi etc pages I want.

    - Aditya @
  9. 9

    Under Kubuntu linux, the KDE browser (Konqueror) works, but Mozilla 2.0.0.4 doesn’t. Hrmmm.

    - Rob @
  10. 10

    This Indic rendering problem is a 3 tier problem:
    1. Indic Unicode Encodings are done on wrong concept.
    2. Indic OT fonts contains only glyphs not complete syllables.
    3. The Unicode processor’s rendering engine is enabled or ignored in Firefox, specially in the taged/indexed texts/headlines.

    - Hariram @
  11. 11

    About हिन्दी Vs हिंदी . Both are equally correct versions of the same word. I cant comment on the reason why हिंदी is more popular on the web.

    Well, even i was searching about why my browser (firefox) doesn’t correctly render devnaagari. for example, i opened http://www.tehelkahindi.com/ , firefox is not able to show everything finely (as pat mentioned in his first screenshot) whereas internet explorer or opera both do a great job. still finding the reason….

    - Jai Pandya @
  12. 12

    Hello
    I m using gopi’s unicode converter and the website is running smoothly on the Internet Explorer and Firefox bt it is not working at Mozilla Can any body suggest me wht shud i do
    bye thanx for reading msg
    With Regards

    - Nidhi @
  13. 13

    i landed up on this old thread because firefox can’t display unicode properly STILL!!

    a clarfication to ashvini and others regarding which rendering is correct for the word HINDI:

    the word is written on paper as:
    i mAtrA + ha + n + da + ii mAtrA (in that sequence)

    the use of half na or the dot on top is the point of discussion.
    the dot on top (called anuswAra or bindu) is a GENERIC shortcut for all half-nasals (nasals that are without the vowel) including all the 5 groups (G from ka-group, J from ja-group, N from Ta-group, n from ta-group and m from pa-group) of consonants of devanAgarI.

    which nasal is used depends on the NEXT consonant. before ka, kha, ga, gha if there is a half-nasal it is only Ga (ITRANS format), before pa, pha, ba, bha it is always m etc.

    but the correct or original way of writing is to use the half consonant and not the dot. the dot is actually used only when the next consonant is not from these four groups but ya, ra, la, va, sha, Sha, sa or ha and denoted by M in ITRANS notation). like in saMsAra, saMyoga etc.

    the use of dot has extended toother nasals as well, out of laziness and later in an act of standardizing this easy way out by the govt of india, which has also recommended not using some conjugates like dwa and to use instead da + halanta + wa

    ह DEVANAGARI LETTER HA
    ि DEVANAGARI VOWEL SIGN I
    न DEVANAGARI LETTER NA
    ् DEVANAGARI SIGN VIRAMA
    द DEVANAGARI LETTER DA
    ी DEVANAGARI VOWEL SIGN II

    is correct over:

    ह DEVANAGARI LETTER HA
    ि DEVANAGARI VOWEL SIGN I
    ं DEVANAGARI SIGN ANUSVARA
    द DEVANAGARI LETTER DA
    ी DEVANAGARI VOWEL SIGN II

    - Shashi @

Leave a Reply