Hindi and Unicode

यूनिकोड क्या है?
What is Unicode? in Hindi

DIT gives push to language software : HindustanTimes.com

The contents of the free CD will include Hindi language true type fonts with keyboard driver, Hindi Language Unicode Compliant Open Type Fonts, generic fonts code and storage code converter for Hindi, Hindi language version of Bharateeya OO, Firefox Browser in Hindi, Multi Protocol Messenger in Hindi, Email Client in Hindi among others.

This is forward-thinking on the part of the Indian government; for a long time it seemed to be the case that the only major website that encoded Hindi in UTF-8 was a foreign site, BBCHindi. Most news sites in Hindi use any of a bewildering array of proprietary encodings, with a proprietary font to accompany it. (Intended presumably to lock in users).

But India is a country which stands to benefit more than most from Unicode: not only does it have a huge variety of languages, it has a large number of scripts (which are already defined in Unicode). Standardizing on a single character set will make it much easier to localize software and spread digital literacy.

And literacy, period…

Whether these efforts will be officially extended to other languages and scripts in India remains to be seen, but the fact that it’s been done in Unicode for Hindi will make the path much easier.

Incidentally, all of this is related to other domains besides news — email, for instance. Consider one blogger’s criticism of Yahoo Mail… gaping void: Why Yahoo will not be my primary mail client?)

See also: वेब पर हिन्दी - हिन्दी - hindi A blog on the Hindi language, in Hindi and English.

10 Comments »

  1. Jeff Lindstrom Said,

    June 20, 2005 @ 11:15 am

    Thanks for your comments about Urdu you sent. I’ve added the links to my bookmarks sidebar and also the Hindi blog you mention in this post.

    I’ve just recently moved from an “ancient” Windows 98 machine to Mac OS X Tiger, and the jump in multilingual support is astonishing. However, I’m annoyed that Firefox on either platform is still very devanagari-resistant.

    As for Unicode Hindi sites, the more the merrier. One way to learn a new language is to swoop up lots of examples, say through Google. Because of the font encoding issues it’s very frustrating trying to find example sentences the way I can even for Urdu, which is more likely to show up on the Web in Unicode.

    Thanks again for the links.

    धन्यवाद शुक्रिया और थैंक्यू!
    जेफ़

  2. pat Said,

    June 20, 2005 @ 8:59 pm

    Hi Jeff,

    Glad to be of help. What kinds of problems are you having with nagari and Firefox? In an older version there seemed to be issues with the matras lining up with the consonants correctly, but either I’ve since gotten a better font somewhere along the line or Firefox corrected it.

    Er… whoops. Now that I look again, maybe not? Hindi-Urdu phrasebook - Wikitravel

    In any case, you’re right about the complexities of encodings and Google. I googled a few of the terms you wrote and of course, got back a bunch of Hindi pages… but I noticed that some of them are actually not in UTF-8, they’re in legacy encodings. The little snippet on Google’s search results page is, however, which means that Google is converting the stuff behind the scenes: they’re indexing stuff in lots of encodings, and converting the input in the search box to UTF-8.

    Nutty.

    Encoding stuff can make one’s head spin.

    नमस्ते!

  3. Pankaj Narula Said,

    July 3, 2005 @ 6:27 pm

    Hi All

    It is a pleasure to read about Hindi over here. Hindi has been making some good progress on the web. The concept of weblogs has given it a good push. Here are few good links

    http://nirantar.org - World’s first Hindi Blogzine
    http://akshargram.com - A group Hindi weblog
    http://myjavaserver.com/~hindi Hindi blogs aggregator. Can you believe it is done in Java :D

    I am also on editor panel of Nirantar Magazine. I would love to hear back from you guys.

    पंकज नरुला उर्फ हाँ भाई
    http://hindi.pnarula.com/haanbhai
    http://pnarula.com

  4. pat Said,

    July 3, 2005 @ 10:58 pm

    Hi Pankaj!
    Thanks for dropping by. Looks like you’re doing some really interesting work in Hindi blogs! Wish I could read it, heheh.

    Of the new Hindi blogs you’re seeing and contributing to, what percentage would you say are encoded in UTF-8? The majority?

    By the way, “us guys” is just me. 8^)

    शुक्रिया and धन्यवाद too!

  5. Pankaj Narula Said,

    July 4, 2005 @ 12:30 pm

    I have never seen a Hindi blog which was not unicoded. I think most of it has to do with the easy and free availability of Blogger.

  6. pat Said,

    July 4, 2005 @ 1:15 pm

    Well that’s good news!

    What about newspapers? Do you think they’re starting to switch as well? It seems like every Hindi newspaper has its own encoding and font to go with it!

  7. Jeff Lindstrom Said,

    July 23, 2005 @ 6:00 pm

    “Glad to be of help. What kinds of problems are you having with nagari and Firefox? In an older version there seemed to be issues with the matras lining up with the consonants correctly, but either I’ve since gotten a better font somewhere along the line or Firefox corrected it.”

    I just finished upgrading Firefox from 1.04 to 1.06 and discovered the nagari problem continues. While searching Google yet again for a solution I came across my old remarks here, so I noticed your follow-up remarks.

    Even after this latest update Firefox continues to display nagari as a bunch of question marks, even when the browser encoding is specifically set to Unicode. Bizarrely, I can type हिन्दी in Unicode in the Google input box (although it shows up as question marks) and get what presumably are pages with the word हिन्दी in it, but the results will not be displayed correctly.

    The Safari browser on this Mac displays nagari, and I can type nagari to my heart’s delight in various programs (OpenOffice, etc.), so I don’t think there’s an obvious font problem.

    I do find it strange that the situation in some ways is worse with the Mac OS version than with Windows 98, which (as an example of what you cited for earlier versions of Firefox) at least showed the nagari characters, just out-of-step (with short i after the consonant, for instance).

    “Er… whoops. Now that I look again, maybe not? Hindi-Urdu phrasebook - Wikitravel”

    Again, I just get question marks. The Urdu shows up fine. Good link, though.

    Another browser curiosity: Safari (for me at least) has broken Arabic text flow (characters that are “joiners” are shown in their stand-alone form), but Firefox renders it fine!

    I don’t read Arabic so this is not a big deal for me. However, it would be nice if I could get Firefox to show nagari, as the browser has much better pop-up blocking and other features, although Safari is surprisingly no slouch.

    शुक्रिया । ख़ुदा हाफ़िज़ ।

    जैफ़

  8. pat Said,

    July 23, 2005 @ 11:40 pm

    Hey Jeff,

    Yeah those sound like pretty perplexing issues, I don’t really have any idea why the font would fail to be working in FF but not in Safari. I couldn’t resist digging around a bit, and I found this: Simon Brown’s weblog - Hindi on Mac OS X, but I imagine you’ve seen that…

    To be honest the whole concept of “font shaping” is a black art to me. Today I was reading Unicode and fonts, and I came across the following:

    A rendering engine that can handle all of Unicode, unlike a universal font, is a perfectly reasonable component of modern operating systems, though there are still many legacy systems with old engines. Both Mozilla and Internet Explorer, for example, have built-in engines to compensate for deficiencies in the operating systems under which they may be run.

    So the place where “font shaping” takes place is not one “place” — both the browser and the OS have a hand in determining how fonts are rendered. I had never really tried to figure it out, but I always figured it was Pango that handled all the font stuff in Linux.

    Guess not.

    Head a splodes.

  9. Bobby Kanchan Said,

    August 1, 2005 @ 3:17 am

    Hindi Word Processor ‘Madhyam’, is being made available to Hindi enthusiasts for free by its developer Balendu Sharma Dadhich. This is very simple, yet effective Hindi tool. It can be downloaded here.

  10. pat Said,

    August 1, 2005 @ 5:31 am

    Hi Bobby,

    Thanks for the pointer, but I would point out that most of the links on the page you link to seem to link to dead ends. In any case, I did dig around a bit and find the application you were referring to:

    Hindi Word Processor (Text Editor): Madhyam

    It looks like it’s Windows only, and doesn’t support Unicode yet, which is a bummer. But according to the blurb, version 2.0 will.

    Also interesting to see the keyboard layouts used for Hindi and info about “How Hindi input system of ‘Inscript’ works.”

    Cheers

RSS feed for comments on this post · TrackBack URI

Leave a Comment

Please wrap code snippets in <code> tags, thanks!