Unicode News
July 23rd, 2005I thought it might be interesting to look through what a Technorati search for “Unicode” turns up recently. This may be of no interest to others… but I like whiling away hours reading about Unicode.
Heh, stop that snickering. I could have a crack habit.
- Urdu Blogging: Discussion on Urdu Content Management: “I have deliberately chosen only the sites that use Unicode Urdu.”
- A pretty long thread over SitePoint Forums - how to differenciate between unicode and plain text. This is surprisingly complex task, especially when you’re talking about web apps (and isn’t everyone?). There is what looks to be a link to a pretty interesting reference at the end of the thread, but it was down when I checked…
- Interesting: a Chinese blogger explicitly requesting that users switch to Unicode. love is beautiful. Isn’t it though? Now first of all… I thought all Blogspot blogs were sent as UTF-8 in the first place. But my browser (Firefox) defaulted to ISO-8859-1 (which is equivalent, mostly to latin-1, IIRC). So I had to heed the blogger’s request to see the Chinese: change yr character coding of yr browser to unicode if u cant c e Chinese characters above. Weird.
- Okay, doubly weird, another Blogspot blog with the same problem: fallen angel says: “*to view -> view -> encoding -> unicode (utf-8)” I’m not sure what’s going on here, and I’m too tired to venture a guess. Explanations welcome. It seems to be related to Blogspot, and it’s not just a Chinese thing — here’s the same problem at on an Urdu blog. Ok, one more:
- Malayalam Related Topics Oh, jackpot. A whole blog about the Malayalam and Unicode. Yikes, according to the Malayalam Unicode font tester, neither Opera nor Firefox passes (under Linux, anyway). Here are some screenshots, see for yourself (I added the red boxes): part a, part b. Opera does slightly better. Man, Malayalam is one complex script.
Do you have Far East Languages installed? (Assuming Windows XP.)
Your computer can correctly interpret the page, even, but if you don’t have the right fonts installed, you still can’t see the (correctly interpreted) characters.
- Lion Kimbro @ 23 July 2005Hey Lion,
Thanks for coming by. I believe we’ve chatted somewhere online before… but I can’t remember where. IRC, maybe? Oh no wait, that Python thing you were doing for a while, that’s it.
Anyway, yeah, it’s definitely not a font problem (talking here about the Blogspot blogs). If I set the encoding to Unicode, everything shows up fine.
I suspect that this is that notorious Apache configuration problem that people run across sometimes: you can set you web pages to the right (so-called)
charsetin the headers and/or with ametatag, but if the server isn’t set correctly, it will send as the default iso-8859-1.But I’m just guessing.
Oh, and I don’t have a Windows box, I only run Linux (Fedora Core 3, at the moment).
Cheers
- pat @ 23 July 2005