I've never seen ANY subject glaze eyes like i18n. However, it is REALLY
important. And, I'm not the best person to be proposing stuff since my
knowledge predates Unicode & ISO 10646. However, I've done a little
bit of homework and found these messages in the html-wg archives:
HTML Character Representation/Transmission - Model Glenn Adams
http://www.acl.lanl.gov/HTML_WG/html-wg-95q1.messages/0907.html
Some other intersting comments.
http://www.acl.lanl.gov/HTML_WG/html-wg-95q1.messages/0917.html
Dan Connolly, sheparding the (IETF) HTML working group more or less agrees.
http://www.acl.lanl.gov/HTML_WG/html-wg-95q1.messages/0942.html
> It is also consistent with the proposal that everybody use Unicode.
> Using Unicode is a sufficient, but not necessary mechanism. In
> the MIME-SGML world, I don't believe "Everyone must use Unicode"
> is an acceptable solution. For HTML, it appears to be.
Unicode and ISO 10646 are the same thing in at least some dimensions,
however, I think Unicode is encoded in two bytes and ISO 10646 thinks
of everyything as integers (no bytes needed) [my working assumption]
What would it mean to say we use ISO 10646? Well, ISO Latin 1 is a subset
(and ASCII is a subset of that). So, any characters in these character
sets are OK, but they are considered short hand for the integers - the
code points, they represent. So, "A" is shorthand for 1 - an integer
in a byte. Any characters that are greater than 256, must be represented
as integers with "&#" in front of them. (̫ is the greek Beta char?)
Of course, that will get boring someday. People will want to have a 2 byte
encoding to save on space, but that raises byteswapping and various other
issues that we are blissfully ignorant of in our "ASCII" files.
More complicated options exist. ISO-2022/EUC, being able to switch
character sets in midstring, having 8859/8 (greek?) as the default.
If you think any of these options will be "right", sticking to option #1
now is the safest, since ASCII is a subset of most of schemes.
For us yanks, all of the above options are the same. For Western Europeans,
option #2 is better. For people wanting to put kanji,
dingbats, or greek, or whatever, option #3 means thay can do it now -
may not be pretty, but it is possible.
This makes so little difference in what will do, but what we say we
are doing will have ramifications for a long time.
I get the feeling no one cares about this and I am boring the list to
tears, I'm sorry, I'm done now.
YON, [email protected], Jan C. Hardenbergh, Oki Advanced Products 508-460-8655
http://www.oki.com/people/jch/ =|= 100 Nickerson Rd. Marlborough, MA 01776
Imagination is more important than knowledge - Albert Einstein (1879-1955)