>knowledge predates Unicode & ISO 10646. However, I've done a little
>bit of homework and found these messages in the html-wg archives:
I forget exactly where, but there are also some excellent papers
online about I18N. I think gatekeeper.dec.com will have them (search
for i18n).
Also, http://www.unicode.org/ is a good starting point.
>Unicode and ISO 10646 are the same thing in at least some dimensions,
I think Jans' intentions are very good, but one needs to be very very
careful here. Characters are *not* code points, nor are they
glyphs.
>What would it mean to say we use ISO 10646? Well, ISO Latin 1 is a
>subset
OK. Simple tutorial. For text, we basically need to be able to
identify characters. In order to do that, we need to identify the
coded character set, and the encoding of the text. The encoding allows
us to map from a bit stream to a code stream, and the coded character
set specification allows us to map from codes to
characters. Repeat. We need to know the coded character set and
encoding of the text *before* we can parse the bit stream.
It is usually better to support *multiple* coded character sets, of
which one should be ISO 10646 and/or Unicode. HTML-WG is moving toward
ISO 10646 purely because it offers an abstract way to provide a
foundation for mutlilingual support, and because it unifies numeric
character references. The fact that is also requires no changes to
current browsers helps of course... ISO 10646 might be less suitable
for other applications, and I should emphasise the *abstract* nature
of that decision.
>More complicated options exist. ISO-2022/EUC, being able to switch
ISO-2022 is, shall we say, "interesting"...
>I get the feeling no one cares about this and I am boring the list to
>tears, I'm sorry, I'm done now.
I would hope that many people care. This issue is a very important
one.