> >I think we can do something that will work by sidestepping the
> >character set switching mechanism. Each "string" must be in a single
> >character set. We could reuse PEX terminology and refer to these as
> >"mono-encdoded" strings.
>
> PEX is wrong.
At least the marketing of PEX was wrong. Let's spare the list and just
say that PEX is irrelevant in the PC space, and that is where the bulk
of the Web action is, OK?
> >A "character set" is a set of relations between "code points"
> >(integer indices) and generic notions of glyphs. For example, 0x41 is
> >an "A" in ASCII (& 8859/1 or ISO Latin-1). The character set has
> >nothing to do with what type face (appearance -Helvetica, Old English,
> >etc) the "A" is.
>
> A character has nothing to do with the glyph image either.
The term character is so general that it is not useful. From the Unicode
glossary (http://www.stonehand.com/unicode/glosscnt.html):
Character.
(1) an element of a computer character set;
(2) an element of an alphabet;
(3) an element of the Han script (see Hanzi). See also glyph.
Glyph.
An abstract form which represents one or more glyph images, and which is
used to visually depict encoded character data. In displaying Unicode
character data, one or more glyphs may be selected to depict a
particular character. These glyphs are selected by a rendering engine
during composition and layout processing. See also character.
> >A "font" is a set of specific glyphs which can be indexed.
>
> Sorry. Try messing with litagitures.
Font.
A collection of glyphs used for the visual depiction of character
data. A font is often associated with a set of parameters, e.g., size,
posture, weight, serifness, etc., which, when set to particular values,
generate a collection of imagable glyphs.
> >Can anyone summarize what is going on with I18n in the HTML world? I
> >saw a link to some info the other day, but did not follow it.
This is not a summary, but here are a few URLs:
http://www.w3.org/hypertext/WWW/MarkUp/html-spec/charset-harmful.html
http://www.stonehand.com/unicode.html
http://www.ebt.com:8080/docs/multilingual-www.html (by Gavin Nicol)
> I would guess that my proposal for the document character set for HTML
> be ISO 10646, and we'll be using the MIME charset parameter to figure
> out encoding and perhaps character set. Don't discuss the above if you
> don't understand the full meaning of "document character set".
In otherwords, we should not discuss it on this list? The problem here
is not that the concepts are so hard (they are not trivial, either),
but that there are too many different terms for the melange of concepts.
Usually, tho, if you eschew obfuscation, you can agree on a set of
terms and solve as much of the larger problem as you want to chew off.
Besides, "document character set" is not even in your own glossary?
> Your basic ideas are OK though, even if they're not worded well. How
> about:
>
> Text3
> Fields
> MFString text
> MFString language
> MFString encoding
> MFString charset
If we make VRML own the language & encoding, it will takes months just
to specify. This implies that the browser/VRML library knows everything
about rendering all languages, ne? Or, half seriously, we could have
text rendering servers hanging around. The browser would get the text,
decide if it can do it itself, if not, ship it out to a 3D text server
with a polygon budget...
It's been five years since I looked at this. At that point, it was good
enough to establish the character set of a string and a font that had a
1:1 mapping between indices and glyph images. This is all the X server
does. This displaces the burden out of VRML - but only to the browser.
If "we" feel this is not good enough, we should present some handwaving
about how we can add it in 1.1.
Everyone's eyes glazed over yet?
-Jan