Re: Putting the "World" back in WWW...

HALLAM-BAKER Phillip ([email protected])
Tue, 4 Oct 1994 10:55:59 +0100


In article <[email protected]> you write:

|> From: [email protected] (HALLAM-BAKER Phillip)
|> Date: Mon, 03 Oct 94 20:36:56 +0100
|>
|> It is simply another content encoding to deal with.
|>
|> A charset module can easilly be written to convert fairly arbitrary
|> encodings
|> into UNICODE tokens. This can also do UTS, ASCII, ISO-8893, JIS, and whacky
|> Russian etc. encodings.
|>
|>I'm not sure I follow... excuse me if I missed the point ... but it sounds like
|>you are suggesting we put "ANY ENCODING" in the document and have each viewer
|>convert into UNICODE...

Yes, internally as part of the parser. There has to be some sort of uniform
encoding and chosing UNICODE makes sense.

|>If so, this will cause MAJOR interoperability problems across the network.
|>Expecting every client to be convert to from every possible encoding will never
|>work - consider Latin-1 has : PC 437, PC 850, EBCDIC, ISO8859, UTF, UCS,
|>other PC national code pages...

I was not planning to support EBCDIC. But The ISO encodings, ISO8859 and the
UNICODE ones are ok. The ISO8859 encodings are in any case part of the MIME
spec.

There are no interoperability problems, if a client cannot accept an encoding
it should not send an accept line for it. There are however a number of
optimisations possible such as requiring UNICODE browsers to support certain
stream encodings as well. But this is simply a matter of specification.

|>Rather, the document should be supplied in a canonical encoding, i.e. UCS,
|>that each client should just provide 1 conversion at the max.

Completely impossible, we already have an installed base of ISO8859-1.

|> On the other side I am looking into a scheme of `multifonts' which allows
|> several X11 fonts to be compounded into a single UNICODE mapping.
|>
|>If this is so, then storing them with UNICODE makes more sense since such
|>fonts will exist... and there is no conversion at view time.

This is not currently possible without operator intervention, the X11 font
scheme requires the fonts to be loaded onto the server. We cannot guarantee
that a font will be avaliable. Sysops most often have a three week or so
delay between being asked to do something and it happening. The low sysop
intervention required is a major reason for web growth.

If anyone has a UNICODE font for X11 we would like it. However we have to
allow people to use the fonts they chose.

|> Because the
|> display module is directly engaged we can translate into the target font
|> character by character. This scheme means that the UNICODE stuff does not
|> cause
|> increased internal storage requirements.
|>
|>But this causes a nightmare for system administrators that need to provide
|>conversions from any other encoding to UNICODE... and puts the burden of
|>conversion on the clients each time the document is accessed rather then on the
|>supplier one time.

It has no implications for the system manager, it is an internal conversion
transparent to everyone. X11 and postscript fonts have encoding vector
descriptors. In the case of X11 a set of heuristics is required to obtain
it from the font name. In any case it is a user setup feature. Providing
defaults for the standard X11 fonts is sufficient for now. If people want
to use non standard font encodings then they will have to do work, but
that is inevitable. The hieroglyph poeple would prefer that to not being able
to use a font at all.

|>I fully realize we can't convert over to a single canonical form overnight.
|>But we should provide the convention that re-enforce simple administration
|>and enhance interoperability for all systems.

We will never have a canonical form, the idea is completely antithical to the
web. The whole principle is that the clients and servers should adapt to
support established standards and be adaptable to support arbitrary standards.

--
Phillip M. Hallam-Baker

Not Speaking for anyone else.