Frames and WWW

Gavin Nicol ([email protected])
Thu, 17 Nov 1994 06:50:36 -0500


>|>>> I would also humbly submit the same scheme to be used as an extension to he
>|>>> the anchor label scheme in HTML proper ie say
>|>>>
>|>>> http://bongo.cern.ch/fred.html#H1:2/H2:4/H3:3/P:4/10,15
>|>>
>|>>This would be a great idea were there only some real containers to be used.
>|>
>|>I seem to be forever repeating myself. The syntax above is not
>|>correct.
>
>Not correct by whose definition? Are you refering to W3O or W3C draft specs or
>an RFC from the IETF? Only the latter is definitive with respect to the Web
>and even then only if it doth not contain egregious lossage.

Is the above in either the W3O, the W3C or the IETF? Does anyone use
this? As you so nicely point out, standards mean nothing if they are
not used. It just so happens that the TEI people have some very large
archives, and some very large SGML documents, and they needed a way to
retrieve only parts of the documents, and so they invented the 2
methods I outlined, among others. You seem to suffer from the awful
"not invented here" malady. Why not make use of something that is
already accepted by quite a few people in academia? I will repeat:
Phil proposed this:

http:///bongo.cern.ch/fred.html#H1:2/H2:4/H3:3/P:4/10,15

while I proposed long ago the use of the TEI invented naming schemes.

http:///bongo.cern.ch/fred.html/section=2/subsection=4/subsubsection=3/P=4
http:///bongo.cern.ch/fred.html/2/4/3/4

(Note that the second uses the child number of the element, whereas
the first is using the occurence of the element name within the child
list.)

>Code ? Implementation? Rough consensus and working code!

As I pointed out, in the DynaWeb server (which I wrote) can use
these. DynaWeb uses the structure of the SGML document tree to
generate TOC's on the fly, and (as a configuration item) one can use
either the above 2, or element ID addressing to retrieve a single
element, or a single subtree from the document. Even in multimegabyte
documents, retrieval takes milliseconds. This is converted to
HTML on the fly. I should point out that managing a single document
and having navigational links mostly generated on the fly is a lot
more convenient than managing a number of little HTML files, and
worse, all the links between them.

>What relation do these `sections' have to HTML elements. Is H1 a section?
>Is H2 a subsection? What is a H3???

Well, now we come to the crux of the matter. HTML was very poorly
designed because it ignored the inherent structure of documents, so in
fact we don't have many containers... if we want to address
something using the TEI stuff, it will be very "flat"
(fred.html/P=14).

>Is this an SGML standard or a Web standard? Who has commented on it? Dave
>Ragget? Tim B-L?

This is a *humanities* standard. The people found that using SGML was
of great benefit, because they have *huge* data repositories, and so I
guess one could also say it's an SGML, standard. Now, while I have
every respect for Tim B-L, I'm not sure he knows all that much about
document processing on a large scale. Dave Ragget certainly knows SGML
well. Ask him for comments.

I should note that no-one now controls the WWW. The genie is out of
the bottle.

>If its an SGML standard don't imagine that it has any relationship to
>HTML.

Well, HTML *is* SGML (which of course you know), but it is a
particularly poor form of it. As I noted, these are not SGML specific
(see below).

>|>I should note that HTML has few containiers as noted above, but we can
>|>just look upon this as a degenerate case of deeply structured
>|>documents. In fact, even things like RTF and LaTeX can have thier
>|>components addressed by such schemes.
>
>It is possible to create containers by associating sections of text with the
>preceeding headers and nesting Hn+1 elements within Hn elements. This may be
>hard to express in SGML lossage but that is SGML for you.

This has got to be the funniest thing I have read all week! Probably
all year! SGML's primary purpose is to define the structure of a document
explicity by defining containers and content model. One cannot define
containers by associating Hn with the following text for 2 reasons:

1) Many people use Hn for font effects
2) One cannot find the boundaries

So I guess your idea works on tag occurence within a document, in
which case, making it look like a path is a mistake because you imply
a heirarchy where there is none.

Tim B-L didn't define the containers (even *LaTeX* has them...) and
now you blame SGML for not being used correctly? When your programs
crash do you swear at the language design?

----
For people who are not interested in some SGML evangelism, you can
skip the rest.

As I said earlier, unless you are working in the abstractions that the
author uses, one is missing the boat entirely. One of the nice things
about SGML is that it gives one the flexibility to define the
abstractions (if one so wishes), or to simply use the abstractions of
others if they happen to fit. Better, it allows you to define the
information structure explicitly, and then check that the document
actually fits the model you defined.

Now that by itself is useful, but what I think is most useful is that
by marking up a document using well structured SGML, one can actually
turn the document into a database. SGML allows searches to be limited
to one particular region within the document. Think again of the TEI
path:

/section=1/subsection=2/para=3

This names one element uniquely. We can limit searches to just this
one element if we wish. Now we can make this less specific simply by
removing the numeric qualifiers:

/section=1/subsection/para=3

This will specify all 3rd paragraphs within all subsections of
section 1.

/section/subsection/para

This is all paragraphs within all subsections within sections. These
have obvious benefits for searches, or any kind of document
transformation/analysis.

Another area where well structured SGML wins is in automatic TOC
generation. In a well structured SGML document, there is a heirarchy
explicitly define by the DTD. Using that, one can generate TOC's
simply by telling the system which elements should appear in the
TOC. In the example above, we might say that all sections,
subsections, and paragraphs should appear, so when we something like:

http://foobar.baz.org/foo.html/

we generate a list of all sections. Then when we get something like

http://foobar.baz.org/foo.html/section=1

we generate a list of subsections within that section.

In DynaWeb, we do precisely this. One can define the depth to which
the TOC's should be generated, and a TOC generation threshold. DynaWeb
takes this even further and allows one to specify multiple TOC
views. For example, you can generate a list of figures, or tables, or
have an "expert" and "amateur" TOC generated automatically *from the
same dataset*

Now SGML *is* coming to the WWW. There are many large corporations and
academic sites that *require* SGML, and the structure it contains. In
fact, you told me that *you* had plans to do an SGML aware browser...

HTML has been tested, and found lacking. There is a concensus that
SGML is needed, and even more, that stylesheets are needed. A MIME
type for SGML is almost ready for the real world, and SGML Open is
committed to creating a stylesheet format based on (the soon to become
ISO standard) DSSSL spec. This spec will be made freely available, as
will an implementation of the stylesheet library. It should be
completed in the next month or so.

Beyond this, I will be *very* surprised if we do not see a full SGML
browser appear within the first half of next year; probaby from some
commercial entity. We already have Panorama, but that is just a first
step toward full SGML browsers. Full SGML browsers will be able to
read HTML as well as any other DTD. When that happens (and providing
the browsers are of sufficient usefulness that they are widely used)
the need for HTML will be diminished. It will continue to have a
place, a large place, within the WWW, but it will not define the WWW
as it does now.

If you think you could possibly stop the many large corporations
around the world who want this, you are very much mistaken. As you so
rightly pointed out, the bottom line is in the RFC's, the IETF, and
most of all, in the hands of users (who vote with their feet shall we
say). The genie is out of the bottle. One cannot cork it again. To try
to do so is arrogance.