RE: SGML Capacities

Claude L. Bullard ([email protected])
Fri, 2 Jun 1995 14:42:46 -0400


[Paul Burchard]

| The mere existence of these limits is what I find troublesome...it
| doesn't matter how "adjustable" they are.

OK. To some, it matters. Typically they are concerned with
variant syntaxes and how much space is reserved by the
SGML parser for the storage of compound document entities.
One can consider that a complexity consideration for
applications in which the arbitrary complexity of documents
is not known until the instance is received. Again, SGML
(developed originally on early PCs, not mainframes)
has a very very very (enuff?) wide span of applicability.

Apps such as HTML can overlook this as its location
conventions force one to eventually adopt the style
of creating "lotsa little files" instead of document entities
with logical framing as is the case with most frame-based
hypermedia. HTML is non-complex both in the
model space it defines and in the typical instantiation.
That is, at this point, it is *glorified email*. While the HTML
app_sys can overlook capacities by convention, that is a
reflection of the non-extensibility of the design arising from
weak design concepts, not inherent strengths.

| In the case of a scripting extension to the Web (such as Java),
| what you really need is an arbitrary-length URI, in order to specify
| both the location of the script and the initialization data that
| should be passed to this invocation; e.g.

http://www/java/script.class#x=1;y=2;name=hello-world;...

This is an example of what I consider weak design. Why? How
can I register in advance the servers that are required? In SGML,
the NOTATION declaration can be used to do that. This can be combined
(as you rightly note) with an <!ENTITY declaration that points
to the notation. Some options are:

1. Attach an Attribute list to the NOTATION or ENTITY declarations.
The NOTATION is not a good choice. The entity is better as you
may have multiple entities of the same notation. The NOTATION declaration
is used also to indicate which specification is used, or in other
words, which public flavor of the notation is expected. Use an
IDREF to point to the entity. By this, many links in the document
can point to this location and maintenance is eased. There are
other considerations of link validation (existence, is the target
a valid reference type, etc) that are possible.

2. Write an element type (ilink-derived perhaps) in which the
property values are separated by attribute types, e.g.

<note>what follows isn't exactly kosher HyTime but is legal
SGML. The full example takes more explanation.</>

<!ELEMENT goJava - - (somecontent) >
<!ATTLIST go java
MyLink NAME ilink -- declares base class of element type --
NOTATION NAME 'java'
target CDATA #REQUIRED -- holds the location without using entity --
xarg NUMBER #IMPLED
yarg NUMBER #IMPLIED
name CDDATA #IMPLIED >

There are several variants on this. For example, if you want the arguments
to be in a string, simply declare

args CDATA #IMPLIED

put them together in a single string separated as you desire. Of course,
the SGML parser does little validation work for you and if the sender
mixes these up, it is up to the application parser to disambiguate them.
It is the intent of SGML to enable the application designer to
decide how much work must be done by the application, and how
much can be offloaded to the SGML parser particularly in the
validation phase of transmission and receipt. Since many apps use
the SGML parser to direct activities beyond validation, this is
strictly a design issue.

Again, it is the SGML Declaration the one looks to for
capacities, not the DTD. These are separable and
configurable.... as you wish. As to entity resolution for
URIs, there are other approaches. One to be considered
is the SGML Open catalog. As your applications grow
in size and complexity and more link validation is required, you
are going to discover that inline links with local location
values are very difficult to maintain. The issues of server
(ie, notation) registration, protocols, linktyping, separation
of database and presentation layer, behavioral vs style specification,
and type cataloging are all important to the WWW.

I apologize to the VRML list members for taking this
much of your bandwidth for this discussion. Paul, we
should probably go offline with this as it is orthogonal
for the moment to VRML.

Len Bullard