PROPOSAL: An extension mechanism for HTML

Joe English ([email protected])
Sat, 30 Sep 1995 12:28:52 PDT


An Extension Mechanism for HTML
Version: 1.0
J. English
30 September 1995

ABSTRACT

HTML currently lacks a well-defined mechanism for developing and
deploying new features. This proposal addresses a small part of
this problem at the SGML level by adding a general-purpose
``alternate representation'' element. Content providers may use
this element to supply an alternate representation for browsers
which can not present or do not understand extended HTML
features.

A new scheme for handling unrecognized elements in HTML user
agents is defined, and a brief list of guidelines for designing
HTML extensions is presented.

Issues of media type parameters for extended versions of HTML
and mechanisms for actually extending the HTML DTD are
_expressly not considered or addressed_ in this proposal.

STATUS OF THIS MEMO

This is a working draft, being circulated for comment only.

If there is sufficient support for this proposal it will be
submitted as an Internet-Draft. Please send comments and
suggestions to the author <[email protected]>, the <html-wg> mailing
list, or the <www-html> mailing list.

CONTENTS

1 Statement of the problem
2 Proposed Solution
3 Changes to DTD
4 Impact on existing browsers and tools
5 Impact on existing documents
6 Deployment and interoperability
7 Format negotiation
8 Guidelines for extension elements
9 Potential problems
10 Acknowledgments and history
A Other solutions
A.1 ALT attribute instead of element
A.2 ALTSRC attribute
A.3 NOxxx elements
A.4 Conditional Element
A.5 Marked Sections
A.6 No tags
A.7 Omissible tags

1. STATEMENT OF THE PROBLEM

_How do we teach current browsers to understand elements that
haven't been invented yet?_

The HTML document type definition is still far from complete.
There are several widely deployed new features which are not
represented in the HTML 2.0 DTD, several more which have been
proposed, and there will no doubt be even more in the future.

At the same time, there is a large installed base of HTML user
agents which (by definition) do not support newly-invented HTML
extensions. It is not feasible for developers or users to
simultaneously update all software every time a new extension is
developed.

Therefore a mechanism or mechanisms for providing backward
compatibility with the installed base is desperately needed.

2. PROPOSED SOLUTION

A new, general-purpose ``alternate representation'' element is
defined as follows:

<!ELEMENT ALT - - (%body.content;)>

That is, ALT may contain anything that is legal inside the BODY
element, the start- and end-tags are required, and it has no
attributes.

The ALT element is not allowed in the content of any current
HTML level 2 elements. Instead, it is intended to be used inside
_new_ elements which are not part of the current standard.

The ALT element contains an `alternate representation' of its
parent element (no matter what that parent element is). The
alternate representation should be presented if the user agent
is not able to present the rest of the containing element. If
the user agent is able to present the containing element, the
content of the ALT element should be ignored.

3. CHANGES TO DTD

This proposal entails no changes to the HTML 2.0 DTD, as it
addresses HTML extensions only.

In future extensions to HTML, any newly-defined elements which
can appear as direct children of current level 2 elements
(hereafter, `extension elements') may include the ALT element in
their content model as an optional first subelement.

Note: For the purpose of this proposal, new elements
which appear only inside extension elements are not
considered extension elements themselves.

For example, the definition of the TABLE extension element would
be changed from:

<!ELEMENT table - - (caption?, col*, thead?, tbody+)>

to:

<!ELEMENT table - - (alt?, caption?, col*, thead?, tbody+)>

Since TR, THEAD, and CAPTION are only allowed inside TABLE, they
are not considered extension elements and need not include ALT
in their content models.

See below (8. "Guidelines for extension elements") for other
guidelines in designing extensions.

4. IMPACT ON EXISTING BROWSERS AND TOOLS

For cases where an extension element contains no other textual
content (such as the proposed EMBED and FRAMESET elements), no
change to existing browsers is required since the ``ignore
unrecognized tags'' rule provides automatic backward
compatibility. (In fact, for such cases there is no need to use
a standardized name for the alternate representation element at
all except possibly for uniformity.)

(HTML 2.0 spec, 4.2.1 "Undeclared Markup Error Handling"
[5])

To facilitate experimentation and interoperability
between implementations of various versions of HTML, the
installed base of HTML user agents supports a superset
of the HTML 2.0 language by reducing it to HTML 2.0:
markup in the form of a start-tag or end-tag, whose
generic identifier is not declared is mapped to nothing
during tokenization. [...]

To support other extensions such as TABLE which _do_ contain
content that cannot be presented by user agents which do not
understand the extension, this guideline shall be amended as
follows:

[...] When encountering markup in the form of a
start-tag whose generic identifier is not recognized by
the user agent, if it is immediately followed by an
<ALT> start tag, then the content of the ALT element
should be presented, and all content between the </ALT>
end-tag and the end-tag of the unrecognized element
should be discarded. If no ALT subelement is present,
then the content of the unrecognized element is treated
as if its start- and end-tags were not present.

Note that under this proposal, browsers are expected to keep
track of the element hierarchy instead of simply discarding
unrecognized tags. Ideally this will be accomplished by
employing a true SGML parser with an extended DTD supplied by
the document provider. However, even heuristic parsers should be
able to accomplish this.

User agents may also present the alternate content for
individual instances of _supported_ extension elements, at their
discretion or the user's instructions. For example, in the case
of EMBED, a user may have disabled object embedding, or a
particular embedded object may be unavailable; the user agent
may use the alternate representation in these cases as well.

5. IMPACT ON EXISTING DOCUMENTS

This proposal does not impact existing documents, except
possibly for those which are already using extended HTML
features. The authors of such documents may wish to take
advantage of the proposed ALT element if and when sufficient
browser support has been deployed.

6. DEPLOYMENT AND INTEROPERABILITY

The current proposal places a large part of the responsibility
for backward compatibility on document providers. (Of course so
does any scheme which requires multiple representations of an
element to be provided. I feel that the current proposal does
more to assist document providers in doing so than other
schemes.)

Use of this feature is entirely discretionary, much like the ALT
attribute on IMG. It will not place any extra, mandatory, burden
on authors who wish to use extended or experimental HTML
features; however, should they choose to supply an alternate
representation, it will make it easier to do so.

The alternate representation can be nearly anything, including a
preformatted plain text rendering of the primary content, a
hyperlink to a bitmapped image, or the ever-popular ``click here
to download a more advanced browser'' message.

This proposal is also amenable to automatic processing. For
example, a preprocessor could scan for TABLE elements which do
not contain an author-supplied ALT representation and insert a
plaintext rendering of the table.

7. FORMAT NEGOTIATION

It has been suggested on numerous occasions that Web user agents
advertise which HTML features they suport, and that servers
provide a ``down-translated'' version of documents when
necessary.

At present, there is no clear definition of how this should work
at the protocol level. There have been several proposals,
notably Dan Connolly's paper ``Toward Graceful Deployment of
Tables in HTML'' [1], but this has not been widely implemented.

Note: Several Web sites are known to use the HTTP
User-Agent header to determine which version of a
document to send. This is a questionable practice, and
is error-prone and hard to maintain.

The current proposal has several advantages over format
negotiation schemes:

Format negotiation only works for HTTP and other transport
protocols which support it. The current proposal will work for
any transport protocol, including none (e.g., local file system
access). No modifications to server software are necessary.

Format negotiation does not provide any solution to the
inherently complex problem of maintaining or generating multiple
versions of a document. Including alternate representations in
the document itself takes advantage of SGML to manage this
complexity.

The current proposal provides more flexibility than automatic
down-translation based on format negotiation, since it allows
authors to choose a suitable alternate representation for each
element instance. It also gives more control to information
consumers, who might have no indication that an alternate
representation is even available if automatic format negotiation
were in use.

8. GUIDELINES FOR EXTENSION ELEMENTS

In order to support heuristic parsers, end-tag omission shall
not be allowed for any extension element, nor shall any
extension element have EMPTY declared content or content
reference attributes.

Note: Again, new elements which are only legal inside
extension elements are not themselves extension
elements, so this rule does not apply to them. In
particular, the current Tables, Frames, and EMBED
proposals all satisfy this requirement.

Requiring end-tags on extension elements will allow heuristic
parsers to ``re-synchronize'' the element hierarchy even in the
presence of subelements without end-tags.

It is not anticipated that all or even most extension elements
will require an alternate representation. For example, the HTML
3 / Netscape 2.0 BIG and SMALL tags can safely be ignored by
browsers without losing information, so an alternate
representation for these elements would not be necessary.

To support ``on the fly'' formatting, an ALT element, if
present, should be the first subelement of the element to which
it applies.

9. POTENTIAL PROBLEMS

The user community may be confused by the dual use of the name
ALT as an element name and as an attribute name (on the IMG
element) [7]. This is further exacerbated by the widespread (and
incorrect) practice of referring to all syntactic constructs as
``tags'' instead of distinguishing between element names,
attribute names, markup declarations, delimiters, and actual
tags.

If this is felt to be a serious problem, ALT could be renamed to
ALTERNATE or something else.

[[ See also [8]; I believe this has been addressed, by requiring
user agents to keep track of the element hierarchy instead of
discarding tags. ]]

10. ACKNOWLEDGMENTS AND HISTORY

The idea of including an alternate representation in the
document was first introduced with the ALT attribute on the IMG
element. This was further refined in HTML 3 with the FIG
element, which directly contains its alternate representation.
The proposed FRAMESET and EMBED extensions took this a step
further, by introducing explicit container elements for this
purpose. The current proposal simply generalizes and formalizes
this basic idea.

Discussion on the html-wg mailing list has provided invaluable
input exploring all the issues involved.

A. OTHER SOLUTIONS

A number of other approaches to this problem have been
suggested.

[[ This section is a bit of a mess right now... -JE ]]

A.1. ALT ATTRIBUTE INSTEAD OF ELEMENT

It has been suggested that the alternate representation might
appear on an attribute, as it is with IMG [9].

Due to the severe limitations of this approach, this is not
advisable [10].

A.2. ALTSRC ATTRIBUTE

Another approach is to supply the URI of a document containing
an alternate representation on an attribute of extension
elements. The attribute would have a standardized name, say
ALTSRC. For example:

<!-- in the DTD -->
<!ATTLIST TABLE ...
ALTSRC %URL; #IMPLIED
...>
<!-- in the document instance -->
<TABLE altsrc="table1.txt">
<CAPTION> Table 1 </CAPTION> ... </TABLE>

where table1.txt contains a preformatted, plain text rendering
of the table.

Under this scheme user agents would check for an ALTSRC
attribute on start-tags with an unrecognized element name
instead of completely ignoring them. If such an attribute is
found, the user agent would discard the content of the
unrecognized element and display the referenced URI either
inline or as a hyperlink.

This has the advantage of only transmitting the alternate
representation if it is actually needed, saving transmission
time. It would also help keep source documents less
``cluttered,'' since it would not be necessary to duplicate
information in the main document.

Note: This solution could be used in addition to the
current proposal; the two are mutually compatible.

A.3. NOXXX ELEMENTS

Another approach is to define a new alternate representation
element for each new feature (e.g., NOFRAMES [2] and NOEMBED
[3]), instead of using a standardized element name.

This works when the extension element has no other textual
content (as is the case with FRAMESET and EMBED), but not for
extension elements with primary content.

For example, if a user agent does not know about the TABLE
element, it will not know that a (hypothetical) NOTABLES element
contains an alternate representation either, and would still
attempt to display the TABLE content under the ``ignore
unrecognized tags'' rule.

Note: A naming convention for generic identifiers -- for
example, assuming that an unrecognized element name
NOxxx is an alternate representation of a new xxx
element -- is dangerous and ill-advised.

A.4. CONDITIONAL ELEMENT

It has been suggested that the ALT element take a FEATURE
attribute, which would be used to determine whether or not the
ALT content should be displayed. Under this scheme, the ALT
element may appear before instead of inside the extended
element.

[[ Citation? ]]

A similar proposal calls for an OPTION element, with

<!AttList Option
PRESENT NAMES #IMPLIED
ABSENT NAMES #IMPLIED
>

PRESENT and ABSENT would be a list of ``feature keywords''; the
content should only be displayed if the feature is supported or
unsupported, respectively. [7]

Both of these schemes work on a per-feature basis instead of a
per-element instance basis, so they are more coarse-grained and
hence less flexible than the current proposal. I feel they are
also more error-prone and less intuitive.

The current proposal uses containment to express the
relationship between an element and its alternate
representation. In a conditional inclusion scheme, this
information is lost.

A.5. MARKED SECTIONS

Another suggestion is to ``modularize'' the DTD, and include
parameter entities for each module. These would be defined by
the user agent to either INCLUDE or IGNORE, depending on whether
or not the module is supported, and authors could use them as
status keywords in marked section declarations [7]:

<![ %present.embed; [
<embed stuff here>
]]>
<![ %absent.embed; [
if you see this, your browser does not support HTML level 23 version 29.
]]>

This would require browsers to support marked sections (which
they ought to anyway), and a much greater familiarity with SGML
(also not a bad idea).

On the down side, it requires a greater implementation effort
and, like the conditional element scheme, obscures the
relationship between the primary and alternate representations
of an element. It is also likely to be confusing to the user
community.

A.6. NO TAGS

In the HTML 3 draft, the FIG element's _content_ was the
alternate representation.

It has also been suggested that EMBED work this way:

(<[email protected]>)

There is no need for redundant NOEMBED tags. Each EMBED
is an implied choice between fetching the URL in
question or rendering the enclosed content.

[[ Full citation? ]]

I find this less intuitive than supplying explicit start- and
end-tags for the alternate content. Also, it does not allow
extension elements to contain primary (non-alternate) content;
this could be detrimental to future enhancements. (For example,
EMBED may eventually include subelements to be used as
parameters for processing the embedded object.)

A.7. OMISSIBLE TAGS

The start- and end-tags for ALT could be made omissible:

<!ELEMENT ALT O O (%body.content;)>

This would allow current HTML 3 documents which use FIG to
remain valid without being updated.

Omitting the ALT start- and end-tags would defeat heuristic
parsers in some cases, so providers would need to take care to
include them where they might be necessary. This would apply
only to extension elements which have textual primary content;
current uses of FIG would still work.

REFERENCES

[[ Fill this in... Tables draft, Netscapes FRAMES and EMBED
proposals, FIG discussions. ]]

[1] Toward Graceful Deployment of Tables in HTML
(<URL:http://www.w3.org/pub/WWW/MarkUp/table-deployment.html>)

Dan Connolly <[email protected]>, 13-Mar-1995

[2] A Proposed Extension to HTML: Frames
(<[email protected]>)

Eric Bina <[email protected]>, 17-Sep-1995

[3] The REAL proposal for addition to HTML 3.0: EMBED
(<[email protected]>)

Alex Edelstein <[email protected]>, John Giannandrea,
19-Sep-1995

[4] HTML3 Tables
(<URL:http://www.w3.org/pub/WWW/TR/WD-tables-950925.html>)

Dave Raggett <[email protected]>, 25-Sep-1995

[5] HTML 2.0

(<URL:ftp://ds.internic.net/internet-drafts/draft-ietf-html-spec-06.txt>)

Dan Connolly and Tim Berners-Lee.

[6] HTML-WG Mailing List Archives
(<URL:http://www.acl.lanl.gov/HTML_WG/>)

HyperMail archive of the HTML Working Group mailing list.

[7] html-wg-95q3: Re: A proposal for addition to HTML 3.0: EMBED

(<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1167.html>)

Liam Quin, <[email protected]>.

[8] html-wg-95q3: Re: A proposal for addition to HTML 3.0: EMBED

(<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1166.html>)

Alexei Kosut,
<[email protected]>

[9] html-wg-95q3: ALTs for EMBED, etc

(<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1177.html>)

Terry Allen, <[email protected]>

[10] html-wg-95q3: ALTs for EMBED, etc

(<URL:http://www.acl.lanl.gov/HTML_WG/html-wg-95q3.messages/1178.html>)

Mike Meyer, <[email protected]>