>If we keep HTML down to a context-free language composed of regular
>tokens, then folks can write little 20-line ditties in perl, elisp,
>lex, yacc, etc. and get real work done.
Can you write a little 20-line perl program that lists the variables
of a C program?
>If we require real-time processing of all legal SGML documents,
>we buy nothing in terms of functionality, and we render almost
>all current implementations broken.
I don't think it has been suggested that browsers need to be able to
process *all* legal SGML documents. It is after all a specific DTD
and a specific SGML declaration.
>>| <!-- this: <A HREF="abc"> looks like a link too! -->
>>
>>How so? It's in a comment, and so will be ignored by a parser.
>
>Yes, by an SMGL compliant parser, but not by any parser built
>out of standard parsing tools like regular expressions, lex, and yacc.
>(well, actually, you could do it with lex, but it's a pain...)
Recognising a comment can be done with regular expressions. If you
have trouble making lex and yacc handle this, I don't think it is
because the limitations of lex and yacc.
>>| And this: a < b > c has no markup at all, even though it
>>| uses the "magic" < and > chars.
>>
>>But not in the magic combinations <[A-Za-z] etc.
>
>Right. The famous "delimiter in context". Contrast this with the
>vast majority of "context free" languages in use.
I will compare this with C. In C "/" is a token used for the division
operator and "*" is a token used for the multiplication operator, but
when "/" is followed by "*" it is a comment start. This is consistent
with a "context free" language as is recognising a "<" as a start tag
opener when it is followed by a letter.
>You say "crippled", I say "expedient". Remember: the documents are
>still conforming. It's just the WWW client parser that's non-standard.
It is harder to make SGML tools produce correct HTML if HTML has a lot
of arbitrary restrictions.
-- Lennart Staflin <[email protected]>