I'm trying to put together a lex style specification of
the lexical elements of HTML. It will almost certainly
conflict with current usage.
But I think the reason current usage is broken is that
the SGML standard is so obtuse.
I believe if I write up a lex specification of exactly
what characters mean what and when, and it's only a
couple pages of lex code, the folks will implement
it faithfully.
As is is now, everybody just writes their own ad-hoc
finite state machine. That's too error prone.
Dan