But why bother?
Parsing SGML with a top down recursive decent parser based on an FSR is
by far the simplest approach to implement and also produces correct code.
Why would anyone want to use an inappropriate tool which does the job less
well and is more difficult to use?
Yacc is OK if you actually have an LR(1) grammar. But its best to steer well
clear of it otherwise. In addition error handling was never really though out
properly for yacc. I've never seen anyone sucessfully use the error
productions without comming a cropper.
HTML2.0 is just about parsable with yacc but HTML3 is pretty awfull. Especially
the maths extensions since they use some of the character set shifting
functions. This part is distinctly non LR(1) and the best, most compact
definition of the grammar is produced using a push-down automata.
I think the problem lies in comp sci classes being taught that bottom up
parsing is `better' and the students not asking why. Goldfarb would not know
an LR(1) grammar if one bit him on the nose. If he had SGML might not fall
into the "much wailing and gnashing of teeth" catogory which it does.
PS: I have discovered that the correct pronunciation of "ASN.1" is "assasin 1".
-- Phillip M. Hallam-BakerNot Speaking for anyone else.