Firstly, hello - I'm a fan and user of Dillo, as well as being a web developer, but I am in no way a coder and I don't have any experience in building web browsers or any other programs - so bear with me if I am talking rubbish. I do know a bit about HTML and XHTML, though, and I was wondering about your doctype-sniffing patch. As I understand it, you're trying to distinguish between HTML 4.01 and XHTML 1.x by sniffing the doctype and giving appropriate warnings for invalid markup. Are you also going to alter the rendering for XHTML? I know that the big browsers such as Gecko or IE6 do doctype-sniffing to switch between a "quirks" mode or a "standards-compliant" mode - are you thinking of doing this, or is it just for showing an error dialog?
* First of all, it evaluates the <!doctype> tag to find out whether the document is HTML or XHTML. If the tag is wrong or missing, an error is raised.
Parsing <!doctype ...> is a good idea.
There's a good article here: http://www.hixie.ch/advocacy/xhtml which talks amongst other things about the impossibility of correctly identifying an XHTML document which might be of interest to you.
You see that is still work to do in html.c, all the more because know one could add error messages based on the distinction between HTML and XHTML. (E.g., "@" is illegal in XHTML because of the uppercase "X".) Would this kind of changes be welcome?
In my personal and certainly very humble opinion, if an XHTML 1.x document is served with the mime-type text/html (as virtually all are, and anyway Dillo doesn't do application/xhtml+xml), it should simply be parsed as HTML 4.01 - precisely because the mime type is a clear indication that it is supposed to be a HTML 4.01 compatible document. If you are doing doctype sniffing in the Gecko way to switch rendering modes, then I'm sure you'll do it better than IE6 and not assume that the doctype can only occur on the first line (IE6 messes up if there's an xml prolog or even a comment). Hey, it'll just be another reason why Dillo is better than IE6! Of course, the currently accepted convention for other browsers is that HTML 4.01 doctypes which include a full w3c DTD url and treated as standards-compliant, but 4.01 without the url and HTML 4.0 and earlier are not. XHTML doctypes are always standards-compliant whether or not an url is present. Finally, and most importantly, I'd like to add my word of thanks to the developers of this really excellent little browser - and having subscribed recently to this list, I can see the level of dedication for making Dillo even better. Richard Page-Wood