Matthias, Please excuse me for the delayed answer. I had a hard time fixing release candidates for the dillo-0.8.3 release... On Tue, Oct 19, 2004 at 12:03:17PM +0200, Matthias Franz wrote:
Hi Richard,
thanks for your comments!
Yes, very interesting.
On Fri, Oct 15, 2004 at 04:35:29PM -0400, Richard Page-Wood wrote:
I do know a bit about HTML and XHTML, though, and I was wondering about your doctype-sniffing patch. As I understand it, you're trying to distinguish between HTML 4.01 and XHTML 1.x by sniffing the doctype and giving appropriate warnings for invalid markup. Are you also going to alter the rendering for XHTML?
Certainly not as part of my patch. It's origin was simply the observation that Dillo refused anchor names like "Dürst" which are allowed in HTML if defined with the "name" attribute (see Section 12.2.3 of the HTML 4.01 spec).
There's a good article here:
http://www.hixie.ch/advocacy/xhtml
which talks amongst other things about the impossibility of correctly identifying an XHTML document which might be of interest to you.
Having looked at this article and the references given therein, I don't feel anymore that it would be a good idea to try and figure out whether the document type is HTML or XHTML.
I still like the idea of supporting XHTML in some way, mostly because XML lacks many of the strange features of SGML that make parsing difficult. For example, "<" and "&" are not allowed as ordinary characters in XML. But this has nothing to do with anchor names, so I will remove the XHTML parts of the patch (unless someone complains).
Jorge: Are you still interested in evaluating <!DOCTYPE> to figure out the HTML version? Maybe it would be ok for a small browser like Dillo to stick to HTML 4.01.
Could be... As the suggested document explains, there's not a big gain in serving XHTML as such, and nowadays most of it is served as "text/html". BTW, it's hard to find a site that serves XHTML as "application/xhtml+xml", that's not intended for testing. In our case the "detection" was just to try to provide a hintful HTML/XHTML warning. The easy solution is not to raise a warning or to send it to extra warnings ;). What worries me a bit more is what to do with XHTML served with the proper MIME type. Currently it's not rendered at all, though Dillo can perfectly cope with it. The reason is that the XHTML SPEC requires a validating client, and as Dillo doesn't include a formal XML parser this is not possible. Today this is not a problem because such sites are very seldom found. Maybe a partial validation can serve the standards compliance objective. I mean, for instance: proper nesting, lowercase tags, tag names in the XHTML namespace. Not much more than that. Perhaps MIME type detection, plus some doctype sniffing (to have "an idea" of whether we are dealing with HTML 2.0, 3.2, 4.0, 4.1 or XHTML), and having that information in a structure like the one suggested in the former mail could serve to fine tune a bit the warning messages (or parser behaviour). For instance having that info, in the case of anchor names can lead to something as simple as: if (!isalpha(val[0]) && doctype == DOCTYPE_XHTML) MSG_HTML("first character of '%s' value is outside" " the [A-Za-z] set\n", attrname); Just make the patch with comment where this messages should go. With the fuzzy detection code in place it'll be a matter of binding. Not the highest priority, but easy to merge. -- Cheers Jorge.-