Re: [Dillo-dev]Re: dillo patch: anchor names

Oct. 28, 2004

      Matthias,

  Please  excuse  me  for  the  delayed answer. I had a hard time
fixing release candidates for the dillo-0.8.3 release...

On Tue, Oct 19, 2004 at 12:03:17PM +0200, Matthias Franz wrote:
...
Hi Richard,
thanks for your comments!
Yes, very interesting.
...
On Fri, Oct 15, 2004 at 04:35:29PM -0400, Richard Page-Wood wrote:
...
I do know a bit about HTML and XHTML, though, and I was wondering
about your doctype-sniffing patch. As I understand it, you're trying
to distinguish between HTML 4.01 and XHTML 1.x by sniffing the doctype
and giving appropriate warnings for invalid markup. Are you also going
to alter the rendering for XHTML?
Certainly not as part of my patch. It's origin was simply the
observation that Dillo refused anchor names like "Dürst" which are
allowed in HTML if defined with the "name" attribute (see Section 12.2.3
of the HTML 4.01 spec).
...
There's a good article here:
http://www.hixie.ch/advocacy/xhtml
which talks amongst other things about the impossibility of correctly
identifying an XHTML document which might be of interest to you.
Having looked at this article and the references given therein, I don't
feel anymore that it would be a good idea to try and figure out whether
the document type is HTML or XHTML.
I still like the idea of supporting XHTML in some way, mostly because XML
lacks many of the strange features of SGML that make parsing difficult.
For example, "<" and "&" are not allowed as ordinary characters in XML.
But this has nothing to do with anchor names, so I will remove the XHTML
parts of the patch (unless someone complains).
Jorge: Are you still interested in evaluating <!DOCTYPE> to figure out
the HTML version? Maybe it would be ok for a small browser like Dillo
to stick to HTML 4.01.
Could be...

  As  the  suggested document explains, there's not a big gain in
serving  XHTML  as  such,  and  nowadays  most of it is served as
"text/html".

  BTW,   it's   hard   to  find  a  site  that  serves  XHTML  as
"application/xhtml+xml", that's not intended for testing.

  In  our  case  the  "detection"  was  just  to try to provide a
hintful  HTML/XHTML  warning. The easy solution is not to raise a
warning or to send it to extra warnings ;).

  What worries me a bit more is what to do with XHTML served with
the  proper MIME type. Currently it's not rendered at all, though
Dillo  can  perfectly  cope with it. The reason is that the XHTML
SPEC requires a validating client, and as Dillo doesn't include a
formal XML parser this is not possible.

  Today  this is not a problem because such sites are very seldom
found.

  Maybe  a  partial validation can serve the standards compliance
objective.  I mean, for instance: proper nesting, lowercase tags,
tag names in the XHTML namespace. Not much more than that.

  Perhaps  MIME  type  detection,  plus some doctype sniffing (to
have "an idea" of whether we are dealing with HTML 2.0, 3.2, 4.0,
4.1  or  XHTML),  and having that information in a structure like
the  one  suggested in the former mail could serve to fine tune a
bit the warning messages (or parser behaviour).

  For  instance having that info, in the case of anchor names can
lead to something as simple as:

   if (!isalpha(val[0]) && doctype == DOCTYPE_XHTML)
      MSG_HTML("first character of '%s' value is outside"
               " the [A-Za-z] set\n", attrname);

  Just make the patch with comment where this messages should go.
With  the  fuzzy  detection  code  in  place it'll be a matter of
binding.

  Not the highest priority, but easy to merge.

-- 
  Cheers
  Jorge.-

Re: [Dillo-dev]Re: dillo patch: anchor names

Jorge Arellano Cid