[Dillo-dev]Re: dillo patch: anchor names

Oct. 15, 2004

      Matthias,

  Maybe the most important guideline in this answer is that we're
trying  to  provide  good hint-messages for common HTML bugs, not
being as picky (or correct) as the W3C's validator.

  The two main reasons behind this are that first, we do not want
to  (nor  can)  complicate  too  much the code inside dillo (some
big browsers have several parsers inside), and we want to help to
fix the most problematic HTML bugs (mainly nesting), not all.

  BTW, inside Dillo all the HTML-like content is currently parsed
as  HTML-4.01  with  a  few minor exceptions. HTML-4.01 is a good
default because it tries hard to be backwards compatible.

  The  third  reason  is that if the need for a formal validation
arises, the W3C does a great job on it! :)

On Wed, Oct 13, 2004 at 06:20:36PM +0200, Matthias Franz wrote:
...
Dear Jorge,
here is the anchor name patch I promised you long time ago.
It does the following:
* First of all, it evaluates the <!doctype> tag to find out whether
the document is HTML or XHTML. If the tag is wrong or missing, an
error is raised.
Parsing <!doctype ...> is a good idea.

  Putting that info in a structure like this one:

     typedef enum {
        DT_NONE,
        DT_HTML,
        DT_XHTML
     } DocumentType;

     typedef struct {
        DocumentType Type;
        float Version;
     } DocumentInfo;

  allows  for  having  all  the  information in one place, and to
later decide whether to take some action or not.

  e.g. DT_NONE + DT_HTML + 4.01 means no doctype was given and
       that HTML-4.01 is assumed as default.

       DT_HTML + 4.01 means it was stated explicitly in doctype.
...
* Dillo now distinguishes more carefully between head and body section
There  was  a  bug in dillo (up to rc1). A patch is now in CVS.
When  the  HTML  meta  refresh warning was sent, it switched from
IN_HEAD to IN_BODY.

  Note that for HTML-4.01:

    BODY: Start tag: optional, End tag: optional
    HEAD: Start tag: optional, End tag: optional
...
* The errors "<...> not allowed in body section" are now centralised
in Html_process_tag
Could be.
...
Moreover, errors are raised in the following situations:
(After all, this was the goal!)
* if an anchor name (defined by "name" or "id") is already defined
OK.
...
For performance reasons, I have changed (very) few lines in dw_page.c
and dw_gtk_viewport.c.
(pending as for the latest bugs found...)
...
* if (in HTML mode) the "name" and "id" tags of <a> differ
OK.
...
* if <a> tags are nested
OK.
...
* extra_warning if an anchor name (defined by "name") was illegal for "id"
OK.
...
NOT DONE:
* warning if in XHTML <a> is used with "name" and no "id"
(according to the spec, this has no effect, which is probably not intended)
OK.
...
* the "refresh" warning causes (like before) an error if further
head elements follow the <meta>
Fixed in CVS now (Björn Brill).
...
* I've discovered that some parts of the TagInfo structure are not used
any more, for example TagLevel and bits 2^0 = 1 and 2^2 = 4 of Flags.
TagLevel is used extensively by the W3C+heuristics mode. Look at
Html_tags_get_taglevel() calls.

  Yes,  bits 0 and 2 are not yet used, but there they are just in
case they're needed.
...
In particular, I didn't know how to define them for <!doctype> on line 4281.
HTML  elements  can  be of type 'block' or 'inline' (well, also
'flow').  And they can be containers of 'inline' or containers of
'blocks'.

  This is what the flags are. I'll comment that inside the code.

  For instance, <address> is an 'block' element, and a cointainer
of 'inline' elements.

  address  B8(0110)
              |||`- inline element
              ||`-- block element
              |`--- inline container
              `---- block container

  This is well defined here:
    http://www.cs.tut.fi/~jkorpela/html/nesting.html

  Now,  as !doctype isn't there, an inline element that's a block
container  can  appear  almost anywhere (i.e. B8(0101)), and help
to tackle the issue.
...
* IN_BUTTON in html.h is also not used any more; I've replaced it by the
new IN_A.
Let  IN_BUTTON  be. As buttons can't be nested, it was meant to
catch that one (not implemented yet).
...
* One change in Html_process_tag is more of a hack; I didn't want to             
start rewriting everything without contacting you first.
You see that is still work to do in html.c, all the more because know
one could add error messages based on the distinction between HTML and
XHTML. (E.g., "@" is illegal in XHTML because of the uppercase "X".)
Would this kind of changes be welcome?
Hmmm, I think this is too much by now.
...
I hope this patch can still make it into rc2. If you have comments or
questions, please let me know.
As explained before, it better not be in rc2. Just bug-fixes.

-- 
  Regards
  Jorge.-

[Dillo-dev]Re: dillo patch: anchor names

Jorge Arellano Cid