[Dillo-dev]Parser orthogonalization

Nov. 21, 2004

      Hi there,

  A few days ago I finished the:

    [1] Orthogonalize the parser (leaks, XML, CSS)

  task  from  the list. I didn't commit it inmediately because of
the  pthreads issue we're still workint in (it showed as a parser
bug when in fact it was elsewhere).

   Now the large parser patch is commited (a 70KB beast!).

  From the ChangeLog:
<q>
   * Orthogonalized the generic parser: 
       - Fixes memory leaks and widget state when recovering from bad HTML.
       - Improves error detection and validation (needed by XHTML).
       - Makes DOC tree generation possible (needed by CSS).
       - Cleaner design of handling routines for bad HTML.
       - Orthodox treatment of double optional elements (HTML, HEAD, BODY).
       - Lots of minor code cleanups.
</q>

  That's the summary.

  Now  tags  are  always pushed into the stack when they're found
and  the  respective  closing  functions  are  always called when
finishing  them,  no  matter  if this is triggered by an explicit
tag,  optional close, or bad-HTML cleanup. This is very important
because  of  the above explained reasons.

  The routines for handling HTML cleanup (optional, magic and bad
HTML) were mostly isolated into three functions:

    Html_test_section()
    Html_stack_cleanup_at_open()
    Html_tag_cleanup_at_close()

  (fairly commented in the source).

  In  brief:  the good news of all of this patch is that it makes
much  easier  to  understand,  improve  and  maintain  the parser
(eventually  to chop, merge or reuse too), fixes memory handling,
widget  creation  state,  validation, and is also good for future
CSS, XHTML.

  For instance, look at the new <P> and </P> tag treatment at:

    http://www-106.ibm.com/developerworks/eserver/articles/framework.html

  Now, I'm back to pthreads, ah, and I have more cleanups already
done waiting for commit, some patches pending, the usual stuff!

-- 
  Cheers
  Jorge.-

Jorge Arellano Cid

tags

participants (1)