Hi there! After all the trouble with trying to convince Slashdot's webmasters to fix their HTML (because of the loftier importance of adhering to standards), and considering Andreas S. comment on that dillo was making a weird rendering only, but not helping to find the cause to fix the problem, I started a more in-depth review of it. As time passed the code I was developing turned more and more interesting and finally here you have it! This is a quote from the ChangeLog: * Adapted the generic parser to make HTML error detection, providing the line number and a hint (expected tag) in the error message! * Added 'show_html_wanings' option to dillorc (boolean). * Added information about optional, required and forbidden end tags. * Modified the parser's handling of closing tags to account for elements with an optional close tag, and for more accurate diagnosis messages. * Added 'use_old_parser' option to dillorc (boolean). * Fixed the handling of HEAD and BODY elements to account for their double optional condition (both open and close tags are optional). This basically means that now dillo will detect and print useful error messages about bad-formed HTML. After using it for a while it helped me to find bugs I have overseen for months and it proved very accurate and helpful! (it even catched one at the W3C site! :) I hope it will be a good tool for webmasters, and for people authoring simple web pages. In our side, it will help to polish our HTML parsing. For instance, it soon became clear that we're sometimes pulling tags when they should stay. For instance: <p> <ol><li> One <li> two </ol> </p> <- detects an error here. because <ol> anticipatedly cleans <p> from the stack. This may look not relevant, but when trying to build a DOM tree it certainly is! Some comments: The new parsing behaviour makes sites render different from what they used to. Sometimes they show better, some others worse. After all it is bad-formed HTML what we're dealing with. For instance: heise.de a bit worst but usable. slashdot.org renders ok. sf.net worst but usable. Please bear in mind that now dillo can diagnose these errors at render time so, as a user, now you can send a meaningful error report to a site's webmaster! (for instance slashdot is diagnosed on-the-fly). Our experience is that most webmasters welcome this reports. Finally, remember that you can turn off the HTML error messages show_html_wanings=NO and that you can fall back to the old parser: use_old_parser=YES Enjoy! (I hope ;) Jorge.-
participants (1)
-
Jorge Arellano Cid