[Dillo-dev]HTML Policy and New parser improvements
Hi there, Most of us are familiarized with the aforism about our HTML parsing policy stated in the [Project Notes]: "Our policy with HTML is not to try to render badly written HTML, ideally send a warning message, and not to crash!" Here goes the rationale behind it. I wrote it sometime ago as an email answer, and here I quote an improved version: <q> About our parsing policy ------------------------ These days I have thought a lot about this subject. In fact an important part of the work of a project maintainer is indeed to make a stance on the difficult decisions; those that are not white or black, but a trade off. With regard to our parsing policy, imagine a triangle with the following vertices: * Standards (W3C) * Web site authors * Users Each vertex represents the exact position of its naming group. The inside area, the whole space of stances anyone could take. Each group has its own interests, sometimes opposed and sometimes very near. The position of a web browser can be visualized as a point within the area of that triangle determined by its development team. Dillo should have to be in the vertex of the W3C, but that would make it almost useless because of the horrible state of the HTML in the Web (aka. "Tag Soup"). For that reason we make some exceptions so that dillo can render a larger set of the web, by correcting some HTML faults, but we keep close to the W3C vertex. An standards compatible browser, as Mozilla, should be close to the W3C, but I understand it would never manage to be a canditate to replace IE if it did (a trade off). Actually it locates alongside the authors-users side AFAIU. As our main objective is the democratization of the access to the internet's information, and that has direct relation with the use of standards, we follow the path of respecting and promoting them. The idea of adding an HTML quality meter to the interface, in the form of a face icon and the number of detected errors, surges as a good idea to improve dillo as a QA tool for content authors. I've also thought of adding a "combat mode rendering" button. That is a way of parsing the worst sites into a basic an simple rendering. That way, users would be one click away from being able to "see" pages with awful HTML. These two ideas would help dillo to keep close to the correct vertex of the triangle, while also becoming a tool to help web authors to provide more standards compliant content. </q> After the new parser was introduced (0.8.0), Dillo featured much better HTML error detection, but it rendered malformed HTML a bit worst. It was a good trade off from the "standards" vertex of the above mentioned triangle, but I also knew that it was not going to be that much amusing for the "users" vertex. These days I've been working on improving the parser and bug-meter by introducing information about the inline, block and flow content models of HTML. After having that information in place, it was easy to produce better and more accurate bug detection and also to improve the rendering more towards what it used to be. So that's the good news: the new CVS contains code with an improved parser that hopefully will be a glad surprise for our users. From the Changelog: * Added container|inline model information to the HTML element table, and made the bug-meter and the parser aware of it. This both improves bug detection and rendering. * Fixed newly detected HTML bugs in bookmarks dpi and file.c. * Fixed opening files with a ':' character in its name (again). * Added binaryconst.h (allows for binary constants in C). * Fixed The ladder effect with lists (BUG#534). So go ahead and try it! Cheers Jorge.-
On Fri, 16 Apr 2004 12:59:35 -0400 (CLT) Jorge Arellano Cid <jcid@dillo.org> wrote:
After the new parser was introduced (0.8.0), Dillo featured much better HTML error detection, but it rendered malformed HTML a bit worst. It was a good trade off from the "standards" vertex of the above mentioned triangle, but I also knew that it was not going to be that much amusing for the "users" vertex.
These days I've been working on improving the parser and bug-meter by introducing information about the inline, block and flow content models of HTML.
After having that information in place, it was easy to produce better and more accurate bug detection and also to improve the rendering more towards what it used to be.
So that's the good news: the new CVS contains code with an improved parser that hopefully will be a glad surprise for our users.
Wow! I can see a big improvement, lots of sites that rendered readably, but 'iffy', are now much clearer and cleaner. -- jim nutt home: jim@nuttz.org jabber: jimnutt@jabber.com work: jimnutt@vestek.com ms msg: jim@nuttz.org pgp id: 1ECBCC78
participants (2)
-
Jim Nutt
-
Jorge Arellano Cid