[Dillo-dev]Buggy HTML, the new parser and webmasters
Hi all, First : my apologees for a rather large mail ! Now, that the bug counter is in dillo, and with Madis' last post, it's a good coincidence that I just tried to convince webmasters of 3 sites I usually visit to fix their HTML. From 2 I haven't heard a thing (and they are not even THAT huge), but from www.osnews.com I got a reply very fast - and it turned out to be an extremely interesting conversation. I asked her for permission to forward it to the list, and here it goes. I sorted it chronolically, with some comments of mine in between. I omitted my first mail - it was just the HTML warnings of the new parser along with a kind request to fix the page. WARNING, BIG REMINDER : Don't flame her - it's not about this individual site, it's about general practice and the general attitude towards standards ... After all, the e-mail conversation started in privat and she didn't have in mind that a lot of people would read it. And I think she deserves credit for taking her time and lay out her arguments and explaining everything in a very friendly manner. On Sat, Jan 24, 2004 at 06:36:27PM -0800, Eugenia Loli-Queru wrote:
Hello Andreas, thanks for the heads up. I looked at the error messages, and I know what is wrong. Please note that all forms and tables ARE closing in the code, it is just that your parser is not realizing it. You see, I start the form inside a table's cell, but I close the form outside the table. This is not 100% valid code, but it is like that for a reason. This is causing your parser to spit all these errors (all the errors below the form error are just a side effect of the first error). So, the reason I close the form outside that table is because I am trying to go around bugs on Netscape 4 and IE. As you can see, the cell that holds the form has a black background color. If I close the form inside that cell, IE and NS4 create a new paragraph inside that cell, even if my code never asked for it. And that looks ugly.
Up to here, you could argue a lot about how to design a web page, to not wanting to dictate how the page looks and leave it to the browser, to us CSS and whatnot. With enough arguing one could even convince this very author to change (not likely though, see below). BUT : the interesting part comes now :
So, the "trick" that many web developers of my time use a lot (see: hand-written HTML), is to close the </FORM> outside of tables or places where it screws up the design. It is a well-known trick, and so your new parser is going to meet it a number of times in the future. I suggest you
The above is frightening ....
fix your parser to understand "non-normal" code like that, and realize that the form does close as expected.
Of course, for us that's not fixing but breaking ;-) .... Here is now my reply (rather long, and you will know the arguments if you follow the lamiling list) :
Hi Eugenia,
Thanks for the reply and explanation. First of all : I am not a core developer of that browser - just a steady user and occasional contributor.
However, if this mail came from one of the developers or if I told them to follow your advice, you would either be ignored or flamed ;-) ... I will try to do neither. So please do keep in mind that my response is meant as a friendly food for thought or possibly some constructive criticism.
Discussions about fixing broken HTML do come up on the mailing list from time to time. And the consent is to just not do it - for a number of reasons (I probably will forget some). - what good are standards if nobody conforms to them, and instead an own (not very strict !) psuedo-standard develops ? - It's not a browser's job to guess what the author wanted to have displayed. - It costs man-power and CPU cycles to detect and fix broken HTML. We prefer to spend the man-power on developing something useful instead of broekn HTML-detection code. And we prefer to give the CPU cycles to the user (BTW, dillo is the snappiest grafic browser I know - maybe Links2)
All right, but we already detect the problem of your page, you may say :-) .. and indeed, the *old* parser was closing out of order. In fact, I could configure dillo to still use the old parser and your page will look great. BUT : then other pages break :-( - they have buggy HTML, too. And there, closing tags with the new method is better then with the old method. So, in chosing which parser to use, the default is the standards complying method of interpreting HTML (also, the old parser was buggy and not quite right). Oh - and if we care about the web sites we also tell/told the authors if these other pages that they have buggy code ! Now, why then are Mozilla or the other browsers displaying all correctly ? Probably because they did spend man-power and CPU-cycles to guess what the author wanted. I had to smile when you wrote, "Fix your parser to understand non-normal HTML", because I thought, "That's not fixing it, that's breaking it".
When I design web pages, I usually make them pass (if possible) validator.w3.org and if they look ugly in ancient - and buggy ! - browsers, bad luck ! I realize, my pages are nothing professional like your site, but still, is it worth to support buggy broswers ? How many visitors are actually using Netscape 4 ? Probably more than dillo, but the comparison should be non-buggy vs. buggy browsers. About IE - are all versions of IE bahving like this ? Then my argument would be, why support MS's interpretation of standards ? (and outside of HTML you probably will agrre ;-) ... after all, don't we all know lots of alternatives and fight for them ?) You as a professional web site probably do have to support buggy broswers, but the question still may be asked.
Finally, some links to archived previous discussions : http://news.gmane.org/group/gmane.comp.web.dillo.devel/thread=1324/force_loa... (if you don't want to read all of it, read at least Eric Gaudet's posts) And here is the explanation of the new parser : http://article.gmane.org/gmane.comp.web.dillo.devel/1434
By now, you know that the other problematic site was slashdot ;-) ... (among others !) which displays ok with the current parser - but also is buggy HTML - and they know it !
And before I forget : Yes, I saw a lot of pages that look quite ugly with the strict interpretation. And I also only tell the webmasters of pages for which I care :-)
And as a really last word : the new parser may indeed have room for improvement, and some more closing may indeed be possible as a compromise, BUT, as I said, I'm not a core developer.
As I said in the beginning, this was meant as food for thought, not flaming. If you should decide to change HTML after all, that would be fantastic. If not, hopefully I at least did explain myself well.
Cheers Andreas
And here is her 2nd answer:
Thank you for the email Andreas, however, I must say that the code in that form won't change. IE has more than 70,000 viewers daily on osnews, and Dillo has about 20. So, as you can understand, being compatible with the broken IE is more important to us, even if that means that we have to not be validated. In fact, not being validated, it is a given for big sites, because in order to give a specific/certain look, you have to do tricks like the one I described to you in order to render the same on all big browsers.
Now you could, of course, go and say "screw this site" .... but the following is at least ineresting food for thought. I don't fully agree, but I can understand the point of view.
Also, please put in mind: my husband is in the browser business. He is a web browser developer for OpenWave.com among his team. They don't have the specific problem of out-of-order-tag-closing because they know that many sites are like that, so they have to go around this problem, OR they won't be able to sell their browser to anyone.
This is a market. Even for the free Dillo. Dillo's job is not create the most "correct" parser in the world, but to render most sites in the world. If the first was true, then Dillo is nothing but an "Amaya", a test, a proof of concept. But if the second is true, then Dillo is really a usable by people browser.
So, depends what its developers want. If they want a proof of concept, no problem with me, osnews and most other sites won't work. If they want a browser that works for most people, they need to sit down and go around buggy HTML (created by either amateur developers or from developers that they had to go around bugs). For example, IE even goes around this bug <a/> and understands that the developer wanted to write </a> instead. Big browsers do such error checking and go around buggy HTML, because it is the only way they can be widely used. In my case of course, it was me that I had to go around their bugs.
So, it all depends what the Dillo people want. It is not what I, or you, want, but what their real targets are: proof of concept project or a real USABLE browser.
If that was me, it would have being the second. And it ain't my fault that IE has these bugs that I have to go around, it is my job to make sure my page renders well for most of my readers. :)
Rgds, Eugenia
--- Editor in Chief http://www.OSNews.com
When I asked her for permission to forward her mails to the list she added another intersting quote : On Sun, Jan 25, 2004 at 08:53:17PM -0800, Eugenia Loli-Queru wrote:
You can go ahead and quote some of the stuff I said. BTW, I discussed the matter with my husband (who as I said is in the browser business) and he told me:
"Creating an HTML browser and a Web browser are two different things. The first one just needs to do the job right and parse HTML correctly, the other one just has to be able to render web pages in any possible mean. These are two different beasts and targets --unfortunately."
I am afraid this is the reality of any product though, not just browsers... Reality market... :(
Eugenia
Bottom line : it may be possible to convince individual sites to fix their HTML, but what about really big sites that use these ... err ... "workarounds" ? Do they really care how a little browser interprets their HTML ? They care if HTML looks good in IE - reality sucks. It's extremely frustrating. And even more so, when I have to look at garbeled pages ;-) .... The situation may improve over time when more and more sites use CSS and don't have to rely on such dirty formating tricks - but then again, how many bugs has IE there, that authors want/will work around ? And to repeat my reminder : Don't flame her - it's not about this individual site, it's about general practice and the general attitude towards standards ... And again I think she deserves credit for taking her time and lay out her arguments and explaining everything in a very friendly manner. Cheers, Andreas -- **************************** NEW ADDRESS ****************************** Hamburger Sternwarte Universitaet Hamburg Gojenbergsweg 112 Tel. ++49 40 42891 4016 D-21029 Hamburg, Germany Fax. ++49 40 42891 4198
On Tue, 27 Jan 2004 14:01:14 +0100 Andreas Schweitzer <Andreas.Schweitzer@hs.uni-hamburg.de> wrote:
On Sat, Jan 24, 2004 at 06:36:27PM -0800, Eugenia Loli-Queru wrote: [...]
So, the reason I close the form outside that table is because I am trying to go around bugs on Netscape 4 and IE. As you can see, the cell that holds the form has a black background color. If I close the form inside that cell, IE and NS4 create a new paragraph inside that cell, even if my code never asked for it. And that looks ugly. [...] So, the "trick" that many web developers of my time use a lot (see: hand-written HTML), is to close the </FORM> outside of tables or places where it screws up the design. It is a well-known trick, and so your new parser is going to meet it a number of times in the future. I suggest you
http://www.cs.tut.fi/~jkorpela/forms/extraspace.html (maybe forward this?) i had that problem myself... personally i went for the CSS solution, as dillo (non-css capable) doesn't have that "feature" anyway. when playing with the more more less valid options, i even found some combinations where *mozilla* would NOT display the form AT ALL, while IE still did. Greetings, Thorbem Thuermer
On Sun, Jan 25, 2004 at 08:53:17PM -0800, Eugenia Loli-Queru wrote:
"Creating an HTML browser and a Web browser are two different things. The first one just needs to do the job right and parse HTML correctly, the other one just has to be able to render web pages in any possible mean. These are two different beasts and targets --unfortunately."
For some other commentary on the effects of buggy HTML, you might be interested in some recent postings on Surfin' Safari: http://weblogs.mozillazine.org/hyatt/archives/2004_01.html Dave Hyatt has a series of posts on how to best handle XML errors, and he talks about the difficulty of getting WebCore to handle all the broken HTML filling the web today. One of his key points: "The #1 reason that HTML pages render incorrectly in alternate browsers is because of differences in error handling and recovery." He describes his experiences trying to make Safari handle broken HTML the same way IE does (much as IE originally had to handle broken HTML the same way Netscape did). His main point is about XML error handling (specifically, if all XML browsers insist on well-formed documents from the start, no one will have to deal with the nightmare of different error-correction schemes), but he has some interesting comments on dealing with broken HTML. Kelson Vibber www.hyperborea.org
participants (3)
-
Andreas Schweitzer
-
Kelson Vibber
-
Thorben Thuermer