Some comments on /. stuff :
This is the part where I rhetorically ask why anyone would write a web browser from scratch, the goal of which is to render 100% syntactically correct webpages really really fast. Are there not enough existing web browsers whose goals are to render pages really really fast?
Oh, which ones ? On my P 120, which is my only computer, I tried mozilla and its derivatives - they filled up my 56 Mb of RAM just for loading and took a dozen of minutes to show up even a simple page. This without counting the megabytes they took on my hard-drive. So I had a few choices : either switch back to Windows 98 and Internet Explorer, which *is* in comparison damn fast, or use Lynx on my Debian. Or finding another fast linux-graphical browser. I browsed the net. I tested "browsers" like Chimera2, that no one ever heard about, and was very upset with every browser I tried until I found Dillo. Even if Dillo is very strict regarding HTML compliance of pages, it is, as far as I know, the only Linux graphical browser working correctly on legacy computers.
But I have other things to worry about, and you have other browsers to choose from that render Slashdot just fine.
Yes - it works fine with Lynx. I guess maybe we pushed the issue a little too hard - after all, Slashdot is not the only website that doesn't display correctly on Dillo. As the development philosophy is "stick to the standards", I think we shouldn't care too much about badly formatted pages. Maybe by version 1.0 we can include some kind of "render_uglily_coded_webpage" function to deal with that, but as far as I'm concerned I'm very happy with dillo the way it is. I guess for people like me, who use old computers in everyday's life, using Dillo+Lynx is a very good way of accessing a great part of the web resources. Best regards, Mathieu -- "Just living is not enough," said the butterfly, "one must also have freedom, sunshine, and a little flower." Hans Christian Andersen
On Sat, Oct 18, 2003 at 09:58:05PM +0200, The Night Howler wrote: <snip>
Maybe by version 1.0 we can include some kind of "render_uglily_coded_webpage" function to deal with that, but as far as I'm concerned I'm very happy with dillo the way it is.
I have a suggestion regarding this: While supporting broken HTML isn't really in the project goals, would it be possible to wrap a tool like htmltidy using the dpi? So that dillo itself never even gets delivered broken HTML at all, becauses it's all been pre-processed into something valid... Just a thought :) -- Stephen Lewis
On Sun, Oct 19, 2003 at 09:46:01AM +1300, Stephen Lewis wrote:
I have a suggestion regarding this: While supporting broken HTML isn't really in the project goals, would it be possible to wrap a tool like htmltidy using the dpi?
So that dillo itself never even gets delivered broken HTML at all, becauses it's all been pre-processed into something valid...
One of the objections to accepting and attempting to display broken HTML is that you have no idea if the browser interpretation of the bad HTML matches the intention of the author. HTML is well described so it's not exactly rocket science to generate gramatically correct HTML source. The fact that so many web pages generated programatically are seriously broken and there is huge resistance to correcting the situation is worrying. It means that not only do browsers have to deal with the many published versions of the HTML standards and the common propriatory extensions, they also have to be bug and interpretation of bad HTML compatible. The result will be that HTML is actually defined by the unpublished, defacto behaviour of the browser with the biggest installation base. It also means that browsers become hugely bloated as more and more variations to correct HTML have to be dealt with; with the bloat comes more difficult to correct bugs. Now we know what to avoid, perhaps it's time to give up on HTML and try again :-) -- Geoff Lane McDonalds hamburgers are made from 100% real clown meat.
On Saturday 18 October 2003 2:22 pm, Geoff Lane wrote:
Now we know what to avoid, perhaps it's time to give up on HTML and try again :-)
I believe that's the reason the XHTML spec says it should not even be rendered if it's invalid. And believe me, this is a good way to ensure you've got valid code. I switched over a few sections of my website to send application/xhtml+xml when appropriate, and suddenly it was *really* obvious when a page had bugs on it! (The disturbing part was that Mozilla caught a few errors that slipped past both the WDG offline validator and the W3C online validator.) -- Kelson Vibber www.hyperborea.org
On Sunday 19 October 2003 01:53, Kelson Vibber wrote:
I believe that's the reason the XHTML spec says it should not even be rendered if it's invalid. And believe me, this is a good way to ensure you've got valid code.
Does this also work on IE? If it does, God forbid the day that some bright engineer at MS decides they should change their code to try to render it anyway. --Ivo
On Saturday 18 October 2003 5:50 pm, Ivo wrote:
On Sunday 19 October 2003 01:53, Kelson Vibber wrote:
I believe that's the reason the XHTML spec says it should not even be rendered if it's invalid. And believe me, this is a good way to ensure you've got valid code.
Does this also work on IE? If it does, God forbid the day that some bright engineer at MS decides they should change their code to try to render it anyway.
No, IE doesn't handle XHTML at all. However, XHTML was designed with backward compatibility in mind. If an XHTML page is sent with the content type text/html, it looks like HTML 4.01 with a few extra attributes that can be ignored. The content type for XHTML, application/xhtml+xml, is only recognized by a few browsers right now, possibly only the Gecko-based ones and Amaya. Since Mozilla trusts the server to send the correct mime type, it assumes that a text/html page is HTML, not XHTML, and parses it as HTML. Any errors found are treated the same way it treats errors in HTML. It's only when it's served with the XHTML mime type that Mozilla enforces the well-formed XML requirement. IE can trash XHTML if you're not careful, though - since IE assumes the server is likely to be misconfigured, it will sometimes see the XML declarations at the beginning of an XHTML file and try to display it as plain XML, even if your server has told it the file is HTML. I saw this when I had more than one stylesheet attached as an XML declaration. Browsers that really display XHTML (through an XML parser, requiring it to be well-formed) should put application/xhtml+xml in their HTTP Accept header. (Gecko browsers do.) Documents in XHTML 1.1 or using features only found in XHTML should always be served with this mime type. For XHTML 1.0 documents intended for a wider audience, the server should use content negotiation to determine which the browser can handle, and serve either text/html or application/xhtml+xml. I should mention that, AFAIK, the don't-even-render-it rule only applies to XML well-formedness. So mis-nested or extra tags (like the extra </td></tr> in Slashdot) would cause the browser to display an error, but unfamiliar tags (such as RDF embedded in XHTML) would simply be ignored as long as they don't invalidate the XML structure. For example, you can serve the following: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head><title>Blah</title></head> <body> <h1>Header</h1> <blargquote><img src="whatever.jpg" /></blargquote> </body> </html> ...and it will display the way you would expect - ignoring the fictitious "blargquote" element (which would be caught by a validator). But if you instead do this (note the missing close tag for blockquote): <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head><title>Blah</title></head> <body> <h1>Header</h1> <blockquote><img src="whatever.jpg" /> </body> </html> ... Mozilla will display this error message: XML Parsing Error: mismatched tag. Expected: </blockquote>. Location: test.xhtml Line Number 8, Column 3: </body> --^ But if you send the exact same file with the text/html content type, it will use the HTML parser and try to guess where the blockquote is supposed to end. -- Kelson Vibber www.hyperborea.org
-- En reponse de "Re: [Dillo-dev] Re: /. crap" de Geoff Lane, le 18-Oct-2003 :
I have a suggestion regarding this: While supporting broken HTML isn't really in the project goals, would it be possible to wrap a tool like htmltidy using the dpi?
So that dillo itself never even gets delivered broken HTML at all, becauses it's all been pre-processed into something valid...
One of the objections to accepting and attempting to display broken HTML is that you have no idea if the browser interpretation of the bad HTML matches the intention of the author.
Not only that, but it is also a significant effort to : - detect that the html is broken (it can have thousands of different problems) - pick the right fix for it (the one that most browser will agree on, because there's no standard for rendering broken pages, by definition) - fix it so it displays somewhat properly (meaning, change the internal representation of the page) By "significant effort", I mean we need develpment time to add a lot of non-trivial error recovery code, and also it will slow down the rendering process because it will need some sort of second-pass parsing of the page. That's why Dillo's attitude is to stick to the standard, not because we're standard fanatics: that's the only way to have a bounded and reliable definitoin of what rendering should be. The only effort we make towards broken html is trying as hard as we can not to segfault. If a broken page is not rendering properly in Dillo (or any browser), it's a bug in the page, not a bug in Dillo. Period. Best, Eric ------------------------------------------------------------------------ Eric GAUDET <eric@rti-zone.org> Le 18-Oct-2003 a 20:10:05 "Parler pour ne rien dire et ne rien dire pour parler sont les deux principes majeurs et rigoureux de tous ceux qui feraient mieux de la fermer avant de l'ouvrir." ------------------------------------------------------------------------
On Sat, Oct 18, 2003 at 08:19:49PM -0700, Eric GAUDET wrote:
-- En reponse de "Re: [Dillo-dev] Re: /. crap" de Geoff Lane, le 18-Oct-2003 :
I have a suggestion regarding this: While supporting broken HTML isn't really in the project goals, would it be possible to wrap a tool like htmltidy using the dpi?
So that dillo itself never even gets delivered broken HTML at all, becauses it's all been pre-processed into something valid...
One of the objections to accepting and attempting to display broken HTML is that you have no idea if the browser interpretation of the bad HTML matches the intention of the author. <snip> The only effort we make towards broken html is trying as hard as we can not to segfault. If a broken page is not rendering properly in Dillo (or any browser), it's a bug in the page, not a bug in Dillo. Period.
OK, your response ( and the others ) has convinced me that this was a bad idea :) I realize this thread comes up periodically, I only mentioned it this time because it was a way of minimizing the problem outside of the core, by reusing existing tools (like tidy). Also, it gets tiresome loading up another browser after a while with large sites, like slashdot, that can't easily fix their problems. ( That's one downside of using dillo - once you get used it, it's hard to tolerate anything else :) -- Stephen Lewis
...tidy...
OK, your response ( and the others ) has convinced me that this was a bad idea :)
:-) More generally, my point was: there's a lot to do to improve Dillo, let's not waste our time fixing problems that are not supposed to exist in the first place.
I realize this thread comes up periodically, I only mentioned it this time because it was a way of minimizing the problem outside of the core, by reusing existing tools (like tidy). Also, it gets tiresome loading up another browser after a while with large sites, like slashdot, that can't easily fix their problems. ( That's one downside of using dillo - once you get used it, it's hard to tolerate anything else :)
Tidy is actually an excellent idea ... server side :-) This /. guy should give it a try. On the other hand, fixing a broken page client side everytime it's loaded really sounds terrible. About the list of excuses given for not fixing the bug right away, the only real one is the management's priorities. As for the rest, I would just say: - not fixing bugs because some customers rely on them is bad for business; - not fixing bugs beacuse it has a lot of dependencies tell the design is probably wrong; - not fixing bugs because "it's complicated" is really lame; - "why would anybody write a browser from scratch?" why would anybody do anyting? let's all use windos and IE6! Best, EG
participants (6)
-
Eric GAUDET
-
Geoff Lane
-
Ivo
-
Kelson Vibber
-
Stephen Lewis
-
The Night Howler