[Dillo-dev]Preparing the 0.8.1 release
Hi guys, 0.8.1 is getting closer every day. It will feature what's currently in the CVS (basically the improved HTML bug detection and rendering). I'd love to be able to switch this dillo version from alpha to beta stable, but for that there's a lot of testing required. That is, the new parser (going to lots of sites and asserting there're no crashes), and the recently found problems with puppy linux and downloading and FTP (has anyone experienced trouble with these dillo plugins on their systems?). Please test our current CVS hardly, and tell us how it did. Thanks to Jim, who was the first one to send some feesback:
Wow! I can see a big improvement, lots of sites that rendered readably, but 'iffy', are now much clearer and cleaner.
Cheers Jorge.-
Please test our current CVS hardly, and tell us how it did.
Directories like http://fy.chalmers.se/~appro/linux/DVD+RW/tools/ or others are not rendered correctly Jens
On Thu, 22 Apr 2004, Jens Arm wrote:
Please test our current CVS hardly, and tell us how it did.
Directories like
http://fy.chalmers.se/~appro/linux/DVD+RW/tools/
or others are not rendered correctly
Yes. This is a complex problem. Apache (the web server), generates bad HTML for directory listings (at least 1.3.x series): <PRE> is an inline container excluding: {IMG, OBJECT, APPLET, BIG, SMALL, SUB, SUP, FONT, BASEFONT} Also, it can't hold block elements (e.g. HR). So there's a bug in having <HR> inside <PRE> and one bug per directory entry because <PRE> can't contain <IMG>. As usual, the problem is not black or white, because producing valid HTML for the directory listing would require tables (if you want to keep the image). Probably, a long time ago, they considered the situation and found that bad HTML using PRE was more portable than one based on TABLE. I don't know for sure. Parhaps the 2.x.x Apache series produces XML. Does anyone know? After analizing the produced HTML for directory listings, I found that isolating the <HR> outside <PRE> is a trivial fix that helps keeping the block element outside. That is: - <HR> + </PRE><HR><PRE> Very simple. It would allow for violations of the type "inline container with excluded inline element within" (instead of "inline container with a block inside"). Does anyone know an apache developer? It seems that sending the patch through bugzilla could take years to be noticed... OTOH, I don't like very much breaking our policies with code to handle bad HTML, and yes I know this is the most used web server. Maybe a dillorc option to enable a workaround for apache could do it, but I'd really like to talk with one of the developers. Comments? Cheers Jorge.- PS: Patching Dillo to render it is trivial. Though it could break other pages...
On Monday 26 April 2004 4:48 pm, Jorge Arellano Cid wrote:
OTOH, I don't like very much breaking our policies with code to handle bad HTML, and yes I know this is the most used web server. Maybe a dillorc option to enable a workaround for apache could do it, but I'd really like to talk with one of the developers.
No offense, but you have *got* to be kidding about (a) not working around this in some way, (b) expecting a program with an install base orders of magnitude higher than Dillo to change immediately, and (c) actually adding an "unbreak my application" preference. Even if Apache is fixed next week, how long will it take for the millions of sites using it to upgrade? What about all those installations using an OS distribution that keeps using the same old 1.3.10 or whatever, just backporting security patches? I see little harm in working around this bug, but if you prefer not to cater to it, perhaps a compromise would be to discard the invalid HR, rather than end the PRE block early. -- Kelson Vibber www.hyperborea.org
Jorge Arellano Cid wrote:
OTOH, I don't like very much breaking our policies with code to handle bad HTML, and yes I know this is the most used web server. Maybe a dillorc option to enable a workaround for apache could do it, but I'd really like to talk with one of the developers.
Comments?
Yes, I think dillo should render those pages. Even if a new 1.3 version comes out many people won't upgrade immediatly, old evrsions of apache are aorund for a very very long time and I woudl respect this. I think we should have an option like iCab which allows "strict" rendering and then options to make it more and more tolerant. After all we have to use dillo in the real world, not in an ideal one. -R
On Mon, Apr 26, 2004 at 07:48:11PM -0400, Jorge Arellano Cid wrote:
On Thu, 22 Apr 2004, Jens Arm wrote:
Please test our current CVS hardly, and tell us how it did.
Directories like
http://fy.chalmers.se/~appro/linux/DVD+RW/tools/
or others are not rendered correctly
Yes. This is a complex problem.
Apache (the web server), generates bad HTML for directory listings (at least 1.3.x series):
<PRE> is an inline container excluding:
{IMG, OBJECT, APPLET, BIG, SMALL, SUB, SUP, FONT, BASEFONT}
Also, it can't hold block elements (e.g. HR).
So there's a bug in having <HR> inside <PRE> and one bug per directory entry because <PRE> can't contain <IMG>.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=9307 and http://nagoya.apache.org/bugzilla/show_bug.cgi?id=13351 suggest that the apache developers know, and it's not going to change in the 1.3 series. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10880 suggests that the default won't change in the 2.0 series, but there may be IndexOptions settings that the web server administrator can use to make it valid.
Parhaps the 2.x.x Apache series produces XML. Does anyone know?
http://httpd.apache.org/docs-2.0/mod/mod_autoindex.html#indexoptions says that it can (2.0.49 and later) if the admin configures it that way. (I haven't verified it, though.)
After analizing the produced HTML for directory listings, I found that isolating the <HR> outside <PRE> is a trivial fix that helps keeping the block element outside.
That is:
- <HR> + </PRE><HR><PRE>
Very simple. It would allow for violations of the type "inline container with excluded inline element within" (instead of "inline container with a block inside").
Does anyone know an apache developer? It seems that sending the patch through bugzilla could take years to be noticed...
It can be a bit hit and miss. If one of the core team is directly interested and not otherwise engaged, they can get things done quickly. Their most recent attempt not to lose patches seems to be described in http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=108012889313136&w=2 which suggests that bugzilla may be being watched closely.
OTOH, I don't like very much breaking our policies with code to handle bad HTML, and yes I know this is the most used web server. Maybe a dillorc option to enable a workaround for apache could do it, but I'd really like to talk with one of the developers.
Comments?
As others have said, any future change to apache won't change current servers for years to come. So rendering it usefully, if not ideally, would be prudent. And a big black mark in the bug meter is in order, too. Correctness vs usefulness... f -- Francis Daly francis@daoine.org
On Tue, Apr 27, 2004 at 12:19:51PM +0100, Francis Daly wrote:
On Mon, Apr 26, 2004 at 07:48:11PM -0400, Jorge Arellano Cid wrote:
On Thu, 22 Apr 2004, Jens Arm wrote:
Please test our current CVS hardly, and tell us how it did.
Directories like
http://fy.chalmers.se/~appro/linux/DVD+RW/tools/
or others are not rendered correctly
Yes. This is a complex problem.
Apache (the web server), generates bad HTML for directory listings (at least 1.3.x series):
<PRE> is an inline container excluding:
{IMG, OBJECT, APPLET, BIG, SMALL, SUB, SUP, FONT, BASEFONT}
Also, it can't hold block elements (e.g. HR).
I may be missing something with this. What I think I am reading is that this is an Apache issue as far as why the page isnt rendering well. I am running Thttp for a server. I copied the source and stuck it on my server. It then will not render well with Dillo either. I sense I am still missing part of this discussion but thought I would add my 2 cents anyways. *;o) -- Pete http://milneweb.com http://nomorevirus.com
On Tue, Apr 27, 2004 at 08:44:45AM -0600, Pete wrote:
On Tue, Apr 27, 2004 at 12:19:51PM +0100, Francis Daly wrote:
On Mon, Apr 26, 2004 at 07:48:11PM -0400, Jorge Arellano Cid wrote:
On Thu, 22 Apr 2004, Jens Arm wrote:
Directories like
http://fy.chalmers.se/~appro/linux/DVD+RW/tools/
or others are not rendered correctly
Yes. This is a complex problem.
Apache (the web server), generates bad HTML for directory listings (at least 1.3.x series):
I may be missing something with this. What I think I am reading is that this is an Apache issue as far as why the page isnt rendering well. I am running Thttp for a server. I copied the source and stuck it on my server. It then will not render well with Dillo either.
The fact is that the html is invalid, and therefore renders funny. Saving the html and serving it from another web server, or fetching it from a file, won't change that it is invalid. The reason it is an apache problem is that the html is generated by apache, not that it is served by apache.
I sense I am still missing part of this discussion but thought I would add my 2 cents anyways. *;o)
Does that fill in the missing bit? As it happens, thttpd (2.25b) has a very similar problem, in that its default directory listing, if enabled, looks like === <H2>Index of /</H2> <PRE> mode links bytes last-changed name <HR>dr-x 2 4096 Mar 23 12:23 <A HREF="/./">.</A>/ === which current dillo will also recognise as invalid, and render not as the server author presumably hoped. Cheers, f -- Francis Daly francis@daoine.org
On Tue, Apr 27, 2004 at 04:11:19PM +0100, Francis Daly wrote:
I may be missing something with this. What I think I am reading is that this is an Apache issue as far as why the page isnt rendering well. I am running Thttp for a server. I copied the source and stuck it on my server. It then will not render well with Dillo either.
The fact is that the html is invalid, and therefore renders funny. Saving the html and serving it from another web server, or fetching it from a file, won't change that it is invalid.
The reason it is an apache problem is that the html is generated by apache, not that it is served by apache.
Ok, that is what I was thinking but wasnt sure. Seeing all these webservers serve up the html incorrectly, do you make them correct the issue or do you have Dillo also 'render incorrectly'? I know there are a lot of sites that are not rendered correctly with Dillo. Whether these issues would be the reason or not I am not educated enough to know. Thanks for the explination though. -- Pete http://milneweb.com http://nomorevirus.com
Hi Francis, First of all, thanks for your research and well backed answer. Here I go: On Tue, 27 Apr 2004, Francis Daly wrote:
On Mon, Apr 26, 2004 at 07:48:11PM -0400, Jorge Arellano Cid wrote:
On Thu, 22 Apr 2004, Jens Arm wrote:
Please test our current CVS hardly, and tell us how it did.
Directories like
http://fy.chalmers.se/~appro/linux/DVD+RW/tools/
or others are not rendered correctly
Yes. This is a complex problem.
Apache (the web server), generates bad HTML for directory listings (at least 1.3.x series):
<PRE> is an inline container excluding:
{IMG, OBJECT, APPLET, BIG, SMALL, SUB, SUP, FONT, BASEFONT}
Also, it can't hold block elements (e.g. HR).
So there's a bug in having <HR> inside <PRE> and one bug per directory entry because <PRE> can't contain <IMG>.
<q from above URL> Yes, it's a known issue. In 2.0 and later, tables can be enabled by configuration. 1.3 will stick with the <pre> stuff, also because of backwards compat, sorry. <q>
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=13351 suggest that the apache developers know, and it's not going to change in the 1.3 series.
Well, they know the bad HTML issue and made a tables version for the directory listing (that was also faulty but that as of the bug entry should be corrected now).
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=10880 suggests that the default won't change in the 2.0 series,
<q from above URL> Item 1 has been fixed in 2.1 and is proposed for backport. Item 2 ... ehm? what standard? html? agreed. ;-) As said you can use IndexOptions to fix that issues. For backwards compat we should stick with the current defaults. Item 3 doesn't appear in the current tree so I'm assuming it fixed already. </q> The backwards compatibility is a point that was easy to foresee. Although the developer above doesn't seem to be much concerned about the standards, and this surprises me. A simple solution woud be to replace: -<HR> +</PRE><HR><PRE> and replece the <IMG> tag with its alt text, and make _that_ the default in apache. They could provide an option to produce the old directory listing, and thus, leave the responsibility of deciding whether to serve bad HTML to every webmaster, but the default should be valid HTML (IMHO). In brief, the webmaster would have: - Good HTML by default (no images) based on <PRE> - The option to serve bad HTML with images based on <PRE> - The option to serve good HTML with images based on <TABLE>
After analizing the produced HTML for directory listings, I found that isolating the <HR> outside <PRE> is a trivial fix that helps keeping the block element outside.
That is:
- <HR> + </PRE><HR><PRE>
Very simple. It would allow for violations of the type "inline container with excluded inline element within" (instead of "inline container with a block inside").
Does anyone know an apache developer? It seems that sending the patch through bugzilla could take years to be noticed...
It can be a bit hit and miss. If one of the core team is directly interested and not otherwise engaged, they can get things done quickly.
I got a bit discouraged to make the patch and submit it because it's a toilsome work that I don't know will have a warm reception ;) If you feel like doing it, please go ahead, but I'd suggest first to contact one of the developers and explain the idea before making a patch. It's reasonable, unless they have to support some scripts that rely on the directory listing source being served exactly the old way...
Their most recent attempt not to lose patches seems to be described in http://marc.theaimsgroup.com/?l=apache-httpd-dev&m=108012889313136&w=2 which suggests that bugzilla may be being watched closely.
This is good news. But the whole issue is more a "backwards compatibilty" problem than "valid HTML". This is better to be asked in advance.
OTOH, I don't like very much breaking our policies with code to handle bad HTML, and yes I know this is the most used web server. Maybe a dillorc option to enable a workaround for apache could do it, but I'd really like to talk with one of the developers.
Comments?
As others have said, any future change to apache won't change current servers for years to come.
Sigh, I suppose this is right. Even under the pressure of a security-fix release...
So rendering it usefully, if not ideally, would be prudent. And a big black mark in the bug meter is in order, too.
Correctness vs usefulness...
Yes Francis, I think the same as you, but I'd appreciate if someone asks them if serving good HTML without the images for the directory listing is a good default (providing the alternatives described above). For those that read this far: the CVS contains the patch I made for this issue. Cheers Jorge.-
Jorge Arellano Cid wrote
Please test our current CVS hardly, and tell us how it did.
Hi, I have been using dillo CVS since a day or so after you announced the improved HTML bug detection and error recovery. It is certainly a great improvement over 1.8.0 . No more occasional crashes (I notice the "aborts on HTML table inside <sup>" bug I reported is fixed, which is well cool). Slashdot renders OK now (though I suspect this is partially due to them cleaning up their cruddy HTML a little). Have there been any more improvements in CVS? If so I'll update and check them out! Kudos to the dillo project for a great tool! Regards, Jeremy Henty
On Fri, 23 Apr 2004, Jeremy Henty wrote:
Jorge Arellano Cid wrote
Please test our current CVS hardly, and tell us how it did.
Hi, I have been using dillo CVS since a day or so after you announced the improved HTML bug detection and error recovery. It is certainly a great improvement over 1.8.0 . No more occasional crashes (I notice the "aborts on HTML table inside <sup>" bug I reported is fixed, which is well cool). Slashdot renders OK now (though I suspect this is partially due to them cleaning up their cruddy HTML a little).
Have there been any more improvements in CVS? If so I'll update and check them out!
Sure there are. You can check the change entries here: http://cvs.auriga.wearlab.de/cgi-bin/cvsweb.cgi/dillo/dillo/ChangeLog BTW, I still don't know how to recover the last commit-log entries from CVS (by date if possible). Any CVS wizard here?
Kudos to the dillo project for a great tool!
Thanks a lot. Cheers Jorge.-
[Fri, 23 Apr 2004 09:00:24 -0400 (CLT)] Jorge Arellano Cid <jcid@dillo.org> eut le bonheur d'écrire:
BTW, I still don't know how to recover the last commit-log entries from CVS (by date if possible). Any CVS wizard here?
I'm not too much of CVS wizard, but do you mean something like cvs log -N -S -d ">= 2004-04-12 00:00:00" ? Or du you wish to sort log entries by date across multiple files ? Actually "-S" is in my cvs manpage but is refused by auriga.wearlab.de's cvs server.. Quite a pain as it should restrict display to files with modifications. Unless server handles this option, you'll have to find out the interesting parts by hand. Maybe grepping... Hoping it helps -- Nipo
On Fri, 23 Apr 2004, Nicolas Pouillon wrote:
[Fri, 23 Apr 2004 09:00:24 -0400 (CLT)] Jorge Arellano Cid <jcid@dillo.org> eut le bonheur d'écrire:
BTW, I still don't know how to recover the last commit-log entries from CVS (by date if possible). Any CVS wizard here?
I'm not too much of CVS wizard, but do you mean something like cvs log -N -S -d ">= 2004-04-12 00:00:00" ?
Yes, but that's too large. Based on it, I'd recommend: cvs log -N -d ">= 2004-04-21 00:00:00" ChangeLog
Or du you wish to sort log entries by date across multiple files ?
I was thinking of a way to recover the last commit-log entries by date, and the respective diff file. It's not important though, I can do it very well using diff between the CVS and my tree.
Hoping it helps
Sure, thanks Jorge.-
On Fri, 23 Apr 2004 15:47:18 +0200 Nicolas Pouillon <nipo@ssji.net> wrote:
Actually "-S" is in my cvs manpage but is refused by auriga.wearlab.de's cvs server.. Quite a pain as it should restrict display to files with modifications. Unless server handles this option, you'll have to find out the interesting parts by hand. Maybe grepping...
That option is afaik only available in more recent versions of cvs. Greetings Andreas Kemnade
Hi,
That is, the new parser (going to lots of sites and asserting there're no crashes), and the recently found problems with puppy linux and downloading and FTP (has anyone experienced trouble with these dillo plugins on their systems?).
New renderer seems good. Improvements on some sites. No crashes or anything like that. The FTP plugin seems to be working too, for directories. It doesn't seem to handle files though, it gives me "Illegal Seek" errors. A simple example is ftp.sf.net, and then clicking on index-sf.html or robots.txt. One question... if I go to an invalid ftp URL, it doesn't say in the statusbar "not a directory" or whatever, it only says so in the console, is this correct behaviour? Thanks all, Dan -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
Sorry, forgot to mention platform, Latest CVS (11:30am - Fri 23 Apr 2004), x86 Gentoo Linux System, gcc version 3.3.3 20040217 compiled with CFLAGS='-pipe -O2 -mcpu=pentium2 -march=pentium2 -mmmx -fomit-frame-pointer' Dan -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean.
participants (10)
-
Andreas Kemnade
-
Daniel Fairhead
-
Francis Daly
-
Jens Arm
-
Jeremy Henty
-
Jorge Arellano Cid
-
Kelson Vibber
-
Nicolas Pouillon
-
Pete
-
Riccardo Mottola