Hello, I've tried out dillo-0.8.3-rc1 for a few days on a Debian GNU/Linux 3.0 "woody"/i386 system. It compiled and installed cleanly, and I did not encounter any regressions against 0.8.2 yet. The parser changes are a real improvement -- several formerly problematic sites I've tried work now, and the "Detected HTML errors" view produces much more meaningful results. Two small problems I've encountered: - It seems that, if a pair of <a href="..."> </a> tags contains something illegal, like <div>, then the <a> tags are ignored, rather than the illegal stuff inside. That's suboptimal. As an example, look at http://www.kmelektronik.de/ (admittedly a very broken site, standards-wise). The text fragments "Light-Version Versand" and "Light-Version Shop" near the page bottom should really be links. If this is hard to fix, then don't bother. - This one is not new, but I fixed it a while ago for myself and then forgot about it: the "This page uses the NON-STANDARD meta refresh tag..." warning usually occurs inside <head>, but is immediately sent to the parser, which in turn switches its HTML processing state from IN_HEAD to IN_BODY prematurely. After that, head-only tags like <title> and <base> are ignored. A minimal fix is at the end of the mail. Best regards, Björn Brill -- Bj"orn Brill <brill@fs.math.uni-frankfurt.de> Frankfurt am Main, Germany --- dillo-0.8.3-rc1/src/html.c.orig Tue Sep 21 17:42:12 2004 +++ dillo-0.8.3-rc1/src/html.c Tue Oct 5 19:21:42 2004 @@ -3101,7 +3101,11 @@ static void Html_tag_open_meta(DilloHtml /* Send a custom HTML message */ html_msg = g_strdup_printf(meta_template, content, delay_str); - Html_write_raw(html, html_msg, strlen(html_msg), 0); + { + DilloHtmlProcessingState SaveFlags = html->InFlags; + Html_write_raw(html, html_msg, strlen(html_msg), 0); + html->InFlags = SaveFlags; + } g_free(html_msg); } }
On Wed, Oct 06, 2004 at 03:47:32PM +0200, Bjoern Brill wrote:
Hello,
Hi Björn.
I've tried out dillo-0.8.3-rc1 for a few days on a Debian GNU/Linux 3.0 "woody"/i386 system.
It compiled and installed cleanly, and I did not encounter any regressions against 0.8.2 yet. The parser changes are a real improvement -- several formerly problematic sites I've tried work now, and the "Detected HTML errors" view produces much more meaningful results.
Thanks for the good report!
Two small problems I've encountered:
- It seems that, if a pair of <a href="..."> </a> tags contains something illegal, like <div>, then the <a> tags are ignored, rather than the illegal stuff inside. That's suboptimal.
Can you elaborate on "suboptimal"? The reason why <a> is closed is that INLINE elements can't contain BLOCK elements, so any inline elements left open are closed. This cleanup has proven very healthy. From the SPEC: <q> The DIV and SPAN elements, in conjunction with the id and class attributes, offer a generic mechanism for adding structure to documents. These elements define content to be inline (SPAN) or block-level (DIV) but impose no other presentational idioms on the content. Thus, authors may use these elements in conjunction with style sheets, the lang attribute, etc., to tailor HTML to their own needs and tastes. </q>
As an example, look at http://www.kmelektronik.de/ (admittedly a very broken site, standards-wise). The text fragments "Light-Version Versand" and "Light-Version Shop" near the page bottom should really be links. If this is hard to fix, then don't bother.
Yes. The only other site I've found is www.lynucs.org which I expect to correct the problem when told. I tried a small hack, but it has the side effect of not cleaning-up any INLINE element upon <div> openings. This solves the problem with the above mentioned pages, but may create bigger problems than what it solves. For instance. INLINES include: TT, I, B, U, S, STRIKE, BIG, SMALL, FONT, EM, STRONG, DFN, CODE, SAMP, KBD, VAR, CITE, ABBR, ACRONYM, SUB, SUP, Q, SPAN, BDO A OBJECT APPLET PARAM IMG BASEFONT BR SCRIPT MAP AREA INPUT SELECT OPTGROUP TEXTAREA LABEL BUTTON Not having them closed (cleaned) upon <div> openning makes me shudder... Maybe a good solution is to only allow an exception when <a> precedes the <div>. This would be much safer. Of course it would'n work with <a ...><b><div> </div></b></a>. Now, considering the small amount of sites doing this, it may be an overkill. Please share your thoughts.
- This one is not new, but I fixed it a while ago for myself and then forgot about it: the "This page uses the NON-STANDARD meta refresh tag..." warning usually occurs inside <head>, but is immediately sent to the parser, which in turn switches its HTML processing state from IN_HEAD to IN_BODY prematurely. After that, head-only tags like <title> and <base> are ignored. A minimal fix is at the end of the mail.
Thanks. Most probably it will make its way into rc2. -- Cheers Jorge.-
On Thu, 7 Oct 2004, Jorge Arellano Cid wrote:
On Wed, Oct 06, 2004 at 03:47:32PM +0200, Bjoern Brill wrote: [...]
Two small problems I've encountered:
- It seems that, if a pair of <a href="..."> </a> tags contains something illegal, like <div>, then the <a> tags are ignored, rather than the illegal stuff inside. That's suboptimal.
Can you elaborate on "suboptimal"?
If a heuristic tries to repair broken stuff, then the repair method with the likely most useable result is optimal. As things are now, <a><div>click here</div></a> ends up as <a></a><div>click here</div>, i.e. the link is empty and thus inaccessible.
The reason why <a> is closed is that INLINE elements can't contain BLOCK elements, so any inline elements left open are closed. This cleanup has proven very healthy.
[...] Yes, I fully agree here. My point is just that, since no visible trace of <a> is left, and links are a rather important part of page contents, another cleanup method may be better in this case.
As an example, look at http://www.kmelektronik.de/ (admittedly a very broken site, standards-wise). The text fragments "Light-Version Versand" and "Light-Version Shop" near the page bottom should really be links. If this is hard to fix, then don't bother.
Yes. The only other site I've found is www.lynucs.org which I expect to correct the problem when told.
I tried a small hack, but it has the side effect of not cleaning-up any INLINE element upon <div> openings. This solves the problem with the above mentioned pages, but may create bigger problems than what it solves.
[...]
Maybe a good solution is to only allow an exception when <a> precedes the <div>. This would be much safer.
Of course it would'n work with <a ...><b><div> </div></b></a>.
Now, considering the small amount of sites doing this, it may be an overkill. Please share your thoughts.
What about this: before forcibly closing <a> (no matter why), render some text like "[...]" or "[HTML bug]" to the page, so that the user can see what's going on? Special-casing <a><div> would likely be overkill. Special-casing <a><block_element> would not be, as long as the workaround is easy, simple and reasonably safe. If such a workaround does not exist, the issue isn't really worth more effort. Regards, Björn -- Bj"orn Brill <brill@fs.math.uni-frankfurt.de> Frankfurt am Main, Germany
Björn, I commited a commented workaround to CVS (alongside with an enhancement for the color choosing algorithm for visited links). If you want to test it, get into html.c:4394 and uncomment the code there. -- Cheers Jorge.-
On Wed, Oct 06, 2004 at 03:47:32PM +0200, Bjoern Brill wrote:
Hello,
[...] - This one is not new, but I fixed it a while ago for myself and then forgot about it: the "This page uses the NON-STANDARD meta refresh tag..." warning usually occurs inside <head>, but is immediately sent to the parser, which in turn switches its HTML processing state from IN_HEAD to IN_BODY prematurely. After that, head-only tags like <title> and <base> are ignored. A minimal fix is at the end of the mail.
Done. -- Cheers Jorge.-
participants (2)
-
Bjoern Brill
-
Jorge Arellano Cid