On Sun, Oct 18, 2009 at 11:35:43AM +0100, Jeremy Henty wrote:
As I understand it, the purpose of the w3c_plus_heuristics preference is to control whether Dillo will force close some tags when it sees an unmatched close tag. But it seems to have no effect! If you try this HTML:
<html> <head><title></title></head> <body> <p> <table><tr><td><s></table> </body> </html>
then the <s> tag is closed no matter what w3c_plus_heuristics is!
The bug is in Html_tag_cleanup_at_close() in html.cc, in this code:
/* Look for the candidate tag to close */ stack_idx = html->stack->size() - 1; while (stack_idx && (cmp = (new_idx != html->stack->getRef(stack_idx)->tag_idx)) && ((w3c_mode && Tags[html->stack->getRef(stack_idx)->tag_idx].EndTag == 'O') || ((!w3c_mode && (Tags[html->stack->getRef(stack_idx)->tag_idx].EndTag == 'O')) || Tags[html->stack->getRef(stack_idx)->tag_idx].TagLevel < Tags[new_idx].TagLevel))) { --stack_idx; }
The indentation makes it appear that the comparison between the TagLevel fields is bracketed with !w3c_mode but both the indentation and the bracketing are wrong. In fact this code is equivalent to:
/* Look for the candidate tag to close */ stack_idx = html->stack->size() - 1; while (stack_idx && (cmp = (new_idx != html->stack->getRef(stack_idx)->tag_idx)) && ((w3c_mode && Tags[html->stack->getRef(stack_idx)->tag_idx].EndTag == 'O') || (!w3c_mode && (Tags[html->stack->getRef(stack_idx)->tag_idx].EndTag == 'O')) || Tags[html->stack->getRef(stack_idx)->tag_idx].TagLevel < Tags[new_idx].TagLevel)) { --stack_idx; }
which is in turn equivalent to
/* Look for the candidate tag to close */ stack_idx = html->stack->size() - 1; while (stack_idx && (cmp = (new_idx != html->stack->getRef(stack_idx)->tag_idx)) && (Tags[html->stack->getRef(stack_idx)->tag_idx].EndTag == 'O' || Tags[html->stack->getRef(stack_idx)->tag_idx].TagLevel < Tags[new_idx].TagLevel)) { --stack_idx; }
so that w3c_mode has no effect. In other words w3c_plus_heuristics is effectively always YES, even if dillorc says otherwise.
I have attached a patch that implements (what I *think* is) the correct behaviour.
Committed! Thanks Jeremy, that part of the code had always been tricky and the patch looks like a big boolean algebra simplification that makes it much easier to undersatand. -- Cheers Jorge.-