On Sun, Aug 01, 2010 at 12:56:23PM +0100, Jeremy Henty wrote:
If you point dillo at http://www.reddit.com/ you will see many javascript fragments. These come from div elements with an onclick="$(this).vote('<$>votehash</$>', null, event)" attribute. Dillo thinks that the '>' looks suspiciously like an end tag character. It looks ahead, sees the '<' character and decides that its suspicions are correct, so it terminates the tag at the '>', warns of a missing attribute close quote and renders the rest as HTML.
This algorithm may have been effective in the early days of the web but I think it clearly does not work well today. I can't think of a better way of doing the job so I suggest removing it and instead just finding the matching close quote in all cases. Patch attached.
Effectively, this is a problem. Unfortunately the final-quote approach creates more trouble than it seems. See doc/HtmlParser.txt for an explanation. A simple heuristic that can tackle both cases equally well, would certainly be appreciated. For instance, if there's a missing quote (according to current heuristics), and there's an unmatched '(' in the atribute value, we may try to look for a matching ')' in the vicinity, and if found, look for the closing quote afterwards. If found, we can change the first decision. This approach has low testing cost (O(n) in case of missing quote), and may help discriminate both cases: real missing quote and scripting language. -- Cheers Jorge.-