Johannes Hofmann wrote:
But for me firefox 3.6.3 shows something given the following HTML (as does current dillo):
<div title="foo >hello world</div>dillo is great
That's Firefox's workaround that I described in my original post: if it sees EOF while parsing a quoted attribute value (ie. if it *never* sees a matching quote) then it goes back to the opening quote, discards it, and parses an unquoted attribute value. So it ends up parsing your example exactly as it would parse <div title=foo >hello world</div>dillo is great which gives the same result as vanilla Dillo, but for entirely different reasons. But Firefox only does that if it can't find the matching quote at all; if you feed it <div title="foo >hello world</div>dillo is great [... repeat 'dillo is great' 10000 times ...]</div><div title="bar"> then it matches the second double quote with the first and *all* the text disappears. Which is exactly what HTML5 says it should do. Of course vanilla Dillo does *better* than Firefox for this example, but in the real world I think it does *worse*. JavaScript fragments that confound Dillo's algorithm are far more common than examples such as the above that it handles well. OK, here's a new proposal: when parsing quoted attribute values, let's copy Firefox! That would: (a) sensibly handle the missing quotes examples that people have suggested (which my proposed patch does not do), (b) handle well-formed JavaScript fragments correctly (which vanilla Dillo does not do), (c) parse well-formed HTML5 as per the HTML5 specification, (d) conform to Firefox's established practice, and (e) not break Reddit! That's 5 wins! It's true that we can't expect people to fix their HTML just because the HTML5 specification says it's broken. And it's even less likely that they will fix it just because it breaks in Dillo. But it is very likely that they will fix it if it breaks in Firefox, so copying Firefox is a good idea, even if you don't care about the HTML5 specification. And, why should we care about edge cases that vanilla Dillo handles better than Firefox, since those are precisely the cases that people will fix to keep their Firefox users happy and that we can therefore expect *not* to see! There's no point in having an algorithm that in theory is better than Firefox's, because in practice it's not. So, why not just copy Firefox? I can't see any downside. Regards, Jeremy Henty