[Dillo-dev] Quoted attribute parsing: summary

Aug. 18, 2010

      Johannes Hofmann wrote:
...
But for  me firefox 3.6.3  shows something given the  following HTML
(as does current dillo):
<div title="foo >hello world</div>dillo is great
That's Firefox's workaround  that I described in my  original post: if
it sees EOF while parsing a quoted attribute value (ie.  if it *never*
sees  a  matching quote)  then  it goes  back  to  the opening  quote,
discards it,  and parses an unquoted  attribute value.  So  it ends up
parsing your example exactly as it would parse

    <div title=foo >hello world</div>dillo is great

which  gives  the same  result  as  vanilla  Dillo, but  for  entirely
different reasons.

But Firefox only does that if it can't find the matching quote at all;
if you feed it

    <div title="foo >hello world</div>dillo is great [... repeat
    'dillo is great' 10000 times ...]</div><div title="bar">

then it matches  the second double quote with the  first and *all* the
text disappears.  Which  is exactly what HTML5 says  it should do.  Of
course vanilla Dillo does *better*  than Firefox for this example, but
in the real world I  think it does *worse*.  JavaScript fragments that
confound Dillo's algorithm  are far more common than  examples such as
the above that it handles well.

OK, here's a new proposal: when parsing quoted attribute values, let's
copy  Firefox!  That  would: (a)  sensibly handle  the  missing quotes
examples that people have suggested  (which my proposed patch does not
do),  (b)  handle well-formed  JavaScript  fragments correctly  (which
vanilla Dillo  does not  do), (c) parse  well-formed HTML5 as  per the
HTML5  specification, (d) conform  to Firefox's  established practice,
and (e) not break Reddit!  That's 5 wins!

It's true that  we can't expect people to fix  their HTML just because
the HTML5 specification  says it's broken.  And it's  even less likely
that they will fix it just because it breaks in Dillo.  But it is very
likely  that they  will fix  it if  it breaks  in Firefox,  so copying
Firefox  is a  good  idea, even  if  you don't  care  about the  HTML5
specification.

And, why  should we care about  edge cases that  vanilla Dillo handles
better than Firefox,  since those are precisely the  cases that people
will fix to  keep their Firefox users happy and  that we can therefore
expect *not* to see!  There's no  point in having an algorithm that in
theory is better than Firefox's, because in practice it's not.

So, why not just copy Firefox?  I can't see any downside.

Regards,

Jeremy Henty

[Dillo-dev] Quoted attribute parsing: summary

onepoint＠starurchin.org