[Dillo-dev] Quoted attribute parsing: summary

Aug. 16, 2010

      Prompted by  some private conversation  with corvid I've  been digging
through specs and source code to see what the state of play is.

The HTML5  specification[1] states that the user  agent should consume
text,  converting character  references  until it  finds the  matching
close quote.  If there is no  matching close quote (ie. it sees an EOF
first) then it terminates (strictly  speaking, it switches to the data
state and reconsumes the EOF, which makes it emit an EOF token).

Taking out Dillo's bogus attribute value detection as I proposed would
make Dillo parse quoted attribute values as per the HTML5 spec.

The Hubbub  HTML parser library[2]  parses quoted attribute  values as
per the HTML5 spec.

Firefox parses quoted attribute values  as per the HTML5 spec *except*
that if it  sees an EOF then  it backs up to the  open quote, discards
it, then  reparses as  though it was  expecting an  unquoted attribute
value.  Otherwise (ie. if it  finds the matching close quote) it makes
no attempt to detect a  broken attribute value, no matter what content
the attribute value has swallowed up.

So it seems that  the world at large has given up  on trying to detect
and correct broken attribute values.

Jeremy Henty

[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/
[2] http://www.netsurf-browser.org/projects/hubbub/

[Dillo-dev] Quoted attribute parsing: summary

onepoint＠starurchin.org