On Thu, Jun 07, 2012 at 04:45:42PM +0400, 123 wrote:
On Thu, Jun 07, 2012 at 12:14:06AM +0100, Jeremy Henty wrote:
123 wrote:
On Sun, Jun 03, 2012 at 09:00:55PM +0100, Jeremy Henty wrote:
If you don't detect and ignore that extra double quote you will break many pages that every other browser renders perfectly well.
Then there should be some logic for detecting double quotes.
I agree. The hard question is: what logic?
IMO the right way to do it is to implement HTML Standard [1]. When Html_write_raw is called, parser is in the Data state. When it returns to Data state again, it is the end of token.
I have implemented standard comment/DOCTYPE parsing. It is incomplete, EOF is not handled and DOCTYPE parsing is not changed. Patch is attached. Next step is to rewrite tag parsing in standard way.
[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.htm...
With your patch text disappears e.g. on http://www.cnas.org/blogs/abumuqawama/2011/04/quote-day.html-0 which was mentioned by Jeremy while it renders ok with current dillo and firefox. I would rather put together a test page that includes all the cases Jeremy brought up plus the reddit one. Also looking into firefox or other browser sources might be a good start. Cheers, Johannes