[Dillo-dev] [patch] HTML parser bugfix

June 12, 2012

      On Thu, Jun 07, 2012 at 04:45:42PM +0400, 123 wrote:
...
On Thu, Jun 07, 2012 at 12:14:06AM +0100, Jeremy Henty wrote:
...
123 wrote:
...
On Sun, Jun 03, 2012 at 09:00:55PM +0100, Jeremy Henty wrote:
...
...
If you  don't detect and ignore  that extra double  quote you will
break many pages that every other browser renders perfectly well.
Then there should be some logic for detecting double quotes.
I agree.  The hard question is: what logic?
IMO the right way to do it is to implement HTML Standard [1]. When
Html_write_raw is called, parser is in the Data state. When it returns
to Data state again, it is the end of token.
I have implemented standard comment/DOCTYPE parsing. It is incomplete,
EOF is not handled and DOCTYPE parsing is not changed. Patch is
attached. Next step is to rewrite tag parsing in standard way.
[1] http://www.whatwg.org/specs/web-apps/current-work/multipage/tokenization.htm...
With your patch text disappears e.g. on
http://www.cnas.org/blogs/abumuqawama/2011/04/quote-day.html-0
which was mentioned by Jeremy while it renders ok with current dillo and firefox.

I would rather put together a test page that includes all the cases
Jeremy brought up plus the reddit one.

Also looking into firefox or other browser sources might be a good
start.

Cheers,
Johannes

[Dillo-dev] [patch] HTML parser bugfix

Johannes.Hofmann＠gmx.de