123 wrote:
On Sun, Jun 03, 2012 at 09:00:55PM +0100, Jeremy Henty wrote:
If you don't detect and ignore that extra double quote you will break many pages that every other browser renders perfectly well.
Then there should be some logic for detecting double quotes.
I agree. The hard question is: what logic?
Searching for < inside quotes is not the right way as it breaks valid pages like reddit main page.
Again, I agree. In fact, I decided to experiment with removing Dillo's misquote-detection just because it broke reddit. Unfortunately it is very hard to come up with an algorithm that correctly handles the various quoting horrors that you see all the time, yet also correctly parses embedded javascript fragments like: onclick='alert("clicked")' (IIRC Dillo mis-renders reddit because of embedded javascript like this.)
Can you give examples of real web pages with double quotes?
I have attached a bunch of links that I collected after spending a lot of time debugging pages that broke my locally-patched Dillo. (Ignore the file:///... URLs as they obviously won't work for you.) I have also attached the two patches I use. I think the first does the same thing that you propose, although I haven't checked that in detail. The second attempts to copy Firefox's misquote-detection algorithm. I agree that Dillo's misquote-detection isn't good enough, but just taking it out is a step backwards, and it is a mistake to propose changes to it based solely on particular pages that Dillo mis-renders, because any such change needs to be tested against the many other broken pages that Dillo currently renders more or less correctly. Regards, Jeremy Henty