On Thu, Mar 30, 2006 at 06:45:16PM +0100, Jeremy Henty (ie. me) wrote:
It breaks http://news.independent.co.uk/ , -rc3 renders it as plain text, -rc2 is fine.
It breaks http://www.slate.com/ too. It's a combination of two factors: these sites report the Content-type as "text/html;<extra_crud>" instead of "text/html", and they insert text before the HTML (http://news.independent.co.uk/ has an XML header and http://www.slate.com/ has an HTML comment). Sites with extra text before the HTML hit a bug in Dillo's content type guesser; it doesn't skip that text, so it doesn't see the HTML, so it guesses "text/plain" instead. This bug is *almost* completely masked by a workaround: if the server says "text/html" and Dillo guesses "text/plain" then Dillo assumes it's got tag soup and changes it's mind to "text/html". But if the server says "text/html;<extra_crud>" then Dillo doesn't even recognise that content type so it defaults to going along with it's original guess. I've hacked my copy of -rc3 so that it's content type guesser skips leading XML headers and HTML comments. Let's see if that's enough! Regards, Jeremy Henty