Re: [Dillo-dev] dillo-0.8.6-rc3.tar.bz2

March 30, 2006

      On Thu, Mar 30, 2006 at 06:45:16PM +0100, Jeremy Henty (ie. me) wrote:
...
It breaks http://news.independent.co.uk/ , -rc3 renders it as plain
text, -rc2 is fine.
It breaks http://www.slate.com/ too. It's a combination of two
factors: these sites report the Content-type as
"text/html;<extra_crud>" instead of "text/html", and they insert text
before the HTML (http://news.independent.co.uk/ has an XML header and
http://www.slate.com/ has an HTML comment).

Sites with extra text before the HTML hit a bug in Dillo's content
type guesser; it doesn't skip that text, so it doesn't see the HTML,
so it guesses "text/plain" instead.  This bug is *almost* completely
masked by a workaround: if the server says "text/html" and Dillo
guesses "text/plain" then Dillo assumes it's got tag soup and changes
it's mind to "text/html".  But if the server says
"text/html;<extra_crud>" then Dillo doesn't even recognise that
content type so it defaults to going along with it's original guess.

I've hacked my copy of -rc3 so that it's content type guesser skips
leading XML headers and HTML comments.  Let's see if that's enough!

Regards, 

Jeremy Henty