Re: [Dillo-dev] dillo-0.8.6-rc3.tar.bz2

April 1, 2006

      On Thu, Mar 30, 2006 at 10:30:32PM +0100, Jeremy Henty (ie. me) wrote:
...
...
It breaks http://news.independent.co.uk/ , -rc3 renders it as plain
text, -rc2 is fine.
It breaks http://www.slate.com/ too.
Aargh!  *More* breakage!  I've discovered MP3s often start wtih
"ID3<lots of whitespace>" so -rc3 guesses they are text/plain .  Don't
know how to fix this.

http://www.nostalgia.com/nf_moreinfo.html?sku=10576 starts with
"<!HAS_WEBDNA_TAGS>", again -rc3 guesses text/plain .  I can work
around this by skipping all leading "<!...>" tags except the
"<!DOCTYPE...>".

http://www.techworld.com/applications/news/index.cfm?NewsID=5685&inkc=0
starts with a *huge* amount of whitespace and triggers a bug in the
content type guesser: it gets a buffer full of whitespace, skips all
the way to the end and guesses based on the following garbage in
memory.  This turns out to be 8 or so binary characters followed by
ascii text from a previous buffer.  Sometimes the text is part of a
previous page, sometimes it contains the message "waiting for the
server".  -rc3 guesses text/plain .  I fixed this bug by adding "if
(i==Size) return st;" before doing any guessing - clearly Dillo should
not try to guess if it has nothing guess with.

I vote against content type guessing unless it can be improved a lot.
It just doesn't work well enough.  (BTW, I've frequently run Dillo
-rcs before and this is the first one that gave me any trouble at
all.)

Jeremy Henty