Re: [Dillo-dev] dillo-0.8.6-rc3.tar.bz2

April 2, 2006

      Hi,

  Thanks for the excellent information!

  Don't worry guys, this patch obviously needs major rethinking.

  Please read these articles (suggested in this thread):

   http://www.mozilla.org/docs/web-developer/mimetypes.html
   http://diveintomark.org/archives/2004/08/13/safari-content-sniffing
   http://ppewww.ph.gla.ac.uk/%7Eflavell/www/content-type.html

  It's  somewhat funny. The third one has thoughtful comments and
suggestions;  it  attacks  IE  for its against-specs content-type
sniffing. Note that the first article shows that mozilla is doing
this too (I guess not for the same reasons though :-).

  The second one shows a shameful case of browser-sniffing bugs.

  This is a complex problem, but has some solutions. BTW, I agree
a lot with what Kelson wrote (attached below).

On Thu, Mar 30, 2006 at 02:16:36PM -0800, Kelson Vibber wrote:
...
Yeah, I'm seeing problems as well.
Putting the character set into the Content-Type header in the form 
'text/html; charset=utf-8' is actually recommended by the W3C:
http://www.w3.org/International/O-HTTP-charset
Some of the problems with the current implementation might be resolved 
by normalizing the content-type.
In general, I'm not a big fan of content sniffing unless it's done in 
very limited circumstances.  Often the server has a reason to use that 
content-type.
The way Mozilla handled this was to only do content sniffing if the 
server used text/plain.  It's more likely that you'll find a server 
that's sending the default type* than one that's deliberately serving 
something incorrect.**
http://www.mozilla.org/docs/web-developer/mimetypes.html
Even then, you have to be careful, as illustrated here:
http://diveintomark.org/archives/2004/08/13/safari-content-sniffing
It's a text document, properly served as text/plain, that mentions XHTML 
in the first line.  The then-current version of Safari decided to render 
it as XHTML.
* Apache uses a default of text/plain for all files it can't identify. 
I think this is a bad idea, but it's probably still used for historical 
reasons.  IIS uses application/octet-stream, which doesn't tell you 
anything, but at least it usually triggers a download instead of trying 
to display binary data as text.  This is one of those rare cases where I 
think IIS got it right and Apache got it wrong.
** This is also why so many RPM packages are served as RealPlayer files. 
RealPlayer was the first popular use for the .rpm extension, so lots of 
servers got configured that way.
-- 
Kelson Vibber
www.hyperborea.org
and also want to quote this part from te third article:

<q>
 The HTTP protocol specifications (1.0 and 1.1) effectively forbid
 a  browser that has received a valid Content-type header from the
 server,  from  making  its  own  unilateral  determination of the
 content-type - see RFC2616 section 7.2.1 (my emphasis): 

   If  and  only  if  the  media type is not given by a Content-Type
   field,  the  recipient  MAY  attempt  to guess the media type via
   inspection of its content and/or the name extension(s) of the URI
   used to identify the resource. 

 The  consequences  of  this when a server is misconfigured aren't
 always  immediately  evident;  for example, consider an HTML page
 (sent  out  correctly  as text/html) which calls out a stylesheet
 and  a  number  of in-lined images: if the server sends these out
 with  a  wrong Content-type, then the browser might be displaying
 the  HTML page's main content, but the browser has every right to
 ignore  the  offending  stylesheet,  or  to  omit  the  offending
 image(s)  from the display: indeed a strict interpretation of the
 rules  would  say  that it must behave that way. Faking the wrong
 Content-type from the server is potentially a way of compromising
 security, so there's a genuine reason for this rule being the way
 that it is. 
</q>

  Solutions I see so far:

  1.- Keep it as is
  2.-  Do  Content-type  sniffing  and  follow  the SPEC. i.e. as
stated  above,  to  take  the  right  of  ignoring  the offending
contents.
  3.- Do Content-type sniffing and take actions (like mozilla).

  I  like  best  the  second  one.  Basically this is, If I get a
binary  stream as "text/plain" or "text/html", or an image that's
not  an  image, then issue a warning and ignore it (abort). Note:
this is a basic security procedure.

  This  has  the  advantege  of  protecting  the  browser against
attacks  and  following the SPEC. The user is left to decide (for
instance to retry with "save link as").

  The third option looks more "user friendly" but it goes against
the SPEC.

  Note:  One  huge  problem  of  option 1, is that if you start a
download this way, it ends in _main memory_. If you get an ISO or
movie,  eventually  the  browser  will  trigger swapping or abort
(out-of-memory) and lose the file and time spent on the download.
A No-no.

--
  Cheers
  Jorge.-

Re: [Dillo-dev] dillo-0.8.6-rc3.tar.bz2

Jorge Arellano Cid