Hi, Thanks for the excellent information! Don't worry guys, this patch obviously needs major rethinking. Please read these articles (suggested in this thread): http://www.mozilla.org/docs/web-developer/mimetypes.html http://diveintomark.org/archives/2004/08/13/safari-content-sniffing http://ppewww.ph.gla.ac.uk/%7Eflavell/www/content-type.html It's somewhat funny. The third one has thoughtful comments and suggestions; it attacks IE for its against-specs content-type sniffing. Note that the first article shows that mozilla is doing this too (I guess not for the same reasons though :-). The second one shows a shameful case of browser-sniffing bugs. This is a complex problem, but has some solutions. BTW, I agree a lot with what Kelson wrote (attached below). On Thu, Mar 30, 2006 at 02:16:36PM -0800, Kelson Vibber wrote:
Yeah, I'm seeing problems as well.
Putting the character set into the Content-Type header in the form 'text/html; charset=utf-8' is actually recommended by the W3C: http://www.w3.org/International/O-HTTP-charset
Some of the problems with the current implementation might be resolved by normalizing the content-type.
In general, I'm not a big fan of content sniffing unless it's done in very limited circumstances. Often the server has a reason to use that content-type.
The way Mozilla handled this was to only do content sniffing if the server used text/plain. It's more likely that you'll find a server that's sending the default type* than one that's deliberately serving something incorrect.** http://www.mozilla.org/docs/web-developer/mimetypes.html
Even then, you have to be careful, as illustrated here: http://diveintomark.org/archives/2004/08/13/safari-content-sniffing It's a text document, properly served as text/plain, that mentions XHTML in the first line. The then-current version of Safari decided to render it as XHTML.
* Apache uses a default of text/plain for all files it can't identify. I think this is a bad idea, but it's probably still used for historical reasons. IIS uses application/octet-stream, which doesn't tell you anything, but at least it usually triggers a download instead of trying to display binary data as text. This is one of those rare cases where I think IIS got it right and Apache got it wrong.
** This is also why so many RPM packages are served as RealPlayer files. RealPlayer was the first popular use for the .rpm extension, so lots of servers got configured that way.
-- Kelson Vibber www.hyperborea.org
and also want to quote this part from te third article: <q> The HTTP protocol specifications (1.0 and 1.1) effectively forbid a browser that has received a valid Content-type header from the server, from making its own unilateral determination of the content-type - see RFC2616 section 7.2.1 (my emphasis): If and only if the media type is not given by a Content-Type field, the recipient MAY attempt to guess the media type via inspection of its content and/or the name extension(s) of the URI used to identify the resource. The consequences of this when a server is misconfigured aren't always immediately evident; for example, consider an HTML page (sent out correctly as text/html) which calls out a stylesheet and a number of in-lined images: if the server sends these out with a wrong Content-type, then the browser might be displaying the HTML page's main content, but the browser has every right to ignore the offending stylesheet, or to omit the offending image(s) from the display: indeed a strict interpretation of the rules would say that it must behave that way. Faking the wrong Content-type from the server is potentially a way of compromising security, so there's a genuine reason for this rule being the way that it is. </q> Solutions I see so far: 1.- Keep it as is 2.- Do Content-type sniffing and follow the SPEC. i.e. as stated above, to take the right of ignoring the offending contents. 3.- Do Content-type sniffing and take actions (like mozilla). I like best the second one. Basically this is, If I get a binary stream as "text/plain" or "text/html", or an image that's not an image, then issue a warning and ignore it (abort). Note: this is a basic security procedure. This has the advantege of protecting the browser against attacks and following the SPEC. The user is left to decide (for instance to retry with "save link as"). The third option looks more "user friendly" but it goes against the SPEC. Note: One huge problem of option 1, is that if you start a download this way, it ends in _main memory_. If you get an ISO or movie, eventually the browser will trigger swapping or abort (out-of-memory) and lose the file and time spent on the download. A No-no. -- Cheers Jorge.-