[Dillo-dev] XHTML (was Re: /. crap)

Oct. 19, 2003

      On Saturday 18 October 2003 5:50 pm, Ivo wrote:
...
On Sunday 19 October 2003 01:53, Kelson Vibber wrote:
...
I believe that's the reason the XHTML spec says it should not even be
rendered if it's invalid.  And believe me, this is a good way to ensure
you've got valid code.
Does this also work on IE? If it does, God forbid the day that some bright
engineer at MS decides they should change their code to try to render it
anyway.
No, IE doesn't handle XHTML at all.  However, XHTML was designed with backward 
compatibility in mind.  If an XHTML page is sent with the content type 
text/html, it looks like HTML 4.01 with a few extra attributes that can be 
ignored.

The content type for XHTML, application/xhtml+xml, is only recognized by a few 
browsers right now, possibly only the Gecko-based ones and Amaya.  Since 
Mozilla trusts the server to send the correct mime type, it assumes that a 
text/html page is HTML, not XHTML, and parses it as HTML.  Any errors found 
are treated the same way it treats errors in HTML.  It's only when it's 
served with the XHTML mime type that Mozilla enforces the well-formed XML 
requirement.

IE can trash XHTML if you're not careful, though - since IE assumes the server 
is likely to be misconfigured, it will sometimes see the XML declarations at 
the beginning of an XHTML file and try to display it as plain XML, even if 
your server has told it the file is HTML.  I saw this when I had more than 
one stylesheet attached as an XML declaration.

Browsers that really display XHTML (through an XML parser, requiring it to be 
well-formed) should put application/xhtml+xml in their HTTP Accept header.  
(Gecko browsers do.)  Documents in XHTML 1.1 or using features only found in 
XHTML should always be served with this mime type.  For XHTML 1.0 documents 
intended for a wider audience, the server should use content negotiation to 
determine which the browser can handle, and serve either text/html or 
application/xhtml+xml.

I should mention that, AFAIK, the don't-even-render-it rule only applies to 
XML well-formedness.  So mis-nested or extra tags (like the extra </td></tr> 
in Slashdot) would cause the browser to display an error, but unfamiliar tags 
(such as RDF embedded in XHTML) would simply be ignored as long as they don't 
invalidate the XML structure.  For example, you can serve the following:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>Blah</title></head>
<body>
<h1>Header</h1>
<blargquote><img src="whatever.jpg" /></blargquote>
</body>
</html>

...and it will display the way you would expect - ignoring the fictitious 
"blargquote" element (which would be caught by a validator).  But if you 
instead do this (note the missing close tag for blockquote):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head><title>Blah</title></head>
<body>
<h1>Header</h1>
<blockquote><img src="whatever.jpg" />
</body>
</html>

... Mozilla will display this error message:

XML Parsing Error: mismatched tag. Expected: </blockquote>.
Location: test.xhtml
Line Number 8, Column 3:
</body>
--^

But if you send the exact same file with the text/html content type, it will 
use the HTML parser and try to guess where the blockquote is supposed to end.

-- 
Kelson Vibber
www.hyperborea.org