---------- Forwarded message ---------- Date: Thu, 13 May 2004 12:33:13 +0300 (EEST) From: Jukka K. Korpela <jkorpela@cs.tut.fi> To: Jorge Arellano Cid <jcid@dillo.org> Subject: Re: Is the null byte allowed in HTML? On Wed, 12 May 2004, Jorge Arellano Cid wrote:
I can't yet found whether the null byte character is allowed in HTML. Can you shed some light on this?
It is not. You could use http://validator.w3.org to check for disallowed characters (it reports "non SGML character number 0"), but the ultimate reference is a) for HTML 4, the SGML declaration http://www.w3.org/TR/html4/sgml/sgmldecl.html where UNUSED effectively means 'disallowed' b) for XHTML, the XML specification, see http://www.w3.org/TR/REC-xml/#charsets which say, among other things, that all characters below 9 (HT) are disallowed. Thanks for a good question - I'm just finalizing a book on XHTML (in Finnish, sorry) and I realized that I had forgotten to discuss the character issue in sufficient detail. (I just realized that various generators may produce data with control characters.) -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/