[Dillo-dev]Re: null byte in HTML

May 13, 2004

      ---------- Forwarded message ----------
Date: Thu, 13 May 2004 12:33:13 +0300 (EEST)
From: Jukka K. Korpela <jkorpela@cs.tut.fi>
To: Jorge Arellano Cid <jcid@dillo.org>
Subject: Re: Is the null byte allowed in HTML?

On Wed, 12 May 2004, Jorge Arellano Cid wrote:
...
I can't yet found whether the null byte character is allowed in
HTML. Can you shed some light on this?
It is not. You could use http://validator.w3.org to check for disallowed
characters (it reports "non SGML character number 0"), but the ultimate
reference is
a) for HTML 4, the SGML declaration
   http://www.w3.org/TR/html4/sgml/sgmldecl.html
   where UNUSED effectively means 'disallowed'
b) for XHTML, the XML specification, see
   http://www.w3.org/TR/REC-xml/#charsets
which say, among other things, that all characters below 9 (HT) are
disallowed.

Thanks for a good question - I'm just finalizing a book on XHTML
(in Finnish, sorry) and I realized that I had forgotten to discuss the
character issue in sufficient detail. (I just realized that various
generators may produce data with control characters.)

-- 
Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/

Jorge Arellano Cid

Francis Daly

tags

participants (2)