On Fri, 21 May 2004, Sebastian Geerken wrote:
On Fri, May 14, Jorge Arellano Cid wrote:
[...] but what do we do with this:
'<u>Some </u>text'
If we ignore white space after the start tag and before the end tag, it becomes
'<u>Some</u>text' (with no space at all!)
What I find indeed reasonable. (See below for my reasons.)
Agreed.
If we "collapse" as the SPEC says should be done, we have two possibilities:
What part of the spec do you refer to?
HTML-4.01, sec 9.1: <q> [...] Note that a sequence of white spaces between words in the source document may result in an entirely different rendered inter-word spacing (except in the case of the PRE element). In particular, user agents should collapse input white space sequences when producing output inter-word space. This can and should be done even in the absence of language information (from the lang attribute, the HTTP"Content-Language" header field (see [RFC2616], section 14.12), user agent settings, etc.). [...] </q>
From what I have understood, spaces should be collapsed at the raw data level. That is, if you have the part "<p><u>Some </u>text</p>", it will be parsed into the following tree:
... `- <p> +- <u> | `- "Some " `- "text"
Then, the question is, what should be done with the space at the end of "Some ".
If "the raw data level" means that this: <p><u>Some </u>text</p> would be parsed as: `- <p> +- <u> | `- "Some " `- "text" and after that, the final space removed, producing: `- <p> +- <u> | `- "Some " `- "text" it seems a good idea for an implementation to me.
Generally, I'd like to stick to this tree view, and especially regard, in this example, the words "Some " and "text" as lying in different levels in the tree, not within a flat list. In the history, this was not always very clear for HTML, but it is much clearer for XHTML. The current parser does not actually build a tree, but should a bit like as it does.
Sorry, my english, what does "a bit like as it does" mean?
This approach makes also the new HTML parser in the CSS prototype simpler.
Good.
[...] AFAICT, the SPEC leaves the choice open, and advices HTML authors against whitespace inside the tags.
IMO, always collapsing white space after the start tag and ^^^^^^^^^^
I'd say, we should (for most elements) simply ignore whitespaces after the opening tag, and before the closing tag. This solves generally the problem with "Some <u> underlined </u> text.", and we should not relate spaces in different elements, e.g. by collapsing them.
Yes, "ignore" is a better word.
Now, this solution would also account for the special SGML line break rules:
<q source='HTML-4.01 SPEC B.3.1'>
SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception.
The following two HTML examples must be rendered identically:
<P>Thomas is watching TV.</P>
<P> Thomas is watching TV. </P>
So must the following two examples:
<A>My favorite Website</A>
<A> My favorite Website </A>
</q>
Rhis is actually something different: It is only about line breaks, and it applies to *all* elements, including <pre>.
If you consider that line breaks are also white space characters (HTML-4.01 sec 9.1), this becomes a special case of general white space handling. Cheers Jorge.- PS: It seems like we'll have to modify the parser to produce a parsing tree for CSS/XML to be supported properly.