On Fri, May 14, Jorge Arellano Cid wrote:
[...] For instance:
A different case is "<u>Some </u> text". Your patch will make "<u>Some </u>text" of it, but it should be really be "<u>Some</u> text."
Yes, I agree, "collapsing" here should be:
'<u>Some </u> text' => '<u>Some</u> text'
as you note.
but what do we do with this:
'<u>Some </u>text'
If we ignore white space after the start tag and before the end tag, it becomes
'<u>Some</u>text' (with no space at all!)
What I find indeed reasonable. (See below for my reasons.)
If we "collapse" as the SPEC says should be done, we have two possibilities:
What part of the spec do you refer to? From what I have understood, spaces should be collapsed at the raw data leve. That is, if you have the part "<p><u>Some </u>text</p>", it will be parsed into the following tree: ... `- <p> +- <u> | `- "Some " `- "text" Then, the question is, what should be done with the space at the end of "Some ". Generally, I'd like to stick to this tree view, and especially regard, in this example, the words "Some " and "text" as lying in different levels in the tree, not within a flat list. In the history, this was not always very clear for HTML, but it is much clearer for XHTML. The current parser does not actually build a tree, but should a bit like as it does. This approach makes also the new HTML parser in the CSS prototype simpler.
'<u>Some </u>text' (as it was: underline the whitespace)
and
'<u>Some</u> text' (move the space out of the tag)
AFAICT, the SPEC leaves the choice open, and advices HTML authors against whitespace inside the tags.
IMO, always collapsing white space after the start tag and ^^^^^^^^^^
I'd say, we should (for most elements) simply ignore whitespaces after the opening tag, and before the closing tag. This solves generally the problem with "Some <u> underlined </u> text.", and we should not relate spaces in different elements, e.g. by collapsing them.
before the end tag is the simplest to implement. Even more, as the SPEC doesn't define what to do in this case, it's an option left to the User Agent:
<q source='HTML4.01 SPEC, 9.1'> In order to avoid problems with SGML line break rules and inconsistencies among extant implementations, authors should not rely on user agents to render white space immediately after a start tag or immediately before an end tag. Thus, authors, and in particular authoring tools, should write:
<P>We offer free <A>technical support</A> for subscribers.</P>
and not:
<P>We offer free<A> technical support </A>for subscribers.</P> </q>
So, at least, the authors are warned ;-)
Now, this solution would also account for the special SGML line break rules:
<q source='HTML-4.01 SPEC B.3.1'>
SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception.
The following two HTML examples must be rendered identically:
<P>Thomas is watching TV.</P>
<P> Thomas is watching TV. </P>
So must the following two examples:
<A>My favorite Website</A>
<A> My favorite Website </A>
</q>
Rhis is actually something different: It is only about line breaks, and it applies to *all* elements, including <pre>. Sebastian