On Fri, May 21, Jorge Arellano Cid wrote:
[...]
If we "collapse" as the SPEC says should be done, we have two possibilities:
What part of the spec do you refer to?
HTML-4.01, sec 9.1:
<q> [...] Note that a sequence of white spaces between words in the source document may result in an entirely different rendered inter-word spacing (except in the case of the PRE element). In particular, user agents should collapse input white space sequences when ^^^^^ Again, I understand this, that this refers to the lowest processing level, this is what the HTML parser has already done before.
producing output inter-word space. This can and should be done even in the absence of language information (from the lang attribute, the HTTP"Content-Language" header field (see [RFC2616], section 14.12), user agent settings, etc.). [...] </q>
[...]
Generally, I'd like to stick to this tree view, and especially regard, in this example, the words "Some " and "text" as lying in different levels in the tree, not within a flat list. In the history, this was not always very clear for HTML, but it is much clearer for XHTML. The current parser does not actually build a tree, but should a bit like as it does.
Sorry, my english, what does "a bit like as it does" mean?
Sorry, this should say "[should behave] a bit like as *if* it does [build a tree]".
[...]
Now, this solution would also account for the special SGML line break rules:
<q source='HTML-4.01 SPEC B.3.1'>
SGML (see [ISO8879], section 7.6.1) specifies that a line break immediately following a start tag must be ignored, as must a line break immediately before an end tag. This applies to all HTML elements without exception.
The following two HTML examples must be rendered identically:
<P>Thomas is watching TV.</P>
<P> Thomas is watching TV. </P>
So must the following two examples:
<A>My favorite Website</A>
<A> My favorite Website </A>
</q>
Rhis is actually something different: It is only about line breaks, and it applies to *all* elements, including <pre>.
If you consider that line breaks are also white space characters (HTML-4.01 sec 9.1), this becomes a special case of general white space handling.
Since this rule applies always, it should be handles at a level below (if I understand this correctly). I.e., first remove these linke breaks (also for <pre>, and then (except for <pre>), remove white spaces. BTW, I do not know whether this is also valid for XML, I did not find something equivalent in the XML spec.
PS: It seems like we'll have to modify the parser to produce a parsing tree for CSS/XML to be supported properly.
This is already halfway done in the CSS prototype, since CSS does indeed depend on a document tree (mostly because CSS is processed asynchronously). It has still to be considered, on which levels what white space handling is done. Sebastian