Re: [Dillo-dev]Re: White spaces handling (was: Weird glitch with rendering)

May 21, 2004

      On Fri, 21 May 2004, Sebastian Geerken wrote:
...
On Fri, May 14, Jorge Arellano Cid wrote:
...
[...]
  but what do we do with this:
'<u>Some </u>text'
If we ignore white space after the start tag and before the end
tag, it becomes
'<u>Some</u>text'       (with no space at all!)
What I find indeed reasonable. (See below for my reasons.)
Agreed.
...
...
If  we  "collapse" as the SPEC says should be done, we have two
possibilities:
What part of the spec do you refer to?
HTML-4.01, sec 9.1:

<q>
 [...]
 Note that a sequence of white spaces between words in the source
 document may result in an entirely different rendered inter-word
 spacing (except in the case of the PRE element). In particular,
 user agents should collapse input white space sequences when
 producing output inter-word space. This can and should be done
 even in the absence of language information (from the lang
 attribute, the HTTP"Content-Language" header field (see
 [RFC2616], section 14.12), user agent settings, etc.).
 [...]
</q>
...
From what I have understood,
spaces should be collapsed at the raw data level.
That is, if you have
the part "<p><u>Some </u>text</p>", it will be parsed into the following
tree:
...
   `- <p>
      +- <u>
      |  `- "Some "
      `- "text"
Then, the question is, what should be done with the space at the end
of "Some ".
If "the raw data level" means that this:

  <p><u>Some   </u>text</p>

  would be parsed as:

    `- <p>
       +- <u>
       |  `- "Some   "
       `- "text"

  and after that, the final space removed, producing:

    `- <p>
       +- <u>
       |  `- "Some  "
       `- "text"

  it seems a good idea for an implementation to me.
...
Generally, I'd like to stick to this tree view, and especially regard,
in this example, the words "Some " and "text" as lying in different
levels in the tree, not within a flat list. In the history, this was
not always very clear for HTML, but it is much clearer for XHTML. The
current parser does not actually build a tree, but should a bit like
as it does.
Sorry, my english, what does "a bit like as it does" mean?
...
This approach makes also the new HTML parser in the CSS
prototype simpler.
Good.
...
...
[...]
  AFAICT,  the  SPEC  leaves  the  choice  open, and advices HTML
authors against whitespace inside the tags.
IMO,  always  collapsing  white  space  after the start tag and
                  ^^^^^^^^^^
I'd say, we should (for most elements) simply ignore whitespaces after
the opening tag, and before the closing tag. This solves generally the
problem with "Some <u> underlined </u> text.", and we should not
relate spaces in different elements, e.g. by collapsing them.
Yes, "ignore" is a better word.
...
...
Now, this solution would also account for the special SGML line
break rules:
<q source='HTML-4.01 SPEC B.3.1'>
SGML  (see  [ISO8879], section 7.6.1) specifies that a line break
immediately following a start tag must be ignored, as must a line
break  immediately  before  an  end tag. This applies to all HTML
elements without exception.
The following two HTML examples must be rendered identically:
<P>Thomas is watching TV.</P>
<P>
Thomas is watching TV.
</P>
So must the following two examples:
<A>My favorite Website</A>
<A>
My favorite Website
</A>
</q>
Rhis is actually something different: It is only about line breaks,
and it applies to *all* elements, including <pre>.
If   you  consider  that  line  breaks  are  also  white  space
characters  (HTML-4.01  sec  9.1), this becomes a special case of
general white space handling.

  Cheers
  Jorge.-

PS:  It  seems  like we'll have to modify the parser to produce a
parsing tree for CSS/XML to be supported properly.