New subject: White spaces handling

May 14, 2004

      Hi Sebastian,
...
Just FYI: I remember from the old discussion, that DwPage originally
collapsed spaces (i.e. calls to a_Dw_page_add_space), and that I
suggested some changes, so that this is not done anymore. Your patch
would reverse this change again.
The discussion goes attached.
I  read the whole thread you sent me. BTW, the whole whitespace
issue re-started when I found one of the patches you mentioned. I
then  read  the  past emails, and started to try to follow. It is
not  easy  to  make  a  clean picture of it though! --This mainly
because of the way it used to be handled inside Dillo and because
the SPECS seem not definitive in this matter.

  Here  I'll  try  to  cite  past  emails  and comment my current
thoughts:
...
[...]
This is because a_Dw_page_add_space is called twice, and this function
does actually not add a space, but change the current one. I've worked
on two solutions:
1. The first patch (underlined-spaces-clean.diff) changes the
      behavior of a_Dw_page_add_space, but needs some changes in the
      HTML parser, to control better when this function is called,
      i.e. to ignore spaces after <A> and before </A>.
This one seems the best candidate.

  With regard to fixing the parser for ignoring these spaces, see
my comments below.
...
2. The second one (underlined-spaces-kludgy.diff) tries to adjust
      spaces, depending on rather hairy conditions. It works already
      (if I'm not wrong, it has quite the same results as the code
      before, except the bug).
This option is no longer valid. As you clearly described:

<q>
...
...
...
2. The second one (underlined-spaces-kludgy.diff) tries to adjust
      spaces, depending on rather hairy conditions. It works already
      (if I'm not wrong, it has quite the same results as the code
      before, except the bug).
As I've now noticed, this will not work for CSS, e.g. the following
code:
<u>One <span style="text-decoration=none">non-underlined</span> word</u>
will be displayed:
One non-underlined word
   ----              -----
just because this patch lets DwPage assume something about the
document structure (change from non-underline to underline ==
beginning of a tag) which it can (and should) not.
</q>

Now,
...
Despite of the file names, I'm not sure if the changes in the HTML
parser can be done cleanly. So, if you think that this is difficult to
realize, apply the second patch. Especially, there is probably also an
other DwPage function necessary to remove again the last space, when
</A> is read after a space.
I believe the parser is not very hard to modify for ignoring
the spaces as patch 1.- requires. The problem is that the SPEC is
not clear about exactly how these spaces should be collapsed.

For instance:
...
A different case is "Some text". Your patch will make
"Some text" of it, but it should be really be
"Some text."
Yes, I agree, "collapsing" here should be:

   '<u>Some </u> text'  =>
   '<u>Some</u> text'

  as you note.

  but what do we do with this:

   '<u>Some </u>text'

  If we ignore white space after the start tag and before the end
tag, it becomes

   '<u>Some</u>text'       (with no space at all!)

  If  we  "collapse" as the SPEC says should be done, we have two
possibilities:

   '<u>Some </u>text'      (as it was: underline the whitespace)

  and

   '<u>Some</u> text'      (move the space out of the tag)

  AFAICT,  the  SPEC  leaves  the  choice  open, and advices HTML
authors against whitespace inside the tags.

  IMO,  always  collapsing  white  space  after the start tag and
before  the  end  tag is the simplest to implement. Even more, as
the  SPEC  doesn't define what to do in this case, it's an option
left to the User Agent:

<q source='HTML4.01 SPEC, 9.1'>
 In order to avoid problems with SGML line break rules and
 inconsistencies among extant implementations, authors should not
 rely on user agents to render white space immediately after a
 start tag or immediately before an end tag. Thus, authors, and in
 particular authoring tools, should write:

    <P>We offer free <A>technical support</A> for subscribers.</P>

  and not:

    <P>We offer free<A> technical support </A>for subscribers.</P>
</q>

  Now, this solution would also account for the special SGML line
break rules:

<q source='HTML-4.01 SPEC B.3.1'>

SGML  (see  [ISO8879], section 7.6.1) specifies that a line break
immediately following a start tag must be ignored, as must a line
break  immediately  before  an  end tag. This applies to all HTML
elements without exception.

The following two HTML examples must be rendered identically:

<P>Thomas is watching TV.</P>

<P>
Thomas is watching TV.
</P>

So must the following two examples:

<A>My favorite Website</A>

<A>
My favorite Website
</A>

</q>

  Note  that  Firebird  doesn't follow the SGML line break rules!
This  is  not  to say we should follow, but that we may find some
buggy  pages  out  there,  but at least at this point the SPEC is
quite clear. ;)
...
I'm not completely sure, we should carefully evaluate the specs, I'd
suggest to delay this for the 0.8.2 release.
Neither I am sure! :-)

  At  this  point  I  see  that applying patch 1, plus making the
parser ignore whitespaces after a start tag and before an end tag
is a correct solution that can endure a reality test.

  I'll  start  coding it. Please let me know your thoughts or any
point I'm still missing.

  Best
  Jorge.-

PS: This work is for 0.8.2.

[Dillo-dev]Re: White spaces handling (was: Weird glitch with rendering)

Jorge Arellano Cid

Jorge Arellano Cid

Jorge Arellano Cid

Sebastian Geerken

Jorge Arellano Cid

Sebastian Geerken

Jorge Arellano Cid

Sebastian Geerken

Jorge Arellano Cid

Sebastian Geerken

tags

participants (2)