Re: [Dillo-dev]Parsing of XML empty tags

Nov. 6, 2004

      On Fri, Nov 05, 2004 at 05:57:16PM +0100, Matthias Franz wrote:
...
On Fri, Nov 05, 2004 at 10:11:21AM -0300, Jorge Arellano Cid wrote:
...
Now the parser recognizes: </x>, <x /> and <x/>. (committed)
Maybe one should do this only in w3c_plus_heuristics mode (or in
a future XML mode), for two reasons:
Firstly, the slash "/" has a special meaning in SGML, hence in HTML,
as a so-called NET-enabling start tag. Essentially, this means that
<x/content/
is equivalent to
<x>content</x>
(This is one of the weird features of HTML that almost no browser
supports, see http://www.cs.tut.fi/~jkorpela/html/empty.html .)
Therefore, parsing <x/> or <x /> as <x></x> makes Dillo manifestly
non-HTML-conforming.
Don't worry, Dillo will never be SGML compliant! ;)

  Being  HTML  compliant  (and therefore SGML compliant) involves
having  an SGML parser, which is too big and complex for Dillo to
have.
...
Secondly, if I understand the HTML compatibility guidelines of Appendix C
of the XHTML 1.0 spec correctly, they suggest to use <x /> only for elements
which have no close tag in HTML, like <hr> or <br> for instance. For others,
one should use an explicit end tag. This means that one can ignore an "/"
at the end of a tag for all XHTML documents which follow these guidelines.
(But note that according to the first point these guidelines are not
compatible with SGML.)
Yes, this is a known issue.

  Even   more,   "<br/>"   is   valid  XML,  notwithstanding  the
compatibility recommendation of writing it as "<br />".

  And  there's  the  HTML  "<a href=http://foo.org/>" type of tag
(for instance with google).  --note the final "/>".

  So, as usual, I tried to code a solution that accounts for most
of the cases with a view to better usability.

  BTW,  now  I'm  very  advanced  in studying a way to modify the
parser  to  be  able to generate the document tree from the tags.
Basically  by  being more orthogonal at pushing ans popping tags.
This will also serve to cut memory leaks with bad HTML.

-- 
  Cheers
  Jorge.-