On Fri, Nov 05, 2004 at 05:57:16PM +0100, Matthias Franz wrote:
On Fri, Nov 05, 2004 at 10:11:21AM -0300, Jorge Arellano Cid wrote:
Now the parser recognizes: </x>, <x /> and <x/>. (committed)
Maybe one should do this only in w3c_plus_heuristics mode (or in a future XML mode), for two reasons:
Firstly, the slash "/" has a special meaning in SGML, hence in HTML, as a so-called NET-enabling start tag. Essentially, this means that
<x/content/
is equivalent to
<x>content</x>
(This is one of the weird features of HTML that almost no browser supports, see http://www.cs.tut.fi/~jkorpela/html/empty.html .) Therefore, parsing <x/> or <x /> as <x></x> makes Dillo manifestly non-HTML-conforming.
Don't worry, Dillo will never be SGML compliant! ;) Being HTML compliant (and therefore SGML compliant) involves having an SGML parser, which is too big and complex for Dillo to have.
Secondly, if I understand the HTML compatibility guidelines of Appendix C of the XHTML 1.0 spec correctly, they suggest to use <x /> only for elements which have no close tag in HTML, like <hr> or <br> for instance. For others, one should use an explicit end tag. This means that one can ignore an "/" at the end of a tag for all XHTML documents which follow these guidelines. (But note that according to the first point these guidelines are not compatible with SGML.)
Yes, this is a known issue. Even more, "<br/>" is valid XML, notwithstanding the compatibility recommendation of writing it as "<br />". And there's the HTML "<a href=http://foo.org/>" type of tag (for instance with google). --note the final "/>". So, as usual, I tried to code a solution that accounts for most of the cases with a view to better usability. BTW, now I'm very advanced in studying a way to modify the parser to be able to generate the document tree from the tags. Basically by being more orthogonal at pushing ans popping tags. This will also serve to cut memory leaks with bad HTML. -- Cheers Jorge.-