[Dillo-dev]Parsing of XML empty tags (e.g <script/> as <script></script>).

older
[Dillo-dev]dpi menu and inteface...

Jorge Arellano Cid

Nov. 4, 2004

10:28 p.m.

Hi there, Well, the priorities list draft had this one with top priority (among others): <q> [1] Parsing of XML empty tags (e.g <script/> as <script></script>). </q> I'll post the priorities draft tomorrow for you to read, probably before Sebastian has a chance to review it... Unto the bug, it's BUG#514 and seems also #587. It needs some testing. Thorben: Do you have a test page for #587? --- Topic change (cache control): Brian: Take a look at BUG#161, Madis worked on the same problem long ago. Some ideas may be saved... I'm not saying the patch is wrong, I just happened to notice this entry while skimming. -- Cheers Jorge.-

Show replies by date

Jorge Arellano Cid

November 2004

1:11 p.m.

On Thu, Nov 04, 2004 at 07:28:16PM -0300, Jorge Arellano Cid wrote:

...

Hi there,

Well, the priorities list draft had this one with top priority (among others):

<q> [1] Parsing of XML empty tags (e.g <script/> as <script></script>). </q>

[...] Unto the bug, it's BUG#514 and seems also #587. It needs some testing.

As a matter of fact I had to fix the patch because google's answer links were not being rendered. Now the parser recognizes: </x>, <x /> and <x/>. (committed) -- Cheers Jorge.-

Matthias Franz

4:57 p.m.

New subject: [Dillo-dev]Parsing of XML empty tags

On Fri, Nov 05, 2004 at 10:11:21AM -0300, Jorge Arellano Cid wrote:

...

Now the parser recognizes: </x>, <x /> and <x/>. (committed)

Maybe one should do this only in w3c_plus_heuristics mode (or in a future XML mode), for two reasons: Firstly, the slash "/" has a special meaning in SGML, hence in HTML, as a so-called NET-enabling start tag. Essentially, this means that <x/content/ is equivalent to <x>content</x> (This is one of the weird features of HTML that almost no browser supports, see http://www.cs.tut.fi/~jkorpela/html/empty.html .) Therefore, parsing <x/> or <x /> as <x></x> makes Dillo manifestly non-HTML-conforming. Secondly, if I understand the HTML compatibility guidelines of Appendix C of the XHTML 1.0 spec correctly, they suggest to use <x /> only for elements which have no close tag in HTML, like <hr> or <br> for instance. For others, one should use an explicit end tag. This means that one can ignore an "/" at the end of a tag for all XHTML documents which follow these guidelines. (But note that according to the first point these guidelines are not compatible with SGML.) All the best, -- Matthias Franz Section de Mathématiques, Université de Genève, Suisse

Jorge Arellano Cid

10:04 p.m.

New subject: [Dillo-dev]Parsing of XML empty tags

On Fri, Nov 05, 2004 at 05:57:16PM +0100, Matthias Franz wrote:

...

On Fri, Nov 05, 2004 at 10:11:21AM -0300, Jorge Arellano Cid wrote:

...
Now the parser recognizes: </x>, <x /> and <x/>. (committed)

Maybe one should do this only in w3c_plus_heuristics mode (or in a future XML mode), for two reasons:

Firstly, the slash "/" has a special meaning in SGML, hence in HTML, as a so-called NET-enabling start tag. Essentially, this means that

<x/content/

is equivalent to

<x>content</x>

(This is one of the weird features of HTML that almost no browser supports, see http://www.cs.tut.fi/~jkorpela/html/empty.html .) Therefore, parsing <x/> or <x /> as <x></x> makes Dillo manifestly non-HTML-conforming.

Don't worry, Dillo will never be SGML compliant! ;) Being HTML compliant (and therefore SGML compliant) involves having an SGML parser, which is too big and complex for Dillo to have.

...

Secondly, if I understand the HTML compatibility guidelines of Appendix C of the XHTML 1.0 spec correctly, they suggest to use <x /> only for elements which have no close tag in HTML, like <hr> or <br> for instance. For others, one should use an explicit end tag. This means that one can ignore an "/" at the end of a tag for all XHTML documents which follow these guidelines. (But note that according to the first point these guidelines are not compatible with SGML.)

Yes, this is a known issue. Even more, "<br/>" is valid XML, notwithstanding the compatibility recommendation of writing it as "<br />". And there's the HTML "<a href=http://foo.org/>" type of tag (for instance with google). --note the final "/>". So, as usual, I tried to code a solution that accounts for most of the cases with a view to better usability. BTW, now I'm very advanced in studying a way to modify the parser to be able to generate the document tree from the tags. Basically by being more orthogonal at pushing ans popping tags. This will also serve to cut memory leaks with bad HTML. -- Cheers Jorge.-

Brian Hechinger

2:24 p.m.

On Thu, Nov 04, 2004 at 07:28:16PM -0300, Jorge Arellano Cid wrote:

...

Topic change (cache control):

Brian: Take a look at BUG#161, Madis worked on the same problem long ago. Some ideas may be saved... I'm not saying the patch is wrong, I just happened to notice this entry while skimming.

ok, i've got a copy. i'm going to put this on hold until a decision is made for configuration stuffs however, since i would like to get that written first before i continue with cache control. -brian -- IHTFP: I firmly believe that people dumber than me exist solely for my amusement. IHTFP: Okay, maybe not solely for my amusement. Some of them make good cake.

7797

Age (days ago)

7799

Last active (days ago)

List overview

Download

4 comments

3 participants

participants (3)

Brian Hechinger
Jorge Arellano Cid
Matthias Franz