What shall we do with the buggy RFC?
Dillo is buggy because RFC2396 is buggy! An outrageous claim, I know, but hear me out! The problem appears when resolving a relative URI reference such as "?foo=bar" relative to a URI such as "http://a.b.c/e/f". Dillo resolves this to "http://a.b.c/e?foo=bar" ie. it loses the last path component. This is clearly wrong, and it breaks pages such as http://www.guardian.co.uk/business/gallery/2008/jan/18/1?picture=332092597 (trying clicking on the big picture - you won't get the next one in the gallery). Clearly Dillo should only lose the last path component of the base URI when the path of the relative is non-empty. Nevertheless, this broken behaviour *is* RFC compliant! If we study the 7 steps in "5.2. Resolving Relative References to Absolute Form" of RFC2396 we see a problem in step 6, which begins: a) All but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (right-most) slash character, if any, are excluded. So we are instructed to lose the last path component in all cases. Unfortunately we get to step 6 if the URI reference is "?foo=bar". Hence the breakage. Patch attached. Jeremy Henty
On Sat, Mar 01, 2008 at 03:56:35PM +0000, Jeremy Henty wrote:
The problem appears when resolving a relative URI reference such as "?foo=bar" relative to a URI such as "http://a.b.c/e/f".i
I don't think that's a valid relative URI, but I would threat the empty path component as ".". Joerg
On Sat, Mar 01, 2008 at 05:16:13PM +0100, Joerg Sonnenberger wrote:
On Sat, Mar 01, 2008 at 03:56:35PM +0000, Jeremy Henty wrote:
The problem appears when resolving a relative URI reference such as "?foo=bar" relative to a URI such as "http://a.b.c/e/f".i
I don't think that's a valid relative URI,
RFC 2396 disagrees. Section "C.1. Normal Examples" explicitly cites "?y" as an example URI. That example also parses according to the regular expression they provide: $ perl -wnle 'print if m(^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$)' <<< '?foo=bar' ?foo=bar $
... but I would threat the empty path component as ".".
I think that would be consistent with RFC 2396 but not the earlier RFC 1808. As I said elsewhere I think RFC 1808 is clearly right on this point and I'm puzzled that RFC 2396 does not explain or even mention the change. Regards, Jeremy Henty
On Sat, Mar 01, 2008 at 05:15:52PM +0000, I wrote:
"?y" as an example URI. That example also parses according to the regular expression they provide:
$ perl -wnle 'print if m(^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?$)' <<< '?foo=bar' ?foo=bar $
Heh, that turns out to be a vacuous remark. The regular expression matches *any* string! Jeremy Henty
On Sat, Mar 01, 2008 at 05:15:52PM +0000, Jeremy Henty wrote:
On Sat, Mar 01, 2008 at 05:16:13PM +0100, Joerg Sonnenberger wrote:
On Sat, Mar 01, 2008 at 03:56:35PM +0000, Jeremy Henty wrote:
The problem appears when resolving a relative URI reference such as "?foo=bar" relative to a URI such as "http://a.b.c/e/f".i
I don't think that's a valid relative URI,
RFC 2396 disagrees. Section "C.1. Normal Examples" explicitly cites "?y" as an example URI. That example also parses according to the regular expression they provide:
I'm interpreting the grammar for rel_segment in section 5 a bit different..
... but I would threat the empty path component as ".".
I think that would be consistent with RFC 2396 but not the earlier RFC 1808. As I said elsewhere I think RFC 1808 is clearly right on this point and I'm puzzled that RFC 2396 does not explain or even mention the change.
I read step 5 different. If the embedded URL path is empty, it inherits the base URL. That is what RFC 1808 says and what is consistent with interpreting it as ".". Joerg
On Sat, Mar 01, 2008 at 03:56:35PM +0000, Jeremy Henty (ie. *me*) wrote:
The problem appears when resolving a relative URI reference such as "?foo=bar" relative to a URI such as "http://a.b.c/e/f". Dillo resolves this to "http://a.b.c/e?foo=bar" ie. it loses the last path component.
I was slightly wrong. Dillo actually resolves the relative URI reference to "http://a.b.c/e/?foo=bar" (notice the trailing '/' in the path). However the example page I gave returns a redirect that strips the '/'.
Clearly Dillo should only lose the last path component of the base URI when the path of the relative is non-empty.
I've just noticed that RFC 2396 explicitly disagrees with me. From section C: C. Examples of Resolving Relative URI References Within an object with a well-defined base URI of http://a/b/c/d;p?q the relative URI would be resolved as follows: C.1. Normal Examples [snip] ?y = http://a/b/c/?y Interestingly, RFC 1808 explicitly agrees with me. 5. Examples and Recommended Practice Within an object with a well-defined base URL of Base: <URL:http://a/b/c/d;p?q#f> the relative URLs would be resolved as follows: 5.1. Normal Examples [snip] ?y = <URL:http://a/b/c/d;p?y> Even odder, in RFC 2396 "G.4. Modifications from RFC 1808" there is no mention of this change. It seems that they made the change but forgot to explain why. Weirderer and weirderer! Jeremy Henty
participants (2)
-
joerg.sonnenberger@web.de
-
onepoint@starurchin.org