[Dillo-dev] character references, trailing ';', urls

May 5, 2014

      On Sun, May 04, 2014 at 09:09:48PM +0000, eocene wrote:
...
I wrote:
...
Jorge wrote:
...
AFAIR the original routine was written to require the trailing ';'
and it worked well for some time. Then more pages started to show
unterminated entities inside, and it got so annoying we decided to
make it more flexible and not to require the ';' when the entity
name was found (IIRC).
Yeah, this is why I was considering just changing the get_attr case,
but of course I don't want to make the code messy and complicated
unless I need to.
...
It'd be good to find the reason for the change before reverting it.
I don't remember it now, but I do remember it was because the other way
started to be perceived as worst in some sense.
Maybe GMANE has the mailing list archives...
I guess I'll put some time into digging around.
http://lists.dillo.org/pipermail/dillo-dev/2005-January/002502.html
where we get the end of a conversation between Jorge and Matthias Franz.
This msg says that it was changed because it wasn't required under
certain conditions. HTML4 spec gives it as:
Note. In SGML, it is possible to eliminate the final ";" after a
  character reference in some cases (e.g., at a line break or
  immediately before a tag). In other circumstances it may not be
  eliminated (e.g., in the middle of a word). We strongly suggest
  using the ";" in all cases to avoid problems with user agents that
  require this character to be present.
...and there's an "IIRC" in the msg that XHTML requires it.
The HTML5 spec requires a terminating ';' in all cases.
Then, it looks like requiring it again in this case may be
the way to go (I seem to recall there were lots of unterminated NBSP).

  A long long time ago people thought that SGML was the final
solution, then XML, then HTML5, now they're looking for an
alternative technology to base the web upon...

-- 
  Cheers
  Jorge.-

[Dillo-dev] character references, trailing ';', urls

jcid＠dillo.org