I wrote:
Jorge wrote:
AFAIR the original routine was written to require the trailing ';' and it worked well for some time. Then more pages started to show unterminated entities inside, and it got so annoying we decided to make it more flexible and not to require the ';' when the entity name was found (IIRC).
Yeah, this is why I was considering just changing the get_attr case, but of course I don't want to make the code messy and complicated unless I need to.
It'd be good to find the reason for the change before reverting it. I don't remember it now, but I do remember it was because the other way started to be perceived as worst in some sense.
Maybe GMANE has the mailing list archives...
I guess I'll put some time into digging around.
http://lists.dillo.org/pipermail/dillo-dev/2005-January/002502.html where we get the end of a conversation between Jorge and Matthias Franz. This msg says that it was changed because it wasn't required under certain conditions. HTML4 spec gives it as: Note. In SGML, it is possible to eliminate the final ";" after a character reference in some cases (e.g., at a line break or immediately before a tag). In other circumstances it may not be eliminated (e.g., in the middle of a word). We strongly suggest using the ";" in all cases to avoid problems with user agents that require this character to be present. ...and there's an "IIRC" in the msg that XHTML requires it. The HTML5 spec requires a terminating ';' in all cases.