On Sun, Sep 12, 2010 at 01:14:11PM +0100, Jeremy Henty wrote:
This is almost a trivial cleanup, but I'd be more comfortable if someone else checked I have not missed anything. (I have been using it locally for a week or two with no problems so far.)
Html_process_word() requires its argument to be null-terminated. This forces Html_write_raw() (the only caller of Html_process_word()) to write a null character before calling Html_process_word() and restore the original character afterwards.
It turns out to be easy to change Html_process_word() to not need the null-terminator. This eliminates the hack in Html_write_raw() and lets us make several function arguments "const char *" instead of "char *".
It's a little tricky to see that the patch is correct. It works because none of the functions that Html_process_word() calls require a null-terminator, and the return value of a_Html_parse_entities() *is* null-terminated.
Have I missed anything? If not, I'll push it.
A few years ago, parts of the parser were coded not to need null-terminators (for speed's sake). It got really hard to maintain and every time a change was necessary, the whole function-call stack needed a review. Simple semantically good loking patches introduced bugs because library functions couldn't be used there too. One day I decided to use standard null-terminated strings and everything got easier. Not to mention there was no noticeable speed penalty. IOW, I learnt this the hard way! ;) FWIW, profiling has surprised me a few times: once I wrote a minimal perfect hash for matching HTML elements and the added complexity didn't payoff against a simple binary search. -1 -- Cheers Jorge.-