[Dillo-dev] experimental patch: Re: dillo2 wrapping Chinese

Aug. 2, 2009

      On Sun, Aug 02, 2009 at 01:04:26AM +0000, corvid wrote:
...
I made yet another version of this patch the other day,
and furaisanjin was kind enough to run on it a bit and
ensure it's doing what we want.
With the previous version, compiling without optimization,
gprof showed 3-4% loss of speed on a page that's mostly
English text.
- I removed the strdup/free from the common case.
- For nonideographic, it goes character by character instead of
  checking every byte. This should help for, e.g., Russian,
  but means some 'unnecessary' length-checking in the
  mostly-ASCII case.
And gprof suggested the slowdown might be 3% now, although
at this point, unless someone is paying you to perform a
lot of careful, drudgery-filled runs, it gets hard to pick the
signal out of the noise when examining the time spent in relevant
functions.
I wouldn't go with complex code for 1% gain.
...
I tried adding in some code to bypass the ideographic call for
ASCII, but 1) it made the code a little messy 2) it looked like
it might be saving a few tenths of a percent 3) the compiler
might perform optimizations under certain circumstances making
my efforts unnecessary or counterproductive.
I could definitely be talked into this one, though.
Please leave the messy part out.
...
I tried adding an addText() with a length argument and thought
it might help by saving me the trouble of writing NULL bytes into
the strings and saving the trouble of some strlen()ning in dw.
I couldn't see any difference in speed, although it did make
things cleaner and I would want to add it in eventually if this
all makes it into the repository.
Considering it makes things cleaner I'm for it.
...
There will be a duplicated ideographic() test, for instance, if
a word starts with digits and ends with some Japanese characters.
1) uncommon 2) complexity
Not worth the trouble of.
...
This patch is really two patches because it also handles the
case where whitespace numeric character references are
inside a word, but I wanted to see the effects of both
and they are each an excuse for the overhead of the loop.
I'd split it into two pieces if committing it.
OK.

  (i.e. please consider the above comments, split and commit).

  BTW: I'm working on the limit concurrent connections patch prototype.

-- 
  Cheers
  Jorge.-

[Dillo-dev] experimental patch: Re: dillo2 wrapping Chinese

jcid＠dillo.org