Hi Jorge, On Mon, Jun 23, 2008 at 05:44:11PM -0400, Jorge Arellano Cid wrote:
On Sun, Jun 22, 2008 at 10:11:30AM -0400, Jorge Arellano Cid wrote:
------------------------------------------- Huge page with different memory allocators: -------------------------------------------
%MEM VSZ RSS TTY STAT START TIME COMMAND 5.4 134912 113740 pts/17 S+ 18:56 0:02 ./dillo-fltk step: 2x 5.4 134916 113752 pts/17 S+ 18:57 0:02 ./dillo-fltk step: 2x 5.4 134828 113660 pts/17 S+ 18:58 0:02 ./dillo-fltk step: 2x
5.0 114940 105432 pts/17 S+ 18:53 0:02 ./dillo-fltk step: 1 5.0 111976 105488 pts/17 S+ 18:54 0:03 ./dillo-fltk step: 1 5.0 114872 105416 pts/17 S+ 18:54 0:02 ./dillo-fltk step: 1
5.1 117796 106436 pts/17 S+ 18:59 0:02 ./dillo-fltk step: cst 5.1 117568 106404 pts/17 S+ 19:00 0:02 ./dillo-fltk step: cst 5.1 117820 106460 pts/17 S+ 19:00 0:02 ./dillo-fltk step: cst
This data comes from a memory usage reduction by tunning the simpleVector memory allocator. The first is the current one, which allocates in chunks of 2*numAlloc, the second allocates one by one and the third is a custom one I made.
The 1-allocator is the best memory usage reducer but it's slower (roughly 20%). The cst-allocator reduces a bit less but is almost as fast as the 2x-allocator (2% slower or so).
Memory reduction:
VSZ RSS ----------------------------- 1-allocator 15% 7.3% cst-allocator 13% 6.5% -----------------------------
After some more study I got to a second cst2-allocator that reduces almost as much as the 1-allocator but with the same (or better) speed than the 2x one!
%MEM VSZ RSS TTY STAT START TIME COMMAND 5.0 114940 105432 pts/17 S+ 18:53 0:02 ./dillo-fltk step: 1 5.0 111976 105488 pts/17 S+ 18:54 0:03 ./dillo-fltk step: 1 5.0 114872 105416 pts/17 S+ 18:54 0:02 ./dillo-fltk step: 1
5.1 113224 105820 pts/17 S+ 13:57 0:02 ./dillo-fltk step: cst2 5.0 112948 105740 pts/17 S+ 13:58 0:02 ./dillo-fltk step: cst2 5.0 115528 105628 pts/17 S+ 13:59 0:02 ./dillo-fltk step: cst2
So now we have:
----------------------------- Memory reduction:
VSZ RSS ----------------------------- 1-allocator 15% 7.3% cst-allocator 13% 6.5% cst2-allocator 15% 7.3% -----------------------------
Committed.
I have some concerns regarding the new SimpleVector implementation: * It introduces two arbitrary numbers (100 as a limit to switch to a different bahaviour, and the 1/10 factor) which are optimized for a specific page on a specific platform. The original 2x method feels more general to me even though it may be a bit less memory efficient. * It probabely only works fast, as most malloc libraries use some heuristics to avoid memcpy for every realloc. Depending on this behaviour may create problems on more exotic platforms. * Rather than igoring the initAlloc parameter I would propose to reconsider the places where SimpleVector() is inititalized with initAlloc > 1. There also is an interesting comment about initAlloc in the Textblock constructor. Having said all this, it's not a big issue for me. If CPU usage should really become a problem on some platform, we could easily change the code. It's just 2 or 3 lines after all. Cheers, Johannes