On Thu, May 22, 2008 at 11:55:26PM +0200, Johannes Hofmann wrote:
Hi Jorge,
On Thu, May 22, 2008 at 05:23:04PM -0400, Jorge Arellano Cid wrote:
Hi Johannes,
On Thu, May 22, 2008 at 12:21:51PM +0200, Johannes Hofmann wrote:
Hi,
to see which part of dillo consumes which amount of memory, I hacked together a version of dlib.c that counts memory that was allocated via dMalloc(). A malloc_tag allows to assign the mallocs to specific parts in the code. In the patch below I use malloc_tag = 1 for cache stuff, malloc_tag = 2 for text strings that are passed dw2 and malloc_tag = 0 for everything else. It's fun to see how memory get's allocated and free'd while loading. Here is the output once the huge mysql manual page is loaded:
====> malloc size 5185114 26967623 6602392 0
So that's 6M for text strings, 26M cache and 5M other. Of course there is also all the dw2 stuff that is not taken into account here.
Yes, and dw2 stuff should be the main part...
Yes, it probabely is, but I wanted to understand what parts use how much memory.
Understanding the code better is always a good thing.
One thing I've seen is that Local_Buf in html.cc keeps a copy of the page until it's deleted. Do you think we can free Local_Buf in closeParser() already? It works fine in my test.
Done.
Another thing is the way new incoming html is handled in DilloHtml::write(). It' get's decoded in a new string and is then copied to the end of Local_Buf. Maybe we could avoid the copy by letting the decoder write it to the right place?
I believe place is the best person to review this. He also knows about the internal handling of the decoders.
Other than that the memory usage (dlib based) seems reasonable to me.
:-)
BTW, textblock delete's the text strings that are allocated using malloc(). That's not allowed and might cause unexpected behaviour on some platforms.
This is clearly a bug.
A long time ago I thought of dw2 using dlib. Now dw2 is mainly using its own resources. The simple solution would be to let addText to allocate and delete the string. Another way is to study whether is worth to convert dw2 to dlib.
The easiest would be to use just use free() in the Textblock destructor but it would be cleaner to free the space where it was allocated.
I agree. BTW, it's funny how the test/ programs in dw also use strdup as a parameter to addText(). From an API point of view, it'd be cleaner if addText makes the allocation/deallocation of its data. Now, if we had gprof enabled we could see the impact of it.
It'd be great if we can have gprof profiling working on dillo again. Then use it to see what is the real impact of some optimizations.
A long long time ago (in dillo1), I used to heavily optimize functions to avoid extra strdups. It ended making the code a lot more complex and without any noticeable gain. After that we switched to the simple paradigm of having each function free what it allocated.
Yes, that's true. Some time ago I noticed the huge amount of malloc() and free() that happen while loading a page and I wrote an allocator that get's a large chunk via malloc and then hands out pieces for use in addText(). In the end all text could be free()'d with one call to free. But there was not much of a gain! The standard malloc routines seem to be quite optimized these days :-)
Yes. For the record, I keep this ancient comment in gif.c to enlighten me (because I fall from time to time): /* Notes 13 Oct 1997 --RLL * * Today, just for the hell of it, I implemented a new decoder from * scratch. It's oriented around pushing bytes, while the old decoder * was based around reads which may suspend. There were basically * three motivations. * * 1. To increase the speed. * * 2. To fix some bugs I had seen, most likely due to suspension. * * 3. To make sure that the code had no buffer overruns or the like. * * 4. So that the code could be released under a freer license. * * Let's see how we did on speed. I used a large image for testing * (fvwm95-2.gif). * * The old decoder spent a total of about 1.04 seconds decoding the * image. Another .58 seconds went into Image_line, almost * entirely conversion from colormap to RGB. * * The new decoder spent a total of 0.46 seconds decoding the image. * However, the time for Image_line went up to 1.01 seconds. * Thus, even though the decoder seems to be about twice as fast, * the net gain is pretty minimal. Could this be because of cache * effects? * * Lessons learned: The first, which I keep learning over and over, is * not to try to optimize too much. It doesn't work. Just keep things * simple. * * Second, it seems that the colormap to RGB conversion is really a * significant part of the overall time. It's possible that going * directly to 16 bits would help, but that's optimization again :) */ Maybe it's a good idea to have it somewhere in the website. I'd say that optimizations are worth when backed with numbers. It could be time, memory, gprof data, etc. e.g. the Local_Buf free was backed by the huge memory saving with huge pages as the mysql manual.
Can you give a hand at enabling profiling again?
I have it working for some time again. My problem was just DragonFly specific and I now have a workaround. Apart from that I compile dw2 and dillo2 with --enable-gprof and fltk2 with --disable-xft. Finally I manually link dillo with the -static and -pg options and the -lXau -lXdmcp libraries added.
I'll try to put some time on this. -- Cheers Jorge.-