Hi, I did some more tests regarding that image related memory leak, Jeremy reported some time ago. It turns out that memory related to aborted images is not released. With CSS and the related repushs this seems to happen frequently. But I can also trigger it without CSS by hitting Ctrl-R while loading a large image. Attached diagnostic patch shows the issue. Load a page with a single large JPEG image and hit Ctrl-R while it is loading. In that case only the second instance of DilloJpeg is deleted. I tried to figure out, why the first cache client is not getting aborted properly but the cache related code is quite complex... Does anyone know how this should work? Cheers, Johannes
On Thu, Apr 23, 2009 at 06:01:56PM +0200, Hofmann Johannes wrote:
It turns out that memory related to aborted images is not released.
What about the other way round, ie. memory being released and then accessed? If you check the invalid read errors at the valgrind logs page[1] for the past few days you will see they are all due to reading freed image memory. Is this another aspect of the same thing, or something else?
I tried to figure out, why the first cache client is not getting aborted properly but the cache related code is quite complex... Does anyone know how this should work?
No, I found that part of the code rather brainbending I'm afraid. Regards, Jeremy Henty [1] http://starurchin.org/dillo/valgrind.html
On Thu, Apr 23, 2009 at 09:36:23PM +0100, Jeremy Henty wrote:
On Thu, Apr 23, 2009 at 06:01:56PM +0200, Hofmann Johannes wrote:
It turns out that memory related to aborted images is not released.
What about the other way round, ie. memory being released and then accessed? If you check the invalid read errors at the valgrind logs page[1] for the past few days you will see they are all due to reading freed image memory. Is this another aspect of the same thing, or something else?
I'd say it's something different. It's always a read of size 4, isn't it? I guess that's just one uninitialized pointer. Cheers, Johannes
On Fri, Apr 24, 2009 at 06:20:20PM +0200, Hofmann Johannes wrote:
On Thu, Apr 23, 2009 at 09:36:23PM +0100, Jeremy Henty wrote:
On Thu, Apr 23, 2009 at 06:01:56PM +0200, Hofmann Johannes wrote:
It turns out that memory related to aborted images is not released.
What about the other way round, ie. memory being released and then accessed? If you check the invalid read errors at the valgrind logs page[1] for the past few days you will see they are all due to reading freed image memory. Is this another aspect of the same thing, or something else?
I'd say it's something different. It's always a read of size 4, isn't it? I guess that's just one uninitialized pointer.
I don't think that it's uninitialized pointers. If it were then you would expect valgrind to complain that the address had never been malloced or freed. That's not what we see. Eg. look at the first instance of 'Jpeg_write: Invalid read of size 4'[1]. The freed memory is a DilloImage structure. The line that does the invalid read is reading ... a member of a DilloImage structure! The same is true for the errors from dw::core::Widget::getLayout() , Png_datarow_callback and Png_datainfo_callback . In every case the pointer refers to a malloced structure of the right type. The problem is that the structure has since been freed. I think it's pretty clear that the problem is not uninitialised pointers, it's that the code is using pointers after freeing them. It's probably significant that these errors all come from within image callbacks (a_Jpeg_callback and a_Png_callback). It looks like a race condition to me: callbacks are running after the image has been freed. Does that make sense? Regards, Jeremy Henty [1] http://starurchin.singularity/dillo/valgrind/91efbf6a9f7430ca10f371db76030c4...
On Sat, Apr 25, 2009 at 01:18:19PM +0100, Jeremy Henty wrote:
On Fri, Apr 24, 2009 at 06:20:20PM +0200, Hofmann Johannes wrote:
On Thu, Apr 23, 2009 at 09:36:23PM +0100, Jeremy Henty wrote:
On Thu, Apr 23, 2009 at 06:01:56PM +0200, Hofmann Johannes wrote:
It turns out that memory related to aborted images is not released.
What about the other way round, ie. memory being released and then accessed? If you check the invalid read errors at the valgrind logs page[1] for the past few days you will see they are all due to reading freed image memory. Is this another aspect of the same thing, or something else?
I'd say it's something different. It's always a read of size 4, isn't it? I guess that's just one uninitialized pointer.
I don't think that it's uninitialized pointers. If it were then you would expect valgrind to complain that the address had never been malloced or freed. That's not what we see. Eg. look at the first instance of 'Jpeg_write: Invalid read of size 4'[1]. The freed memory is a DilloImage structure. The line that does the invalid read is reading ... a member of a DilloImage structure! The same is true for the errors from dw::core::Widget::getLayout() , Png_datarow_callback and Png_datainfo_callback . In every case the pointer refers to a malloced structure of the right type. The problem is that the structure has since been freed. I think it's pretty clear that the problem is not uninitialised pointers, it's that the code is using pointers after freeing them.
It's probably significant that these errors all come from within image callbacks (a_Jpeg_callback and a_Png_callback). It looks like a race condition to me: callbacks are running after the image has been freed. Does that make sense?
Yes, it makes sense to me. Have you found a simplified test-case to reproduce it? Finally now I have some time to investigate this bug. I'll start with the problem Johannes described (large jpeg) because of the test-case. Let's start... -- Cheers Jorge.-
On Thu, May 07, 2009 at 11:43:36AM -0400, Jorge Arellano Cid wrote:
Have you found a simplified test-case to reproduce it?
Unfortunately no. Here's what I know from good-old printf() debugging. It's definitely the case that sometimes an image callback is called with a freed DilloImage. In fact this happens a lot, much more often than the valgrind logs suggest, so I think it doesn't always cause a bad memory access. Mostly the callback is only called two extra times after the free, but sometimes it is called many more times than that. The only way this could happen is if the DilloImage is put in a DilloWeb and then Cache_client_dequeue() is called without the CCC being shut down. Cache_client_dequeue() is called at various places in cache.c but the whenever the problem appears it turns out that it was called from a_Cache_stop_client(). I have not yet traced the stack further back. I'll do that next and let you know what it shows. Regards, Jeremy Henty
You should note that I have applied this patch: http://starurchin.org/dillo/patch/img_ref.diff.txt I list all my local patches at http://starurchin.org/dillo/valgrind.html . Only img_ref.diff.txt affects how DilloImages are managed. Regards, Jeremy Henty
On Thu, Apr 23, 2009 at 06:01:56PM +0200, Hofmann Johannes wrote:
Hi,
I did some more tests regarding that image related memory leak, Jeremy reported some time ago. It turns out that memory related to aborted images is not released. With CSS and the related repushs this seems to happen frequently. But I can also trigger it without CSS by hitting Ctrl-R while loading a large image. Attached diagnostic patch shows the issue. Load a page with a single large JPEG image and hit Ctrl-R while it is loading. In that case only the second instance of DilloJpeg is deleted.
I tried to figure out, why the first cache client is not getting aborted properly but the cache related code is quite complex... Does anyone know how this should work?
The repository has a patch for this bug. It has the debug messages for easy testing. I've given it some stress but would like to see how it works in the outside world. Please give it some testing. -- Cheers Jorge.-
On Fri, May 08, 2009 at 05:08:35PM -0400, Jorge Arellano Cid wrote:
On Thu, Apr 23, 2009 at 06:01:56PM +0200, Hofmann Johannes wrote:
Hi,
I did some more tests regarding that image related memory leak, Jeremy reported some time ago. It turns out that memory related to aborted images is not released. With CSS and the related repushs this seems to happen frequently. But I can also trigger it without CSS by hitting Ctrl-R while loading a large image. Attached diagnostic patch shows the issue. Load a page with a single large JPEG image and hit Ctrl-R while it is loading. In that case only the second instance of DilloJpeg is deleted.
I tried to figure out, why the first cache client is not getting aborted properly but the cache related code is quite complex... Does anyone know how this should work?
The repository has a patch for this bug. It has the debug messages for easy testing. I've given it some stress but would like to see how it works in the outside world.
Please give it some testing.
Cool! Also partially loaded images I saw sometimes seem to be fixed now. Cheers, Johannes
participants (3)
-
jcid@dillo.org
-
Johannes.Hofmann@gmx.de
-
onepoint@starurchin.org