valgrind still unhappy despite expect changes

newer
prefs.show_url

older
valgrind doctree leak

jcid＠dillo.org

Aug. 29, 2011

4:08 p.m.

On Sun, Aug 28, 2011 at 06:53:54PM +0000, corvid wrote:

...

...
I can't reproduce this with valgrind-3.6.1 on wheezy.

No valgrind error report before or after these debug messages:

a_Nav_cancel_expect_if_eq:: calling a_Nav_cancel_expect Dillo: normal exit!

I tested by pushing a remote URL on CLI and closing the window before the page data starts flowing (i.e. to cancel the "expect"), and also with the cancelled download code path. Neither one produced an error.

I saw that there was another log entry the other day http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413...

Do you think something sneaky could be going on, or does it seem like it must come down to configuration differences somehow?

AFAIS, either one is possible... Me not being able to reproduce it says it's most probably be OK for the codepaths I tested (FWIW, the ones I can see). If there's an untested codepath, well... OTOH, valgrind produces some false positives (its docs say it) and gcc optimization can also play a role... So, I'd try to find a simple test case that produces the log. Then, test this problematic codepath here, and see what comes out of the exercise. Can you reproduce the bug? Do you have a simple test case? -- Cheers Jorge.-

Show replies by date

corvid＠lavabit.com

August 2011

6:39 p.m.

Jorge wrote:

...

On Sun, Aug 28, 2011 at 06:53:54PM +0000, corvid wrote:

...
...
I can't reproduce this with valgrind-3.6.1 on wheezy.

No valgrind error report before or after these debug messages:

a_Nav_cancel_expect_if_eq:: calling a_Nav_cancel_expect Dillo: normal exit!

I tested by pushing a remote URL on CLI and closing the window before the page data starts flowing (i.e. to cancel the "expect"), and also with the cancelled download code path. Neither one produced an error.

I saw that there was another log entry the other day http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413...

Do you think something sneaky could be going on, or does it seem like it must come down to configuration differences somehow?

AFAIS, either one is possible...

Me not being able to reproduce it says it's most probably be OK for the codepaths I tested (FWIW, the ones I can see). If there's an untested codepath, well...

OTOH, valgrind produces some false positives (its docs say it) and gcc optimization can also play a role...

So, I'd try to find a simple test case that produces the log. Then, test this problematic codepath here, and see what comes out of the exercise.

Can you reproduce the bug? Do you have a simple test case?

No. I haven't really scrutinized the code at all. I've tried a little "I wonder whether this could do it" just from looking at the traces supplied by valgrind, but without hitting upon the trick.

onepoint＠starurchin.org

1:35 a.m.

Jorge Arellano Cid wrote:

...

OTOH, valgrind produces some false positives (its docs say it) and gcc optimization can also play a role...

I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g -O0"' to minimise false positives from gcc optimization. Regards, Jeremy Henty

jcid＠dillo.org

3:02 p.m.

Hi Jeremy, Good to see you in this thread! On Tue, Aug 30, 2011 at 12:34:59AM +0100, Jeremy Henty wrote:

...

Jorge Arellano Cid wrote:

...
OTOH, valgrind produces some false positives (its docs say it) and gcc optimization can also play a role...

I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g -O0"' to minimise false positives from gcc optimization.

Ack. Given you got the log. Do you have a way to reproduce it, or at least have a memory of what you were doing? -- Cheers Jorge.-

onepoint＠starurchin.org

7:20 p.m.

Jorge Arellano Cid wrote:

...

On Tue, Aug 30, 2011 at 12:34:59AM +0100, Jeremy Henty wrote:

...
I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g -O0"' to minimise false positives from gcc optimization.

Ack.

Given you got the log. Do you have a way to reproduce it, or at least have a memory of what you were doing?

Unfortunately no. I automatically run dillo inside valgrind and log the output. A cron job reads the logs and updates the reports. These days I rarely even look at the logs unless something in the mailing list prompts me to. I am wondering how to add debugging information that could help tie valgrind reports to Dillo's actions. Perhaps the CCC could log which of its chains is active and which URL it was serving? Then if dlib logged its allocations and frees maybe we could identify the URL that was responsible? Regards, Jeremy Henty

jcid＠dillo.org

9:01 p.m.

On Tue, Aug 30, 2011 at 06:19:38PM +0100, Jeremy Henty wrote:

...

Jorge Arellano Cid wrote:

...
On Tue, Aug 30, 2011 at 12:34:59AM +0100, Jeremy Henty wrote:

...
I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g -O0"' to minimise false positives from gcc optimization.

Ack.

Given you got the log. Do you have a way to reproduce it, or at least have a memory of what you were doing?

Unfortunately no. I automatically run dillo inside valgrind and log the output. A cron job reads the logs and updates the reports. These days I rarely even look at the logs unless something in the mailing list prompts me to.

I am wondering how to add debugging information that could help tie valgrind reports to Dillo's actions. Perhaps the CCC could log which of its chains is active and which URL it was serving? Then if dlib logged its allocations and frees maybe we could identify the URL that was responsible?

Even having the URL wouldn't help much in this case because we need to know which "cancel" action triggered the last call to a_Nav_cancel_expect_if_eq(). This would require logging user actions. OTOH, trying to re-build a user action pattern from the CCC logs would be almost impossible for a human brain (if possible at all). ;) More technically: Each bw has its own nav_expect_url; a private copy of a DilloUrl. Valgrind complains about nav_expect_url, not the bw. So I wonder how can the bw be valid and not the nav_expect_url, which is either NULL or a private copy of a DilloUrl? -- Cheers Jorge.-

corvid＠lavabit.com

11:42 p.m.

Jorge wrote:

...

On Tue, Aug 30, 2011 at 06:19:38PM +0100, Jeremy Henty wrote:

...
Jorge Arellano Cid wrote:

...
On Tue, Aug 30, 2011 at 12:34:59AM +0100, Jeremy Henty wrote:

...
I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g -O0"' to minimise false positives from gcc optimization.

Ack.

Given you got the log. Do you have a way to reproduce it, or at least have a memory of what you were doing?

Unfortunately no. I automatically run dillo inside valgrind and log the output. A cron job reads the logs and updates the reports. These days I rarely even look at the logs unless something in the mailing list prompts me to.

I am wondering how to add debugging information that could help tie valgrind reports to Dillo's actions. Perhaps the CCC could log which of its chains is active and which URL it was serving? Then if dlib logged its allocations and frees maybe we could identify the URL that was responsible?

Even having the URL wouldn't help much in this case because we need to know which "cancel" action triggered the last call to a_Nav_cancel_expect_if_eq(). This would require logging user actions.

OTOH, trying to re-build a user action pattern from the CCC logs would be almost impossible for a human brain (if possible at all). ;)

More technically:

Each bw has its own nav_expect_url; a private copy of a DilloUrl. Valgrind complains about nav_expect_url, not the bw. So I wonder how can the bw be valid and not the nav_expect_url, which is either NULL or a private copy of a DilloUrl?

In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

jcid＠dillo.org

September 2011

2:58 p.m.

On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote:

...

Jorge wrote:

...
On Tue, Aug 30, 2011 at 06:19:38PM +0100, Jeremy Henty wrote:

...
Jorge Arellano Cid wrote:

...
On Tue, Aug 30, 2011 at 12:34:59AM +0100, Jeremy Henty wrote:

...
I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g -O0"' to minimise false positives from gcc optimization.

Ack.

Given you got the log. Do you have a way to reproduce it, or at least have a memory of what you were doing?

Unfortunately no. I automatically run dillo inside valgrind and log the output. A cron job reads the logs and updates the reports. These days I rarely even look at the logs unless something in the mailing list prompts me to.

I am wondering how to add debugging information that could help tie valgrind reports to Dillo's actions. Perhaps the CCC could log which of its chains is active and which URL it was serving? Then if dlib logged its allocations and frees maybe we could identify the URL that was responsible?

Even having the URL wouldn't help much in this case because we need to know which "cancel" action triggered the last call to a_Nav_cancel_expect_if_eq(). This would require logging user actions.

OTOH, trying to re-build a user action pattern from the CCC logs would be almost impossible for a human brain (if possible at all). ;)

More technically:

Each bw has its own nav_expect_url; a private copy of a DilloUrl. Valgrind complains about nav_expect_url, not the bw. So I wonder how can the bw be valid and not the nav_expect_url, which is either NULL or a private copy of a DilloUrl?

In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

Oh, it may be my mistake then. ==28991== Invalid read of size 4 ==28991== at 0x805760A: a_Bw_expected_url (bw.c:336) bw.c:336 return bw->nav_expect_url; Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it) -- Cheers Jorge.-

johannes.hofmann＠gmx.de

3:39 p.m.

On Thu, Sep 01, 2011 at 09:57:39AM -0300, Jorge Arellano Cid wrote:

...

On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote:

...
Jorge wrote:

...
On Tue, Aug 30, 2011 at 06:19:38PM +0100, Jeremy Henty wrote:

...
Jorge Arellano Cid wrote:

...
On Tue, Aug 30, 2011 at 12:34:59AM +0100, Jeremy Henty wrote:

...
I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g -O0"' to minimise false positives from gcc optimization.

Ack.

Given you got the log. Do you have a way to reproduce it, or at least have a memory of what you were doing?

Unfortunately no. I automatically run dillo inside valgrind and log the output. A cron job reads the logs and updates the reports. These days I rarely even look at the logs unless something in the mailing list prompts me to.

I am wondering how to add debugging information that could help tie valgrind reports to Dillo's actions. Perhaps the CCC could log which of its chains is active and which URL it was serving? Then if dlib logged its allocations and frees maybe we could identify the URL that was responsible?

Even having the URL wouldn't help much in this case because we need to know which "cancel" action triggered the last call to a_Nav_cancel_expect_if_eq(). This would require logging user actions.

OTOH, trying to re-build a user action pattern from the CCC logs would be almost impossible for a human brain (if possible at all). ;)

More technically:

Each bw has its own nav_expect_url; a private copy of a DilloUrl. Valgrind complains about nav_expect_url, not the bw. So I wonder how can the bw be valid and not the nav_expect_url, which is either NULL or a private copy of a DilloUrl?

In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

Oh, it may be my mistake then.

==28991== Invalid read of size 4 ==28991== at 0x805760A: a_Bw_expected_url (bw.c:336)

bw.c:336 return bw->nav_expect_url;

Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it)

I'd guess bw is no longer pointing to valid memory (i.e. has been free'd) and therefore reading the 4 byte nav_expect_url pointer causes the valgrind message.

jcid＠dillo.org

3:57 p.m.

On Thu, Sep 01, 2011 at 03:38:58PM +0200, Johannes Hofmann wrote:

...

On Thu, Sep 01, 2011 at 09:57:39AM -0300, Jorge Arellano Cid wrote:

...
On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote:

...
Jorge wrote:

...
On Tue, Aug 30, 2011 at 06:19:38PM +0100, Jeremy Henty wrote:

...
Jorge Arellano Cid wrote:

...
On Tue, Aug 30, 2011 at 12:34:59AM +0100, Jeremy Henty wrote: > > I compile Dillo with 'configure "CFLAGS=-g -O0" "CXXFLAGS= -g > -O0"' to minimise false positives from gcc optimization.

Ack.

Given you got the log. Do you have a way to reproduce it, or at least have a memory of what you were doing?

Unfortunately no. I automatically run dillo inside valgrind and log the output. A cron job reads the logs and updates the reports. These days I rarely even look at the logs unless something in the mailing list prompts me to.

I am wondering how to add debugging information that could help tie valgrind reports to Dillo's actions. Perhaps the CCC could log which of its chains is active and which URL it was serving? Then if dlib logged its allocations and frees maybe we could identify the URL that was responsible?

Even having the URL wouldn't help much in this case because we need to know which "cancel" action triggered the last call to a_Nav_cancel_expect_if_eq(). This would require logging user actions.

OTOH, trying to re-build a user action pattern from the CCC logs would be almost impossible for a human brain (if possible at all). ;)

More technically:

Each bw has its own nav_expect_url; a private copy of a DilloUrl. Valgrind complains about nav_expect_url, not the bw. So I wonder how can the bw be valid and not the nav_expect_url, which is either NULL or a private copy of a DilloUrl?

In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

Oh, it may be my mistake then.

==28991== Invalid read of size 4 ==28991== at 0x805760A: a_Bw_expected_url (bw.c:336)

bw.c:336 return bw->nav_expect_url;

Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it)

I'd guess bw is no longer pointing to valid memory (i.e. has been free'd) and therefore reading the 4 byte nav_expect_url pointer causes the valgrind message.

OK, this line gave light in that direction: ==28991== Address 0x6961a34 is 44 bytes inside a block of size 68 free'd ... and I observe that bw has 17 items, 17*4 = 68, and the expect URL is the 12th item, (12-1)*4 = 44. So it's an already freed 'bw' and not its expected URL. This starts to make sense... ;-) -- Cheers Jorge.-

jcid＠dillo.org

7:59 p.m.

On Thu, Sep 01, 2011 at 10:56:19AM -0300, Jorge Arellano Cid wrote:

...

On Thu, Sep 01, 2011 at 03:38:58PM +0200, Johannes Hofmann wrote:

...
On Thu, Sep 01, 2011 at 09:57:39AM -0300, Jorge Arellano Cid wrote:

...
On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote:

...
Jorge wrote:

...

...
...
...
[...] In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

...

...
...
[...] Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it)

I'd guess bw is no longer pointing to valid memory (i.e. has been free'd) and therefore reading the 4 byte nav_expect_url pointer causes the valgrind message.

OK, this line gave light in that direction:

==28991== Address 0x6961a34 is 44 bytes inside a block of size 68 free'd

... and I observe that bw has 17 items, 17*4 = 68, and the expect URL is the 12th item, (12-1)*4 = 44.

So it's an already freed 'bw' and not its expected URL.

This starts to make sense... ;-)

BTW, why there are two different stack traces? What does the second one mean with respect to the first? If I look the first one (in reading order), it's clear that a keyboard event closed a browser window cancelling the expect. This is, for instance, middle-click a link, hit Ctrl-W before the data stream starts to arrive (while dillo displays the "contacting host" message). In my machine this happens in the expected order with no problems. Can anybody reproduce the valgrind log using this procedure? OTOH, analyzing the code: AFAIS: * The only place where a bw is freed is in a_UIcmd_close_bw [3]: (the only call to a_Bw_free() in the codebase). * bw is valid at [1] * The problematic a_Nav_cancel_expect_if_eq() is called from inside the call chain at [2]. * No other a_UIcmd_close_bw() appears in the log, so how came bw to be invalid then? void a_UIcmd_close_bw(void *vbw) { BrowserWindow *bw = (BrowserWindow *)vbw; UI *ui = BW2UI(bw); CustTabs *tabs = ui->tabs(); 1 Layout *layout = (Layout*)bw->render_layout; MSG("a_UIcmd_close_bw bw=%p\n", bw); 2 a_Bw_stop_clients(bw, BW_Root + BW_Img + BW_Force); delete(layout); if (tabs) { tabs->remove_tab(ui); if (tabs->num_tabs() == 0) delete tabs->window(); } 3 a_Bw_free(bw); MSG("a_UIcmd_close_bw freed(bw=%p)\n", bw); } Any help is appreciatted. -- Cheers Jorge.-

jcid＠dillo.org

9:10 p.m.

On Thu, Sep 01, 2011 at 02:58:09PM -0300, Jorge Arellano Cid wrote:

...

On Thu, Sep 01, 2011 at 10:56:19AM -0300, Jorge Arellano Cid wrote:

...
On Thu, Sep 01, 2011 at 03:38:58PM +0200, Johannes Hofmann wrote:

...
On Thu, Sep 01, 2011 at 09:57:39AM -0300, Jorge Arellano Cid wrote:

...
On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote:

...
Jorge wrote:

...
...
...
...
[...] In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

...
...
...
[...] Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it)

I'd guess bw is no longer pointing to valid memory (i.e. has been free'd) and therefore reading the 4 byte nav_expect_url pointer causes the valgrind message.

OK, this line gave light in that direction:

==28991== Address 0x6961a34 is 44 bytes inside a block of size 68 free'd

... and I observe that bw has 17 items, 17*4 = 68, and the expect URL is the 12th item, (12-1)*4 = 44.

So it's an already freed 'bw' and not its expected URL.

This starts to make sense... ;-)

FYI, looking into the other direction (a false positive), we have this: "Valgrind Bug254420 - memory pool tracking broken" http://bugs.kde.org/show_bug.cgi?id=254420 fixed in 3.6.1. If Jeremy or anybody can find a way to reproduce the log, we'd be very close to nailing the issue down. @Jeremy: what do you get from valgrind using the procedure described in my previous email? TIA. -- Cheers Jorge.-

corvid＠lavabit.com

9:48 p.m.

Jorge wrote:

...

On Thu, Sep 01, 2011 at 10:56:19AM -0300, Jorge Arellano Cid wrote:

...
On Thu, Sep 01, 2011 at 03:38:58PM +0200, Johannes Hofmann wrote:

...
On Thu, Sep 01, 2011 at 09:57:39AM -0300, Jorge Arellano Cid wrote:

...
On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote:

...
Jorge wrote:

...
...
...
...
[...] In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

...
...
...
[...] Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it)

I'd guess bw is no longer pointing to valid memory (i.e. has been free'd) and therefore reading the 4 byte nav_expect_url pointer causes the valgrind message.

OK, this line gave light in that direction:

==28991== Address 0x6961a34 is 44 bytes inside a block of size 68 free'd

... and I observe that bw has 17 items, 17*4 = 68, and the expect URL is the 12th item, (12-1)*4 = 44.

So it's an already freed 'bw' and not its expected URL.

This starts to make sense... ;-)

BTW, why there are two different stack traces? What does the second one mean with respect to the first?

The top one shows the invalid read, and the bottom shows how the memory was freed that the top tried to read.

...

If I look the first one (in reading order), it's clear that a keyboard event closed a browser window cancelling the expect. This is, for instance, middle-click a link, hit Ctrl-W before the data stream starts to arrive (while dillo displays the "contacting host" message).

In my machine this happens in the expected order with no problems. Can anybody reproduce the valgrind log using this procedure?

OTOH, analyzing the code:

AFAIS:

* The only place where a bw is freed is in a_UIcmd_close_bw [3]: (the only call to a_Bw_free() in the codebase). * bw is valid at [1] * The problematic a_Nav_cancel_expect_if_eq() is called from inside the call chain at [2]. * No other a_UIcmd_close_bw() appears in the log, so how came bw to be invalid then?

void a_UIcmd_close_bw(void *vbw) { BrowserWindow *bw = (BrowserWindow *)vbw; UI *ui = BW2UI(bw); CustTabs *tabs = ui->tabs(); 1 Layout *layout = (Layout*)bw->render_layout;

MSG("a_UIcmd_close_bw bw=%p\n", bw); 2 a_Bw_stop_clients(bw, BW_Root + BW_Img + BW_Force); delete(layout); if (tabs) { tabs->remove_tab(ui); if (tabs->num_tabs() == 0) delete tabs->window(); } 3 a_Bw_free(bw); MSG("a_UIcmd_close_bw freed(bw=%p)\n", bw); }

Any help is appreciatted.

My guess is that it could be something like first, one bw asks for a page, and that becomes conn->bw, and then another bw starts also asking for the page, and the first bw closes, invalidating the conn->bw, and then the second bw closes and the cancel expect code tries to use the old bw. I'm sure it won't turn out to be exactly that, but when I saw that there was a conn->bw, it seemed that there have to be ways for a conn to outlast a bw.

jcid＠dillo.org

3:32 p.m.

On Thu, Sep 01, 2011 at 07:47:48PM +0000, corvid wrote:

...

Jorge wrote:

...
On Thu, Sep 01, 2011 at 10:56:19AM -0300, Jorge Arellano Cid wrote:

...
On Thu, Sep 01, 2011 at 03:38:58PM +0200, Johannes Hofmann wrote:

...
On Thu, Sep 01, 2011 at 09:57:39AM -0300, Jorge Arellano Cid wrote:

...
On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote:

...
Jorge wrote:

...
...
...
...
[...] In this one http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... at least, it looks pretty clear that the bw is gone.

...
...
...
[...] Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it)

I'd guess bw is no longer pointing to valid memory (i.e. has been free'd) and therefore reading the 4 byte nav_expect_url pointer causes the valgrind message.

OK, this line gave light in that direction:

==28991== Address 0x6961a34 is 44 bytes inside a block of size 68 free'd

... and I observe that bw has 17 items, 17*4 = 68, and the expect URL is the 12th item, (12-1)*4 = 44.

So it's an already freed 'bw' and not its expected URL.

This starts to make sense... ;-)

BTW, why there are two different stack traces? What does the second one mean with respect to the first?

The top one shows the invalid read, and the bottom shows how the memory was freed that the top tried to read.

Thanks. Well, in that case we have a weird situation: Both stacks have: ==28991== by 0x8053F49: CustTabs::handle(int) (uicmd.cc:204) Which translates into a bw (window or tab) that's closed twice by using the keyboard. Strange as it seems, let's analyze it a bit: For __both_ cases: 1.- fltk is sending the SHORTCUT event for KEYS_CLOSE_TAB 2.- dillo receives it in CustTabs::handle() and gets the bw with: UI *ui = (UI*)wizard()->value(); BrowserWindow *bw = a_UIcmd_get_bw_by_widget(ui); (which means the ui is the current one and that the bw is in our valid bw list.) 3.- dillo calls a_UIcmd_close_bw() which: a) stops the clients b) tabs->remove_tab(ui) // This switches to another bw and removes the ui from wizard * Here the question is how can dillo find the ui and same bw, in the wizard as current, and in the valid bw list (after 3c), for a second time at 2.- ? c) a_Bw_free(bw) // the bw is removed from our valid list and freed. The only answer I can find for the above question is: The memory space for bw is reused with a new ui. This is: * create a bw (e.g. address 0x888). * close the bw, free the bw (0x888 is available again) * create a bw (malloc reuses address 0x888). * close the bw // Here we have the situation! * free the bw (0x888 is available again) which is compatible with: "Valgrind Bug254420 - memory pool tracking broken" http://bugs.kde.org/show_bug.cgi?id=254420 but, of course, I may be wrong...

...

...
[...] OTOH, analyzing the code:

AFAIS:

* The only place where a bw is freed is in a_UIcmd_close_bw [3]: (the only call to a_Bw_free() in the codebase). * bw is valid at [1] * The problematic a_Nav_cancel_expect_if_eq() is called from inside the call chain at [2]. * No other a_UIcmd_close_bw() appears in the log, so how came bw to be invalid then?

void a_UIcmd_close_bw(void *vbw) { BrowserWindow *bw = (BrowserWindow *)vbw; UI *ui = BW2UI(bw); CustTabs *tabs = ui->tabs(); 1 Layout *layout = (Layout*)bw->render_layout;

MSG("a_UIcmd_close_bw bw=%p\n", bw); 2 a_Bw_stop_clients(bw, BW_Root + BW_Img + BW_Force); delete(layout); if (tabs) { tabs->remove_tab(ui); if (tabs->num_tabs() == 0) delete tabs->window(); } 3 a_Bw_free(bw); MSG("a_UIcmd_close_bw freed(bw=%p)\n", bw); }

Any help is appreciatted.

My guess is that it could be something like first, one bw asks for a page, and that becomes conn->bw, and then another bw starts also asking for the page, and the first bw closes, invalidating the conn->bw, and then the second bw closes and the cancel expect code tries to use the old bw.

"bw" is a bit a misleading name, bacause each tab is a bw. OTOH, each tab has its own control panel and thus becomes a browser window object. In the case you propose there'd be two different bw.

...

...
...
...
...
...
>>>>>

Bottom line: This bug is currently holding the dillo-3.0 release. Given the analysis, that nobody knows how to reproduce it, that even if real, it doesn't do much harm, the situation is: if there's no news in a couple of days, the release will be made. -- Cheers Jorge.-

corvid＠lavabit.com

5:59 p.m.

Jorge wrote:

...

On Thu, Sep 01, 2011 at 07:47:48PM +0000, corvid wrote:

...
Jorge wrote:

...
On Thu, Sep 01, 2011 at 10:56:19AM -0300, Jorge Arellano Cid wrote:

...
On Thu, Sep 01, 2011 at 03:38:58PM +0200, Johannes Hofmann wrote:

...
On Thu, Sep 01, 2011 at 09:57:39AM -0300, Jorge Arellano Cid wrote:

...
On Wed, Aug 31, 2011 at 09:42:21PM +0000, corvid wrote: > Jorge wrote:

...
...
...
> [...] > In this one > http://starurchin.org/dillo/valgrind/7ea9fc809376ddf7dde2908e2ecf999aea27413... > at least, it looks pretty clear that the bw is gone.

...
...
...
[...] Does the invalid read apply to 'bw' or 'bw->nav_expect_url'? (I don't know valgrind's semantics on it)

I'd guess bw is no longer pointing to valid memory (i.e. has been free'd) and therefore reading the 4 byte nav_expect_url pointer causes the valgrind message.

OK, this line gave light in that direction:

==28991== Address 0x6961a34 is 44 bytes inside a block of size 68 free'd

... and I observe that bw has 17 items, 17*4 = 68, and the expect URL is the 12th item, (12-1)*4 = 44.

So it's an already freed 'bw' and not its expected URL.

This starts to make sense... ;-)

BTW, why there are two different stack traces? What does the second one mean with respect to the first?

The top one shows the invalid read, and the bottom shows how the memory was freed that the top tried to read.

Thanks.

Well, in that case we have a weird situation:

Both stacks have:

==28991== by 0x8053F49: CustTabs::handle(int) (uicmd.cc:204)

Which translates into a bw (window or tab) that's closed twice by using the keyboard.

Strange as it seems, let's analyze it a bit: For __both_ cases:

1.- fltk is sending the SHORTCUT event for KEYS_CLOSE_TAB 2.- dillo receives it in CustTabs::handle() and gets the bw with:

UI *ui = (UI*)wizard()->value(); BrowserWindow *bw = a_UIcmd_get_bw_by_widget(ui);

(which means the ui is the current one and that the bw is in our valid bw list.)

3.- dillo calls a_UIcmd_close_bw() which:

a) stops the clients b) tabs->remove_tab(ui) // This switches to another bw and removes the ui from wizard

* Here the question is how can dillo find the ui and same bw, in the wizard as current, and in the valid bw list (after 3c), for a second time at 2.- ?

c) a_Bw_free(bw) // the bw is removed from our valid list and freed.

The only answer I can find for the above question is: The memory space for bw is reused with a new ui. This is:

* create a bw (e.g. address 0x888). * close the bw, free the bw (0x888 is available again) * create a bw (malloc reuses address 0x888). * close the bw // Here we have the situation! * free the bw (0x888 is available again)

which is compatible with:

"Valgrind Bug254420 - memory pool tracking broken" http://bugs.kde.org/show_bug.cgi?id=254420

but, of course, I may be wrong...

...
...
[...] OTOH, analyzing the code:

AFAIS:

* The only place where a bw is freed is in a_UIcmd_close_bw [3]: (the only call to a_Bw_free() in the codebase). * bw is valid at [1] * The problematic a_Nav_cancel_expect_if_eq() is called from inside the call chain at [2]. * No other a_UIcmd_close_bw() appears in the log, so how came bw to be invalid then?

void a_UIcmd_close_bw(void *vbw) { BrowserWindow *bw = (BrowserWindow *)vbw; UI *ui = BW2UI(bw); CustTabs *tabs = ui->tabs(); 1 Layout *layout = (Layout*)bw->render_layout;

MSG("a_UIcmd_close_bw bw=%p\n", bw); 2 a_Bw_stop_clients(bw, BW_Root + BW_Img + BW_Force); delete(layout); if (tabs) { tabs->remove_tab(ui); if (tabs->num_tabs() == 0) delete tabs->window(); } 3 a_Bw_free(bw); MSG("a_UIcmd_close_bw freed(bw=%p)\n", bw); }

Any help is appreciatted.

My guess is that it could be something like first, one bw asks for a page, and that becomes conn->bw, and then another bw starts also asking for the page, and the first bw closes, invalidating the conn->bw, and then the second bw closes and the cancel expect code tries to use the old bw.

"bw" is a bit a misleading name, bacause each tab is a bw.

OTOH, each tab has its own control panel and thus becomes a browser window object.

In the case you propose there'd be two different bw.

Right. Why do you feel that they can't be two bws that were both closed from the keyboard?

...

Bottom line:

This bug is currently holding the dillo-3.0 release.

Given the analysis, that nobody knows how to reproduce it, that even if real, it doesn't do much harm, the situation is: if there's no news in a couple of days, the release will be made.

No objections here.

jcid＠dillo.org

6:49 p.m.

On Fri, Sep 02, 2011 at 03:59:17PM +0000, corvid wrote:

...

Jorge wrote:

...
On Thu, Sep 01, 2011 at 07:47:48PM +0000, corvid wrote:

...
Jorge wrote:

...
On Thu, Sep 01, 2011 at 10:56:19AM -0300, Jorge Arellano Cid wrote:

...

...
...
...
[...] Any help is appreciatted.

My guess is that it could be something like first, one bw asks for a page, and that becomes conn->bw, and then another bw starts also asking for the page, and the first bw closes, invalidating the conn->bw, and then the second bw closes and the cancel expect code tries to use the old bw.

"bw" is a bit a misleading name, bacause each tab is a bw.

OTOH, each tab has its own control panel and thus becomes a browser window object.

In the case you propose there'd be two different bw.

Right. Why do you feel that they can't be two bws that were both closed from the keyboard?

I have no problems with that. What I don't see is two bws with the same memory address being closed from the keyboard. But now I see what you mean: which is a lingering bw in conn, (which is compatible with my analysis), as there's no need for it to exist in the UI when the event arrives. So we'de have two close events handled OK by two perfectly sane bws, but one call made on an already defunct bw (i.e. not present in the UI). And you hit it on the nail, because now I can reproduce it!!! -- Cheers Jorge.-

jcid＠dillo.org

9:55 p.m.

On Fri, Sep 02, 2011 at 01:48:29PM -0300, Jorge Arellano Cid wrote:

...

On Fri, Sep 02, 2011 at 03:59:17PM +0000, corvid wrote:

...
Jorge wrote:

...
On Thu, Sep 01, 2011 at 07:47:48PM +0000, corvid wrote:

...
Jorge wrote:

...
On Thu, Sep 01, 2011 at 10:56:19AM -0300, Jorge Arellano Cid wrote:

...
...
...
...
[...] Any help is appreciatted.

My guess is that it could be something like first, one bw asks for a page, and that becomes conn->bw, and then another bw starts also asking for the page, and the first bw closes, invalidating the conn->bw, and then the second bw closes and the cancel expect code tries to use the old bw.

"bw" is a bit a misleading name, bacause each tab is a bw.

OTOH, each tab has its own control panel and thus becomes a browser window object.

In the case you propose there'd be two different bw.

Right. Why do you feel that they can't be two bws that were both closed from the keyboard?

I have no problems with that.

What I don't see is two bws with the same memory address being closed from the keyboard.

But now I see what you mean: which is a lingering bw in conn, (which is compatible with my analysis), as there's no need for it to exist in the UI when the event arrives.

So we'de have two close events handled OK by two perfectly sane bws, but one call made on an already defunct bw (i.e. not present in the UI).

And you hit it on the nail, because now I can reproduce it!!!

Patch committed. FWIW, the testcase I used produced: ==17336== ERROR SUMMARY: 115 errors from 76 contexts (suppressed: 13 from 10) without the patch, and ==17468== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 13 from 10) with it. :-) -- Cheers Jorge.-

5243

Age (days ago)

5247

Last active (days ago)

List overview

Download

16 comments

4 participants

participants (4)

corvid＠lavabit.com
jcid＠dillo.org
johannes.hofmann＠gmx.de
onepoint＠starurchin.org