tight loop on IO error
When viewing the page [warning, many images]: <URL:http://primates.ximian.com/~miguel/all.html> dillo runs into a tight loop when there's a broken pipe error on retrieving one of the last couple of images. I've tracked down the cause; I'm running the 0.7.3 code, with Frank's patch, but have confirmed the presence of the code in HEAD via the WebCVS interface (wonderful thing). In IO/IO.c:IO_write(), if the write() returns (-1) then unless the error is one of EINTR or EGAIN, the case isn't handled and the "while (1) {}" loop runs wild. With a quick change (including new debug statement) I now see: IO_callback: [GIOcond 4] IO_write IO_write: Broken pipe [errno 32] [St -1] IO_write: aborting: Broken pipe [errno 32] The patch includes some extra cosmetic DEBUG changes which can be safely discarded (and the DEBUG_LEVEL change probably should be ...) -----------------------------< cut here >------------------------------- --- /home/pdp/tmp/zshmiVuTI Mon Sep 1 13:43:24 2003 +++ IO.c Mon Sep 1 13:42:58 2003 @@ -30,7 +30,7 @@ #include "../klist.h" #include "IO.h" -#define DEBUG_LEVEL 5 +#define DEBUG_LEVEL 1 #include "../debug.h" @@ -399,8 +399,12 @@ St = read(io->FD, io->Buf, io->BufSize); io->Status = St; - DEBUG_MSG(3, " IO_read: %s [errno %d] [St %d]\n", - g_strerror(errno), errno, St); + if (errno) { + DEBUG_MSG(3, " IO_read: %s [errno %d] [St %d]\n", + g_strerror(errno), errno, St); + } else { + DEBUG_MSG(3, " IO_read: [St %d] success\n", St); + } if ( St < 0 ) { /* Error */ @@ -446,18 +450,27 @@ St = write(io->FD, io->Buf, io->BufSize); io->Status = St; - DEBUG_MSG(3, " IO_write: %s [errno %d] [St %d]\n", - g_strerror(errno), errno, St); + if (errno) { + DEBUG_MSG(3, " IO_write: %s [errno %d] [St %d]\n", + g_strerror(errno), errno, St); + } else { + DEBUG_MSG(3, " IO_write: success [St %d]\n", St); + } if ( St < 0 ) { /* Error */ io->Status = -errno; - if (errno == EINTR) { + switch (errno) { + case EINTR: continue; - } else if (errno == EAGAIN) { + case EAGAIN: DEBUG_MSG(4, " IO_write: EAGAIN\n"); ret = TRUE; break; + default: + DEBUG_MSG(5, " IO_write: aborting: %s [errno %d]\n", + g_strerror(errno), errno); + return FALSE; } } else if ( St < io->BufSize ){ /* Not all data written */
Hi, On Mon, Sep 01, 2003 at 11:45:06AM +0000, Phil Pennock wrote:
When viewing the page [warning, many images]: ^^^^^^^^^^^^^^^^^^^^ and incorrect HTML :-) <URL:http://primates.ximian.com/~miguel/all.html> dillo runs into a tight loop when there's a broken pipe error on retrieving one of the last couple of images.
I've noticed similar behaviour a lot of times on the FreeBSD machine here but not on my Linux box. Since I couldn't actually pin point the exact problem I did not file a bug report yet. Occasionally I would use gdb to see what's up and found indeed broken pipes - but not always. Also it seemed to be hard to reproduce on my Linux machine.
I've tracked down the cause; I'm running the 0.7.3 code, with Frank's patch, but have confirmed the presence of the code in HEAD via the WebCVS interface (wonderful thing).
I noticed my lock-ups on both the current CVS version, the 0.7.3 release version and patched versions.
In IO/IO.c:IO_write(), if the write() returns (-1) then unless the error is one of EINTR or EGAIN, the case isn't handled and the "while (1) {}" loop runs wild.
The error conditions and the error handling is probably different enough on FreeBSD vs. Linux so that I notice it more on FreeBSD. Now, that the problem is clear, here is another way to trigger the problem on FreeBSD : try to access a non-responsive web server, and eventually CVS-dillo will freeze not noticing the timeout error, with the patch it no longer freezes ! Great ! (right now you can try www.sco.com, another DOS-attack ? :-) ... and wait 1-2 minutes until the timeout kicks in and watch the unpatched dillo hang with high CPU usage) Cheers, Andreas -- **************************** NEW ADDRESS ****************************** Hamburger Sternwarte Universitaet Hamburg Gojenbergsweg 112 Tel. ++49 40 42891 4016 D-21029 Hamburg, Germany Fax. ++49 40 42891 4198
On 2003-09-01 at 14:38 +0200, Andreas Schweitzer wrote:
I've noticed similar behaviour a lot of times on the FreeBSD machine here but not on my Linux box. Since I couldn't actually pin point the exact problem I did not file a bug report yet.
This was FreeBSD too. Perhaps some different error-code elsewhere? All I did was fix the problem for me -- it's possible that there's still some object cleanup elsewhere that really needs doing ... -- 2001: Blogging invented. Promises to change the way people bore strangers with banal anecdotes about their pets. <http://www.thelemon.net/issues/timeline.php>
All I did was fix the problem for me -- it's possible that there's still some object cleanup elsewhere that really needs doing ...
It almost looks like that there needs some more digging. I did some tests comparing Linux and FreeBSD. 1) The patch needs to be slightly modified for the Linux version, otherwise dillo won't even start correctly :-) 2a) When the timeout occurs while contacting unresponsive sites: The Linux version actually stops eventually and puts "ERROR: unable to connect to remote host" in the status bar. However, none of the error debug messgaes actually occur ... 2b) Same as 2a but under FreeBSD: dillo still keeps the "Contacting host" in the status bar, and the stop-icon is still unfaded. And the error message gets dumped as indicated in the patch. But I'm afraid the logic how all these routines are accessed is a bit too compliacted for me right now :-) Cheers, Andreas -- **************************** NEW ADDRESS ****************************** Hamburger Sternwarte Universitaet Hamburg Gojenbergsweg 112 Tel. ++49 40 42891 4016 D-21029 Hamburg, Germany Fax. ++49 40 42891 4198
On 2003-09-01 at 16:38 +0200, Andreas Schweitzer wrote:
I did some tests comparing Linux and FreeBSD. 1) The patch needs to be slightly modified for the Linux version, otherwise dillo won't even start correctly :-)
*blinks repeatedly* In that case, there's something happening on Linux where a write() at startup (cookie management, perhaps?) is failing the first time and succeeding later on, trying busily until it succeeds. This is the only thing that I can think of. Because the only _real_ change (ie: ignoring the cosmetic fluff to make the debugging easier on my eyes) was to change from: if (failure = a) { foo; /* break or continue or return */ } else if (failure = b) { bar; } to this: select(failure) { case a: foo; /* break or continue or return */ case b: bar; /* break or continue or return */ default: return FALSE; } This went on the theory that it's easier to spot that you need a default (and force yourself to always write a default handler) with the switch(){} syntax. I then went for the simplest handler. IF this changes behaviour from "appears to work" to "won't do X", then "X" relies upon repeatedly trying the write() inside a busy loop. Provided that the patch applied cleanly, that is. *shrugs* -- 2001: Blogging invented. Promises to change the way people bore strangers with banal anecdotes about their pets. <http://www.thelemon.net/issues/timeline.php>
participants (3)
-
Andreas Schweitzer
-
Phil Pennock
-
Phil Pennock