Improved stability patch for dpi framework!
Hi Ferdi, (et all) After spending the last weeks hunting for some bugs in the dpi framework deeply, I've succeeded to track them down, and finally, to fix them! From time to time there were multiple dpid instances, this was due to a race condition while waiting for dpid to be "online"; if the process took around a second, dillo could end launching multiple instances of the daemon (I left a commented sleep in the dpid code that served to reliably reproduce it). Another nasty bug was that sometimes dillo crash-locked when requesting a dpi service, for instance: launch dillo -> ask for bm -> kill dpid -> ask for bm -> crash ...this was because of FD inheritance! The bm dpi survived, and kept its parent's FD open, so the connect-test succeeded but there was no server to answer. This couple was really hard to track. As a by-product, lsof showed a FD leak in dpi.c (one per dpi request) and another for dillorc. All of these bugs are fixed in CVS. The good news is that this is a big step towards 0.8.0 release because the stabilization of the dpi-framework is a necessity. There is still at least one pending thing to polish, but it is not a big problem to live with it in 0.8.0 because it causes little harm. This is: if dpid crashes (or is killed), the dpi servers will survive, but the new dpid will launch new processes for them (the old ones will last until their bore-time is reached). No crash, no locks. My first idea was to make a connection-test to dpi servers from dpid at start time, but unfortunately, the parent-child relationship would be lost for the second dpid, so we have to find another way to tell dpid that a dpi-server has finished. The "srs" may be used, although it may be a bit tricky to do when the dpi server has crashed; anyway, the connection test from dpid can detect the crash and bypass the old server with a new one. Cheers Jorge.-
participants (1)
-
Jorge Arellano Cid