Re: [Dillo-dev]pthreads

Nov. 28, 2004

      On Fri, Nov 26, 2004 at 02:14:53PM -0200, Livio Baldini Soares wrote:
...
Hi Jorge!
Jorge Arellano Cid writes:
...
Errata:
...
but it usually ends with:
jcid 19262 0.0 0.4 1652 400 pts/10 S+ 08:18 0:00 ./pth_mem2 cd
[...]
jcid 19027 0.0 0.4 2676 408 pts/10 S+ 08:14 0:00 ./pth_mem2 cd
Oh, the the PID and time in the above example are quoted wrong,
it should say:
jcid 19262 0.0 0.4 1652 400 pts/10 S+ 08:18 0:00 ./pth_mem2 cd
[...]
jcid 19262 0.0 0.4 2676 408 pts/10 S+ 08:18 0:00 ./pth_mem2 cd
I can't really tell if that's a  leak or not. In my system, there is
an increase use of memory  between the first iteration and the second,
but from then on, things become  _very_ stable (box with Linux 2.4 and
libc 2.3.2 from Debian/unstable):
I  understand.  In  my  box  the "leak" usually occurs near the
fifth if at all...
...
livio 20097 0.0 0.0 1664 336 pts/21 S+ 10:24 0:00 ./pth_mem cd
livio 20097 0.0 0.0 32556 468 pts/21 S+ 10:24 0:00 ./pth_mem cd
livio 20097 0.0 0.0 1796 396 pts/21 S+ 10:24 0:00 ./pth_mem cd
livio 20097 0.0 0.0 32556 476 pts/21 S+ 10:24 0:00 ./pth_mem cd
livio 20097 0.0 0.0 1796 396 pts/21 S+ 10:25 0:00 ./pth_mem cd
So there  is certainly an increase in  memory use (in my  case a lot
smaller than  yours). But  look, there is  no leak of  memory, because
with each interation, the memory is equal to the previous phase.
Or do you get an increase of 0.5-1 MiB for _each_ phase?
No but I get a "random" one in one of them.

  For instance:

  I just run this one and there was no leak:

<q>
USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
jcid     26194  0.0  0.3  1640  336 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  476 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  484 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  484 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  484 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  484 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  484 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  484 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.5 32412  484 pts/29   S+   17:50   0:00 ./pth_mem2 cd
jcid     26194  0.0  0.4  1652  404 pts/29   S+   17:50   0:00 ./pth_mem2 cd
</q>

   Then I run this other one (w/leak):

<q>
jcid     26878  0.0  0.3  1640  336 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 32412  472 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.4  1652  400 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 32412  480 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.4  1652  400 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 32412  480 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.4  1652  400 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 33436  484 pts/29   S+   18:27   0:00 ./pth_mem2 d   *
jcid     26878  0.0  0.4  2676  404 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 33436  484 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.4  2676  404 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 33436  484 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.4  2676  404 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 33436  484 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.4  2676  404 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.5 33436  484 pts/29   S+   18:27   0:00 ./pth_mem2 d
jcid     26878  0.0  0.4  2676  404 pts/29   S+   18:27   0:00 ./pth_mem2 d
</q>

  I can easily get results with an without leaks for the three ways though.
As from my tests, it seems like the "leak" can appear on any iteration
...
To make my point clearer, look at how my home box handles the fixed
test program  (I'm using  Linux kernel 2.6  with NTPL, and  libc 2.3.2
from Debian/unstable):
livio 4300 0.0 0.0 1440 316 pts/7 S+ 10:40 0:00 ./pth_mem d
livio 4300 0.0 0.0 93812 432 pts/7 Sl+ 10:40 0:00 ./pth_mem d
livio 4300 0.0 0.0 34356 372 pts/7 S+ 10:40 0:00 ./pth_mem d
livio 4300 0.0 0.0 93812 436 pts/7 Sl+ 10:40 0:00 ./pth_mem d
etc. Even  though I  use humengeous use  of memory, I'm  pretty sure
it's  not leaking  (since the  numbers are  the same  after  the first
iteration).
A thing that's not happening on my box.
...
Looking at  straces, I see that  libc is doing  mmap2() on anonymous
memory for  allocation, and in  the first phase, actually  allocates 8
MiB worth  of stack for each  thread! (I know it's  stack because it's
setting    PROT_EXEC    on,    whoch    it   doesn't    for    regular
mallocs()).  Subsequently, each  thread mmap2()  _2_ MiB  of anonymous
memory (instead of only one).
Anyways, not wanting to get too  much into the details, it's hard to
predict what your libc is doing for you. I can see that libc correctly
munmaps()   all   the  mapped   segments   (effectively  freeing   the
memory). _And_ the VSZ stays the same throughout interations.
REMINDER:  In  this  case  a  big  VSZ  does  not  mean  big  memory
usage. It means  a _potential_ to use a lot of  memory (I've seen rare
cases where a  thread actually makes use of more than  1MiB of stack,
for example,  let alone 8MiB). If  the mmap2() is is  never touched, a
page never gets assigned to the process' page table.
BTW,  Is the VSZ reserved somehow, on swap at least? or is just
a potential size that we can happily ignore?

  I'm  adding  six more test results, this time with Dillo and
a small page with a hundred tiny images. It has three samples for
each technique:

  Note:  I'm  using  the file dpi server, so dillo uses a pthread
for each IO.

Using 'd' technique:
<q>
USER PID %CPU %MEM VSZ   RSS   TTY STAT START TIME COMMAND
                   13820 11168                     ./dillo
                   15492 12624                     ./dillo
                   15600 12736                     ./dillo
</q>

Using 'cd' technique:
<q>
USER PID %CPU %MEM VSZ   RSS   TTY STAT START TIME COMMAND
                   15496 12640                     ./dillo
                    9272  6888                     ./dillo
                   10408  7940                     ./dillo
</q>

  Here  the  RSS  is smaller for 'cd' so I opted this one for the
interim patch just commited to CVS.

-- 
  Cheers
  Jorge.-