New subject: Accept-Encoding: gzip

Nov. 13, 2007

      Hi,

  [gzip decoder]

On Mon, Nov 12, 2007 at 02:52:43PM +0000, place wrote:
...
Hmm, how unfortunate. It had been working just fine for me, though I took it
out yesterday so that I could have a clean tree... I wonder whether
we're constantly going to see timing problems or something, where the
people with fast machines write things that work on fast machines and
the person with a slow machine will write things that work on a slow machine.
Good news!  
  Committed.

  I found a workaround for the segfault and decided to
commit to allow further investigation and polishing from in-CVS code.

  It doesn't look like a race condition, but a problem of handling redirections
and the null_decoder in cache.c (redirections is a somewhat ad-hoc code, not
well designed yet).

  Attached goes a page to reproduce the segfault (without the patch in CVS).
For instance:

  1- save the attached page in /tmp
  2- dillo-fltk /tmp
  3- click on the page
  4- go back
  5- go forward (segfault) // you may need to repeat this 4 and 5

BTW, the workaround is mainly:

    -   dStr_append_l(entry->Data, buf, (int)buf_size);
    +   /* Assert we have a Decoder.
    +    * BUG: this is a workaround, more study and a proper design
    +    * for handling redirects is required */
    +   if (entry->Decoder != NULL) {
    +      decodedBuf = a_Decode_process(entry->Decoder, buf, buf_size);
    +      dStr_append_l(entry->Data, decodedBuf->str, decodedBuf->len);
    +      dStr_free(decodedBuf, 1);
    +   } else {
    +      dStr_append_l(entry->Data, buf, buf_size);
    +   }

  With regard to:

// Doesn't work. I could make TotalSize into something like BytesRemaining,
// seeing whether it goes precisely to 0.

  I'd prefer TransferSize (this is the whole http transfer size minus the
Header length --i.e. Content-Length).

  With regard to the iconv decoder:

  Please  note  that  we  may still need the original data, to be
able  to  save verbatim. This is, if the original page is encoded
in  latin2  (with a <meta http-equiv charset line in the source),
and  it  is  saved  translated to UTF-8 we have two problems (the
misleading  "meta" and that the user will get a page that's not a
verbatim copy of the original).

  One way to solve this is to re-encode into the original charset
at  save  time. This is not 8bit clean but could work most of the
time.

  Another  way  is  to  keep a copy of the verbatim data. In this
case we're 8bit clean and it would only take more memory when the
original is not UTF-8.

  I *feel* 8bit clean is the correct path, and here there're lots
of ways to optimize. For instance, with UTF-8 pages we don't need
an  extra  buffer. For pages that need one, we can deallocate the
UTF-8  encoded one when leaving the page (and re-create it if the
page is visited again).

  This is just some food for thought.

-- 
  Cheers
  Jorge.-

Re: Accept-Encoding: gzip

jcid＠dillo.org

place＠gobigwest.com

Johannes.Hofmann＠gmx.de

jcid＠dillo.org

place＠gobigwest.com

jcid＠dillo.org

tags

participants (3)