[Dillo-dev] Re: charset conversion issues

Nov. 28, 2007

      Hi,

On Sat, Nov 24, 2007 at 10:50:18PM +0000, place wrote:
...
With rough code to look at.
No sense polishing up and completing what I have yet when you'll want changes...
OK.
...
Charset in the meta tag. 
My options seemed to be:
- Client->Callback returns information to cache.
  Probably not supposed to happen. Too big of a change.
Agreed. At least to postpone until other approaches prove messier. ;-)
...
- Reach the meta tag and undo/redo within html.cc.
  Somewhat fragile.
  Whatever I might come up with, some html out there would surely find a way
  to outsmart me.
This is the one I like most at first sight.

  Considering the charset can be given by HTTP or a META element:

  We  can assume ASCII in the html text until the HEAD element is
closed. If there's a charset in the META, then the decoder can be
switched from null to the specified one.

  This  approach  has  the  advantage  of  working  both when the
charset comes via HTTP or META (<HEAD> content is ASCII).

  We  can  even add a text buffer for the HEAD element and append
it to the whole HTML content if the offset is hard to set for the
new decoder.

  Do the problems you found apply to this scheme too?
...
- Bail out and start again.
  This is what I have now. In fact, not knowing at all what to do, I just
  call a_Nav_cancel_expect() and a_Nav_push() right then and there.
  Surprisingly, it functions instead of crashing, but I'm probably spilling
  resources And Violating Principles.
  (And, since a_Nav_push() isn't quite right for the situation even then,
   I have to fix the nav stack ptr in Nav_cleanup())
There's a dialog box to set the Content-Type for a page. If you change it
to text/plain, you get source in a proper window (with searching) instead
of the little fltk window. No line numbers, though.
The idea of using Dillo for view source is interesting, i'm not
sure though. Maybe a fltk widget (like an stripped-down editor as
the   one  in  FLTK's  test/  directory)  which  gets  utf-8  via
a_Capi_get_utf8_data(Url) is also good.
...
Not wanting to mess with TypeDet and TypeHdr until finding out what you
think, I stuck in another one, TypeSet.
I'd prefer to try to apply the second method (described above).
...
I'm directly calling a_Cache_[sg]et_content_type() from just wherever,
which may or may not be all right.
Cache should be called from Capi only, but at this stage...
...
I knew that latin1 != UTF-8, but I was not thinking when I wrote
that Decode code. Must've been thinking about the Unicode code numbers.
FLTK has a workaround in it to deal with fools like me who send it latin1
or cp1252, but better that I stop being foolish :)
I don't have Plain or View Source set up to decode yet. I don't know whether
the cache should know about decoding charsets. Or a Text base class or
something.
The real-type-matches-header-type code will have to learn to relax a bit,
now that html might contain all kinds of bytes...
-- 
  Cheers
  Jorge.-

[Dillo-dev] Re: charset conversion issues

jcid＠dillo.org