[Dillo-dev] Fwd: Re: Dillo early exit

Dec. 15, 2012

      On Fri, Dec 14, Jorge Arellano Cid wrote:
...
On Fri, Dec 14, 2012 at 04:48:23PM +0100, Sebastian Geerken wrote:
...
The HTML parser passes invalid UTF-8 to dw::Textblock. I will make
nextUtf8Char more robust (of course, dillo should not crash), but
Jorge's page is HTML, encoded in ISO-8859-1, not UTF-8, as seen here:
000009b0  34 38 22 3e 2d 20 4b 65  79 73 74 72 6f 6b 65 20  |48">- Keystroke |
000009c0  4c 6f 67 67 69 6e 67 20  77 69 74 68 20 42 65 61  |Logging with Bea|
000009d0  63 6f 6e 20 ab 20 53 74  72 61 74 65 67 69 63 20  |con . Strategic |
                      ^^
It seems that the Fltk functions do some checks, and sometimes decode
as ISO-8859-1.
AFAIR from comments in fltk, some utf8 functions dealt with mixed
latin1, utf8 and some windows codec.
They got into it because the mix was inevitable for them.
I've modified my code so that it works in a similar way, but I've not
yet cared about the differences between ISO-8859-1, ISO-8859-15, and
Windows-1252. Anyway these differences are marginal.

However, IMO there should be a conversion to clean UTF-8 so that only
a small part of dillo should have to bother about such problems, while
most parts can rely on clean UTF-8. (Something to consider after the
release.)

Sebastian

[Dillo-dev] Fwd: Re: Dillo early exit

sgeerken＠dillo.org