[Dillo-dev]More on the UTF-8 issue
First of all, I would like to thank Jorge for working on Dillo; it fills the need for there to be a lightweight *nix browser for older systems and times when one doesn't want to devote a lot of system resources just to access the world wide web. It is an excellent browser. When performing cross-browser verification, I discovered that Dillo 0.8.1 does not honor the following line in an HTML header: <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> Despite the presence of this line, Dillo assumed that the web page was encoded in Latin-1. Jorge, in a message with the subject "UTF-8", sent on June 3rd, wondered why someone would encode an English-language webpage with UTF-8. I can not speak for other webmasters, but the reason I personally encode my webpage in UTF-8 is because I do my web design on a Fedora Core 2 box, an OS that generally assumes that plain text files are encoded with Unicode instead of the iso-8859 mess. Also, Unicode gives one the flexibility to have Greek letters for Greek words, Cryllic letters for Russian words, and what not. Not to mention a number of attractive typographical symbols that Latin 1 does not have. Now, obviously, considering Dillo's target audience, it doesn't need full Unicode support. But it would be nice if it could handle basic UTF-8 encoded Latin-1 characters in web pages [1]. As an aside, I think Dillo currently has excellent CSS support (namely, none whatsoever). The thinking behing CSS is that a webpage should be perfectly readable on a browser with no CSS support; the content (if not the presentation) of a CSS webpage is perfectly accessible from a non-CSS browser. I come from the "CSS shouldn't be done at all, or should be done perfectly" camp. Considering the number of hacks web designers have come up to make CSS visible to this browser yet invisible to that browser [2], the last thing a webmaster wants is yet another buggy CSS implementation. - Sam [1] Some hacky code I once wrote, that reads UTF-8 from standard input, and outputs iso 8859-1: if(c<128) { printf("%c",c); } else { if(c < 0xe0) { /* two-byte sequence */ v = c & 0x1f; v <<= 6; c = getc(stdin); v = v + (c & 0x3f); printf("%c",v); } else { /* multi-byte sequence */ while(c & 0xc0 == 0x80 && !feof(stdin)) { c = getc(stdin); } } } (This code, FWIW, is public domain) It is probably better to simply use iconv. [2] http://centricle.com/ref/css/filters/ shows all of the common CSS hacks webmasters do these days.
On Wed, 16 Jun 2004 sam+dillo@chaosring.org wrote:
Jorge, in a message with the subject "UTF-8", sent on June 3rd, wondered why someone would encode an English-language webpage with UTF-8. [ ... reasons snipped ... ] Now, obviously, considering Dillo's target audience, it doesn't need full Unicode support. But it would be nice if it could handle basic UTF-8 encoded Latin-1 characters in web pages [1]. [...snip...] [1] Some hacky code I once wrote, that reads UTF-8 from standard input, and outputs iso 8859-1:
if(c<128) { printf("%c",c); } else { if(c < 0xe0) { /* two-byte sequence */ v = c & 0x1f; v <<= 6; c = getc(stdin); v = v + (c & 0x3f); printf("%c",v); } else { /* multi-byte sequence */ while(c & 0xc0 == 0x80 && !feof(stdin)) { c = getc(stdin); } } }
(This code, FWIW, is public domain)
It is probably better to simply use iconv.
Just as a note, I find no iconv(1) or iconv(3) on my (old!) system, so ./configure would have to be able to make some decisions if things went that way... -- -- David McKee -- dmckee@jlab.org -- (757) 269-7492 (Office)
On Wed, Jun 16, 2004 at 12:46:17PM -0700, sam+dillo@chaosring.org wrote: <snip>
As an aside, I think Dillo currently has excellent CSS support (namely, none whatsoever). The thinking behing CSS is that a webpage should be perfectly readable on a browser with no CSS support; the content (if not the presentation) of a CSS webpage is perfectly accessible from a non-CSS browser.
I come from the "CSS shouldn't be done at all, or should be done perfectly" camp. Considering the number of hacks web designers have come up to make CSS visible to this browser yet invisible to that browser [2], the last thing a webmaster wants is yet another buggy CSS implementation.
I couldn't agree more, and it's one of the reasons I use Dillo--to view pages according to their HTML structure. It's when you look at css-heavy pages that you appreciate webmasters that use the visibility feature of css to hide the navigation jumps to css-aware browsers; they become very user-friendly to non-css aware browsers. I suppose lynx would give the same effect :) Todd
participants (3)
-
David McKee
-
sam+dillo@chaosring.org
-
Todd Slater