patch: meta charset changes decoder
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
You mean, the decoder injection usually happens in the javascript-and-css section, right? Please post some test-case URLs. -- Cheers Jorge.-
Jorge wrote:
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
You mean, the decoder injection usually happens in the javascript-and-css section, right?
Please post some test-case URLs.
Ummm, okay. I'll search for ????????? and see what comes up, looking for pages where meta is setting it to something other than utf-8. http://www.nbu.bg/ works. http://www.biforum.org/ works. http://bgstudent.8m.com/ works. http://www.newobjects.com/dict.asp works. http://www.mastylo.net/ fails that content-type testing in misc.c. http://bultext.tripod.com/ works. (tripod's still around?) http://textove.com/ works. http://bgstories.athost.net/ works. http://www.bglekar.com/ works even though it sets charset to ISO-8859-1! (it's all entities) Getting bored, but I'll keep going until one breaks by having body text in the first packet. http://www.bds-bg.org/ works. http://www.kursove-neg.com/ works. http://avast.110mb.com/ fails that content-type testing in misc.c. http://esperanto.vnvsoft.com/ works. http://www.dfbulgaria.org/ works. http://www.bghelsinki.org/ works. http://www.bcnl.org/ works. http://www.angelfire.com/ca/canbul/Bcacb.html works. http://www.ibl.bas.bg/ works. http://bgjedi.com/ breaks. So no Bulgarian jedi for dillo yet. And of course the titles are all broken, but my window manager can only understand latin1 anyway.
Hi, On Fri, Mar 07, 2008 at 04:21:16PM +0000, place wrote:
Jorge wrote:
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
You mean, the decoder injection usually happens in the javascript-and-css section, right?
Please post some test-case URLs.
Ummm, okay. I'll search for ?????????????????? and see what comes up, looking for pages where meta is setting it to something other than utf-8.
OK, committed as is. I tried to make a "clean" switch by re-starting parsing all over again, but it needs more work (prototype works but unstable).
http://www.nbu.bg/ works. http://www.biforum.org/ works. http://bgstudent.8m.com/ works. http://www.newobjects.com/dict.asp works. http://www.mastylo.net/ fails that content-type testing in misc.c. http://bultext.tripod.com/ works. (tripod's still around?) http://textove.com/ works. http://bgstories.athost.net/ works. http://www.bglekar.com/ works even though it sets charset to ISO-8859-1! (it's all entities)
Getting bored, but I'll keep going until one breaks by having body text in the first packet.
http://www.bds-bg.org/ works. http://www.kursove-neg.com/ works. http://avast.110mb.com/ fails that content-type testing in misc.c. http://esperanto.vnvsoft.com/ works. http://www.dfbulgaria.org/ works. http://www.bghelsinki.org/ works. http://www.bcnl.org/ works. http://www.angelfire.com/ca/canbul/Bcacb.html works. http://www.ibl.bas.bg/ works. http://bgjedi.com/ breaks. So no Bulgarian jedi for dillo yet.
Thanks for the examples.
And of course the titles are all broken, but my window manager can only understand latin1 anyway.
Oh, titles and local files work here (with the unstable prototype). -- Cheers Jorge.-
Hi, On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
In CVS there is a new approach that restarts parsing the whole page. It works with local/remote files, and page title. Please test it. -- Cheers Jorge.-
Jorge wrote:
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
In CVS there is a new approach that restarts parsing the whole page. It works with local/remote files, and page title. Please test it.
In the past, I've encountered pages that go <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> </head> Here's code just sticking in a flag when meta charset's been seen.
On Thu, Mar 13, 2008 at 04:21:42PM +0000, place wrote:
Jorge wrote:
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
In CVS there is a new approach that restarts parsing the whole page. It works with local/remote files, and page title. Please test it.
In the past, I've encountered pages that go <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> </head>
Here's code just sticking in a flag when meta charset's been seen.
Committed a slightly modified version. Please check. -- Cheers Jorge.-
On Sun, Mar 16, 2008 at 09:34:13AM -0400, Jorge Arellano Cid wrote:
On Thu, Mar 13, 2008 at 04:21:42PM +0000, place wrote:
Jorge wrote:
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
In CVS there is a new approach that restarts parsing the whole page. It works with local/remote files, and page title. Please test it.
In the past, I've encountered pages that go <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> <meta http-equiv="content-type" content="text/html; charset=iso-8859-1"> </head>
Here's code just sticking in a flag when meta charset's been seen.
Committed a slightly modified version. Please check.
ebay.com works again for me. Thanks, Johannes
Hi, On Thu, Mar 13, 2008 at 10:49:47AM -0400, Jorge Arellano Cid wrote:
Hi,
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
In CVS there is a new approach that restarts parsing the whole page. It works with local/remote files, and page title. Please test it.
Not 100% sure if it's this change, but since recently dillo-fltk loops for me on http://www.ebay.com: META Content-Type changes charset to: ISO-8859-1 Nav_open_url: new url='http://www.ebay.com/' HTTP Content-Type gave charset as: ISO-8859-1 META Content-Type gave charset as: ISO-8859-1 FltkViewBase::drawTotal META Content-Type changes charset to: Cp1252 Nav_open_url: new url='http://www.ebay.com/' HTTP Content-Type gave charset as: ISO-8859-1 META Content-Type gave charset as: Cp1252 FltkViewBase::drawTotal META Content-Type changes charset to: ISO-8859-1 Nav_open_url: new url='http://www.ebay.com/' HTTP Content-Type gave charset as: ISO-8859-1 META Content-Type gave charset as: ISO-8859-1 FltkViewBase::drawTotal .... Cheers, Johannes
Johannes wrote:
On Thu, Mar 13, 2008 at 10:49:47AM -0400, Jorge Arellano Cid wrote:
Hi,
On Fri, Mar 07, 2008 at 06:17:17AM +0000, place wrote:
This is no sort of proper fix whatsoever, but I was tired of seeing the "would set charset" message. It's actually surprisingly useful for remote files, since they're generally still at the javascript-and-css point at the end of the first packet.
In CVS there is a new approach that restarts parsing the whole page. It works with local/remote files, and page title. Please test it.
Not 100% sure if it's this change, but since recently dillo-fltk loops for me on http://www.ebay.com:
META Content-Type changes charset to: ISO-8859-1 Nav_open_url: new url='http://www.ebay.com/' HTTP Content-Type gave charset as: ISO-8859-1 META Content-Type gave charset as: ISO-8859-1 FltkViewBase::drawTotal META Content-Type changes charset to: Cp1252 Nav_open_url: new url='http://www.ebay.com/' HTTP Content-Type gave charset as: ISO-8859-1 META Content-Type gave charset as: Cp1252 FltkViewBase::drawTotal META Content-Type changes charset to: ISO-8859-1 Nav_open_url: new url='http://www.ebay.com/' HTTP Content-Type gave charset as: ISO-8859-1 META Content-Type gave charset as: ISO-8859-1 FltkViewBase::drawTotal
I should have made it clearer that my little patch from a day ago was intended to fix such looping.
participants (3)
-
jcid@dillo.org
-
Johannes.Hofmann@gmx.de
-
place@gobigwest.com