On Tue, 23 Apr 2024 23:29:45 +0200 Rodrigo Arias <rodarima@gmail.com> wrote:
On Sat, Apr 20, 2024 at 02:35:10PM +0200, Rodrigo Arias wrote:
Yeah, the current detection mechanism in Dillo for content types is not very good. It searches for the doctype line at the beginning of the document[1] but it doesn't handle comments.
[1]:https://github.com/dillo-browser/dillo/blob/v3.1.0-rc1/src/misc. c#L148
We should rely on the Content-Type provided by the server, or at least improve the detection.
So, this is a tricky case.
Dillo has several content types for a single document sorted by priority, the first one set defines the content type of the document:
1. The "override type" used to override the type (highest priority) 2. The "meta type" given by the <meta ... content="..."> tag in HTML 3. The "http type" given by the HTTP Content-Type header 4. The "guessed type" based on the document data (lowest priority)
Thanks for the explanation, this also makes clearer an issue I had with XHTML image indexes generated by ImageMagick Montage which were getting (by an unusual sequence of events) the incorrect HTTP Content-Type type of "text/xml" (and they don't contain a meta tag). They'd load properly via file:// but show as text over http://. Now I know to ideally force the HTTP Content-Type to "application/xhtml+xml" instead of "text/html" which I used to fix the problem originally.
Regarding the type guessing bug, I think I can improve it by assuming that if we find the "<!doctype html" string in the first 1024 bytes or so, it is an HTML-like type, but it incurrs in more overhead.
But if it aborts that search upon encountering the first thing that isn't "spaces, newlines, tabs, and comments", most text files will be detected within the first few bytes. I'm not sure how that approach would work with ImageMagick image index XHTML pages which start like this though: <?xml version="1.0" encoding="US-ASCII"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> Example: http://www.ombertech.com/cnk/dillo/STS-133_Pictures/photo_index.html I don't really understand how XHTML is supposed to work, and I don't have time to learn, so perhaps I'm ignoring a distinction between differet flavours of XHTML that can begin in different ways? Anyway I like how ImageMagick image map pages are viewable in Dillo at the moment.
So I think for now we can rely on the correction of "text/xhtml" to "application/xhtml+xml", which seems to work fine. I don't like adding quirks, but I will keep this one as it was already there. Here is the PR:
I've built Dillo from that branch and pages on www.lemis.com now render correctly, thanks! If I save the homepage as lemis.xhtml it still shows as plain text when loaded with file://, though it is rendered if the comments before <!DOCTYPE> are removed or if the original file is saved as lemis.html. Not much of an issue, but it could cause confusion for someone.