[Dillo-dev] Re: Dillo not rendering HTML with comments before <!DOCTYPE>

April 24, 2024

      On Tue, 23 Apr 2024 23:29:45 +0200
Rodrigo Arias <rodarima@gmail.com> wrote:
...
On Sat, Apr 20, 2024 at 02:35:10PM +0200, Rodrigo Arias wrote:
...
Yeah, the current detection mechanism in Dillo for content types is 
not very good. It searches for the doctype line at the beginning of 
the document[1] but it doesn't handle comments.
[1]:https://github.com/dillo-browser/dillo/blob/v3.1.0-rc1/src/misc.
c#L148
We should rely on the Content-Type provided by the server, or at
least improve the detection.
So, this is a tricky case.
Dillo has several content types for a single document sorted by 
priority, the first one set defines the content type of the document:
1. The "override type" used to override the type (highest priority)
2. The "meta type" given by the <meta ... content="..."> tag in HTML
3. The "http type" given by the HTTP Content-Type header
4. The "guessed type" based on the document data (lowest priority)
Thanks for the explanation, this also makes clearer an issue I had with
XHTML image indexes generated by ImageMagick Montage which were getting
(by an unusual sequence of events) the incorrect HTTP Content-Type type
of "text/xml" (and they don't contain a meta tag). They'd load properly
via file:// but show as text over http://. Now I know to ideally force
the HTTP Content-Type to "application/xhtml+xml" instead of "text/html"
which I used to fix the problem originally.
...
Regarding the type guessing bug, I think I can improve it by assuming
that if we find the "<!doctype html" string in the first 1024 bytes or
so, it is an HTML-like type, but it incurrs in more overhead.
But if it aborts that search upon encountering the first thing that
isn't "spaces, newlines, tabs, and comments", most text files will be
detected within the first few bytes.

I'm not sure how that approach would work with ImageMagick image index
XHTML pages which start like this though:
  <?xml version="1.0" encoding="US-ASCII"?>
  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Example:
http://www.ombertech.com/cnk/dillo/STS-133_Pictures/photo_index.html

I don't really understand how XHTML is supposed to work, and I don't
have time to learn, so perhaps I'm ignoring a distinction between
differet flavours of XHTML that can begin in different ways? Anyway I
like how ImageMagick image map pages are viewable in Dillo at the
moment.
...
So I think for now we can rely on the correction of "text/xhtml" to
"application/xhtml+xml", which seems to work fine. I don't like adding
quirks, but I will keep this one as it was already there. Here is the 
PR:
https://github.com/dillo-browser/dillo/pull/140
I've built Dillo from that branch and pages on www.lemis.com now render
correctly, thanks! If I save the homepage as lemis.xhtml it still shows
as plain text when loaded with file://, though it is rendered if the
comments before <!DOCTYPE> are removed or if the original file is saved
as lemis.html. Not much of an issue, but it could cause confusion for
someone.

[Dillo-dev] Re: Dillo not rendering HTML with comments before <!DOCTYPE>

Kevin Koster