[Dillo-dev] Re: Dillo not rendering HTML with comments before <!DOCTYPE>

April 25, 2024

      Hi,

On Thu, Apr 25, 2024 at 09:23:38AM +1000, Kevin Koster wrote:
...
Thanks for the explanation, this also makes clearer an issue I had with
XHTML image indexes generated by ImageMagick Montage which were getting
(by an unusual sequence of events) the incorrect HTTP Content-Type type
of "text/xml" (and they don't contain a meta tag). They'd load properly
via file:// but show as text over http://. Now I know to ideally force
the HTTP Content-Type to "application/xhtml+xml" instead of "text/html"
which I used to fix the problem originally.
For Dillo, "application/xhtml+xml" and "text/html" are handled by the 
same HTML parser, which later identifies which version of HTML/XHTML is 
the document, based on the doctype. The problem is failing to set the 
content type to any of those two, like when using "text/xml".

AFIK, the proper content type for XHTML is "application/xhtml+xml", 
which should be set on the HTTP Content-Type header.
...
...
Regarding the type guessing bug, I think I can improve it by assuming
that if we find the "<!doctype html" string in the first 1024 bytes or
so, it is an HTML-like type, but it incurrs in more overhead.
But if it aborts that search upon encountering the first thing that
isn't "spaces, newlines, tabs, and comments", most text files will be
detected within the first few bytes.
I'm not sure how that approach would work with ImageMagick image index
XHTML pages which start like this though:
 <?xml version="1.0" encoding="US-ASCII"?>
 <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
 "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Example:
http://www.ombertech.com/cnk/dillo/STS-133_Pictures/photo_index.html
I don't really understand how XHTML is supposed to work, and I don't
have time to learn, so perhaps I'm ignoring a distinction between
differet flavours of XHTML that can begin in different ways? Anyway I
like how ImageMagick image map pages are viewable in Dillo at the
moment.
We can improve the content detection to handle both HTML and XML-style 
comments, but I prefer to defer it after the 3.1.0 release. Websites 
shouldn't rely on the browser to guess the content type, it should be 
stated in the HTTP header or the meta tag. So I don't consider this a 
priority that should block the release for longer.

If you want to work on it, feel free to do so :-)
...
...
So I think for now we can rely on the correction of "text/xhtml" to
"application/xhtml+xml", which seems to work fine. I don't like adding
quirks, but I will keep this one as it was already there. Here is the
PR:
https://github.com/dillo-browser/dillo/pull/140
I've built Dillo from that branch and pages on www.lemis.com now render
correctly, thanks! If I save the homepage as lemis.xhtml it still shows
as plain text when loaded with file://, though it is rendered if the
comments before <!DOCTYPE> are removed or if the original file is saved
as lemis.html. Not much of an issue, but it could cause confusion for
someone.
I pushed another patch that should fix this issue. It is caused 
primarily by the ".xhtml" extension not being recognized by the file 
plugin, which then tries to detect the doctype and fails in the same 
way, falling back to text/plain.

Best,
Rodrigo.

[Dillo-dev] Re: Dillo not rendering HTML with comments before <!DOCTYPE>

Rodrigo Arias