Hi Alex, On Tue, Jun 18, 2024 at 03:28:20PM +0200, a1ex@dismail.de wrote:
This also allows patching the HTML of sites so you can fix them to work better (or at all) in Dillo. This is also done by Firefox from the webcompat[2] project in what they call "interventions", as sometimes page authors don't fix them or take a long time, so they patch it from the browser directly. You can open about:compat to see the long list of patches, here[3] is one for YouTube.
This would be very impressive, a real step forward for Dillo in my opinion. Do you think this is something that would be relatively straight-forward to implement, or is it more of a long-term goal with lots of work required to get there? Either way, sounds like there are exciting times ahead for Dillo!
Adding a mechanism to rewrite the HTML is surprisingly not super complicated, as the internal design of Dillo is centered around the CCC, the "Concomitant Control Chain", which is basically a chain of bi-directional pipes connected together to pass data around. Here is how Dillo currently receives data from the a TLS server (AFAIK). I'm only drawing the incoming direction, but the outgoing link is similar. Net +--------+ +-------+ +------+ +-------+ ---->| TLS IO |--->| IO |--->| HTTP |--->| CACHE |-... +--------+ +-------+ +------+ +-------+ src/tls.c src/IO.c src/http.c src/capi.c And adding a new rewrite module (named SED in the diagram) would require rerouting the chain to add a new element (not hard): Net +--------+ +-------+ +------+ +=====+ +-------+ ---->| TLS IO |--->| IO |--->| HTTP |---># SED #--->| CACHE |-... +--------+ +-------+ +------+ +=====+ +-------+ src/tls.c src/IO.c src/http.c | src/capi.c | +---------+ | rulesrc | | ... | +---------+ The module can then forward the content parsed from the HTTP module to the appropriate scripts defined in the rules, and then read the output and forward it to the next steps in the chain. When no rules apply, it can just forward the content to the cache as-is. Now, the interesting part is that we can place another SED module between the IO and the HTTP nodes, so we can rewrite the HTML content *and* the HTTP headers too. This would allow for example writing a plugin that matches a given mime type and on-the-fly rewrites it into an HTML file changing the Content-Type header. This is already done by the plugins, but they mix the two things together. For example we can display a .gmi file served via the "gemini:" protocol, but we cannot display a local .gmi file. Same for manual pages with the "man:" protocol, which cannot open manual pages served via "file:" or "http(s):". The solution with the new design would involve: 1) Open a "gemini:" link 2) The request is routed to the gemini: dpi handler (like now) 3) The gemini plugin returns the .gmi file as-is as an HTTP response, instead of converting it to HTML 4) The .gmi mime type matches a rewrite rule and is rewritten into HTML in the SED node. Now, if we open a .gmi via HTTP: 1) Open a "https:" link 2) The request is routed to the usual HTTP/IO/TLS chain 3) The HTTP server returns the .gmi file as-is as an HTTP response. 4) The .gmi mime type matches a rewrite rule, and is rewritten into HTML in the SED node. Notice that the HTTP content can be compressed. So, for example, this simple rewrite script: #!/bin/sh sed 's_www.youtube.com_inv.vern.cc_g' Would only work well in the SED node *after* the HTTP content is uncompressed and the headers removed. The rewrite rules should indicate in which position of the chain they apply. As a side note, keep in mind that all of these pieces work in stream mode. Each node reads a bit of data, process it and sends it to the next node of the chain, without the need to store the whole thing in memory. Same with that sed command I wrote as an example. Best, Rodrigo.