Rodrigo Arias <rodarima-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
On Wed, Jun 19, 2024 at 10:03:20AM +1000, Kevin Koster wrote:
This mechanism might suit an idea I've had to do remote downscaling of extremely large images, which are increasingly being included in web pages. The script would send a list of URLs in all <img> tags within the HTML to a remote server (eg. on a VPS), or ideally just the ones for large image files, then rewrite the URLs in the HTML to point to the remote server where the converted images are available over HTTP/S.
You can create a script that rewrites the <img> src attribute
<img src="https://foo.com/img1.png">
To point to an endpoint of your server:
<img src="https://yourserver.com/downscale?url=https://foo.com/img1.png">
Yes that would work too, however the idea of sending a URL list was that it allows my server to re-use the HTTPS connection to the image host server while downloading lots of images for a webpage, whereas the way you have it each image request by my server would be a new HTTPS connection, and therefore potentially slower. It's a minor difference though.
And then in the server you simply downscale it. Here is how you could do it with rules:
# Script that would rewrite images to a server for downscaling action downscale filter 'rewrite-img.sh' define mime header 'Content-Type' match mime 'text/html' action downscale
That should work. I would personally find it clearer to read if there was a character indicating assignment values, since this would make it obvious which words are commands and which are arguments. Such as: # Script that would rewrite images to a server for downscaling action downscale filter='rewrite-img.sh' define mime header='Content-Type' match mime='text/html' action=downscale Using "action" as an argument to "match" as well as the name of a command might still be a little confusing.
Or a deeper approach would be to apply the same approach as this rewrite engine to binary content as well, and have Dillo do it transparently via 'rewrite'/convert rules for image MIME types. Then the HTML would stay the same and Dillo would trigger a command that requested a downscaled image from the converter server instead of the original image's server. That would be more elegant, but expands the scope of your proposed system a little.
Rewriting the binary image directly would be possible, but then you would have wasted the bandwidth bringing it to Dillo, and now you have to send it to the server to downscale it.
No the idea is to reduce the bandwidth usage on Dillo's connection, so for this approach Dillo would have to abort the connection to the image server if the image size was over the limit and fetch it from the script instead, which might do: wget -q -O - "https://yourserver.com/downscale?url=https://foo.com/img1.png" A way to get Dillo to take the replacement URL from the script would be better, but I suspect that would make this system more complicated to implement because then state matters between different connections. Granted I've lost the ability to re-use the HTTPS connection at my server with this approach. I like how it wouldn't affect the performance of fetching small images by waiting for them to be needlessly downloaded by my server first though, which would generally be a bigger advantage overall. Most images are still small enough that I wouldn't down-scale them, just the 1MB+ ones would get that treatment. The surprise 10MB+ ones are the real evil, and just the option to block these outright (abort the connection and don't run any script) would be better than nothing. To complicate things further, it would be good to have a right-click menu option to bypass this rule and allow fetching the full-size image. Alternatively an external handler could be assigned (via the earlier-discussed mechanism) to a script that downloads and opens the image URL in an external image viewer, which might actually be better to use for that.
In any case, imagine you want to downscale it locally anyway. Here is how I can think about it:
# Script that would downscale an image and write to stdout action downscale filter 'downscale-img.sh'
# Define headers from the HTTP content with shorter names define mime header 'Content-Type' define size header 'Content-Length'
# Downscale big images match mime =~ 'image/.*' and size > 10K action downscale
Notice that this can be triggered for any image, not only ones provided via HTTP/HTTPS, but also via other protocols like gemini that are adapted to speak HTTP and also provide a Content-Length header.
I added the =~ and > operators, as the former would match a regex and the latter will use a numeric comparator. You can assume that the default if the header is not present is to make any comparison fail.
I have also added the "define" keyword to define properties like "mime" or "size" which are parsed from the HTTP headers and are shorter and easier to write.
That looks good. Maybe I'd find it clearer without the whitespace between operators as with my '=' example above.
Maybe since it still requires a remote Web server this problem would be better solved via a Web proxy (I did look into Squid before, but drowned in confusing documentation). But I just thought I'd mention it as an example of a more complex usage for this proposed rewrite system.
But then you will need to pass all the traffic through the server so it performs the substitution there.
Another solution which may be better is to mark from Dillo which requests are being done from img elements (filtering them before going to the network).
If Dillo marks those requests in the HTTP headers for example, then you could do:
# Script that transforms image HTTP requests to a server that # downscales the image action downscale filter 'downscale-req.sh'
define source header 'Dillo-Request-Source'
# Downscale images comming from <img> elements match source 'img' action downscale
Yes, that's neat.
This would have the benefit that Dillo already performs the parsing of the HTML for you, and only the images that are loaded are passed to the downscaling server. Additionally, cookies would be sent in the HTTP request, so you can access login protected images this way too.
Good point, although for my usage login protected images wouldn't be much of a concern.