Rodrigo Arias <rodarima-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
On Sat, Jun 22, 2024 at 09:43:00AM +1000, Kevin Koster wrote:
But the proxy approach allows old versions/binaries to keep working too. If Dillo 3.0.5 had supported it, then the HTTPS issues from lacking SNI support could have been worked around by running an 'old-style' HTTPS proxy with SNI support on localhost.
A related term seems to be "reverse proxy":
Yes, though in practice that tends to imply a caching proxy run by website operators, hence "reverse" because a normal proxy is run by the person accessing the content on other people's servers, but these are run by the server operators. But the mechanism is the same as my usage. Real-world "reverse proxies" just tend to be geared away from internet-wide usage to serve requests to any server, not least because HTTPS requires the proxy to have the target website's certificate. Or at least based on looking at Squid and Nginx, internet-wide "reverse" proxy isage isn't documented clearly enough for me to spot the easy way of doing it (while also manipulating the content being retrieved). I did see a very hard way to do it with Squid and an abandoned 3rd-party library that wouldn't even compile for me.
From the "Fun with Crypto Ancienne" post I understand that you want Dillo to get a HTTP or HTTPS URL and always perform a HTTP GET towards your proxy, as the Mosaic configuration suggests:
https 127.0.0.1 8765 http http 127.0.0.1 8765 http
Yes. Which, if it helps to decode my earlier posts, is equivalent to this Wget command before they secretly changed its https_proxy behaviour: http_proxy=127.0.0.1:8765 https_proxy=127.0.0.1:8765 wget https://example.com
Other that those old browsers, I don't think you can do this with any (relatively) modern tool.
Which is annoying because I didn't need it until (relatively) recently. It looks like current Lynx is designed to use HTTPS tunneling through proxies now, based on use of the do_connect variable in WWW/Library/Implementation/HTTP.c of its source code. Though with Lynx 2.8.9rel.1 and 2.9.0dev.6 I can't actually get it to talk to anything (inc. Netcat on localhost) with http_proxy or https_proxy, even though setting each variable causes page loads to fail over the respective protocol. Not a firewall issue.
This is what Dillo is currently doing for https and http URLs:
hop% http_proxy=http://localhost:1234 dillo http://www.google.com hop% nc -l 1234 GET http://www.google.com/ HTTP/1.1 Host: www.google.com User-Agent: Dillo/3.1.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding: gzip, deflate DNT: 1 Referer: http://www.google.com/ Connection: keep-alive
hop% http_proxy=http://localhost:1234 dillo https://www.google.com hop% nc -l 1234 CONNECT www.google.com:443 HTTP/1.1 Host: www.google.com:443
While for the latter you'll want:
GET https://www.google.com/ HTTP/1.1 Host: www.google.com User-Agent: Dillo/3.1.1 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Encoding: gzip, deflate DNT: 1 Referer: https://www.google.com/ Connection: keep-alive
You'll need to instruct Dillo to ignore the HTTPS handling and just treat it as an HTTP GET request to the proxy (using the HTTPS url), so no CONNECT is used.
Yes, though ideally like in Wget this behaviour wouldn't be caused by setting 'http_proxy'. You'd have a variable/setting like Wget's 'https_proxy' which only causes HTTPS connections to try and use the proxy, while HTTP connections go through directly. Then you could set 'http_proxy' as well if you wanted to use a proxy for HTTP connections too. For example (with 'https_endpoint' instead of the confusing 'https_proxy' variable/setting name I proposed earlier): http_proxy=127.0.0.1:1234 dillo http://www.google.com [ 127.0.0.1:1234 ] GET http://www.google.com/ HTTP/1.1 http_proxy=127.0.0.1:1234 dillo https://www.google.com [ 127.0.0.1:1234 ] CONNECT www.google.com:443 HTTP/1.1 https_endpoint=127.0.0.1:5678 dillo http://www.google.com [ www.google.com:80 ] GET http://www.google.com/ HTTP/1.1 https_endpoint=127.0.0.1:5678 dillo https://www.google.com [ 127.0.0.1:5678 ] GET https://www.google.com/ HTTP/1.1 http_proxy=127.0.0.1:1234 https_endpoint=127.0.0.1:5678 dillo http://www.google.com [ 127.0.0.1:1234 ] GET http://www.google.com/ HTTP/1.1 http_proxy=127.0.0.1:1234 https_endpoint=127.0.0.1:5678 dillo https://www.google.com [ 127.0.0.1:5678 ] GET https://www.google.com/ HTTP/1.1 So the behaviour of 'http_proxy' alone isn't changed. Also I don't think that 'https_endpoint' actually needs to be set from an environment variable, just a dillorc setting would do.