[Dillo-dev] Re: Fingerprinting lessons from 'curl-impersonate'

Dec. 30, 2024

      Rodrigo Arias wrote:
...
On Mon, Dec 30, 2024 at 05:35:50PM +0100, a1ex-J7K0XVabL0iELgA04lAiVw@public.gmane.org wrote:
...
There was an interesting post[1] on HN today about 'curl-impersonate',
which is a patch[2] to curl which allows it to act like various big
browsers, bypassing various fingerprinting techniques which would
otherwise prevent the client from accessing the page.
Looking at the patch, maybe there could be some useful ideas here for
Dillo to use to load more sites. The SSL library also obviously plays a
large role, maybe that's something we will need to consider as well.
I experienced problems with the user-agent being banned, and having to 
impersonate Firefox to load some sites. I haven't found yet examples of 
this deep fingerprinting for TLS or similar, you?
In any case, it would be trivial to discern Dillo as we don't support 
JS, so it can be banned if they decide so.
I've found that sometimes I go to a webpage and see one of the
"enable Javascript to continue" pages in Dillo, then I load the
same page in Firefox with NoScript blocking all its scripts, and
it comes up fine without running any such Javascript. That could
be just the User-Agent header though because I don't try faking
that.

Rather than add Chrome-faking features to Dillo, maybe this would
be an extra application of the Rule-based content manipulation RFC:
https://github.com/dillo-browser/rfc/blob/rfc-002/rfc-002-rule-based-content...

Make a rule for some sites (or Web server responses?) that has
Dillo call curl-impersonate to retrieve a Web page instead of doing
it in Dillo?

By the way, being a Git failure, I really can't see where that MD
document lives. I look at the "rfc" repo via the GitHub website in
Dillo and there's just a readme. I clone the repo and I just get a
readme. I had to look back to your RFC repo announcement to find
that link. I guess they're in separate branches or something but I
forget things about Git faster than I learn them and can't be
bothered learning how to use branches yet again today. I really
think it would be better to list them together somewhere obvious,
eg. a new Developer Documentation webpage.

I can see from this URL mangling that there are probably only two
RFCs so far:
https://github.com/dillo-browser/rfc/tree/rfc-001/ (rfc-001-dillo-rfc-documents.md)
https://github.com/dillo-browser/rfc/tree/rfc-002/ (rfc-002-rule-based-content-manipulation.md)
https://github.com/dillo-browser/rfc/tree/rfc-003/ (404)
...
In my experiences, it is generally not worth reading the website
that performs this type of discrimination.
That's often my approach, but then big offenders are things like
government websites which one is obliged to read sometimes.

[Dillo-dev] Re: Fingerprinting lessons from 'curl-impersonate'

Kevin Koster