Option to define external link handler
Hi Rodrigo and list, Thanks for all the recent improvements! I have one more feature which I would like to propose, or hopefully at least get some guidance with. An option to open a link with an external program. This could be a custom downloader, or whatever else the user wants. Ideally it would be a setting in dillorc, something like: external_downloader="/usr/bin/some_command -x -y -z" And in the right-click Link menu there would be an option to open the link with the external program, something like "Open link with external app". I can envision various workflows, especially repetitive ones, in which this would be a real time-saver. There is nothing really wrong with the built-in downloader, but it's not always ideal. Obviously it's possible to copy the link and use it in whatever program, but that results in extra clicks / key-presses. Is there any interest in a feature like this? I've been looking at the source, but am not really seeing a way to easily replace the current download program (or even where it is defined). If anyone has hints on how to do this, I'd appreciate the help. Or, even better, someone more talented could step up and implement the feature :) Regards, Alex
Hi Alex, On Thu, Jun 13, 2024 at 05:52:55PM +0200, a1ex@dismail.de wrote:
Hi Rodrigo and list,
Thanks for all the recent improvements!
I have one more feature which I would like to propose, or hopefully at least get some guidance with. An option to open a link with an external program. This could be a custom downloader, or whatever else the user wants.
Ideally it would be a setting in dillorc, something like: external_downloader="/usr/bin/some_command -x -y -z"
And in the right-click Link menu there would be an option to open the link with the external program, something like "Open link with external app".
I can envision various workflows, especially repetitive ones, in which this would be a real time-saver. There is nothing really wrong with the built-in downloader, but it's not always ideal. Obviously it's possible to copy the link and use it in whatever program, but that results in extra clicks / key-presses.
Is there any interest in a feature like this?
Certainly!, it is one of the firsts issues I opened back in 2020: https://github.com/dillo-browser/dillo/issues/3 However, there are several concerns that we may want to take into account. When downloading a file, it is often not only the URL what is needed to retrieve it. A lot of sites require the session cookies and less frequently the user agent. So it would be nice to allow passing that information to the external tool. It may be nice to have multiple options, so for example I could define an action for YouTube videos and another for PDF files. However, I would like to support a workflow in which specific URLs are matched and the appropriate tool is selected to handle it. For example, I could open YouTube URLs directly into MPV (or similar) by just clicking on a link without even going to the "right-click > open in MPV" menu. This has the problem that sometimes the URL is not enough, you need to determine other things. For example, the server may reply that the URL I just clicked on is in fact a PDF. It would be nice to be able to tell Dillo to download the file and open it in my PDF viewer *after* the server has replied with the MIME type. It should be posible to directly pipe the server reply into the tool, no need to wait for the download to finish. There is an issue to cover this (and other things): https://github.com/dillo-browser/dillo/issues/56 Those tools may not only take care of downloading the file, they can also perform some substitutions in HTML or CSS files and send them back to Dillo, allowing arbitrary modifications of pages. Now, going back to your specific case, it may be good to have a simpler solution first that can later be extended to the more general approach, so we don't have to wait too long. I have some ideas of how to do it, but I would like to ask you to give concrete examples of that feature so I can test it with those programs.
I've been looking at the source, but am not really seeing a way to easily replace the current download program (or even where it is defined). If anyone has hints on how to do this, I'd appreciate the help. Or, even better, someone more talented could step up and implement the feature :)
A similar feature has been implemented in dilloNG: https://github.com/w00fpack/dilloNG/commit/7bd6b1c4592466d6e717eaf795e98b7ce... But as it is currently implemented is not very extensible, only a "media player" and a "media downloader" options are posible. Best, Rodrigo.
When downloading a file, it is often not only the URL what is needed to retrieve it. A lot of sites require the session cookies and less frequently the user agent. So it would be nice to allow passing that information to the external tool.
It may be nice to have multiple options, so for example I could define an action for YouTube videos and another for PDF files.
However, I would like to support a workflow in which specific URLs are matched and the appropriate tool is selected to handle it. For example, I could open YouTube URLs directly into MPV (or similar) by just clicking on a link without even going to the "right-click > open in MPV" menu.
This has the problem that sometimes the URL is not enough, you need to determine other things. For example, the server may reply that the URL I just clicked on is in fact a PDF. It would be nice to be able to tell Dillo to download the file and open it in my PDF viewer *after* the server has replied with the MIME type. It should be posible to directly pipe the server reply into the tool, no need to wait for the download to finish.
Now, going back to your specific case, it may be good to have a simpler solution first that can later be extended to the more general approach, so we don't have to wait too long.
I have some ideas of how to do it, but I would like to ask you to give concrete examples of that feature so I can test it with those programs.
For me, it could be a number of filetypes, plus things like youtube or other media links. These would be opened in various programs. Obviously, many programs can't directly open a URL and will fail. Why not just have 3 generic options and let the user decide: link_handler_1="" link_handler_2="" link_handler_3="" or whatever name is more suitable. If the option is undefined, it doesn't show up in the link menu. Even if there was only one handler option, personally I would just have it call a shell script which parses the URL and extension and takes appropriate action. This doesn't have to be a 'one size fits all' feature, and I think most Dillo users are advanced enough to use this to their advantage. Your ideas above regarding Dillo handling the logic for various file and content types are excellent, and also complicated :) But, these can be two separate features, with optional external link handlers acting as an override for the built-in handling.
A similar feature has been implemented in dilloNG:
https://github.com/w00fpack/dilloNG/commit/7bd6b1c4592466d6e717eaf795e98b7ce...
But as it is currently implemented is not very extensible, only a "media player" and a "media downloader" options are posible.
Thanks for pointing this out, looks like a good template for testing. If it works well, I'll post a demo patch for anyone interested. Regards, Alex
Hi, On Thu, Jun 13, 2024 at 09:59:03PM +0200, a1ex@dismail.de wrote:
I have some ideas of how to do it, but I would like to ask you to give concrete examples of that feature so I can test it with those programs.
For me, it could be a number of filetypes, plus things like youtube or other media links. These would be opened in various programs. Obviously, many programs can't directly open a URL and will fail.
Why not just have 3 generic options and let the user decide:
link_handler_1="" link_handler_2="" link_handler_3=""
or whatever name is more suitable.
If the option is undefined, it doesn't show up in the link menu.
Maybe it would be more suitable to follow the syntax of the "search_url" option, which can be used multiple times to define a list of things, so we don't need a predefined number of options. You probably want to set a list of actions with names too: action=Open with MPV;mpv "$url" action=Open with Feh;feh -- "$url" action=Open with Firefox;firefox "$url" Then let the shell expand $url from the envionment, previously set by Dillo with setenv(). What I don't like is that we are mixing the menu label and the command line in the same option value. And we also have to take into account how the shell will interact with the quotes and other symbols (#).
Even if there was only one handler option, personally I would just have it call a shell script which parses the URL and extension and takes appropriate action.
That would be something like this: action=Open with script;/path/to/script.sh And then the script can use $url and possibly other variables from the environment set by Dillo.
This doesn't have to be a 'one size fits all' feature, and I think most Dillo users are advanced enough to use this to their advantage.
Your ideas above regarding Dillo handling the logic for various file and content types are excellent, and also complicated :)
But, these can be two separate features, with optional external link handlers acting as an override for the built-in handling.
Yes, they can be two separate features. I don't have a strong opinion about it yet, I'll have to think about the consequences of having two systems. On the other hand, I think we can design a very simple rule language just to model the actions that can later be extended to other more complex rules without breaking the compatibility with previous configurations. This way we cover all uses with a unified syntax. The previous examples could be written in ~/.dillo/rulesrc as: action "Open with MPV" shell "mpv $url" action "Open with Feh" shell "feh -- $url" action "Open with Firefox" shell "firefox $url" Which only defines a set of available actions, and by default they appear on the link menu as you suggest. I think using a small language is a more elegant solution than trying to squeeze the menu label and the command in a single dillorc option. For now I would only focus on a single rule: action <name> shell <command> Then we can do other things like this to pipe the page to some program: action "Open in editor" pipe "gvim -" Or this, to only show "Download video" on YouTube: match url "http[s]://[www\.]youtube\.com" { action "Download video" shell "yt-dlp -f 18 -- $url" } But see how the syntax is the same, so we don't break the compatibility with previously defined rules.
A similar feature has been implemented in dilloNG:
https://github.com/w00fpack/dilloNG/commit/7bd6b1c4592466d6e717eaf795e98b7ce...
But as it is currently implemented is not very extensible, only a "media player" and a "media downloader" options are posible.
Thanks for pointing this out, looks like a good template for testing. If it works well, I'll post a demo patch for anyone interested.
Thanks, Rodrigo
Hi Rodrigo,
But, these can be two separate features, with optional external link handlers acting as an override for the built-in handling.
Yes, they can be two separate features. I don't have a strong opinion about it yet, I'll have to think about the consequences of having two systems.
On the other hand, I think we can design a very simple rule language just to model the actions that can later be extended to other more complex rules without breaking the compatibility with previous configurations. This way we cover all uses with a unified syntax.
The previous examples could be written in ~/.dillo/rulesrc as:
action "Open with MPV" shell "mpv $url" action "Open with Feh" shell "feh -- $url" action "Open with Firefox" shell "firefox $url"
Which only defines a set of available actions, and by default they appear on the link menu as you suggest. I think using a small language is a more elegant solution than trying to squeeze the menu label and the command in a single dillorc option.
I like this approach, but not sure how much I can help with it. For now, I have taken the example you gave and made barebones patch to add an external handler to the link menu. It works quite well for me, in fact its something I'll probably be using every day. A real time saver. Its obviously nowhere near as featureful as what you are proposing, but for the time being, it gets the job done :) Here is the patch for anyone interested, hopefully not too mangled. Sent as an attachment as well, just in case. I haven't gotten around to setting up git send-email yet. diff -u dillorc.orig dillorc --- dillorc.orig Wed Jun 12 21:25:35 2024 +++ dillorc Fri Jun 14 16:37:49 2024 @@ -46,6 +46,9 @@ # height of the visible page area. #scroll_step=100 +# Set the external link handler +#ext_handler="mpv" + #------------------------------------------------------------------------- # RENDERING SECTION #------------------------------------------------------------------------- diff -u src/orig/menu.cc src/menu.cc --- src/orig/menu.cc Fri Jun 14 16:14:55 2024 +++ src/menu.cc Fri Jun 14 16:21:34 2024 @@ -122,6 +122,21 @@ } /** + * Open URL in external handler + */ +static void Menu_open_url_ex_cb(Fl_Widget*, void *user_data) +{ + DilloUrl *url = (DilloUrl *)user_data; + char str[500]; + strcpy(str, prefs.ext_handler); + strcat(str, " "); + strcat(str, URL_STR_(url)); + strcat(str, "> /dev/null 2>&1 &"); + puts(str); + system(str); +} + +/** * Add bookmark */ static void Menu_add_bookmark_cb(Fl_Widget*, void *user_data) @@ -432,8 +447,8 @@ static Fl_Menu_Item link_menu[] = { {"Open link in new tab", 0, Menu_open_url_nt_cb,0,0,0,0,0,0}, - {"Open link in new window", 0, Menu_open_url_nw_cb,0,FL_MENU_DIVIDER,0,0, - 0,0}, + {"Open link in new window", 0, Menu_open_url_nw_cb,0,0,0,0,0,0}, + {"Open with external", 0, Menu_open_url_ex_cb,0,FL_MENU_DIVIDER,0,0,0,0}, {"Bookmark this link", 0, Menu_add_bookmark_cb,0,0,0,0,0,0}, {"Copy link location", 0, Menu_copy_urlstr_cb,0,FL_MENU_DIVIDER,0,0,0,0}, {"Save link as...", 0, Menu_save_link_cb,0,0,0,0,0,0}, diff -u src/orig/prefs.c src/prefs.c --- src/orig/prefs.c Fri Jun 14 16:14:55 2024 +++ src/prefs.c Fri Jun 14 16:24:11 2024 @@ -26,6 +26,7 @@ #define PREFS_HTTP_REFERER "host" #define PREFS_HTTP_USER_AGENT "Dillo/" VERSION #define PREFS_THEME "none" +#define PREFS_EXT_HANDLER "mpv" /*----------------------------------------------------------------------------- * Global Data @@ -71,6 +72,7 @@ prefs.http_strict_transport_security = TRUE; prefs.http_force_https = FALSE; prefs.http_user_agent = dStrdup(PREFS_HTTP_USER_AGENT); + prefs.ext_handler = dStrdup(PREFS_EXT_HANDLER); prefs.limit_text_width = FALSE; prefs.adjust_min_width = TRUE; prefs.adjust_table_min_width = TRUE; @@ -151,6 +153,7 @@ dFree(prefs.http_proxyuser); dFree(prefs.http_referer); dFree(prefs.http_user_agent); + dFree(prefs.ext_handler); dFree(prefs.no_proxy); dFree(prefs.save_dir); for (i = 0; i < dList_length(prefs.search_urls); ++i) diff -u src/orig/prefs.h src/prefs.h --- src/orig/prefs.h Fri Jun 14 16:14:55 2024 +++ src/prefs.h Fri Jun 14 16:24:44 2024 @@ -45,6 +45,7 @@ char *http_proxyuser; char *http_referer; char *http_user_agent; + char *ext_handler; char *no_proxy; DilloUrl *start_page; DilloUrl *home; diff -u src/orig/prefsparser.cc src/prefsparser.cc --- src/orig/prefsparser.cc Fri Jun 14 16:14:55 2024 +++ src/prefsparser.cc Fri Jun 14 16:25:40 2024 @@ -187,6 +187,7 @@ { "adjust_table_min_width", &prefs.adjust_table_min_width, PREFS_BOOL, 0 }, { "load_images", &prefs.load_images, PREFS_BOOL, 0 }, { "load_background_images", &prefs.load_background_images, PREFS_BOOL, 0 }, + { "ext_handler", &prefs.ext_handler, PREFS_STRING, 0 }, { "load_stylesheets", &prefs.load_stylesheets, PREFS_BOOL, 0 }, { "middle_click_drags_page", &prefs.middle_click_drags_page, PREFS_BOOL, 0 }, mpv is just used as a common example. I'm using this with a custom shell script which does some basic parsing and takes various actions depending on the URL. Regards, Alex
Hi Alex, On Fri, Jun 14, 2024 at 05:25:16PM +0200, a1ex@dismail.de wrote:
The previous examples could be written in ~/.dillo/rulesrc as:
action "Open with MPV" shell "mpv $url" action "Open with Feh" shell "feh -- $url" action "Open with Firefox" shell "firefox $url"
Which only defines a set of available actions, and by default they appear on the link menu as you suggest. I think using a small language is a more elegant solution than trying to squeeze the menu label and the command in a single dillorc option.
I like this approach, but not sure how much I can help with it.
I've made a proof of concept. It is still ugly, but it seems to be working: https://github.com/dillo-browser/dillo/pull/199 The syntax is the one I commented before. Here is my rulesrc file: action "Open with MPV" shell "mpv $url" action "Open with MPV (only audio)" shell "mpv --no-video $url" action "Open with Firefox" shell "firefox $url" And here is how it looks like: https://0x0.st/XcyR.png
For now, I have taken the example you gave and made barebones patch to add an external handler to the link menu. It works quite well for me, in fact its something I'll probably be using every day. A real time saver.
Its obviously nowhere near as featureful as what you are proposing, but for the time being, it gets the job done :)
Thanks for the patch. I will try to have something merged soon to at least cover this feature. But if I it gets stuck, I wouldn't mind merging your patch or something similar. Best, Rodrigo.
I've made a proof of concept. It is still ugly, but it seems to be working:
Very cool! I have briefly tested this and it seems to work fine. Will continue testing to see if any issues come up.
And here is how it looks like: https://0x0.st/XcyR.png
I did a double-take when I saw this. Hello from a fellow vaporwave fan! I happen to have several macroblank tracks in my playlist currently :) Regards, Alex
Hi Alex, On Sat, Jun 15, 2024 at 01:31:31PM +0200, a1ex@dismail.de wrote:
I've made a proof of concept. It is still ugly, but it seems to be working:
Very cool! I have briefly tested this and it seems to work fine. Will continue testing to see if any issues come up.
Thanks! I'm interested in how you are using a script to match several URLs and do some actions automatically, not sure if you would be interesting in sharing it (or some parts).
And here is how it looks like: https://0x0.st/XcyR.png
I did a double-take when I saw this. Hello from a fellow vaporwave fan! I happen to have several macroblank tracks in my playlist currently :)
Nice to hear :-) Best, Rodrigo.
On Sat, 15 Jun 2024 22:16:12 +0200 Rodrigo Arias <rodarima@gmail.com> wrote:
Thanks! I'm interested in how you are using a script to match several URLs and do some actions automatically, not sure if you would be interesting in sharing it (or some parts).
Sure, here is a very basic example which mainly just goes by the extension in the URL to handle some common filetypes. I suppose that for more complex links in which the filetype can't be parsed like this, you could curl the link into the 'file' command to determine the type: curl -s "https://example.com/filename" | file - But that's overkill for my needs. There are many ways to improve on this, but hopefully it at least illustrates the concept. --------------------------------------------------------------------- #!/bin/sh # Example Dillo URL action handler script # handlers youtube="mpv" audio="mpv" video="mpv" image="nsxiv" pdf="mupdf" # temporary file location tmp_dir="/tmp/dillo" # routine to download link and define $file dl() { if [ ! -d $tmp_dir ] ; then mkdir $tmp_dir ; fi wget "$1" --no-use-server-timestamps -P $tmp_dir file="$tmp_dir/$(ls -tp $tmp_dir | grep -v /$ | head -1)" } # youtube if echo $1 | grep outu then $youtube "$1" else # audio if echo $1 | tail -c -5 | grep -e mp3 -e ogg then $audio "$1" else # video if echo $1 | tail -c -5 | grep -e mkv -e mp4 -e webm then $video "$1" else # images if echo $1 | tail -c -5 | grep -e jpg -e jpeg -e png -e gif -e tif then dl "$1" $image "$file" else # pdf if echo $1 | tail -c -5 | grep pdf then dl "$1" $pdf "$file" fi ; fi ; fi ; fi ; fi ----------------------------------------------------------------------- Regards, Alex
On Sun, 16 Jun 2024 13:07:38 +0200 <a1ex@dismail.de> wrote:
On Sat, 15 Jun 2024 22:16:12 +0200 Rodrigo Arias <rodarima@gmail.com> wrote:
Thanks! I'm interested in how you are using a script to match several URLs and do some actions automatically, not sure if you would be interesting in sharing it (or some parts).
Sure, here is a very basic example which mainly just goes by the extension in the URL to handle some common filetypes. I suppose that for more complex links in which the filetype can't be parsed like this, you could curl the link into the 'file' command to determine the type: curl -s "https://example.com/filename" | file - But that's overkill for my needs. There are many ways to improve on this, but hopefully it at least illustrates the concept.
Just for fun, here is a version which uses curl instead of wget. This has the advantage that curl can output the Content-Type from the server, and we can use this to perform actions instead of relying on the file extension. Another advantage is that curl can tell us the output filename, so we don't have to mess around to get that anymore. --------------------------------------------------------------------- #!/bin/sh # Example Dillo URL action handler script using curl # handlers youtube="mpv" audio="mpv" video="mpv" image="nsxiv" pdf="mupdf" # temporary file location tmp_dir="/tmp/dillo" # youtube. checks for "outu" string in URL if echo $1 | grep outu then $youtube "$1" else # audio. uses file extension from URL if echo $1 | tail -c -5 | grep -e mp3 -e ogg then $audio "$1" else # video. uses file extension from URL if echo $1 | tail -c -5 | grep -e mkv -e mp4 -e webm then $video "$1" else # use curl to download the file, and save Content-Type and filename if [ ! -d $tmp_dir ] ; then mkdir $tmp_dir ; fi curl -s --write-out "%{content_type} \n%{filename_effective}" \ --remote-name --output-dir "$tmp_dir" "$1" > $tmp_dir/file_info set the filename based on curl output to file_info filename=$(tail -1 $tmp_dir/file_info) # images. checks content type from file_info if head -1 $tmp_dir/file_info | grep image ; then $image "$filename" fi # pdf. checks content type from file_info if head -1 $tmp_dir/file_info | grep pdf ; then $pdf "$filename" fi fi fi fi -------------------------------------------------------------------- Regards, Alex
Hi Alex, On Sun, Jun 16, 2024 at 06:54:41PM +0200, a1ex@dismail.de wrote:
Sure, here is a very basic example which mainly just goes by the extension in the URL to handle some common filetypes. I suppose that for more complex links in which the filetype can't be parsed like this, you could curl the link into the 'file' command to determine the type: curl -s "https://example.com/filename" | file - But that's overkill for my needs. There are many ways to improve on this, but hopefully it at least illustrates the concept.
Just for fun, here is a version which uses curl instead of wget. This has the advantage that curl can output the Content-Type from the server, and we can use this to perform actions instead of relying on the file extension. Another advantage is that curl can tell us the output filename, so we don't have to mess around to get that anymore.
These examples are very useful to test the ideas of rules and actions. Here is how I could imagine it being written: action youtube label "Open YouTube in MPV" shell "mpv $url" action audio label "Open audio in MPV" shell "mpv $url" action video label "Open video in MPV" shell "mpv $url" action image label "Open image in NSXIV" download shell "nsxiv $out" action mupdf label "Open PDF in MuPDF" download shell "mupdf $out" action default open-new-tab "$url" match url "outu" action youtube match url "\.(mp3|ogg)$" action audio match url "\.(mkv|mp4|webm)$" action video # From here on Dillo needs to fetch the headers match mime-type "image" action image match mime-type "pdf" action mupdf # Catchall for non-matched urls match any action default Although, we may use Dillo to download the content on the fly, like this: action image label "Open image in NSXIV" pipe "nsxiv -" action mupdf label "Open PDF in MuPDF" pipe "mupdf -" So there is no need to download the file completely before it is piped to the tools. On the other hand, I can already spot a problem. When we right-click on a link to find out which actions are available, some of them require the mime type to be known, which would require a request to the server. So we cannot determine which actions will match beforehand. Maybe a better way is to define the set of menu entries first, which could also be altered by match rules: menu label "Open in default program" tag open menu label "Save in notes" tag save-notes Then we use the information of which menu item was selected *and* the clicked url to do the full handling: action youtube shell "mpv $url" action audio shell "mpv $url" action video shell "mpv $url" action image download shell "nsxiv $out" action mupdf download shell "mupdf $out" action save-notes shell "echo $url >> ~/.dillo/notes.txt && echo Saved" action default open-new-tab "$url" match tag open { match url "outu" action youtube match url "\.(mp3|ogg)$" action audio match url "\.(mkv|mp4|webm)$" action video # From here on Dillo needs to fetch the headers match mime-type "image" action image match mime-type "pdf" action mupdf } match tag save-notes action save-notes match any action default So when the user clicks "Save in notes" it doesn't go through the normal "open with" rules, but only matches the save-notes rule. Anyway, I need to think more about it. I also need to consider how this could be moved out of Dillo, so we don't bring more complexity to it. Executing a user script is fine, but I also want to be able to rewrite the page and bring it back to Dillo for display, which requires more cooperation.
# temporary file location tmp_dir="/tmp/dillo"
# youtube. checks for "outu" string in URL if echo $1 | grep outu then $youtube "$1" else
Just a small comment, you can use `exec` to avoid nesting the ifs as it won't return: if echo $1 | grep outu; then exec $youtube "$1" fi Or exit: if echo $1 | grep outu; then $youtube "$1" exit $? fi Best, Rodrigo.
Hi Rodrigo,
Just for fun, here is a version which uses curl instead of wget. This has the advantage that curl can output the Content-Type from the server, and we can use this to perform actions instead of relying on the file extension. Another advantage is that curl can tell us the output filename, so we don't have to mess around to get that anymore.
These examples are very useful to test the ideas of rules and actions. Here is how I could imagine it being written:
action youtube label "Open YouTube in MPV" shell "mpv $url" action audio label "Open audio in MPV" shell "mpv $url" action video label "Open video in MPV" shell "mpv $url" action image label "Open image in NSXIV" download shell "nsxiv $out" action mupdf label "Open PDF in MuPDF" download shell "mupdf $out" action default open-new-tab "$url"
match url "outu" action youtube match url "\.(mp3|ogg)$" action audio match url "\.(mkv|mp4|webm)$" action video # From here on Dillo needs to fetch the headers match mime-type "image" action image match mime-type "pdf" action mupdf # Catchall for non-matched urls match any action default
This looks pretty reasonable. I guess the trick is to not make things too complicated, and yet still make it flexible enough.
Although, we may use Dillo to download the content on the fly, like this:
action image label "Open image in NSXIV" pipe "nsxiv -" action mupdf label "Open PDF in MuPDF" pipe "mupdf -"
So there is no need to download the file completely before it is piped to the tools.
I don't think mupdf will open a file piped from stdin. Neither does nsxiv. Maybe I'm missing something. It is possible to determine the Content-Type without downloading the file with something like: curl -XHEAD -s -w '%{content_type}' $url Also, it may be desirable to restrict the download filesize to a reasonable limit, which can also be determined with a similar method.
Anyway, I need to think more about it. I also need to consider how this could be moved out of Dillo, so we don't bring more complexity to it.
Executing a user script is fine, but I also want to be able to rewrite the page and bring it back to Dillo for display, which requires more cooperation.
I'm curious what the use-case for this is. Sounds interesting.
Just a small comment, you can use `exec` to avoid nesting the ifs as it won't return:
if echo $1 | grep outu; then exec $youtube "$1" fi
Or exit:
if echo $1 | grep outu; then $youtube "$1" exit $? fi
Oops! I should know better than that, but there is is.. Thanks for pointing it out. Regards, Alex
Hi, On Mon, Jun 17, 2024 at 02:57:44PM +0200, a1ex@dismail.de wrote:
So there is no need to download the file completely before it is piped to the tools.
I don't think mupdf will open a file piped from stdin. Neither does nsxiv. Maybe I'm missing something.
Apparently that's the case, although it should be possible to do: https://mupdf.readthedocs.io/en/1.22.0/progressive-loading.html However, Okular and Zathura seem to be able to open PDFs from the standard input as a pipe. Same with feh for images.
It is possible to determine the Content-Type without downloading the file with something like:
curl -XHEAD -s -w '%{content_type}' $url
Yes, but it will require another GET request to download the actual file. I think we can just begin a GET request from Dillo, parse the HTTP headers, select the appropriate handler from the mime, and either pipe the content to it or write it to a file and then pass the file path. The first option would allow doing it without waiting for the whole file to download. In fact, it should be possible to not download more than what is being consumed by the handler program. One of the Dillo main objectives is to support slow download speeds (or metered) gracefully.
Executing a user script is fine, but I also want to be able to rewrite the page and bring it back to Dillo for display, which requires more cooperation.
I'm curious what the use-case for this is. Sounds interesting.
For example, you could write a filter program that parses HTML and rewrites the links to JS hungry websites to alternative ones, in the same way libredirect[1] works. [1]: https://libredirect.github.io/ This also allows patching the HTML of sites so you can fix them to work better (or at all) in Dillo. This is also done by Firefox from the webcompat[2] project in what they call "interventions", as sometimes page authors don't fix them or take a long time, so they patch it from the browser directly. You can open about:compat to see the long list of patches, here[3] is one for YouTube. [2]: https://webcompat.com/ [3]: https://hg.mozilla.org/mozilla-central/rev/1fa7de8dec52 This already happened to Dillo with Hacker News, and there are still some minor issues not solved. The matching rules should apply those corrections only to the set of matching URLs. Best, Rodrigo.
On Mon, 17 Jun 2024 18:23:32 +0200 Rodrigo Arias <rodarima@gmail.com> wrote:
Executing a user script is fine, but I also want to be able to rewrite the page and bring it back to Dillo for display, which requires more cooperation.
I'm curious what the use-case for this is. Sounds interesting.
For example, you could write a filter program that parses HTML and rewrites the links to JS hungry websites to alternative ones, in the same way libredirect[1] works.
[1]: https://libredirect.github.io/
This also allows patching the HTML of sites so you can fix them to work better (or at all) in Dillo. This is also done by Firefox from the webcompat[2] project in what they call "interventions", as sometimes page authors don't fix them or take a long time, so they patch it from the browser directly. You can open about:compat to see the long list of patches, here[3] is one for YouTube.
This would be very impressive, a real step forward for Dillo in my opinion. Do you think this is something that would be relatively straight-forward to implement, or is it more of a long-term goal with lots of work required to get there? Either way, sounds like there are exciting times ahead for Dillo! Thanks! Alex
Hi Alex, On Tue, Jun 18, 2024 at 03:28:20PM +0200, a1ex@dismail.de wrote:
This also allows patching the HTML of sites so you can fix them to work better (or at all) in Dillo. This is also done by Firefox from the webcompat[2] project in what they call "interventions", as sometimes page authors don't fix them or take a long time, so they patch it from the browser directly. You can open about:compat to see the long list of patches, here[3] is one for YouTube.
This would be very impressive, a real step forward for Dillo in my opinion. Do you think this is something that would be relatively straight-forward to implement, or is it more of a long-term goal with lots of work required to get there? Either way, sounds like there are exciting times ahead for Dillo!
Adding a mechanism to rewrite the HTML is surprisingly not super complicated, as the internal design of Dillo is centered around the CCC, the "Concomitant Control Chain", which is basically a chain of bi-directional pipes connected together to pass data around. Here is how Dillo currently receives data from the a TLS server (AFAIK). I'm only drawing the incoming direction, but the outgoing link is similar. Net +--------+ +-------+ +------+ +-------+ ---->| TLS IO |--->| IO |--->| HTTP |--->| CACHE |-... +--------+ +-------+ +------+ +-------+ src/tls.c src/IO.c src/http.c src/capi.c And adding a new rewrite module (named SED in the diagram) would require rerouting the chain to add a new element (not hard): Net +--------+ +-------+ +------+ +=====+ +-------+ ---->| TLS IO |--->| IO |--->| HTTP |---># SED #--->| CACHE |-... +--------+ +-------+ +------+ +=====+ +-------+ src/tls.c src/IO.c src/http.c | src/capi.c | +---------+ | rulesrc | | ... | +---------+ The module can then forward the content parsed from the HTTP module to the appropriate scripts defined in the rules, and then read the output and forward it to the next steps in the chain. When no rules apply, it can just forward the content to the cache as-is. Now, the interesting part is that we can place another SED module between the IO and the HTTP nodes, so we can rewrite the HTML content *and* the HTTP headers too. This would allow for example writing a plugin that matches a given mime type and on-the-fly rewrites it into an HTML file changing the Content-Type header. This is already done by the plugins, but they mix the two things together. For example we can display a .gmi file served via the "gemini:" protocol, but we cannot display a local .gmi file. Same for manual pages with the "man:" protocol, which cannot open manual pages served via "file:" or "http(s):". The solution with the new design would involve: 1) Open a "gemini:" link 2) The request is routed to the gemini: dpi handler (like now) 3) The gemini plugin returns the .gmi file as-is as an HTTP response, instead of converting it to HTML 4) The .gmi mime type matches a rewrite rule and is rewritten into HTML in the SED node. Now, if we open a .gmi via HTTP: 1) Open a "https:" link 2) The request is routed to the usual HTTP/IO/TLS chain 3) The HTTP server returns the .gmi file as-is as an HTTP response. 4) The .gmi mime type matches a rewrite rule, and is rewritten into HTML in the SED node. Notice that the HTTP content can be compressed. So, for example, this simple rewrite script: #!/bin/sh sed 's_www.youtube.com_inv.vern.cc_g' Would only work well in the SED node *after* the HTTP content is uncompressed and the headers removed. The rewrite rules should indicate in which position of the chain they apply. As a side note, keep in mind that all of these pieces work in stream mode. Each node reads a bit of data, process it and sends it to the next node of the chain, without the need to store the whole thing in memory. Same with that sed command I wrote as an example. Best, Rodrigo.
Rodrigo Arias <rodarima-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
The solution with the new design would involve:
1) Open a "gemini:" link 2) The request is routed to the gemini: dpi handler (like now) 3) The gemini plugin returns the .gmi file as-is as an HTTP response, instead of converting it to HTML 4) The .gmi mime type matches a rewrite rule and is rewritten into HTML in the SED node.
Now, if we open a .gmi via HTTP:
1) Open a "https:" link 2) The request is routed to the usual HTTP/IO/TLS chain 3) The HTTP server returns the .gmi file as-is as an HTTP response. 4) The .gmi mime type matches a rewrite rule, and is rewritten into HTML in the SED node.
Notice that the HTTP content can be compressed. So, for example, this simple rewrite script:
#!/bin/sh sed 's_www.youtube.com_inv.vern.cc_g'
Would only work well in the SED node *after* the HTTP content is uncompressed and the headers removed. The rewrite rules should indicate in which position of the chain they apply.
This mechanism might suit an idea I've had to do remote downscaling of extremely large images, which are increasingly being included in web pages. The script would send a list of URLs in all <img> tags within the HTML to a remote server (eg. on a VPS), or ideally just the ones for large image files, then rewrite the URLs in the HTML to point to the remote server where the converted images are available over HTTP/S. Or a deeper approach would be to apply the same approach as this rewrite engine to binary content as well, and have Dillo do it transparently via 'rewrite'/convert rules for image MIME types. Then the HTML would stay the same and Dillo would trigger a command that requested a downscaled image from the converter server instead of the original image's server. That would be more elegant, but expands the scope of your proposed system a little. Maybe since it still requires a remote Web server this problem would be better solved via a Web proxy (I did look into Squid before, but drowned in confusing documentation). But I just thought I'd mention it as an example of a more complex usage for this proposed rewrite system.
Hi, On Wed, Jun 19, 2024 at 10:03:20AM +1000, Kevin Koster wrote:
Notice that the HTTP content can be compressed. So, for example, this simple rewrite script:
#!/bin/sh sed 's_www.youtube.com_inv.vern.cc_g'
Would only work well in the SED node *after* the HTTP content is uncompressed and the headers removed. The rewrite rules should indicate in which position of the chain they apply.
This mechanism might suit an idea I've had to do remote downscaling of extremely large images, which are increasingly being included in web pages. The script would send a list of URLs in all <img> tags within the HTML to a remote server (eg. on a VPS), or ideally just the ones for large image files, then rewrite the URLs in the HTML to point to the remote server where the converted images are available over HTTP/S.
You can create a script that rewrites the <img> src attribute <img src="https://foo.com/img1.png"> To point to an endpoint of your server: <img src="https://yourserver.com/downscale?url=https://foo.com/img1.png"> And then in the server you simply downscale it. Here is how you could do it with rules: # Script that would rewrite images to a server for downscaling action downscale filter 'rewrite-img.sh' define mime header 'Content-Type' match mime 'text/html' action downscale
Or a deeper approach would be to apply the same approach as this rewrite engine to binary content as well, and have Dillo do it transparently via 'rewrite'/convert rules for image MIME types. Then the HTML would stay the same and Dillo would trigger a command that requested a downscaled image from the converter server instead of the original image's server. That would be more elegant, but expands the scope of your proposed system a little.
Rewriting the binary image directly would be possible, but then you would have wasted the bandwidth bringing it to Dillo, and now you have to send it to the server to downscale it. You probably want to use the previous approach for this case. In any case, imagine you want to downscale it locally anyway. Here is how I can think about it: # Script that would downscale an image and write to stdout action downscale filter 'downscale-img.sh' # Define headers from the HTTP content with shorter names define mime header 'Content-Type' define size header 'Content-Length' # Downscale big images match mime =~ 'image/.*' and size > 10K action downscale Notice that this can be triggered for any image, not only ones provided via HTTP/HTTPS, but also via other protocols like gemini that are adapted to speak HTTP and also provide a Content-Length header. I added the =~ and > operators, as the former would match a regex and the latter will use a numeric comparator. You can assume that the default if the header is not present is to make any comparison fail. I have also added the "define" keyword to define properties like "mime" or "size" which are parsed from the HTTP headers and are shorter and easier to write.
Maybe since it still requires a remote Web server this problem would be better solved via a Web proxy (I did look into Squid before, but drowned in confusing documentation). But I just thought I'd mention it as an example of a more complex usage for this proposed rewrite system.
But then you will need to pass all the traffic through the server so it performs the substitution there. Another solution which may be better is to mark from Dillo which requests are being done from img elements (filtering them before going to the network). If Dillo marks those requests in the HTTP headers for example, then you could do: # Script that transforms image HTTP requests to a server that # downscales the image action downscale filter 'downscale-req.sh' define source header 'Dillo-Request-Source' # Downscale images comming from <img> elements match source 'img' action downscale This would have the benefit that Dillo already performs the parsing of the HTML for you, and only the images that are loaded are passed to the downscaling server. Additionally, cookies would be sent in the HTTP request, so you can access login protected images this way too. Working on these examples is very helpful to design the rule system, so feel free to mention more cases. Best, Rodrigo.
Rodrigo Arias <rodarima-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
On Wed, Jun 19, 2024 at 10:03:20AM +1000, Kevin Koster wrote:
This mechanism might suit an idea I've had to do remote downscaling of extremely large images, which are increasingly being included in web pages. The script would send a list of URLs in all <img> tags within the HTML to a remote server (eg. on a VPS), or ideally just the ones for large image files, then rewrite the URLs in the HTML to point to the remote server where the converted images are available over HTTP/S.
You can create a script that rewrites the <img> src attribute
<img src="https://foo.com/img1.png">
To point to an endpoint of your server:
<img src="https://yourserver.com/downscale?url=https://foo.com/img1.png">
Yes that would work too, however the idea of sending a URL list was that it allows my server to re-use the HTTPS connection to the image host server while downloading lots of images for a webpage, whereas the way you have it each image request by my server would be a new HTTPS connection, and therefore potentially slower. It's a minor difference though.
And then in the server you simply downscale it. Here is how you could do it with rules:
# Script that would rewrite images to a server for downscaling action downscale filter 'rewrite-img.sh' define mime header 'Content-Type' match mime 'text/html' action downscale
That should work. I would personally find it clearer to read if there was a character indicating assignment values, since this would make it obvious which words are commands and which are arguments. Such as: # Script that would rewrite images to a server for downscaling action downscale filter='rewrite-img.sh' define mime header='Content-Type' match mime='text/html' action=downscale Using "action" as an argument to "match" as well as the name of a command might still be a little confusing.
Or a deeper approach would be to apply the same approach as this rewrite engine to binary content as well, and have Dillo do it transparently via 'rewrite'/convert rules for image MIME types. Then the HTML would stay the same and Dillo would trigger a command that requested a downscaled image from the converter server instead of the original image's server. That would be more elegant, but expands the scope of your proposed system a little.
Rewriting the binary image directly would be possible, but then you would have wasted the bandwidth bringing it to Dillo, and now you have to send it to the server to downscale it.
No the idea is to reduce the bandwidth usage on Dillo's connection, so for this approach Dillo would have to abort the connection to the image server if the image size was over the limit and fetch it from the script instead, which might do: wget -q -O - "https://yourserver.com/downscale?url=https://foo.com/img1.png" A way to get Dillo to take the replacement URL from the script would be better, but I suspect that would make this system more complicated to implement because then state matters between different connections. Granted I've lost the ability to re-use the HTTPS connection at my server with this approach. I like how it wouldn't affect the performance of fetching small images by waiting for them to be needlessly downloaded by my server first though, which would generally be a bigger advantage overall. Most images are still small enough that I wouldn't down-scale them, just the 1MB+ ones would get that treatment. The surprise 10MB+ ones are the real evil, and just the option to block these outright (abort the connection and don't run any script) would be better than nothing. To complicate things further, it would be good to have a right-click menu option to bypass this rule and allow fetching the full-size image. Alternatively an external handler could be assigned (via the earlier-discussed mechanism) to a script that downloads and opens the image URL in an external image viewer, which might actually be better to use for that.
In any case, imagine you want to downscale it locally anyway. Here is how I can think about it:
# Script that would downscale an image and write to stdout action downscale filter 'downscale-img.sh'
# Define headers from the HTTP content with shorter names define mime header 'Content-Type' define size header 'Content-Length'
# Downscale big images match mime =~ 'image/.*' and size > 10K action downscale
Notice that this can be triggered for any image, not only ones provided via HTTP/HTTPS, but also via other protocols like gemini that are adapted to speak HTTP and also provide a Content-Length header.
I added the =~ and > operators, as the former would match a regex and the latter will use a numeric comparator. You can assume that the default if the header is not present is to make any comparison fail.
I have also added the "define" keyword to define properties like "mime" or "size" which are parsed from the HTTP headers and are shorter and easier to write.
That looks good. Maybe I'd find it clearer without the whitespace between operators as with my '=' example above.
Maybe since it still requires a remote Web server this problem would be better solved via a Web proxy (I did look into Squid before, but drowned in confusing documentation). But I just thought I'd mention it as an example of a more complex usage for this proposed rewrite system.
But then you will need to pass all the traffic through the server so it performs the substitution there.
Another solution which may be better is to mark from Dillo which requests are being done from img elements (filtering them before going to the network).
If Dillo marks those requests in the HTTP headers for example, then you could do:
# Script that transforms image HTTP requests to a server that # downscales the image action downscale filter 'downscale-req.sh'
define source header 'Dillo-Request-Source'
# Downscale images comming from <img> elements match source 'img' action downscale
Yes, that's neat.
This would have the benefit that Dillo already performs the parsing of the HTML for you, and only the images that are loaded are passed to the downscaling server. Additionally, cookies would be sent in the HTTP request, so you can access login protected images this way too.
Good point, although for my usage login protected images wouldn't be much of a concern.
Rodrigo Arias wrote:
Working on these examples is very helpful to design the rule system, so feel free to mention more cases.
In addition to HTTP header feilds, it would be handy to assign rewrite rules according to HTTP status codes. I spend a lot of time browsing dead/dying websites so I'm always using the Wayback Machine. I'd like to use this system to process 404 (500, etc.) error pages through a program/script that adds a link at the top to look up the URL at the Wayback Machine. It might also check for a copy I've archived locally, if I get my website archives more organised, and link to that as well. I've already got the Wayback Machine set up as a search option in dillorc which returns the latest archived copy of a URL: search_url="w Wayback Machine http://web.archive.org/web/%s" But I use it so much that the ease of just clicking a link would be a real advantage, especially after opening multiple broken links from a page. I might even try to write something that detects archived error pages and works back to the last actual copy of the page.
Hi Kevin, On Sun, Jul 07, 2024 at 10:21:05AM +1000, Kevin Koster wrote:
Rodrigo Arias wrote:
Working on these examples is very helpful to design the rule system, so feel free to mention more cases.
In addition to HTTP header feilds, it would be handy to assign rewrite rules according to HTTP status codes. I spend a lot of time browsing dead/dying websites so I'm always using the Wayback Machine. I'd like to use this system to process 404 (500, etc.) error pages through a program/script that adds a link at the top to look up the URL at the Wayback Machine. It might also check for a copy I've archived locally, if I get my website archives more organised, and link to that as well.
Good idea, this should be doable with the rule mechanism. An important consideration is to be able to quickly forward good requests or responses, so we reduce the overhead in overall browsing. We could probably just hook it to >= 400 and then you handle those broken responses as you want, while quickly forwarding 200 to Dillo: match http-status-code >= 400 action broken-page I think we may need to define several "tables" like in iptables so we can have rules that handle the traffic at different stages. This one doesn't require decoding the compressed content.
I've already got the Wayback Machine set up as a search option in dillorc which returns the latest archived copy of a URL: search_url="w Wayback Machine http://web.archive.org/web/%s"
But I use it so much that the ease of just clicking a link would be a real advantage, especially after opening multiple broken links from a page. I might even try to write something that detects archived error pages and works back to the last actual copy of the page.
I think this is a very good addition to the default set of search engines. Thanks, Rodrigo.
On Sun, 7 Jul 2024 19:10:57 +0200 Rodrigo Arias <rodarima@gmail.com> wrote:
On Sun, Jul 07, 2024 at 10:21:05AM +1000, Kevin Koster wrote:
In addition to HTTP header feilds, it would be handy to assign rewrite rules according to HTTP status codes. I spend a lot of time browsing dead/dying websites so I'm always using the Wayback Machine. I'd like to use this system to process 404 (500, etc.) error pages through a program/script that adds a link at the top to look up the URL at the Wayback Machine. It might also check for a copy I've archived locally, if I get my website archives more organised, and link to that as well.
Good idea, this should be doable with the rule mechanism. An important consideration is to be able to quickly forward good requests or responses, so we reduce the overhead in overall browsing.
We could probably just hook it to >= 400 and then you handle those broken responses as you want, while quickly forwarding 200 to Dillo:
match http-status-code >= 400 action broken-page
I think we may need to define several "tables" like in iptables so we can have rules that handle the traffic at different stages. This one doesn't require decoding the compressed content.
That would be great. Provided it doesn't actually end up as complicated as iptables configuration of course. :)
On Mon, Jul 08, 2024 at 09:59:20AM +1000, Kevin Koster wrote:
That would be great. Provided it doesn't actually end up as complicated as iptables configuration of course. :)
Yeah, I'll have to think how to avoid that too :-) I'm putting my thoughts into an RFC here: https://github.com/dillo-browser/rfc/blob/rfc-002/rfc-002-rule-based-content... (On Dillo it renders ok without remote CSS) Let's see if I can keep it simple. Rodrigo.
El sáb, 13 jul 2024 a las 10:01, Rodrigo Arias (<rodarima@gmail.com>) escribió:
On Mon, Jul 08, 2024 at 09:59:20AM +1000, Kevin Koster wrote:
That would be great. Provided it doesn't actually end up as complicated as iptables configuration of course. :)
Yeah, I'll have to think how to avoid that too :-)
I'm putting my thoughts into an RFC here:
https://github.com/dillo-browser/rfc/blob/rfc-002/rfc-002-rule-based-content...
(On Dillo it renders ok without remote CSS)
Let's see if I can keep it simple.
Rodrigo.
I think a good way is to reduce functionality to a minimum. Not to make a full scripting language, but rather moderately complex hook rules that can call external scripts or DPIs with complex functionality. Another mistake that can be made is to replace the DPI infrastructure. I think a cooperation between the two is better. DPIs and authorized scripts can set temporary rules and be called by dillo on events. Fixed rules in a rules file call complex scripts or DPIs. The problem with the DPI infrastructure is that it is an unfinished infrastructure with very little documentation. For example: uncontrolled mime types can be implemented by sending a DPI command to the DPI service "mime.<mime_type_name>" and using the same functions as vsource. To handle, for example, text/markdow or image/webp, put mime.text/markdown=... and mime.imge/webp=... in dpidrc. I think this is the only way to have a lightweight device with more functionality loaded only when used. (Sorry for my English. My native language is Spanish) Diego Sáenz
Hi Diego On Sat, Jul 13, 2024 at 01:36:19PM +0200, Diego wrote:
I think a good way is to reduce functionality to a minimum. Not to make a full scripting language, but rather moderately complex hook rules that can call external scripts or DPIs with complex functionality.
This is not a scripting language, just a way to describe rules that can be transformed in quick operations at runtime. We won't be reinventing JS.
Another mistake that can be made is to replace the DPI infrastructure. I think a cooperation between the two is better.
There is no plan to remove the support for DPI plugins, just to increase the capabilities for hooking.
DPIs and authorized scripts can set temporary rules and be called by dillo on events. Fixed rules in a rules file call complex scripts or DPIs.
That's more or less the idea.
The problem with the DPI infrastructure is that it is an unfinished infrastructure with very little documentation.
There are other structural problems, please check the related issues. https://github.com/dillo-browser/dillo/issues/65 https://github.com/dillo-browser/dillo/issues/56 Best, Rodrigo.
participants (4)
-
a1ex@dismail.de
-
Diego
-
Kevin Koster
-
Rodrigo Arias