Some points about the dpi's
Hi, All right, I did it. I made my first dpi ! :-) (heavily based on the hello dpi, though). I will poilsh it a bit and release it soon. It can handle finger URLs. Two caveats : there is no standard for finger URL's, only some drafts http://www.watersprings.org/pub/id/draft-ietf-uri-url-finger-00.txt http://www.watersprings.org/pub/id/draft-ietf-uri-url-finger-01.txt http://www.watersprings.org/pub/id/draft-ietf-uri-url-finger-02.txt http://ftp.ics.uci.edu/pub/ietf/uri/draft-ietf-uri-url-finger-03.txt I implemented version 01 and 03 (if you check the documents you'll know why). However, I did find real finger URL's in the wild. The second caveat : dillo does not automatically call this plugin upon such a URL. It is easy to patch dillo, though. However, this brings me to my questions : 1) will there be a generic URL->dpi mapping ? I could imagine that dpid could actually handle that. I think, one would need to only modify IO/Url.c and capi.c to simply pass through EVERYTHING to the dpi framework. However, some "core URL handling dpi's" that will become part of the dillo package can probably stay hardcoded in dillo. 1b) Either way, I am planing on writing a "catch-all-URL handler" or "fallback handler" or "univeral URL handler" or the like. It will give the user the possibility to start an external program for handling certain URL's. In fact, with my current dpi, this is quite easy to do :-). 2) Does it make sense to write a library for all the dpi's ? I, for myself, did start simply collecting all the functions that I found in dpi/ and are used by many dpi's and put them in a library. Like the various send_stream functions or the get_attr_value function, or the encoding and decoding functions. 3) About configuration (and I think I read comments about this in the code), shouldn't there be only one config file ? and not dpidrc and possibly different configs for all the dpis ? maybe even merge cookiesrc with dillorc ? I guess some of this (or all of it ?) is being worked on by some of you. Maybe this mail helps to avoid duplication :-) Cheers, Andreas P.S.: I also have some ideas about improving the ftp plugin. There is a pretty simple ftp library, and I think it does make sense to actually log in an ftp server and stay logged in for all the operations. (unlike wget which logs in and out for every file). This is not only faster but helps with busy ftp sites, too ! -- **************************** NEW ADDRESS ****************************** Hamburger Sternwarte Universitaet Hamburg Gojenbergsweg 112 Tel. ++49 40 42891 4016 D-21029 Hamburg, Germany Fax. ++49 40 42891 4198
On Tue, Aug 19, 2003 at 12:33:53AM +0200, Andreas Schweitzer wrote:
Hi,
All right, I did it. I made my first dpi ! :-) (heavily That's great, I hope to make a new patch for dpid available soon.
The second caveat : dillo does not automatically call this plugin upon such a URL. It is easy to patch dillo, though.
Creating a new dpi still requires more effort than it should.
However, this brings me to my questions : 1) will there be a generic URL->dpi mapping ? I could imagine that dpid could actually handle that. Yes, but I think you mean URL->SERVICE, a dpi is just an implementation of a service.
I think, one would need to only modify IO/Url.c and capi.c to simply pass through EVERYTHING to the dpi framework. However, some "core URL handling dpi's" that will become part of the dillo package can probably stay hardcoded in dillo.
1b) Either way, I am planing on writing a "catch-all-URL handler" or "fallback handler" or "univeral URL handler" or the like. It will give the user the possibility to start an external program for handling certain URL's. In fact, with my current dpi, this is quite easy to do :-). OK, but I think we should hold off on 1 and 1b until I make the next dpid patch available.
2) Does it make sense to write a library for all the dpi's ? I, for myself, did start simply collecting all the functions that I found in dpi/ and are used by many dpi's and put them in a library. Like the various send_stream functions or the get_attr_value function, or the encoding and decoding functions. Yes, that would be very useful. Your strategy of creating a library of functions which are actually used by many dpis is a good one.
3) About configuration (and I think I read comments about this in the code), shouldn't there be only one config file ? and not dpidrc and possibly different configs for all the dpis ? maybe even merge cookiesrc with dillorc ? Yes, my next dpid patch gets rid of dpidrc.
I guess some of this (or all of it ?) is being worked on by some of you. Maybe this mail helps to avoid duplication :-)
Cheers, Andreas
P.S.: I also have some ideas about improving the ftp plugin. There is a pretty simple ftp library, and I think it does make sense to actually log in an ftp server and stay logged in for all the operations. (unlike wget which logs in and out for every file). This is not only faster but helps with busy ftp sites, too ! Developing the ftp plugin is a high priority and your help here would be appreciated.
Best regards Ferdi
Andreas, Here go my comments...
On Tue, Aug 19, 2003 at 12:33:53AM +0200, Andreas Schweitzer wrote:
Hi,
All right, I did it. I made my first dpi ! :-) (heavily That's great, I hope to make a new patch for dpid available soon.
Good! That means it's understandable now! :-)
The second caveat : dillo does not automatically call this plugin upon such a URL. It is easy to patch dillo, though.
Creating a new dpi still requires more effort than it should.
Eventually there'll be a mechanism for automating the inclusion of new URL-handling plugins. We're working on it with Ferdi.
However, this brings me to my questions : 1) will there be a generic URL->dpi mapping ? I could imagine that dpid could actually handle that. Yes, but I think you mean URL->SERVICE, a dpi is just an implementation of a service.
Ditto!. The distinction is relevant because there may be several different dpi programs that implement the same service. The user makes a choice and uses _one_ of them.
I think, one would need to only modify IO/Url.c and capi.c to simply pass through EVERYTHING to the dpi framework. However, some "core URL handling dpi's" that will become part of the dillo package can probably stay hardcoded in dillo.
And attached to a service name... If there's a dpi for handling it, no problem. If not, warn that is not an available service.
1b) Either way, I am planing on writing a "catch-all-URL handler" or "fallback handler" or "univeral URL handler" or the like. It will give the user the possibility to start an external program for handling certain URL's. In fact, with my current dpi, this is quite easy to do :-).
OK, but I think we should hold off on 1 and 1b until I make the next dpid patch available.
Me too.
2) Does it make sense to write a library for all the dpi's ? I, for myself, did start simply collecting all the functions that I found in dpi/ and are used by many dpi's and put them in a library. Like the various send_stream functions or the get_attr_value function, or the encoding and decoding functions. Yes, that would be very useful. Your strategy of creating a library of functions which are actually used by many dpis is a good one.
3) About configuration (and I think I read comments about this in the code), shouldn't there be only one config file ? and not dpidrc and possibly different configs for all the dpis ? maybe even merge cookiesrc with dillorc ? Yes, my next dpid patch gets rid of dpidrc.
This is the only point I have some doubts. I've always thought that it is a good idea to keep separate configuration files for dillo, and its dpis (and dpid). One file with a lengthy collection of not tightly-related configuration directives sounds to me as low-cohesion and high-coupling (I'm not sure about what are the words in english for those technical terms. In spanish it is: cohesiĆ³n y acoplamiento). The idea of having a preferences plugin providing an HTML GUI for: dillo, dpid, and each dpi, is what I picture in the long term. Something like: .-------. | Dillo | Dpid | ftp | https | .... | ---------------------------------------------------. | | | <dillo's preferences here> | | ... | | '-----------------------------------------------------------' Restore Apply Save I think that is easier and cleaner with separate configuration files. Also, the rc for each dpi would be located in its own dpi directory tree (available in next dpid patch). Note that the task of the preferences dpi is to change the rc file providing a nice GUI, not to make the changes effective. Taking the changes into account is a thing that each program knows better. The preferences dpi will "tell" that program to reread the rc and update its state, probably through dpidc or dpip.
I guess some of this (or all of it ?) is being worked on by some of you. Maybe this mail helps to avoid duplication :-)
Yes it is!
Cheers, Andreas
P.S.: I also have some ideas about improving the ftp plugin. There is a pretty simple ftp library,
Which one?
and I think it does make sense to actually log in an ftp server and stay logged in for all the operations. (unlike wget which logs in and out for every file). This is not only faster but helps with busy ftp sites, too ! Developing the ftp plugin is a high priority and your help here would be appreciated.
Just as hinted in the comments inside ftp.c! Note that you only need to provide the HTML output for FTP directories. Downloads are to be bounced back to dillo for it to forward them to the downloads plugin (that way the whole downloading is centralized in a single place, stats can be provided etc.). I left some code inside ftp.c that shows a beautiful trick to tell whether an FTP URL is a directory when using wget (it probably can be re-used). The most time consuming task with this approach is to parse the ASCII from the ftp server into HTML (easy in theory but you'll find different layouts). In fact that's what I did with and ancient FTP plugin I wrote. If the library you think of using writes HTML itself, great! If not, just tell me to dig in for my old FTP code, and to send it to you. Cheers Jorge.-
Hi, This e-mail account was cut off the net (literally ...) for 2 days, hence the delay in answering.
However, this brings me to my questions : 1) will there be a generic URL->dpi mapping ? I could imagine that dpid could actually handle that. Yes, but I think you mean URL->SERVICE, a dpi is just an implementation of a service.
Ditto!.
I understand. I think my original wording was not the best. But in the end, if the user can easily manage and configure URL->(via services)->dpi-program w/o compiling dillo, (s)he will be happy :-)
1b) Either way, I am planing on writing a "catch-all-URL handler" or "fallback handler" or "univeral URL handler" or the like. It will give the user the possibility to start an external program for handling certain URL's. In fact, with my current dpi, this is quite easy to do :-). OK, but I think we should hold off on 1 and 1b until I make the next dpid patch available.
Me too.
I'll put it on hold :-)
3) About configuration (and I think I read comments about this in the code), shouldn't there be only one config file ? and not dpidrc and possibly different configs for all the dpis ? maybe even merge cookiesrc with dillorc ? Yes, my next dpid patch gets rid of dpidrc.
This is the only point I have some doubts.
I've always thought that it is a good idea to keep separate configuration files for dillo, and its dpis (and dpid).
One file with a lengthy collection of not tightly-related configuration directives sounds to me as low-cohesion and high-coupling (I'm not sure about what are the words in english for those technical terms. In spanish it is: cohesiĆ³n y acoplamiento).
OTOH, for a user (like myself) who likes editing configuration files by hand, it is more convenient to have it in one place ... just an opinion though :-) ... but ... as you say :
The idea of having a preferences plugin providing an HTML GUI for: dillo, dpid, and each dpi, is what I picture in the long term. Something like:
.-------. | Dillo | Dpid | ftp | https | .... | ---------------------------------------------------. ^^^^^ should be named "Plugin Management" or the like :-)
this is of course also acceptable. Thinking a bit longer ... I think Mozilla is very much like that. You can configure Mozilla from within and it puts its config files scattered thorugh ~/.mozilla/ within a maze of directories. Every time I look inside that directory trying to fix things by hand, I am completely lost ... hence my desire for a one stop config file :-)
P.S.: I also have some ideas about improving the ftp plugin. There is a pretty simple ftp library,
Which one?
http://nbpfaus.net/~pfau/ftplib/ seems to be relatively old, but the ftp protocol hasn't really changed that much :-) I *think* it is also used in some other applications.
and I think it does make sense to actually log in an ftp server and stay logged in for all the operations. (unlike wget which logs in and out for every file). This is not only faster but helps with busy ftp sites, too ! Developing the ftp plugin is a high priority and your help here would be appreciated.
Just as hinted in the comments inside ftp.c!
Note that you only need to provide the HTML output for FTP directories.
and for files the user clicks on, like text files, images and even html files that exist in the ftp directory.
Downloads are to be bounced back to dillo for it to forward them to the downloads plugin (that way the whole downloading is centralized in a single place, stats can be provided etc.).
I have been thinking back and forth about that point. I think what I will end up doing is at least implementing the actual download from within the ftp plugin. Weather it will be included in the "official" one, or if I make 2 versions (a basic one and a fancy one), or whatever, remains to be seen. I guess one could also think about sending the data from the ftp plugin directly to the download plugin w/o relaying it via dillo.
I left some code inside ftp.c that shows a beautiful trick to tell whether an FTP URL is a directory when using wget (it probably can be re-used).
I saw it, and this is indeed a problem. I checked the ftp rfc and there is no command that will tell me if something is a regular file or a directory. Also, after checking the ftp rfc and the url syntax rfc (rfc 1738) especially, section 3.2.4 and 3.2.5 makes the logging into ftp sites somewhat problematic. Essentially, it says, unless you actually travel an ftp site like within an interactive client, there is no way of telling how to get from ftp://site.com/foo/bar/ (which could be one link in an html page) to ftp://site.com/foo2/bar2/ (which could be a 2nd link in an html page)
The most time consuming task with this approach is to parse the ASCII from the ftp server into HTML (easy in theory but you'll find different layouts). In fact that's what I did with and ancient FTP plugin I wrote.
If the library you think of using writes HTML itself, great! If not, just tell me to dig in for my old FTP code, and to send it to you.
I think either way it could be useful to look at :-) Cheers, Andreas -- **************************** NEW ADDRESS ****************************** Hamburger Sternwarte Universitaet Hamburg Gojenbergsweg 112 Tel. ++49 40 42891 4016 D-21029 Hamburg, Germany Fax. ++49 40 42891 4198
participants (3)
-
Andreas Schweitzer
-
Ferdi Franceschini
-
Jorge Arellano Cid