![](https://secure.gravatar.com/avatar/f40b961fd497e2471960f7150d986610.jpg?s=120&d=mm&r=g)
Hello all! I've been doing some analysis of this Improvement task [1] and have concerns whether the solutions proposed improve things. It helps me to write things up so here it goes... I'll start out with some of the facts I've found and follow that with a list of the proposed solutions I've heard or read and comments about them. _Facts_ I wrote a script [2] to tell me how many times a downloaded source file is used in more than one port in core, opt, xorg, and config. The result is: ports with the same source URL: 30 Total repos checked: 4 Total ports checked: 1404 These may or may not be collisions. Multiple ports may use the same source. The script does not check for this because pkgmk does not check for this but depends on the .md5sum . I wrote another script [3] to sum up the size of those downloaded source files and got: Total size is 259,239,211 bytes for 30 urls. A change to the Pkgfile that affects the meaning of source=() will require changes to these scripts in core or opt: - dllist (bash) - pkgmk (bash) - pkgsize (bash) - prtsweep (bash) - prtwash (bash) _Proposed Solutions_ The as.diff included with the initial improvement request changes the meaning of the contents of source=() in the Pkgfile. There is now a possible keyword in there. This means all consumers of source=() will have to be changed. -5 The comma-separated proposal suffers the same fate. We now have source=() meaning two things: remote source and local source. Again all other consumers of source=() will have to be updated. -3 A new array: names=() . This is better. No change to source=() so no direct breakage of other consumers. But if they are looking for the the local copy of the source they will fail. -1 Downloading to separate directories for each port, like /usr/ports/dist/$name/file.tar.gz . That is already accomplished by the default /etc/pkgmk.conf in pkgutils; the source is stored with the Pkgfile. -10 _IMHO_ All the proposed improvements came up negative in my personal view. The least negative is a new array. The delimiter separation is the next in line. Now that I've written this I think I need to go back and do some more analysis. There is no implementation patch for the new array proposal. (a couple of days go by) And now there is a sample implementation of the new array proposed. I chose the array name rename=() so it might be more clear what it is doing in the Pkgfile. Attached is a patch and Pkgfile for the pkgutils port that implements renaming in pkgmk. There is a one-for-one order corelation between the source() array and the rename() array. An equal sign (=) can be used if the same name is desired but it's the third source() item that needs renaming. So a Pkgfile would look like: # Description: text email client # URL: http://www.washington.edu/alpine/ # Maintainer: Daryl Fonseca-Holt, wyatt at prairieturtle dot ca # Packager: Daryl Fonseca-Holt, wyatt at prairieturtle dot ca # Depends on: aspell openssl name=alpine version=2.00 release=2 source=(ftp://ftp.cac.washington.edu/alpine/alpine.tar.bz2) rename=(ftp://ftp.cac.washington.edu/alpine/alpine-$version.tar.bz2) build() { cd ${name}-${version} ./configure \ --prefix=/usr \ --mandir=/usr/man \ --with-passfile=.pinepw \ --with-ssl-lib-dir=/lib \ --with-ssl-dir=/usr \ --with-web-bin=/usr/share/alpine make -j3 make DESTDIR=$PKG install rm -rf $PKG/home } An example of laziness^Werror prevention is: source=(http://example.com/well-named-2.0.tar.gz static.file config.file http://example.com/26490.tar.gz) rename=(= = = http://example.com/$name-extra-$version.tar.gz) Through trickery in the patch the .md5sum still displays the old name of the file but it actually runs md5sum on the renamed file. If this seems worthwhile I can add it to the Flyspray bug. Comments? [1] https://crux.nu/bugs/index.php?do=details&task_id=923 [2] http://crux.wyatt.fastmail.fm/scripts/find-same-source [3] http://crux.wyatt.fastmail.fm/scripts/sum-inet-file-sizes -Daryl IRC: darfo or nthwyatt
![](https://secure.gravatar.com/avatar/d51b6f233eee94b37270e5140cadef46.jpg?s=120&d=mm&r=g)
On 2016-10-04 11:21, Daryl F wrote:
I wrote a script [2] to tell me how many times a downloaded source file is used in more than one port in core, opt, xorg, and config. The result is:
ports with the same source URL: 30 Total repos checked: 4 Total ports checked: 1404
These may or may not be collisions. Multiple ports may use the same source. The script does not check for this because pkgmk does not check for this but depends on the .md5sum .
Hi Daryl, Thanks for investigating this. I took the time to write an alternative script (attached) that distinguishes between source collisions and sharing, and I compare the filename only (as should be), not the whole URL. My results are: 31 files shared, 1 collision (see below). Unless I missed something, at the moment there is no problem, just a small "inconvenience".
Downloading to separate directories for each port, like /usr/ports/dist/$name/file.tar.gz . That is already accomplished by the default /etc/pkgmk.conf in pkgutils; the source is stored with the Pkgfile. -10
The default is to save source files in /usr/ports/$repo/$name/file.tar.gz, you won't have a problem this way, but admittedly it's a bit messy. You can use separate download directories by name (like I suggested in the bug report) by using /usr/ports/dist/$name/file.tar.gz. It requires a change in pkgmk to create missing directories, I have this patch in my personal repo [1]. The change in pkgmk is small, and nobody has to change their ports.
An example of laziness^Werror prevention is:
source=(http://example.com/well-named-2.0.tar.gz static.file config.file http://example.com/26490.tar.gz) rename=(= = = http://example.com/$name-extra-$version.tar.gz)
If anything I would prefer this: name=cairo-dock version=3.4.1 release=1 declare -A source=([$name-$version.tar.gz]=https://github.com/Cairo-Dock/$name-core/archive/$version.tar.gz) ... name=cairo-dock-plug-ins version=3.4.1 release=1 declare -A source=([$name-$version.tar.gz]=https://github.com/Cairo-Dock/$name/archive/$version.tar.gz) ... Using associative arrays instead of two separate ones will prevent mistakes. And ports without risks of filename collisions (eg: including $name in the filename), should continue using a normal array. Pkgmk would have to check the type of the array and behave accordingly (to be backward compatible). But my preferred solution continues to be my pkgutils patch (or to just continue ignoring this). [1] http://www.mizrahi.com.ve/crux/ports/pkgutils/
![](https://secure.gravatar.com/avatar/f40b961fd497e2471960f7150d986610.jpg?s=120&d=mm&r=g)
On Wed, 5 Oct 2016, Alan Mizrahi wrote:
My results are: 31 files shared, 1 collision (see below).
I don't know why we got different counts of sharing. My main interest was the amount of disk space that will be consumed and the extra Internet use since the sharing will stop with some of the proposed solutions.
Unless I missed something, at the moment there is no problem, just a small "inconvenience".
Agreed. For more experienced users of CRUX it is not hard to understand what's going on. I wrote up the patch simply to see what it would cost to do renaming but still allow sharing. I am not invested in it.
You can use separate download directories by name (like I suggested in the bug report) by using /usr/ports/dist/$name/file.tar.gz. It requires a change in pkgmk to create missing directories, I have this patch in my personal repo [1]. The change in pkgmk is small, and nobody has to change their ports.
This is nice. It doesn't require change to any of the other pkgutils scripts.
If anything I would prefer this:
name=cairo-dock version=3.4.1 release=1 declare -A source=([$name-$version.tar.gz]=https://github.com/Cairo-Dock/$name-core/archive/$version.tar.gz) ... name=cairo-dock-plug-ins version=3.4.1 release=1 declare -A source=([$name-$version.tar.gz]=https://github.com/Cairo-Dock/$name/archive/$version.tar.gz) ...
Using associative arrays instead of two separate ones will prevent mistakes. And ports without risks of filename collisions (eg: including $name in the filename), should continue using a normal array. Pkgmk would have to check the type of the array and behave accordingly (to be backward compatible).
This is better than parallel arrays. Still requires that some of the other scripts in pkgutils be updated but still allows source sharing. I wonder if the code in pkgmk script would be more elegant (obvious) than my patch? Probably. Hashing is usually easier to code than indexing.
But my preferred solution continues to be my pkgutils patch (or to just continue ignoring this).
I prefer ignoring it if the solution doesn't include source sharing but I am OK with any solution the CRUX team comes up with. They (you) rock! I have a greater appreciation for the simplicity of CRUX infrastructure from developing that patch. My bash-fu has lots of room for improvement :) -Daryl
![](https://secure.gravatar.com/avatar/591a64259d5818dd73920a9870e63458.jpg?s=120&d=mm&r=g)
Hi, I think it's possible to use separate directories without losing the ability of source sharing. One way is what I proposed in #923, which is based on a presumption that any 2 ports that sharing source files can't have source name collisions. I think It's a fair presumption. Basically, the way to do it is to use separate intermediate directories for ports with source name collisions, but use the same intermediate directories for ports that share source files. The best place to specify the name of intermediate directories is in Pkgfiles. The patch is simple(see #923), and nothing has to be changed for normal ports or scripts, only ports that have source name collisions need modification (it has to be done anyway). It's less messy by the way, as only ports that have source name collisions require intermediate directories, for other ports the location of source files is unchanged. Actually it can overcome the problem that I said in #923, that the name of some source files don't have version numbers, and pkgmk won't download newer versions while updating. The fix for this issue is simple, just have some version numbers in their intermediate directories. For example, chromium-pepperflash/Pkgfile currently is: name=chromium-pepperflash version=47.0.2526.80 release=1 source=() build() { wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb ar -x google-chrome-stable_current_amd64.deb tar -xJf data.tar.xz install -D $SRC/opt/google/chrome/PepperFlash/libpepflashplayer.so \ $PKG/usr/lib/PepperFlash/libpepflashplayer.so } After applying the patch, it would be: name=chromium-pepperflash version=47.0.2526.80 release=1 source=(https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb) srcdir=chromium-pepperflash-$version build() { ar -x google-chrome-stable_current_amd64.deb tar -xJf data.tar.xz install -D $SRC/opt/google/chrome/PepperFlash/libpepflashplayer.so \ $PKG/usr/lib/PepperFlash/libpepflashplayer.so } Assigning the same srcdir to ports that share source files. Normal ports don't have srcdir, thus no change is made.
participants (3)
-
Alan Mizrahi
-
Daryl F
-
phi