pkgutils footprint enhancement

11 Jun 2019

      I observed while working on my pkgutils rewrite that the original wastes 
time by making two passes through each archive when generating a 
footprint. I have revised the algorithm to cache the necessary metadata 
it needs to generate the footprint in a first pass through the archive 
and then process it all during a second pass through the cached metadata.

The observable differences from the original algorithm are as follows. 
First, the new algorithm takes roughly half the time of the original 
when generating footprints. Second, the unsorted line order has been 
changed from the order of their appearance in the original archive to a 
sorted order by path. I did this to support the faster binary search 
lookups for the hard links that the original algorithm was doing via a 
hash table.

If this is considered undesirable, I can switch back to using a hash 
table to retain the original order. I switched it because I didn't want 
to increase the RAM usage anymore than I already had to for the metadata 
cache. An auxiliary hashtable would require extra RAM whereas this 
method allows me to reuse the same array I use for the metadata cache.

In practice this should shave a few seconds off large port builds, 
though it's relative to how much time it spends building the archive and 
the composition of the resulting package. Thoughts?

James Buren

tags

participants (1)