I observed while working on my pkgutils rewrite that the original wastes time by making two passes through each archive when generating a footprint. I have revised the algorithm to cache the necessary metadata it needs to generate the footprint in a first pass through the archive and then process it all during a second pass through the cached metadata. The observable differences from the original algorithm are as follows. First, the new algorithm takes roughly half the time of the original when generating footprints. Second, the unsorted line order has been changed from the order of their appearance in the original archive to a sorted order by path. I did this to support the faster binary search lookups for the hard links that the original algorithm was doing via a hash table. If this is considered undesirable, I can switch back to using a hash table to retain the original order. I switched it because I didn't want to increase the RAM usage anymore than I already had to for the metadata cache. An auxiliary hashtable would require extra RAM whereas this method allows me to reuse the same array I use for the metadata cache. In practice this should shave a few seconds off large port builds, though it's relative to how much time it spends building the archive and the composition of the resulting package. Thoughts?
participants (1)
-
James Buren