I observed while working on my pkgutils rewrite that the original wastes
time by making two passes through each archive when generating a
footprint. I have revised the algorithm to cache the necessary metadata
it needs to generate the footprint in a first pass through the archive
and then process it all during a second pass through the cached metadata.
The observable differences from the original algorithm are as follows.
First, the new algorithm takes roughly half the time of the original
when generating footprints. Second, the unsorted line order has been
changed from the order of their appearance in the original archive to a
sorted order by path. I did this to support the faster binary search
lookups for the hard links that the original algorithm was doing via a
hash table.
If this is considered undesirable, I can switch back to using a hash
table to retain the original order. I switched it because I didn't want
to increase the RAM usage anymore than I already had to for the metadata
cache. An auxiliary hashtable would require extra RAM whereas this
method allows me to reuse the same array I use for the metadata cache.
In practice this should shave a few seconds off large port builds,
though it's relative to how much time it spends building the archive and
the composition of the resulting package. Thoughts?