Re: [crux-devel] pkgutils 6

11 Aug 2007

      On Sat, Aug 11, 2007 at 05:10:12PM +0200, Tilman Sauerbeck wrote:
...
Tilman Sauerbeck [2007-08-09 01:16]:
...
over the last few days I have started working on the next major
incarnation of pkgutils, version 6.
I was bored earlier today and compared the run times and memory usage of
"pkginfo -i" from pkgutils-c 1.5 and the tip of the pkgutils6 branch.
I only ran each binary once, as I'm mostly interested in memory usage
here (memory usage should be constant no matter whether you're running
with a hot or a cold cache, whereas run time typically won't be).
Here's the graphs from valgrind resp. massif:
pkgutils-c 1.5: http://crux.nu/~tilman/pkginfo_i.1.5.png
pkgutils6:      http://crux.nu/~tilman/pkginfo_i.6.png
I'll leave the interpretation of these graphs to you ;)
I really doubt that you didn't know the reason:

Let's see what pkgutils6 is doing:

static void
list_installed_cb (PkgPackage *pkg, void *user_data)
{
        printf ("%s %s-%s\n", pkg->name, pkg->version, pkg->release);
}
...
        pkg_database_read_package_list (db, PKG_DATABASE_READ_NAMES_ONLY);
        pkg_database_foreach (db, list_installed_cb, NULL);
        pkg_database_unref (db);
...

I.e. you're just reading package names in memory. So, you're using
ad-hoc methods, and yes they're faster, eat less memory. While
pkgutils-c's functions to the database done w/o premature
optimizations:

        pkg_init_db();

        list_for_each(_pkg, &pkg_db) {
                struct pkg_desc *pkg = _pkg->data;

                printf("%s %s\n", pkg->name, pkg->version);
        }

        pkg_free_db();

That mean pkgutils-c reading whole database, including files listing,
where unnecessary (well, only in pkginfo -i case, the only I can
imagine). Is it easy to optimize (if you'd really want to)? Yes, less
than 30 lines patch. I'll not believe that you didn't find the way to
do such optimization in the pkgutils-c, if you'd want to.

But... thanks for the idea, maybe I'll implement such specific
optimization, myself.

Now, let's compare pkgrm? pkgadd would be more interesting, but you
haven't done this. pkgadd/pkgrm needs whole db.

pkgutils6/src# time ./pkgrm gtk

real    0m0.979s
user    0m0.867s
sys     0m0.043s

# time pkgrm gtk

real    0m0.440s
user    0m0.153s
sys     0m0.080s

It's warm start. pkgutils-c runs twice faster. And looking at the
pkgutils6's code, I can tell that pkgutils-c (comparing to pkgutils6)
will be faster as db and package size grows, while pkgutils6 will be
more and more slower, comparing to pkgutils-c. At the same time
pkgutils-c still eats a bit more memory. Why? Because in addition to
db size, pkgutils-c using temporary storage for sorting, to use faster
algorithms.

Can I make pkgutils-c faster with less memory consumption? It depends.
In-place list sorting is slower, but maybe there will be some win
because of less memory management stuff. Not sure, lazy to calc.
(And btw, keeping db in rb-tree would be definitely faster, with less
memory consumption. But it will complicate code, a bit).

Who will do these tests and optimizations for pkgutils-c? Not you,
surely.

NIH syndrome won't let you do this. In the same time, you'll test and
profile your pkgutils6. Quite illogical, but it's what NIH syndrome
is all about. You won't optimize someone's else code, but you will
make your own, and then _anyway_ will test/fix/profile/optimize it.

Oh, I've just said "quite illogical"? No, it's damn moronic.

Anyhow, good luck with it,

-- 
Anton Vorontsov
email: cbou@mail.ru
backup email: ya-cbou@yandex.ru
irc://irc.freenode.net/bd2