On Sat, Aug 11, 2007 at 05:10:12PM +0200, Tilman Sauerbeck wrote:
Tilman Sauerbeck [2007-08-09 01:16]:
over the last few days I have started working on the next major incarnation of pkgutils, version 6.
I was bored earlier today and compared the run times and memory usage of "pkginfo -i" from pkgutils-c 1.5 and the tip of the pkgutils6 branch.
I only ran each binary once, as I'm mostly interested in memory usage here (memory usage should be constant no matter whether you're running with a hot or a cold cache, whereas run time typically won't be).
Here's the graphs from valgrind resp. massif:
pkgutils-c 1.5: http://crux.nu/~tilman/pkginfo_i.1.5.png pkgutils6: http://crux.nu/~tilman/pkginfo_i.6.png
I'll leave the interpretation of these graphs to you ;)
I really doubt that you didn't know the reason: Let's see what pkgutils6 is doing: static void list_installed_cb (PkgPackage *pkg, void *user_data) { printf ("%s %s-%s\n", pkg->name, pkg->version, pkg->release); } ... pkg_database_read_package_list (db, PKG_DATABASE_READ_NAMES_ONLY); pkg_database_foreach (db, list_installed_cb, NULL); pkg_database_unref (db); ... I.e. you're just reading package names in memory. So, you're using ad-hoc methods, and yes they're faster, eat less memory. While pkgutils-c's functions to the database done w/o premature optimizations: pkg_init_db(); list_for_each(_pkg, &pkg_db) { struct pkg_desc *pkg = _pkg->data; printf("%s %s\n", pkg->name, pkg->version); } pkg_free_db(); That mean pkgutils-c reading whole database, including files listing, where unnecessary (well, only in pkginfo -i case, the only I can imagine). Is it easy to optimize (if you'd really want to)? Yes, less than 30 lines patch. I'll not believe that you didn't find the way to do such optimization in the pkgutils-c, if you'd want to. But... thanks for the idea, maybe I'll implement such specific optimization, myself. Now, let's compare pkgrm? pkgadd would be more interesting, but you haven't done this. pkgadd/pkgrm needs whole db. pkgutils6/src# time ./pkgrm gtk real 0m0.979s user 0m0.867s sys 0m0.043s # time pkgrm gtk real 0m0.440s user 0m0.153s sys 0m0.080s It's warm start. pkgutils-c runs twice faster. And looking at the pkgutils6's code, I can tell that pkgutils-c (comparing to pkgutils6) will be faster as db and package size grows, while pkgutils6 will be more and more slower, comparing to pkgutils-c. At the same time pkgutils-c still eats a bit more memory. Why? Because in addition to db size, pkgutils-c using temporary storage for sorting, to use faster algorithms. Can I make pkgutils-c faster with less memory consumption? It depends. In-place list sorting is slower, but maybe there will be some win because of less memory management stuff. Not sure, lazy to calc. (And btw, keeping db in rb-tree would be definitely faster, with less memory consumption. But it will complicate code, a bit). Who will do these tests and optimizations for pkgutils-c? Not you, surely. NIH syndrome won't let you do this. In the same time, you'll test and profile your pkgutils6. Quite illogical, but it's what NIH syndrome is all about. You won't optimize someone's else code, but you will make your own, and then _anyway_ will test/fix/profile/optimize it. Oh, I've just said "quite illogical"? No, it's damn moronic. Anyhow, good luck with it, -- Anton Vorontsov email: cbou@mail.ru backup email: ya-cbou@yandex.ru irc://irc.freenode.net/bd2