Hi guys, over the last few days I have started working on the next major incarnation of pkgutils, version 6. All of the package/archive and database handling has been moved into a library (atm called libpkgutils), which is used by our small trio of apps, pkg{info,add,rm}. As we had planned for a long time now, this rewrite is done in C instead of C++. None of the libpkgutils API is final yet of course, so feel free to make comments or suggestions. So far I've implemented functionality for almost all of pkginfo's features, and pkgrm. Feel free to test those, but please make a backup of your package database before trying pkgrm :) The code is available in the pkgutils6 branch of the pkgutils.git repository: http://crux.nu/gitweb/?p=tools/pkgutils.git;a=shortlog;h=pkgutils6 Regards, Tilman -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
Tilman Sauerbeck [2007-08-09 01:16]:
over the last few days I have started working on the next major incarnation of pkgutils, version 6.
I was bored earlier today and compared the run times and memory usage of "pkginfo -i" from pkgutils-c 1.5 and the tip of the pkgutils6 branch. I only ran each binary once, as I'm mostly interested in memory usage here (memory usage should be constant no matter whether you're running with a hot or a cold cache, whereas run time typically won't be). Here's the graphs from valgrind resp. massif: pkgutils-c 1.5: http://crux.nu/~tilman/pkginfo_i.1.5.png pkgutils6: http://crux.nu/~tilman/pkginfo_i.6.png I'll leave the interpretation of these graphs to you ;) NB: pkgutils6 is work-in-progress, so take these numbers with a grain of salt. Regards, Tilman -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
On Sat, Aug 11, 2007 at 05:10:12PM +0200, Tilman Sauerbeck wrote:
Tilman Sauerbeck [2007-08-09 01:16]:
over the last few days I have started working on the next major incarnation of pkgutils, version 6.
I was bored earlier today and compared the run times and memory usage of "pkginfo -i" from pkgutils-c 1.5 and the tip of the pkgutils6 branch.
I only ran each binary once, as I'm mostly interested in memory usage here (memory usage should be constant no matter whether you're running with a hot or a cold cache, whereas run time typically won't be).
Here's the graphs from valgrind resp. massif:
pkgutils-c 1.5: http://crux.nu/~tilman/pkginfo_i.1.5.png pkgutils6: http://crux.nu/~tilman/pkginfo_i.6.png
I'll leave the interpretation of these graphs to you ;)
I really doubt that you didn't know the reason: Let's see what pkgutils6 is doing: static void list_installed_cb (PkgPackage *pkg, void *user_data) { printf ("%s %s-%s\n", pkg->name, pkg->version, pkg->release); } ... pkg_database_read_package_list (db, PKG_DATABASE_READ_NAMES_ONLY); pkg_database_foreach (db, list_installed_cb, NULL); pkg_database_unref (db); ... I.e. you're just reading package names in memory. So, you're using ad-hoc methods, and yes they're faster, eat less memory. While pkgutils-c's functions to the database done w/o premature optimizations: pkg_init_db(); list_for_each(_pkg, &pkg_db) { struct pkg_desc *pkg = _pkg->data; printf("%s %s\n", pkg->name, pkg->version); } pkg_free_db(); That mean pkgutils-c reading whole database, including files listing, where unnecessary (well, only in pkginfo -i case, the only I can imagine). Is it easy to optimize (if you'd really want to)? Yes, less than 30 lines patch. I'll not believe that you didn't find the way to do such optimization in the pkgutils-c, if you'd want to. But... thanks for the idea, maybe I'll implement such specific optimization, myself. Now, let's compare pkgrm? pkgadd would be more interesting, but you haven't done this. pkgadd/pkgrm needs whole db. pkgutils6/src# time ./pkgrm gtk real 0m0.979s user 0m0.867s sys 0m0.043s # time pkgrm gtk real 0m0.440s user 0m0.153s sys 0m0.080s It's warm start. pkgutils-c runs twice faster. And looking at the pkgutils6's code, I can tell that pkgutils-c (comparing to pkgutils6) will be faster as db and package size grows, while pkgutils6 will be more and more slower, comparing to pkgutils-c. At the same time pkgutils-c still eats a bit more memory. Why? Because in addition to db size, pkgutils-c using temporary storage for sorting, to use faster algorithms. Can I make pkgutils-c faster with less memory consumption? It depends. In-place list sorting is slower, but maybe there will be some win because of less memory management stuff. Not sure, lazy to calc. (And btw, keeping db in rb-tree would be definitely faster, with less memory consumption. But it will complicate code, a bit). Who will do these tests and optimizations for pkgutils-c? Not you, surely. NIH syndrome won't let you do this. In the same time, you'll test and profile your pkgutils6. Quite illogical, but it's what NIH syndrome is all about. You won't optimize someone's else code, but you will make your own, and then _anyway_ will test/fix/profile/optimize it. Oh, I've just said "quite illogical"? No, it's damn moronic. Anyhow, good luck with it, -- Anton Vorontsov email: cbou@mail.ru backup email: ya-cbou@yandex.ru irc://irc.freenode.net/bd2
Anton Vorontsov [2007-08-12 02:35]:
On Sat, Aug 11, 2007 at 05:10:12PM +0200, Tilman Sauerbeck wrote:
Tilman Sauerbeck [2007-08-09 01:16]:
over the last few days I have started working on the next major incarnation of pkgutils, version 6.
I was bored earlier today and compared the run times and memory usage of "pkginfo -i" from pkgutils-c 1.5 and the tip of the pkgutils6 branch.
I only ran each binary once, as I'm mostly interested in memory usage here (memory usage should be constant no matter whether you're running with a hot or a cold cache, whereas run time typically won't be).
Here's the graphs from valgrind resp. massif:
pkgutils-c 1.5: http://crux.nu/~tilman/pkginfo_i.1.5.png pkgutils6: http://crux.nu/~tilman/pkginfo_i.6.png
I'll leave the interpretation of these graphs to you ;)
I really doubt that you didn't know the reason:
Yes, I do know the reason. Hint: don't start your replies by implying ignorance. It doesn't really increase my willingness to give a thorough answer.
I.e. you're just reading package names in memory. So, you're using ad-hoc methods, and yes they're faster, eat less memory. While pkgutils-c's functions to the database done w/o premature optimizations:
I remember that you accused me of wasting memory. Guess in the pkginfo-i case it's you instead who's eating RAM alive. ...and that's the whole point of my previous mail. Tilman -- A: Because it messes up the order in which people normally read text. Q: Why is top-posting such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail?
On Sun, Aug 12, 2007 at 10:34:16AM +0200, Tilman Sauerbeck wrote:
Anton Vorontsov [2007-08-12 02:35]:
On Sat, Aug 11, 2007 at 05:10:12PM +0200, Tilman Sauerbeck wrote:
Tilman Sauerbeck [2007-08-09 01:16]:
over the last few days I have started working on the next major incarnation of pkgutils, version 6.
I was bored earlier today and compared the run times and memory usage of "pkginfo -i" from pkgutils-c 1.5 and the tip of the pkgutils6 branch.
I only ran each binary once, as I'm mostly interested in memory usage here (memory usage should be constant no matter whether you're running with a hot or a cold cache, whereas run time typically won't be).
Here's the graphs from valgrind resp. massif:
pkgutils-c 1.5: http://crux.nu/~tilman/pkginfo_i.1.5.png pkgutils6: http://crux.nu/~tilman/pkginfo_i.6.png
I'll leave the interpretation of these graphs to you ;)
I really doubt that you didn't know the reason:
Yes, I do know the reason. Hint: don't start your replies by implying ignorance. It doesn't really increase my willingness to give a thorough answer.
I.e. you're just reading package names in memory. So, you're using ad-hoc methods, and yes they're faster, eat less memory. While pkgutils-c's functions to the database done w/o premature optimizations:
I remember that you accused me of wasting memory. Guess in the pkginfo-i case it's you instead who's eating RAM alive.
You're speaking about completely different things, and you _know_ it. Doing struct pkg { char name[NAMESIZE]; char version[VERSIONSIZE]; ... }; Is just ugly and regression against _original_ pkgutils, you're adding constraints, and wasting memory w/o any hope to easily optimize it. And I told you about _that_. You seem to "fix" it after: struct pkg { ... char version[VERSIONSIZE]; char name[]; }; Heh. You've "fixed" name, but in the current code you can't easily fix version's constraint (or any another), and well... you didn't. On the other hand, not-reading files listing on -i - is a light and easy to do optimization, not some memory management issue that you've made ugly from the very start. And I hardly can imagine where you've found similar points of these two issues, and why you've decided to use silly "-i" example to show that pkgutils-c "ZOMG wastes" memory. Unrelated to this email: please, remove all those constraints, implement all the features, test, fix bugs and then compare. Will be pkgutils6 faster? With less memory consumption? Okay, I'll have reference point to optimize my code. Until that, _you_ have two reference points - pkgutils-c and C++ish pkgutils, while I have none for today. And I know it's easier to have it than not (I was using C++ish pkgutils for it). Good luck, -- Anton Vorontsov email: cbou@mail.ru backup email: ya-cbou@yandex.ru irc://irc.freenode.net/bd2
participants (3)
-
Anton Vorontsov
-
Jonathan Asghar
-
Tilman Sauerbeck