Hello list, I run into a segfault whenever I run pkgadd (pkgutils 5.20). Am I the only one in the list with this problem? Now that Per retired from the Crux project, who is going to support this programs? This is the main server in the company I work for, and the hardware is running fine (I don't believe this is a hardware problem). This machine has an uptime of 116 days and counting, with all services running flawless since day 1. I compiled pkgadd with --gdb and disabling the NDEBUG define, and then run it with gdb, and this is what I get: [root@centauris pkgutils-5.20]# gdb ./pkgadd GNU gdb 6.3 Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -u /usr/ports/pkgs/libgmp#4.1.4-3.pkg.tar.gz Starting program: /tmp/pkgutils-5.20/pkgadd -u /usr/ports/pkgs/libgmp#4.1.4-3.pkg.tar.gz 145 packages found in database Configuration: ^etc/.*$ 0 ^var/log/.*$ 0 ^etc/mail/cf/.*$ 1 ^etc/ports/drivers/.*$ 1 ^etc/X11/.*$ 1 ^etc/rc.*$ 1 ^etc/rc\.local$ 0 ^etc/rc\.modules$ 0 ^etc/rc\.conf$ 0 ^etc/rc\.d/net$ 0 Conflicts phase 1 (conflicts in database): usr/ usr/include/ usr/lib/ Conflicts phase 2 (conflicts in filesystem added): usr/ usr/include/ usr/include/gmp.h usr/include/gmpxx.h usr/lib/ usr/lib/libgmp.a usr/lib/libgmp.la usr/lib/libgmp.so usr/lib/libgmp.so.3 usr/lib/libgmp.so.3.3.3 usr/lib/libgmpxx.a usr/lib/libgmpxx.la usr/lib/libgmpxx.so usr/lib/libgmpxx.so.3 usr/lib/libgmpxx.so.3.0.5 Conflicts phase 3 (directories excluded): usr/include/gmp.h usr/include/gmpxx.h usr/lib/libgmp.a usr/lib/libgmp.la usr/lib/libgmp.so usr/lib/libgmp.so.3 usr/lib/libgmp.so.3.3.3 usr/lib/libgmpxx.a usr/lib/libgmpxx.la usr/lib/libgmpxx.so usr/lib/libgmpxx.so.3 usr/lib/libgmpxx.so.3.0.5 Conflicts phase 4 (files already owned by this package excluded): Keep list: Removing package phase 1 (all files in package): usr/ usr/include/ usr/include/gmp.h usr/include/gmpxx.h usr/lib/ usr/lib/libgmp.a usr/lib/libgmp.la usr/lib/libgmp.so usr/lib/libgmp.so.3 usr/lib/libgmp.so.3.3.3 usr/lib/libgmpxx.a usr/lib/libgmpxx.la usr/lib/libgmpxx.so usr/lib/libgmpxx.so.3 usr/lib/libgmpxx.so.3.0.5 Removing package phase 2 (files that is in the keep list excluded): usr/ usr/include/ usr/include/gmp.h usr/include/gmpxx.h usr/lib/ usr/lib/libgmp.a usr/lib/libgmp.la usr/lib/libgmp.so usr/lib/libgmp.so.3 usr/lib/libgmp.so.3.3.3 usr/lib/libgmpxx.a usr/lib/libgmpxx.la usr/lib/libgmpxx.so usr/lib/libgmpxx.so.3 usr/lib/libgmpxx.so.3.0.5 Removing package phase 3 (files that still have references excluded): usr/include/gmp.h usr/include/gmpxx.h usr/lib/libgmp.a usr/lib/libgmp.la usr/lib/libgmp.so usr/lib/libgmp.so.3 usr/lib/libgmp.so.3.3.3 usr/lib/libgmpxx.a usr/lib/libgmpxx.la usr/lib/libgmpxx.so usr/lib/libgmpxx.so.3 usr/lib/libgmpxx.so.3.0.5 145 packages written to database Program received signal SIGSEGV, Segmentation fault. 0x080d58d0 in _IO_un_link () (gdb) bt #0 0x080d58d0 in _IO_un_link () #1 0x080cf24f in fclose () #2 0x0805e0b8 in destroy () #3 0x0805c1fa in tar_close (t=0x818cc30) at handle.c:118 #4 0x0804dd0e in pkgutil::pkg_install (this=0x816cb00, filename=@0xbfb49f50, keep_list=@0xbfb49e70) at pkgutil.cc:425 #5 0x080568d6 in pkgadd::run (this=0x816cb00, argc=-1078681936, argv=0xbfb4a2c4) at pkgadd.cc:104 #6 0x08048687 in main (argc=3, argv=0xbfb4a2c4) at memory:285 Can anybody help me figure this out? Regards, -- Alan Mizrahi
Hi Alan, Your gdb-backtrace says it dies at handle.c:118 i = (*(t->type->closefunc))(t->fd);
#0 0x080d58d0 in _IO_un_link () #1 0x080cf24f in fclose () #2 0x0805e0b8 in destroy () #3 0x0805c1fa in tar_close (t=0x818cc30) at handle.c:118
"closefunc" is a pointer to zlib's "gzclose" function (defined in libtar/libtar.c:97). Have you (re-)compiled your zlib with some strange optimization options? bye, danm -- Daniel Mueller Berlin, Germany OpenPGP: 1024D/E4F4383A
El Wednesday, 8 de March de 2006 7:59 pm, Daniel Mueller escribió:
Hi Alan,
Your gdb-backtrace says it dies at handle.c:118
i = (*(t->type->closefunc))(t->fd);
#0 0x080d58d0 in _IO_un_link () #1 0x080cf24f in fclose () #2 0x0805e0b8 in destroy () #3 0x0805c1fa in tar_close (t=0x818cc30) at handle.c:118
"closefunc" is a pointer to zlib's "gzclose" function (defined in libtar/libtar.c:97). Have you (re-)compiled your zlib with some strange optimization options?
bye, danm
I didn't use any strange flags, I have always used "-O2 -march=i686 -pipe" as my CFLAGS and CXXFLAGS, and it has always worked fine (this is a pentium2). Anyway, I rebuilt libz.a with: -O2 -DDEBUG -ggdb, and then proceeded to build pkgutils again (with -O2 -ggdb), and now I get this backtrace: Program received signal SIGSEGV, Segmentation fault. 0x080d5ee0 in _IO_un_link () (gdb) bt #0 0x080d5ee0 in _IO_un_link () #1 0x080cf85f in fclose () #2 0x0805e0b8 in destroy (s=0x818dc30) at gzio.c:375 #3 0x0805c1fa in tar_close (t=0x82508c0) at handle.c:118 #4 0x0804dd0e in pkgutil::pkg_install (this=0x816db00, filename=@0xbff4c8e0, keep_list=@0xbff4c800) at pkgutil.cc:425 #5 0x080568d6 in pkgadd::run (this=0x816db00, argc=-1074476992, argv=0xbff4cc54) at pkgadd.cc:104 #6 0x08048687 in main (argc=3, argv=0xbff4cc54) at memory:285 This is gzio.c: 368: if (s->stream.state != NULL) { 369: if (s->mode == 'w') { xxx:#ifdef NO_GZCOMPRESS 370: err = Z_STREAM_ERROR; xxx:#else 370: err = deflateEnd(&(s->stream)); xxx:#endif 371: } else if (s->mode == 'r') { 372: err = inflateEnd(&(s->stream)); 373: } 374: } 375: if (s->file != NULL && fclose(s->file)) { xxx:#ifdef ESPIPE xxx: if (errno != ESPIPE) /* fclose is broken for pipes in HP/UX */ xxx:#endif xxx: err = Z_ERRNO; xxx: } I guess the segfault is at fclose(s->file), but why is this happening? After this test, I installed the stock zlib from crux 2.1, and rebuilt pkgutils-5.20, with the same result. Then I tried installing the stock pkgutils from crux 2.1, and still the same result, this makes no sense. The next thing I tried was running pkgadd with strace. I got this output: munmap(0xb7b95000, 131072) = 0 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 stat64("/etc/openldap/ldap.conf", {st_mode=S_IFREG|0644, st_size=1043, ...}) = 0 geteuid32() = 0 stat64("/etc/openldap/ldap.conf", {st_mode=S_IFREG|0644, st_size=1043, ...}) = 0 geteuid32() = 0 time(NULL) = 1141878180 write(5, "0\201\234\2\1\30c\201\226\4\27ou=group,dc=bwv2,dc=c"..., 159) = 159 select(1024, [5], [], NULL, NULL) = 1 (in [5]) read(5, "0H\2\1\30dC\4", 8) = 8 read(5, "\37cn=root,ou=Group,dc=bwv2,dc=com"..., 66) = 66 select(1024, [5], [], NULL, NULL) = 1 (in [5]) read(5, "0\f\2\1\30e\7\n", 8) = 8 read(5, "\1\0\4\0\4\0", 6) = 6 time(NULL) = 1141878180 time([1141878180]) = 1141878180 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 geteuid32() = 0 lchown32("/usr/include/gmpxx.h", 0, 0) = 0 utime("/usr/include/gmpxx.h", [2006/03/08-16:55:24, 2006/03/08-16:55:24]) = 0 chmod("/usr/include/gmpxx.h", 0100644) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++ As you can see, I am using nss-ldap. So the next thing I did was to comment out ldap from my nsswitch.conf, and it worked! pkgadd doesn't segfault anymore! This is very strange, since I run many programs that call the nss functions on ldap, and they always work. And by many I mean: exim, sshd, samba, mysql, apache, etc. I even tried with different nss-ldap versions, with the same results. If this wasn't strange enough, read this: If I put back ldap in my nsswitch.conf, and run nscd, everything works fine! So my guess is that this is a glibc bug, what do you think? I'll take a deeper look when I have more time, but for now I am satisfied (and puzzled) with the results. Thanks for reading. Regards, -- Alan Mizrahi
Hi! Maybe it is totally out, but I remember that I have got sometimes Segmentation fault with pkgadd when files inside the package owned some user which was not present in the system (f.e. 1017/1017 was in .footprint). Please check contents of the *.pkg.tar.gz file. And I also use nss_ldap which is responsible for the number/username translation. Pavel Samek Alan Mizrahi napsal(a):
El Wednesday, 8 de March de 2006 7:59 pm, Daniel Mueller escribió:
Hi Alan,
Your gdb-backtrace says it dies at handle.c:118
i = (*(t->type->closefunc))(t->fd);
#0 0x080d58d0 in _IO_un_link () #1 0x080cf24f in fclose () #2 0x0805e0b8 in destroy () #3 0x0805c1fa in tar_close (t=0x818cc30) at handle.c:118 "closefunc" is a pointer to zlib's "gzclose" function (defined in libtar/libtar.c:97). Have you (re-)compiled your zlib with some strange optimization options?
bye, danm
I didn't use any strange flags, I have always used "-O2 -march=i686 -pipe" as my CFLAGS and CXXFLAGS, and it has always worked fine (this is a pentium2).
Anyway, I rebuilt libz.a with: -O2 -DDEBUG -ggdb, and then proceeded tobuild pkgutils again (with -O2 -ggdb), and now I get this backtrace:
Program received signal SIGSEGV, Segmentation fault. 0x080d5ee0 in _IO_un_link () (gdb) bt #0 0x080d5ee0 in _IO_un_link () #1 0x080cf85f in fclose () #2 0x0805e0b8 in destroy (s=0x818dc30) at gzio.c:375 #3 0x0805c1fa in tar_close (t=0x82508c0) at handle.c:118 #4 0x0804dd0e in pkgutil::pkg_install (this=0x816db00, filename=@0xbff4c8e0, keep_list=@0xbff4c800) at pkgutil.cc:425 #5 0x080568d6 in pkgadd::run (this=0x816db00, argc=-1074476992, argv=0xbff4cc54) at pkgadd.cc:104 #6 0x08048687 in main (argc=3, argv=0xbff4cc54) at memory:285
This is gzio.c:
368: if (s->stream.state != NULL) { 369: if (s->mode == 'w') { xxx:#ifdef NO_GZCOMPRESS 370: err = Z_STREAM_ERROR; xxx:#else 370: err = deflateEnd(&(s->stream)); xxx:#endif 371: } else if (s->mode == 'r') { 372: err = inflateEnd(&(s->stream)); 373: } 374: } 375: if (s->file != NULL && fclose(s->file)) { xxx:#ifdef ESPIPE xxx: if (errno != ESPIPE) /* fclose is broken for pipes in HP/UX */ xxx:#endif xxx: err = Z_ERRNO; xxx: }
I guess the segfault is at fclose(s->file), but why is this happening?
After this test, I installed the stock zlib from crux 2.1, and rebuilt pkgutils-5.20, with the same result.
Then I tried installing the stock pkgutils from crux 2.1, and still thesame result, this makes no sense.
The next thing I tried was running pkgadd with strace. I got this output:
munmap(0xb7b95000, 131072) = 0 rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0 stat64("/etc/openldap/ldap.conf", {st_mode=S_IFREG|0644, st_size=1043, ...}) = 0 geteuid32() = 0 stat64("/etc/openldap/ldap.conf", {st_mode=S_IFREG|0644, st_size=1043, ...}) = 0 geteuid32() = 0 time(NULL) = 1141878180 write(5, "0\201\234\2\1\30c\201\226\4\27ou=group,dc=bwv2,dc=c"..., 159) = 159 select(1024, [5], [], NULL, NULL) = 1 (in [5]) read(5, "0H\2\1\30dC\4", 8) = 8 read(5, "\37cn=root,ou=Group,dc=bwv2,dc=com"..., 66) = 66 select(1024, [5], [], NULL, NULL) = 1 (in [5]) read(5, "0\f\2\1\30e\7\n", 8) = 8 read(5, "\1\0\4\0\4\0", 6) = 6 time(NULL) = 1141878180 time([1141878180]) = 1141878180 rt_sigaction(SIGPIPE, {SIG_DFL}, NULL, 8) = 0 geteuid32() = 0 lchown32("/usr/include/gmpxx.h", 0, 0) = 0 utime("/usr/include/gmpxx.h", [2006/03/08-16:55:24, 2006/03/08-16:55:24]) = 0 chmod("/usr/include/gmpxx.h", 0100644) = 0 --- SIGSEGV (Segmentation fault) @ 0 (0) --- +++ killed by SIGSEGV +++
As you can see, I am using nss-ldap. So the next thing I did was to comment out ldap from my nsswitch.conf, and it worked! pkgadd doesn't segfault anymore!
This is very strange, since I run many programs that call the nss functions on ldap, and they always work. And by many I mean: exim, sshd, samba, mysql, apache, etc. I even tried with different nss-ldap versions, with the same results.
If this wasn't strange enough, read this: If I put back ldap in my nsswitch.conf, and run nscd, everything works fine!
So my guess is that this is a glibc bug, what do you think?
I'll take a deeper look when I have more time, but for now I am satisfied (and puzzled) with the results.
Thanks for reading.
Regards,
-- Alan Mizrahi
On Thursday 09 March 2006 05:30, Alan Mizrahi wrote:
If this wasn't strange enough, read this: If I put back ldap in my nsswitch.conf, and run nscd, everything works fine!
What the..? o_O
So my guess is that this is a glibc bug, what do you think?
It could be nss_ldap as well. bye, danm -- Daniel Mueller Berlin, Germany OpenPGP: 1024D/E4F4383A
participants (3)
-
Alan Mizrahi
-
Daniel Mueller
-
Pavel Samek