Date: Tue, 6 Aug 2002 10:15:53 +0200 (CEST) From: Martin Blapp <mb@imp.ch> To: Alexander Kabaev <ak03@gte.com> Cc: <openoffice@FreeBSD.ORG>, <jdp@FreeBSD.ORG>, <hackers@FreeBSD.ORG> Subject: Help needed. Deadlock in rtld makes openoffice build hang again Message-ID: <20020806095745.M58571-100000@levais.imp.ch> In-Reply-To: <20020805110611.4292e3d5.ak03@gte.com>
next in thread | previous in thread | raw e-mail | index | archive | help
Hi, From 10 builds, about 6 are hanging, and I need to restart them. This is not a usable solution for a package building cluster. I end with a process consuming all CPU resources and hanging for waiting for a lock to get released what never happens. Problem is exit(). Replaceing exit() with _exit() did not help. [Switching to Process 4968, Thread 1] 0x28050784 in sigprocmask () from /usr/libexec/ld-elf.so.1 (gdb) bt #0 0x28050784 in sigprocmask () from /usr/libexec/ld-elf.so.1 #1 0x2804f2d1 in xprintf () from /usr/libexec/ld-elf.so.1 #2 0x2804df78 in find_symdef () from /usr/libexec/ld-elf.so.1 #3 0x2838dbd8 in exit () from /usr/lib/libc_r.so.4 #4 0x08048c77 in _start () I tried to add the following lines as proposed by Alexander Kabaev to libexec/rtld-elf/i386/lockdflt.c > Martin, try to add the loop below to the wlock_acquire function > to make it look more like lock80386_acquire: > while (l->lock != 0) > ; /* Spin */ Now it hangs there ... [Switching to Process 93059, Thread 1] 0x28050923 in wlock_acquire (lock=0x28067000) at /usr/src/libexec/rtld-elf/i386/lockdflt.c:188 188 while (l->lock != 0) (gdb) bt #0 0x28050923 in wlock_acquire (lock=0x28067000) at /usr/src/libexec/rtld-elf/i386/lockdflt.c:188 #1 0x280505ee in wlock_acquire () at /usr/src/libexec/rtld-elf/rtld.c:202 #2 0x2804ee60 in rtld_exit () at /usr/src/libexec/rtld-elf/rtld.c:1428 #3 0x28390bd8 in exit () from /usr/lib/libc_r.so.4 #4 0x08048c77 in _start () (gdb) p l->lock $2 = 2 (gdb) p tmp_oldsigmask $3 = {__bits = {0, 0, 0, 0}} (gdb) p fullsigmask $4 = {__bits = {4294963463, 4294967295, 4294967295, 4294967295}} I tried to do this: (gdb) set l->lock=0 (gdb) c And got this ... /usr/libexec/ld-elf.so.1: Application locking error: 1 readers and 1 writers in dynamic linker. See DLLOCKINIT(3) in manual pages. I'll now try to change it like this: static void wlock_acquire(void *lock) { Lock *l = (Lock *)lock; sigset_t tmp_oldsigmask; for ( ; ; ) { sigprocmask(SIG_BLOCK, &fullsigmask, &tmp_oldsigmask); if (cmpxchgl(0, WAFLAG, &l->lock) == 0) break; sigprocmask(SIG_SETMASK, &tmp_oldsigmask, NULL); + while (l->lock & WAFLAG) + ; /* Spin */ } oldsigmask = tmp_oldsigmask; } Anybody has any clue how to fix this issue ? Martin Martin Blapp, <mb@imp.ch> <mbr@FreeBSD.org> ------------------------------------------------------------------ ImproWare AG, UNIXSP & ISP, Zurlindenstrasse 29, 4133 Pratteln, CH Phone: +41 061 826 93 00: +41 61 826 93 01 PGP: <finger -l mbr@freebsd.org> PGP Fingerprint: B434 53FC C87C FE7B 0A18 B84C 8686 EF22 D300 551E ------------------------------------------------------------------ To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20020806095745.M58571-100000>