From owner-freebsd-hackers@FreeBSD.ORG Sat Nov 5 07:02:47 2005 Return-Path: X-Original-To: freebsd-hackers@freebsd.org Delivered-To: freebsd-hackers@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7D93816A41F; Sat, 5 Nov 2005 07:02:47 +0000 (GMT) (envelope-from bra@fsn.hu) Received: from people.fsn.hu (people.fsn.hu [195.228.252.137]) by mx1.FreeBSD.org (Postfix) with ESMTP id D30BD43D45; Sat, 5 Nov 2005 07:02:46 +0000 (GMT) (envelope-from bra@fsn.hu) Received: from localhost (localhost [127.0.0.1]) by people.fsn.hu (Postfix) with ESMTP id 900A784420; Fri, 4 Nov 2005 17:15:32 +0100 (CET) Received: from people.fsn.hu ([127.0.0.1]) by localhost (people.fsn.hu [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 55893-01-3; Fri, 4 Nov 2005 17:15:19 +0100 (CET) Received: from [127.0.0.1] (unknown [192.168.2.3]) by people.fsn.hu (Postfix) with ESMTP id 4AFD58441E; Fri, 4 Nov 2005 17:15:19 +0100 (CET) Message-ID: <436B8918.9060105@fsn.hu> Date: Fri, 04 Nov 2005 17:15:20 +0100 From: Attila Nagy User-Agent: Mozilla Thunderbird 1.0.7 (X11/20050930) X-Accept-Language: en-us, en MIME-Version: 1.0 To: Robert Watson References: <436A868F.4010606@fsn.hu> <20051104095129.M9692@fledge.watson.org> In-Reply-To: <20051104095129.M9692@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-2; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Scanned: amavisd-new at fsn.hu Cc: freebsd-hackers@freebsd.org, freebsd-amd64@freebsd.org Subject: Re: Reproduceable freeze with quotas enabled X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 05 Nov 2005 07:02:47 -0000 Hello, Robert Watson wrote: > > On Thu, 3 Nov 2005, Attila Nagy wrote: > >> Hello, >> >> I have an "easily" (at least to me) reproduceable freeze with both >> 6-STABLE and 7-CURRENT on an amd64 SMP machine. >> >> What I do is simply copy a lot of directories, files and symlinks with >> different uids from another machine to this one, using rsync. > > > This is probably a vnode lock leak or deadlock. As it looks like you > have both the ability to get into the debugger and also a serial or > other remote console, the output of: > > show allpcpu > alltrace > show lockedvnods > show allocks > > would be helpful. There are also "options DEBUG_LOCKS" which extends > the debugging information available via "show lockedvnods" with > information about where the lock was acquired, which can be quite helpful. > > Robert N M Watson > >> >> With quotas enabled it freezes constantly, without quotas it doesn't. >> >> I have MP_WATCHDOG enabled (the kernel config is the default >> GENERIC+SMP with debuggers and MP_WATCHDOG only) so after the freeze >> it drops me into the debugger in which I can see the following: >> >> FreeBSD/amd64 (helen) (ttyv0) >> >> login: Watchdog timer: 3 >> Watchdog timer: 2 >> Watchdog timer: 1 >> Watchdog timer: 0 >> Watchdog firing! >> NMI ... going to debugger >> [thread pid 157 tid 100143 ] >> Stopped at vinactive+0x83: jz 0x365245a9 >> db> bt >> Tracing pid 157 tid 100143 td 0xffffff007b9c14c0 >> vinactive() at vinactive+0x83 >> vput() at vput+0x1cf >> qsync() at qsync+0x260 >> ffs_sync() at ffs_sync+0x159 >> sync_fsync() at sync_fsync+0x191 >> VOP_FSYNC_APV() at VOP_FSYNC_APV+0x3d >> sched_sync() at sched_sync+0x428 >> fork_exit() at fork_exit+0x86 >> fork_trampoline() at fork_trampoline+0xe >> --- trap 0, rip = 0, rsp = 0xffffffffb2029d00, rbp = 0 --- >> db> ps >> 157 ffffff007b9489c0 0 0 0 0000204 [CPU 0] syncer This is an HP machine, so I can access it via it's ILO port. But it mangles larger outputs, so alltrace will be truncated. I will find a way to get a serial console attached to it, but until that... I've recompiled the kernel with "options DEBUG_LOCKS", rebooted and started the rsync process. This is what I see now: login: Watchdog timer: 3 Watchdog timer: 2 Watchdog timer: 1 Watchdog timer: 0 Watchdog firing! NMI ... going to debugger [thread pid 14 tid 100010 ] db> bt Tracing pid 14 tid 100010 td 0xffffff007b94ebe0 acpi_cpu_c1() at acpi_cpu_c1+0x6 acpi_cpu_idle() at acpi_cpu_idle+0x174 idle_proc() at idle_proc+0x14e fork_exit() at fork_exit+0x86 fork_trampoline() at fork_trampoline+0xe --- trap 0, rip = 0, rsp = 0xffffffffb1e81d00, rbp = 0 --- db> ps [...] 14 ffffff007b97b680 0 0 0 000020c [CPU 0] idle: cpu0 [...] db> show allpcpu cpuid = 0 curthread = 0xffffff007b94ebe0: pid 14 "idle: cpu0" curpcb = 0xffffffffb1e81d10 fpcurthread = none idlethread = 0xffffff007b94ebe0: pid 14 "idle: cpu0" cpuid = 1 curthread = 0xffffff007b94e000: pid 13 "idle: cpu1" curpcb = 0xffffffffb1e68d10 fpcurthread = none idlethread = 0xffffff007b94e000: pid 13 "idle: cpu1" cpuid = 2 curthread = 0xffffff007b9014c0: pid 157 "syncer" curpcb = 0xffffffffb20e8d10 fpcurthread = none idlethread = 0xffffff007b94dbe0: pid 12 "idle: cpu2" cpuid = 3 curthread = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3" curpcb = 0xffffffffb1e5ed10 fpcurthread = none idlethread = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3" db> show lockedvnods Locked vnodes 0xffffff0061c05470: tag syncer, type VNON usecount 1, writecount 0, refcount 2 mountedhere 0 flags () lock type syncer: EXCL (count 1) by thread 0xffffff007b9014c0 (pid 157) 0xffffff00418086a8: tag ufs, type VDIR usecount 1, writecount 0, refcount 90 mountedhere 0 flags () lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180) ino 18983427, on dev da1s1d 0xffffff0015a1b6a8: tag ufs, type VLNK usecount 0, writecount 0, refcount 1 mountedhere 0 flags (VI_DOOMED) lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180) ino 19058341, on dev da1s1d WARNING this is not complete! db> alltrace --- syscall (93, FreeBSD ELF64, select), rip = 0x800d4c14c, rsp = 0x7fffffffc2d8 , rbp = 0x1 --- Tracing command rsync pid 1180 tid 100267 td 0xffffff005d435980 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c turnstile_wait() at turnstile_wait+0x272 _mtx_lock_sleep() at _mtx_lock_sleep+0xa6 msleep() at msleep+0x2a6 acquire() at acquire+0x76 debuglockmgr() at debuglockmgr+0x500 vop_stdlock() at vop_stdlock+0x3e VOP_LOCK_APV() at VOP_LOCK_APV+0x49 VOP_LOCK_APV() at VOP_LOCK_APV+0x49 debug_vn_lock() at debug_vn_lock+0x8f dqsync() at dqsync+0x95 dqrele() at dqrele+0x68 ufs_reclaim() at ufs_reclaim+0x100 VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x3d vgonel() at vgonel+0x12f vnlru_free() at vnlru_free+0x29f getnewvnode() at getnewvnode+0x2ca ffs_vget() at ffs_vget+0xe9 ffs_valloc() at ffs_valloc+0x109 ufs_makeinode() at ufs_makeinode+0x6e ufs_symlink() at ufs_symlink+0x3f VOP_SYMLINK_APV() at VOP_SYMLINK_APV+0x65 kern_symlink() at kern_symlink+0x3d3 syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (57, FreeBSD ELF64, symlink), rip = 0x8006b93ec, rsp = 0x7fffffffb7d 8, rbp = 0xa64388 --- Tracing command sh pid 1178 tid 100202 td 0xffffff00601f8be0 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c sleepq_wait_sig() at sleepq_wait_sig+0xe cv_wait_sig() at cv_wait_sig+0x17f kern_select() at kern_select+0xbd4 select() at select+0x3e syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (93, FreeBSD ELF64, select), rip = 0x4165ac, rsp = 0x7fffffffe948, r bp = 0x7fffffffe970 --- Tracing command adjkerntz pid 271 tid 100195 td 0xffffff0062431000 sched_switch() at sched_switch+0x11f mi_switch() at mi_switch+0x14c sleepq_wait_sig() at sleepq_wait_sig+0xe msleep() at msleep+0x21b kern_sigsuspend() at kern_sigsuspend+0xb1 sigsuspend() at sigsuspend+0x40 syscall() at syscall+0x642 Xfast_syscall() at Xfast_syscall+0xa8 --- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80067a18c, rsp = 0x7ffffff fed28, rbp = 0xfffffffffffff1f0 --- (null)() at 0x80067a18c *** error reading from address fffffffffffff1f8 *** I will try to provide a full alltrace output tomorrow. Thanks, -- Attila Nagy e-mail: Attila.Nagy@fsn.hu Adopt a directory on our free software phone: +3630 306 6758 server! http://www.fsn.hu/?f=brick