Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 04 Nov 2005 17:15:20 +0100
From:      Attila Nagy <bra@fsn.hu>
To:        Robert Watson <rwatson@FreeBSD.org>
Cc:        freebsd-hackers@freebsd.org, freebsd-amd64@freebsd.org
Subject:   Re: Reproduceable freeze with quotas enabled
Message-ID:  <436B8918.9060105@fsn.hu>
In-Reply-To: <20051104095129.M9692@fledge.watson.org>
References:  <436A868F.4010606@fsn.hu> <20051104095129.M9692@fledge.watson.org>

next in thread | previous in thread | raw e-mail | index | archive | help
Hello,

Robert Watson wrote:
> 
> On Thu, 3 Nov 2005, Attila Nagy wrote:
> 
>> Hello,
>>
>> I have an "easily" (at least to me) reproduceable freeze with both 
>> 6-STABLE and 7-CURRENT on an amd64 SMP machine.
>>
>> What I do is simply copy a lot of directories, files and symlinks with 
>> different uids from another machine to this one, using rsync.
> 
> 
> This is probably a vnode lock leak or deadlock.  As it looks like you 
> have both the ability to get into the debugger and also a serial or 
> other remote console, the output of:
> 
>   show allpcpu
>   alltrace
>   show lockedvnods
>   show allocks
> 
> would be helpful.  There are also "options DEBUG_LOCKS" which extends 
> the debugging information available via "show lockedvnods" with 
> information about where the lock was acquired, which can be quite helpful.
> 
> Robert N M Watson
> 
>>
>> With quotas enabled it freezes constantly, without quotas it doesn't.
>>
>> I have MP_WATCHDOG enabled (the kernel config is the default 
>> GENERIC+SMP with debuggers and MP_WATCHDOG only) so after the freeze 
>> it drops me into the debugger in which I can see the following:
>>
>> FreeBSD/amd64 (helen) (ttyv0)
>>
>> login: Watchdog timer: 3
>> Watchdog timer: 2
>> Watchdog timer: 1
>> Watchdog timer: 0
>> Watchdog firing!
>> NMI ... going to debugger
>> [thread pid 157 tid 100143 ]
>> Stopped at      vinactive+0x83: jz      0x365245a9
>> db> bt
>> Tracing pid 157 tid 100143 td 0xffffff007b9c14c0
>> vinactive() at vinactive+0x83
>> vput() at vput+0x1cf
>> qsync() at qsync+0x260
>> ffs_sync() at ffs_sync+0x159
>> sync_fsync() at sync_fsync+0x191
>> VOP_FSYNC_APV() at VOP_FSYNC_APV+0x3d
>> sched_sync() at sched_sync+0x428
>> fork_exit() at fork_exit+0x86
>> fork_trampoline() at fork_trampoline+0xe
>> --- trap 0, rip = 0, rsp = 0xffffffffb2029d00, rbp = 0 ---
>> db> ps
>>  157 ffffff007b9489c0    0     0     0 0000204 [CPU 0] syncer
This is an HP machine, so I can access it via it's ILO port. But it 
mangles larger outputs, so alltrace will be truncated.

I will find a way to get a serial console attached to it, but until that...

I've recompiled the kernel with "options DEBUG_LOCKS", rebooted and 
started the rsync process.

This is what I see now:
login: Watchdog timer: 3
Watchdog timer: 2
Watchdog timer: 1
Watchdog timer: 0
Watchdog firing!
NMI ... going to debugger
[thread pid 14 tid 100010 ]
db> bt
Tracing pid 14 tid 100010 td 0xffffff007b94ebe0
acpi_cpu_c1() at acpi_cpu_c1+0x6
acpi_cpu_idle() at acpi_cpu_idle+0x174
idle_proc() at idle_proc+0x14e
fork_exit() at fork_exit+0x86
fork_trampoline() at fork_trampoline+0xe
--- trap 0, rip = 0, rsp = 0xffffffffb1e81d00, rbp = 0 ---
db> ps
[...]
    14 ffffff007b97b680    0     0     0 000020c [CPU 0] idle: cpu0
[...]
db> show allpcpu
cpuid        = 0
curthread    = 0xffffff007b94ebe0: pid 14 "idle: cpu0"
curpcb       = 0xffffffffb1e81d10
fpcurthread  = none
idlethread   = 0xffffff007b94ebe0: pid 14 "idle: cpu0"

cpuid        = 1
curthread    = 0xffffff007b94e000: pid 13 "idle: cpu1"
curpcb       = 0xffffffffb1e68d10
fpcurthread  = none
idlethread   = 0xffffff007b94e000: pid 13 "idle: cpu1"

cpuid        = 2
curthread    = 0xffffff007b9014c0: pid 157 "syncer"
curpcb       = 0xffffffffb20e8d10
fpcurthread  = none
idlethread   = 0xffffff007b94dbe0: pid 12 "idle: cpu2"

cpuid        = 3
curthread    = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3"
curpcb       = 0xffffffffb1e5ed10
fpcurthread  = none
idlethread   = 0xffffff007b94d980: pid 11 "mp_watchdog cpu 3"

db> show lockedvnods
Locked vnodes

0xffffff0061c05470: tag syncer, type VNON
     usecount 1, writecount 0, refcount 2 mountedhere 0
     flags ()
      lock type syncer: EXCL (count 1) by thread 0xffffff007b9014c0 (pid 
157)

0xffffff00418086a8: tag ufs, type VDIR
     usecount 1, writecount 0, refcount 90 mountedhere 0
     flags ()
      lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180)
         ino 18983427, on dev da1s1d

0xffffff0015a1b6a8: tag ufs, type VLNK
     usecount 0, writecount 0, refcount 1 mountedhere 0
     flags (VI_DOOMED)
      lock type ufs: EXCL (count 1) by thread 0xffffff005d435980 (pid 1180)
         ino 19058341, on dev da1s1d

WARNING this is not complete!
db> alltrace
--- syscall (93, FreeBSD ELF64, select), rip = 0x800d4c14c, rsp = 
0x7fffffffc2d8
, rbp = 0x1 ---

Tracing command rsync pid 1180 tid 100267 td 0xffffff005d435980
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
turnstile_wait() at turnstile_wait+0x272
_mtx_lock_sleep() at _mtx_lock_sleep+0xa6
msleep() at msleep+0x2a6
acquire() at acquire+0x76
debuglockmgr() at debuglockmgr+0x500
vop_stdlock() at vop_stdlock+0x3e
VOP_LOCK_APV() at VOP_LOCK_APV+0x49
VOP_LOCK_APV() at VOP_LOCK_APV+0x49
debug_vn_lock() at debug_vn_lock+0x8f
dqsync() at dqsync+0x95
dqrele() at dqrele+0x68
ufs_reclaim() at ufs_reclaim+0x100
VOP_RECLAIM_APV() at VOP_RECLAIM_APV+0x3d
vgonel() at vgonel+0x12f
vnlru_free() at vnlru_free+0x29f
getnewvnode() at getnewvnode+0x2ca
ffs_vget() at ffs_vget+0xe9
ffs_valloc() at ffs_valloc+0x109
ufs_makeinode() at ufs_makeinode+0x6e
ufs_symlink() at ufs_symlink+0x3f
VOP_SYMLINK_APV() at VOP_SYMLINK_APV+0x65
kern_symlink() at kern_symlink+0x3d3
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (57, FreeBSD ELF64, symlink), rip = 0x8006b93ec, rsp = 
0x7fffffffb7d
8, rbp = 0xa64388 ---
Tracing command sh pid 1178 tid 100202 td 0xffffff00601f8be0
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
sleepq_wait_sig() at sleepq_wait_sig+0xe
cv_wait_sig() at cv_wait_sig+0x17f
kern_select() at kern_select+0xbd4
select() at select+0x3e
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (93, FreeBSD ELF64, select), rip = 0x4165ac, rsp = 
0x7fffffffe948, r
bp = 0x7fffffffe970 ---

Tracing command adjkerntz pid 271 tid 100195 td 0xffffff0062431000
sched_switch() at sched_switch+0x11f
mi_switch() at mi_switch+0x14c
sleepq_wait_sig() at sleepq_wait_sig+0xe
msleep() at msleep+0x21b
kern_sigsuspend() at kern_sigsuspend+0xb1
sigsuspend() at sigsuspend+0x40
syscall() at syscall+0x642
Xfast_syscall() at Xfast_syscall+0xa8
--- syscall (341, FreeBSD ELF64, sigsuspend), rip = 0x80067a18c, rsp = 
0x7ffffff
fed28, rbp = 0xfffffffffffff1f0 ---
(null)() at 0x80067a18c
*** error reading from address fffffffffffff1f8 ***

I will try to provide a full alltrace output tomorrow.

Thanks,
-- 
Attila Nagy                                   e-mail: Attila.Nagy@fsn.hu
Adopt a directory on our free software         phone: +3630 306 6758
server! http://www.fsn.hu/?f=brick



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?436B8918.9060105>