Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 4 May 2011 13:05:05 +0400
From:      Sergey Kandaurov <pluknet@gmail.com>
To:        Garrett Cooper <yanegomi@gmail.com>
Cc:        Kirk McKusick <mckusick@mckusick.com>, FreeBSD Current <freebsd-current@freebsd.org>
Subject:   Re: Nasty non-recursive lockmgr panic on softdep only enabled UFS partition when filesystem full
Message-ID:  <BANLkTikEygiJKefTOjcY4sKdab8NN2jykQ@mail.gmail.com>
In-Reply-To: <BANLkTikAQ6Jz4Jbjxh51iA-cjCYmdx1mSg@mail.gmail.com>
References:  <BANLkTik4=O_1PWB2GzGzY=m51dG-Kbhe%2BQ@mail.gmail.com> <201105040559.p445xEJ5024585@chez.mckusick.com> <BANLkTikAQ6Jz4Jbjxh51iA-cjCYmdx1mSg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On 4 May 2011 10:42, Garrett Cooper <yanegomi@gmail.com> wrote:
> On Tue, May 3, 2011 at 10:59 PM, Kirk McKusick <mckusick@mckusick.com> wr=
ote:
>>> Date: Tue, 3 May 2011 22:40:26 -0700
>>> Subject: Nasty non-recursive lockmgr panic on softdep only enabled UFS
>>> =A0partition when filesystem full
>>> From: Garrett Cooper <yanegomi@gmail.com>
>>> To: Jeff Roberson <jeff@freebsd.org>,
>>> =A0 =A0 =A0 =A0 Marshall Kirk McKusick <mckusick@mckusick.com>
>>> Cc: FreeBSD Current <freebsd-current@freebsd.org>
>>>
>>> Hi Jeff and Dr. McKusick,
>>> =A0 =A0 Ran into this panic when /usr ran out of space doing a make
>>> universe on amd64/r221219 (it took ~15 minutes for the panic to occur
>>> after the filesystem ran out of space -- wasn't quite sure what it was
>>> doing at the time):
>>>
>>> ...
>>>
>>> =A0 =A0 Let me know what other commands you would like for me to run in=
 kgdb.
>>> Thanks,
>>> -Garrett
>>
>> You did not indicate whether you are running an 8.X system or a 9-curren=
t
>> system. It would be helpful to know that.
>
> I've actually been running CURRENT for a few years now, but you're right =
--
> I didn't mention that part.
>
>> Jeff thinks that there may be a potential race in the locking code for
>> softdep_request_cleanup. If so, this patch for 9-current should fix it:
>>
>> Index: ffs_softdep.c
>> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
>> --- ffs_softdep.c =A0 =A0 =A0 (revision 221385)
>> +++ ffs_softdep.c =A0 =A0 =A0 (working copy)
>> @@ -11380,7 +11380,8 @@
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0MNT_IUNLOCK(mp);
>> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (vget(lvp, LK_EXCLUSIVE=
 | LK_INTERLOCK, curthread)) {
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (vget(lvp, LK_EXCLUSIVE=
 | LK_NOWAIT | LK_INTERLOCK,
>> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 curthread)) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0MNT_ILOCK=
(mp);
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0continue;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0}
>>

FYI,
I was playing with head (w/o the above patch) to reproduce the panic and go=
t
this LOR when filesystem was eventually filled.
I'm not sure the patch would fix the panic but I think it should at
least fix the LOR.

kernel: pid 66153 (dd), uid 0 inumber 4 on /mnt: filesystem full
lock order reversal:
 1st 0xfffffe001d7d3310 ufs (ufs) @ /usr/src/sys/kern/vfs_vnops.c:614
 2nd 0xffffff807ba8a800 bufwait (bufwait) @ /usr/src/sys/kern/vfs_bio.c:265=
8
 3rd 0xfffffe001ade7588 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2126
KDB: stack backtrace:
db_trace_self_wrapper() at 0xffffffff802d9eba =3D db_trace_self_wrapper+0x2=
a
kdb_backtrace() at 0xffffffff80475d17 =3D kdb_backtrace+0x37
_witness_debugger() at 0xffffffff8048b4fe =3D _witness_debugger+0x2e
witness_checkorder() at 0xffffffff8048c7a7 =3D witness_checkorder+0x807
__lockmgr_args() at 0xffffffff80427553 =3D __lockmgr_args+0xd63
ffs_lock() at 0xffffffff806578fc =3D ffs_lock+0x9c
VOP_LOCK1_APV() at 0xffffffff806f285f =3D VOP_LOCK1_APV+0xbf
_vn_lock() at 0xffffffff804e87c7 =3D _vn_lock+0x57
vget() at 0xffffffff804dbb5b =3D vget+0x7b
softdep_request_cleanup() at 0xffffffff80649f31 =3D softdep_request_cleanup=
+0x311
ffs_alloc() at 0xffffffff80630b64 =3D ffs_alloc+0x134
ffs_balloc_ufs2() at 0xffffffff8063426c =3D ffs_balloc_ufs2+0x11ac
ffs_write() at 0xffffffff8065889f =3D ffs_write+0x22f
VOP_WRITE_APV() at 0xffffffff806f33dd =3D VOP_WRITE_APV+0x14d
vn_write() at 0xffffffff804e9a42 =3D vn_write+0x2a2
dofilewrite() at 0xffffffff8048df25 =3D dofilewrite+0x85
kern_writev() at 0xffffffff8048f740 =3D kern_writev+0x60
write() at 0xffffffff8048f845 =3D write+0x55
syscallenter() at 0xffffffff80483cbb =3D syscallenter+0x1cb
syscall() at 0xffffffff806abaf0 =3D syscall+0x60
Xfast_syscall() at 0xffffffff8069670d =3D Xfast_syscall+0xdd
--- syscall (4, FreeBSD ELF64, write), rip =3D 0x8009438fc, rsp =3D
0x7fffffffda68, rbp =3D 0xa00000 ---

--=20
wbr,
pluknet



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?BANLkTikEygiJKefTOjcY4sKdab8NN2jykQ>