Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 May 2010 22:55:02 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Jeff Roberson <jroberson@jroberson.net>
Cc:        current@freebsd.org, Peter Jeremy <peterjeremy@acm.org>
Subject:   Re: LOR: ufs vs bufwait
Message-ID:  <r2s3bbf2fe11005121355k43a44e22tf4c0b6849024afd1@mail.gmail.com>
In-Reply-To: <alpine.BSF.2.00.1005121040390.1398@desktop>
References:  <20100508102005.GB1867@elmar.spoerlein.net> <20100510061057.GA93038@server.vk2pj.dyndns.org> <u2h3bbf2fe11005101353k493f3ca3v7c1216e840820c67@mail.gmail.com> <20100512141154.GF88504@acme.spoerlein.net> <alpine.BSF.2.00.1005121040390.1398@desktop>

next in thread | previous in thread | raw e-mail | index | archive | help
2010/5/12 Jeff Roberson <jroberson@jroberson.net>:
> On Wed, 12 May 2010, Ulrich Sp?rlein wrote:
>
>> On Mon, 10.05.2010 at 22:53:32 +0200, Attilio Rao wrote:
>>>
>>> 2010/5/10 Peter Jeremy <peterjeremy@acm.org>:
>>>>
>>>> On 2010-May-08 12:20:05 +0200, Ulrich Sp?rlein <uqs@spoerlein.net>
>>>> wrote:
>>>>>
>>>>> This LOR also is not yet listed on the LOR page, so I guess it's rath=
er
>>>>> new. I do use SUJ.
>>>>>
>>>>> lock order reversal:
>>>>> 1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502
>>>>> 2nd 0xec0fe304 bufwait (bufwait) @
>>>>> /usr/src/sys/ufs/ffs/ffs_softdep.c:11363
>>>>> 3rd 0xc49e56b8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2091
>>>>
>>>> I'm seeing exactly the same LOR (and subsequent deadlock) on a recent
>>>> -current without SUJ.
>>>
>>> I think this LOR was reported since a long time.
>>> The deadlock may be new and someway related to the vm_page_lock work
>>> (if not SUJ).
>>
>> I was not able to reproduce this with a kernel prior to SUJ, a kernel
>> just after SUJ went it shows this "deadlock" or infinite loop ...
>>
>> Now it might be that the SUJ kernel only increases the pressure so it
>> happens during a systems uptime. It does not seem directly related to
>> actually using SUJ on a volume, as I could reproduce it with SU only,
>> too.
>>
>> I will try to get a hang not involving GELI and also re-do my tests when
>> the volumes have neither SUJ nor SU enabled, which led to 10-20s "hangs"
>> of the system IIRC. It seems SU/SUJ then only prolongs these hangs ad
>> infinitum.
>
> I think Peter Holm also saw this once while we were testing SUJ and
> reproduced ~30 second hangs with stock sources. =C2=A0At this point we ne=
ed to
> brainstorm ideas for adding debugging instrumentation and come up with th=
e
> quickest possible repro.
>
> It would probably be good to add some KTR tracing and log that when it
> wedges. =C2=A0The core I looked at was hung in bufwait. =C2=A0Is there an=
y cpu
> activity or io activity when things hang? =C2=A0You'll prboably have to k=
eep
> iostat/vmstat in memory to find out so they don't try to fault in pages o=
nce
> things are hung.

I think I also have some reports about deadlock on unmount -f (not
specific to UFS) that seems to me still the same buffer cache async
deadlock.
I will forward you the traces in a separate e-mail (Peter got to
reproduce it with KTR on).

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?r2s3bbf2fe11005121355k43a44e22tf4c0b6849024afd1>