From owner-freebsd-current@FreeBSD.ORG Wed May 12 20:55:05 2010 Return-Path: Delivered-To: current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8CEFD106566C for ; Wed, 12 May 2010 20:55:05 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-fx0-f54.google.com (mail-fx0-f54.google.com [209.85.161.54]) by mx1.freebsd.org (Postfix) with ESMTP id 15F7C8FC08 for ; Wed, 12 May 2010 20:55:04 +0000 (UTC) Received: by fxm1 with SMTP id 1so609730fxm.13 for ; Wed, 12 May 2010 13:55:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=C2FqpM7frrcmt1FZPP8jik4f9TBz4gz+FOgRtKsoHbc=; b=eDXM91wmLjAF7TaPzJfOPh2pZ9IOgif1JRZ2KaTR9Zvz/3d6XItNjiraBguXut6ReS wD1lm1mi0iA3an/ffWYNeth9U+WQVHL5H3+vX/jn+cKL3GQe+vfrS++6r+zuUWwcvb6a t/zoDIkKZfxx+1pmNO1Mt/pP+D5gzbBV61YdU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=uTAmgETbWHVKFEOWhLNKh6McfuFvliO6Y2hDUPbVKwVGHdqwxDtsoVky3b2vcx36ua +b7Y1y6Fo99Z/AxeTyhgmLzxaRq28owXttTAbMA5/WjUagpi7wstuJHVCr/WaJXNSsu5 8y4xhEwF3ooXgDHa1atzhVtvXquLL5lCJonG0= MIME-Version: 1.0 Received: by 10.239.190.83 with SMTP id w19mr696424hbh.144.1273697703115; Wed, 12 May 2010 13:55:03 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.239.129.207 with HTTP; Wed, 12 May 2010 13:55:02 -0700 (PDT) In-Reply-To: References: <20100508102005.GB1867@elmar.spoerlein.net> <20100510061057.GA93038@server.vk2pj.dyndns.org> <20100512141154.GF88504@acme.spoerlein.net> Date: Wed, 12 May 2010 22:55:02 +0200 X-Google-Sender-Auth: c904ed5cf7854c34 Message-ID: From: Attilio Rao To: Jeff Roberson Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: current@freebsd.org, Peter Jeremy Subject: Re: LOR: ufs vs bufwait X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 12 May 2010 20:55:05 -0000 2010/5/12 Jeff Roberson : > On Wed, 12 May 2010, Ulrich Sp?rlein wrote: > >> On Mon, 10.05.2010 at 22:53:32 +0200, Attilio Rao wrote: >>> >>> 2010/5/10 Peter Jeremy : >>>> >>>> On 2010-May-08 12:20:05 +0200, Ulrich Sp?rlein >>>> wrote: >>>>> >>>>> This LOR also is not yet listed on the LOR page, so I guess it's rath= er >>>>> new. I do use SUJ. >>>>> >>>>> lock order reversal: >>>>> 1st 0xc48388d8 ufs (ufs) @ /usr/src/sys/kern/vfs_lookup.c:502 >>>>> 2nd 0xec0fe304 bufwait (bufwait) @ >>>>> /usr/src/sys/ufs/ffs/ffs_softdep.c:11363 >>>>> 3rd 0xc49e56b8 ufs (ufs) @ /usr/src/sys/kern/vfs_subr.c:2091 >>>> >>>> I'm seeing exactly the same LOR (and subsequent deadlock) on a recent >>>> -current without SUJ. >>> >>> I think this LOR was reported since a long time. >>> The deadlock may be new and someway related to the vm_page_lock work >>> (if not SUJ). >> >> I was not able to reproduce this with a kernel prior to SUJ, a kernel >> just after SUJ went it shows this "deadlock" or infinite loop ... >> >> Now it might be that the SUJ kernel only increases the pressure so it >> happens during a systems uptime. It does not seem directly related to >> actually using SUJ on a volume, as I could reproduce it with SU only, >> too. >> >> I will try to get a hang not involving GELI and also re-do my tests when >> the volumes have neither SUJ nor SU enabled, which led to 10-20s "hangs" >> of the system IIRC. It seems SU/SUJ then only prolongs these hangs ad >> infinitum. > > I think Peter Holm also saw this once while we were testing SUJ and > reproduced ~30 second hangs with stock sources. =C2=A0At this point we ne= ed to > brainstorm ideas for adding debugging instrumentation and come up with th= e > quickest possible repro. > > It would probably be good to add some KTR tracing and log that when it > wedges. =C2=A0The core I looked at was hung in bufwait. =C2=A0Is there an= y cpu > activity or io activity when things hang? =C2=A0You'll prboably have to k= eep > iostat/vmstat in memory to find out so they don't try to fault in pages o= nce > things are hung. I think I also have some reports about deadlock on unmount -f (not specific to UFS) that seems to me still the same buffer cache async deadlock. I will forward you the traces in a separate e-mail (Peter got to reproduce it with KTR on). Attilio --=20 Peace can only be achieved by understanding - A. Einstein