From owner-freebsd-stable@FreeBSD.ORG Sat Oct 27 21:07:51 2012 Return-Path: Delivered-To: stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 5FBFFA97; Sat, 27 Oct 2012 21:07:51 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from mail-la0-f54.google.com (mail-la0-f54.google.com [209.85.215.54]) by mx1.freebsd.org (Postfix) with ESMTP id 9DA7D8FC14; Sat, 27 Oct 2012 21:07:50 +0000 (UTC) Received: by mail-la0-f54.google.com with SMTP id e12so3875186lag.13 for ; Sat, 27 Oct 2012 14:07:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type; bh=fJpPlXpVuxUiK8L40K3oZf9vE5jFjuJupdTkvHVOUbk=; b=mHPYB2V13B6MLcs5MFbUbhv08aFOzlVSTdQNMsencTyMRSje3pYH5B9e7eYAzAfPZy uYwm8pc/3Mxl4L50OdiWl1uF9w321wEuuetWpPK3I+faDCc7j0OFnwqzTTQFqE1w2dVM DeL+RomPUnxduRjyVsFekifzOTmHDHcluaimTdbH1uaycJEEyHhHHMhyxtFiZdX3tFx4 B5xzEzUvceXgMAmk0RNr6+zEqSe6evm/WJXPZTbRYZo6SvxqenyBdvc3Gi6WCpbtjQCv cnpubSnM0ZatwVTndQzzkIoZR5l8tpDP6zQoVc1VtRV9uOMWGNGsHDo8iEoPv+tyN/OV sbvQ== MIME-Version: 1.0 Received: by 10.152.123.103 with SMTP id lz7mr23348684lab.21.1351372069237; Sat, 27 Oct 2012 14:07:49 -0700 (PDT) Sender: asmrookie@gmail.com Received: by 10.112.30.37 with HTTP; Sat, 27 Oct 2012 14:07:49 -0700 (PDT) In-Reply-To: References: <5022840B.3060708@omnilan.de> <5048C6D1.8020007@omnilan.de> Date: Sat, 27 Oct 2012 22:07:49 +0100 X-Google-Sender-Auth: NNv21HsIrAxkHRxk2IkZmS-fXeI Message-ID: Subject: Re: lock violation in unionfs (9.0-STABLE r230270) From: Attilio Rao To: Harald Schmalzbauer Content-Type: text/plain; charset=UTF-8 Cc: stable@freebsd.org, daichi@freebsd.org, Pavel Polyakov X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list Reply-To: attilio@FreeBSD.org List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 27 Oct 2012 21:07:51 -0000 On Sat, Oct 27, 2012 at 9:46 PM, Attilio Rao wrote: > On Sat, Sep 8, 2012 at 12:48 AM, Attilio Rao wrote: >> On Thu, Sep 6, 2012 at 4:52 PM, Harald Schmalzbauer >> wrote: >>> schrieb Attilio Rao am 09.08.2012 20:26 (localtime): >>>> On 8/8/12, Harald Schmalzbauer wrote: >>>>> schrieb Pavel Polyakov am 06.03.2012 11:20 (localtime): >>>>>>>> mount -t unionfs -o noatime /usr /mnt >>>>>>>> >>>>>>>> insmntque: mp-safe fs and non-locked vp: 0xfffffe01d96704f0 is not >>>>>>>> exclusive locked but should be >>>>>>>> KDB: enter: lock violation >>>>>>> Pavel, >>>>>>> can you give a spin to this patch?: >>>>>>> http://www.freebsd.org/~attilio/unionfs_missing_insmntque_lock.patch >>>>>>> >>>>>>> I think that the unlocking is due at that point as the vnode lock can >>>>>>> be switch later on. >>>>>>> >>>>>>> Let me know what you think about it and what the test does. >>>>>> Thanks! >>>>>> This patch fixes the problem with lock violation. Sorry I've tested it so >>>>>> late. >>>>> Hello, >>>>> >>>>> this patch still applies cleanly to RELENG_9_1. Was there another fix >>>>> for the issue or has it just not been PR-sent and thus forgotten? >>>> Can you and Pavel try the attached patch? Unfortunately I had no time >>>> to test it, I just made in 5 free mins from a non-FreeBSD workstation, >>> >>> Sorry, couldn't test earlier, but now I did: >>> With this patch applied the machine hangs without debug kernel and the >>> latter gives the following panic: >>> System call nmount returning with the following locks held: >>> exclusive lockmgr ufs (ufs) r = 0 (0xc5438278) locked @ >>> src/sys/fs/unionfs/union_vnops.c:1938 >>> panic: witness_warn >>> cpuid = 0 >>> KDB: stack backtrace: >>> db_trace_self_wrapper(c0a04f7f,c0c112c4,d1de3bb4,c097aa8c,fc,...) at >>> db_trace_self_wrapper+0x26 >>> kdb_backtrace(c0a4965f,0,c09c2ede3c1c,0,...) at kdb_backtrace+0x2a >>> witness_warn(2,0,c0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>> syscall(d1de3d08) ar syscall+0x415 >>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>> --- syscall (0, FreeBSD ELF32, nosys), eip = 0x280b883f,esp = >>> 0xbfbfe46c, ebp = 0xbfbfede8 --- >>> KDB: enter: panic >>> [ thread pid 86 tid 100054 ] >>> Stopped ad kdb_enter+0x3a: movl $0,kdb_why >>> db> bt >>> Tracing pid 86 tid 100054 td 0xc541b000 >>> kdb_enter(c0a00d16,c0a09130,0,0,0,...) at panix+0x190 >>> witness_warn(2,0,x0a4ac34,c0a0990a,286,...) at witness_warn+0x1e4 >>> syscall(d1de3d08) at syscall+0x415 >>> Xint0x80_syscall() at Xint0x80_syscall+0x21 >>> >>> Hmm, I guess I forgot to install kernel debug symbols... >>> Coming back if I have more >> >> Unfortunately unionfs does very wrong things with the insmntque() locking. >> It basically expects the vnode to return locked in the same way >> requested by the precedent namei() (when that happens) but when you do >> insmntque() you can only have an LK_EXCLUSIVE lock on the vnode. > > Hello, > the following patch should workout the issues around unionfs_nodeget() a bit: > http://www.freebsd.org/~attilio/unionfs_nodeget2.patch > > Unfortunately unionfs code is rather messy in the lookup path about > locking requirements so follow what it needs to be done there is a bit > difficult. > I have no way to test this patch, so it is just test-compiled at the > moment, but I would need that you also test lookup path (so directory > "ls", find(1) on the whole unionfs volume, etc.) to validate it > someway. On a second thought, I think that locking in lookup (and also other operations) is so fragile and difficult to follow that it makes all vnops real locking landmines. I think that the following patch fixes the insmntque insertion and follows the old approach well enough to be committed separately: http://www.freebsd.org/~attilio/unionfs_nodeget3.patch However I strongly suggest that someone does review & sweep all the locking from nodeget and related functions removing the tedious lkflags conditional, reinforcing and expliciting locking rules within functions, checking out for races (which I'm sure are quite a few by the fact that vn lock gets dropped indiscriminately in many points) and possibly review the highly proficient usage of LK_RETRY that I'm sure is not always safe. All these steps should really be carried out separately. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein