From owner-freebsd-current@FreeBSD.ORG Tue Jul 22 16:54:06 2008 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 397A61065674 for ; Tue, 22 Jul 2008 16:54:06 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from fg-out-1718.google.com (fg-out-1718.google.com [72.14.220.159]) by mx1.freebsd.org (Postfix) with ESMTP id B7DCF8FC13 for ; Tue, 22 Jul 2008 16:54:05 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: by fg-out-1718.google.com with SMTP id l26so1210696fgb.35 for ; Tue, 22 Jul 2008 09:54:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:cc:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=6MGYw+sp7g2lMq/6fbiyh0RYJyXNCh8NtQ3p+w2Ozr0=; b=gEmOIGtEFICUkNHHpl9ygJ/kEiuybVZn64OUiXVr7zCkhDgvw5tYQjsjvwYng15JYc N747BMa++Lns2N/P7siCEra+L82ls0Gvj6ce7+ivlqjM9pKzU1oKA9XcYqLqUmnSqYlQ 5kfIERN83HxfYeJqaUUaSTeLddOcjq/61YEyU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:cc:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=UDovx5yTS20u5PMvbNOSg+ITBMSFBYXGFVCXKPwSL5wlM+SplaOXjH9HSiyPKSTzAm 4XB+PUXkr17f7imWYR6De+qASRUEX3BL5qIRn+HDUaEtlZOyytieTHUfopO0A7nEtLUl cXu/CVchJrYwQDqX6frOr9PCJPjDjNXcQBcLw= Received: by 10.86.77.5 with SMTP id z5mr6583749fga.10.1216745644357; Tue, 22 Jul 2008 09:54:04 -0700 (PDT) Received: by 10.86.2.18 with HTTP; Tue, 22 Jul 2008 09:54:04 -0700 (PDT) Message-ID: <3bbf2fe10807220954q60ee6747x40076e39884daf19@mail.gmail.com> Date: Tue, 22 Jul 2008 18:54:04 +0200 From: "Attilio Rao" Sender: asmrookie@gmail.com To: "Kostik Belousov" In-Reply-To: <20080722154825.GZ17123@deviant.kiev.zoral.com.ua> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4884F992.7090008@cs.duke.edu> <20080722154825.GZ17123@deviant.kiev.zoral.com.ua> X-Google-Sender-Auth: e252689620ad329d Cc: freebsd-current@freebsd.org, Andrew Gallatin Subject: Re: reproducible "panic: share->excl" X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Jul 2008 16:54:06 -0000 2008/7/22, Kostik Belousov : > On Mon, Jul 21, 2008 at 05:03:14PM -0400, Andrew Gallatin wrote: > > I can panic today's -current reliably (or hang it with > > WITNESS/INVARIENTS disabled). When it crashes, I see > > the appended panic messages. > > > > It seems to be 100% reproducible on my box (AMD64 x2, > > 512MB ram, UFS2). If anybody savvy in this area would > > like to reproduce it, I've left the program at ~gallatin/ahunt.c > > on freefall. Compile it, and run it as: > > ./a.out -mmbfileinit -madvise=/var/tmp/zot -random -size=95536 > > -touch=4096 -rewrite=2 > > > > > > Cheers, > > > > Drew > > > > PS: Here is a serial console log from the panic: > > ... > > > > login: shared lock of (lockmgr) ufs @ kern/vfs_subr.c:2044 > > while exclusively locked from kern/vfs_vnops.c:593 > > panic: share->excl > > cpuid = 1 > > KDB: enter: panic > > [thread pid 1702 tid 100149 ] > > Stopped at kdb_enter+0x3d: movq $0,0x639958(%rip) > > db> tr > > Tracing pid 1702 tid 100149 td 0xffffff000d08f000 > > kdb_enter() at kdb_enter+0x3d > > panic() at panic+0x176 > > witness_checkorder() at witness_checkorder+0x137 > > __lockmgr_args() at __lockmgr_args+0xc74 > > ffs_lock() at ffs_lock+0x8c > > VOP_LOCK1_APV() at VOP_LOCK1_APV+0x9b > > _vn_lock() at _vn_lock+0x47 > > vget() at vget+0x7b > > vnode_pager_lock() at vnode_pager_lock+0x146 > > vm_fault() at vm_fault+0x1e2 > > trap_pfault() at trap_pfault+0x128 > > trap() at trap+0x395 > > calltrap() at calltrap+0x8 > > --- trap 0xc, rip = 0xffffffff8079f2bd, rsp = 0xfffffffe58c2f7b0, rbp = > > 0xfffffffe58c2f830 --- > > copyin() at copyin+0x3d > > ffs_write() at ffs_write+0x2f8 > > VOP_WRITE_APV() at VOP_WRITE_APV+0x10b > > vn_write() at vn_write+0x23f > > dofilewrite() at dofilewrite+0x85 > > --More-- > > > > kern_writev() at kern_writev+0x60 > > write() at write+0x54 > > syscall() at syscall+0x1dd > > Xfast_syscall() at Xfast_syscall+0xab > > --- syscall (4, FreeBSD ELF64, write), rip = 0x8007296ec, rsp = > > 0x7fffffffe158, rbp = 0x7fffffffe210 --- > > db> show locks > > exclusive sleep mutex vnode interlock r = 0 (0xffffff000d0dc0c0) locked > > @ vm/vnode_pager.c:1199 > > exclusive sx user map r = 0 (0xffffff000d054360) locked @ vm/vm_map.c:3115 > > exclusive lockmgr bufwait r = 0 (0xfffffffe5047f278) locked @ > > kern/vfs_bio.c:1783 > > exclusive lockmgr ufs r = 0 (0xffffff000d0dc098) locked @ > > kern/vfs_vnops.c:593 > > db> > > > Essentially, you tried to do the write of the part of the region mmaped > from the file, to the file. The VOP_WRITE() is called with exclusively > locked vnode, while fault handler tried to lock the vnode in shared mode > to page in. > > The following change fixed it for me. > Attilio, would it make sense to consider LK_CANRECURSE | LK_SHARED as > a request for the exlusive lock when the current thread already hold the > exclusive lock instead ? I think this would be a proper solution. I don't like this kind of magics and ecoding in lockmgr. I think that the better thing to do here is to recurse the exclusive lock as you pass to vget(). Also note that without WITNESS the code will return EDEADLK in this case while traditionally what would have happened is that the lockmgr would have to be downgraded silently, but as you can expect this is a very dangerous practice. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein